I am very interested in the underlying mechanisims by which code is built and executed. Every day I use executables and yet I have no real understanding of how they work. I began reading about the ELF file format and how it's structured, what the sections are, how linking works etc. I found a couple of resources that helped me understand a lot. The first is a PDF called "libelf by Example" by J. Koshy, which describes how to use the libelf library to read an elf file and extract it's contents. The second is an excellent series of posts by Ian Lance Taylor about how linkers work. I copy-pasted his series into a single huge document and you can read it here. After reading I became curious about how exactly linking works when statically linking an object file into an executable. I figured there must be a minimal object and minimal executable I could create where I could really "see" the relocation happening. This is that process.
We are going to create a simple object file (obj1.o) and link it into a binary, observing the relocation table in the object file and how the program linker performs the relocation when including the object into the binary.
Consider following source file obj1.c. It has no includes:
int obj1_global_int;
void obj1_global_int_increment_func() {
obj1_global_int++;
}
Compile it into an object file with:
cc -c -o obj1.o obj1.c
This is the full output of readelf
including the relocation section:
ceade@ncc1701:$ readelf -a obj1.o
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 656 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 64 (bytes)
Number of section headers: 12
Section header string table index: 11
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040
0000000000000016 0000000000000000 AX 0 0 1
[ 2] .rela.text RELA 0000000000000000 000001e8
0000000000000030 0000000000000018 I 9 1 8
[ 3] .data PROGBITS 0000000000000000 00000056
0000000000000000 0000000000000000 WA 0 0 1
[ 4] .bss NOBITS 0000000000000000 00000056
0000000000000000 0000000000000000 WA 0 0 1
[ 5] .comment PROGBITS 0000000000000000 00000056
000000000000002e 0000000000000001 MS 0 0 1
[ 6] .note.GNU-stack PROGBITS 0000000000000000 00000084
0000000000000000 0000000000000000 0 0 1
[ 7] .eh_frame PROGBITS 0000000000000000 00000088
0000000000000038 0000000000000000 A 0 0 8
[ 8] .rela.eh_frame RELA 0000000000000000 00000218
0000000000000018 0000000000000018 I 9 7 8
[ 9] .symtab SYMTAB 0000000000000000 000000c0
00000000000000f0 0000000000000018 10 8 8
[10] .strtab STRTAB 0000000000000000 000001b0
0000000000000037 0000000000000000 0 0 1
[11] .shstrtab STRTAB 0000000000000000 00000230
0000000000000059 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
There are no section groups in this file.
There are no program headers in this file.
Relocation section '.rela.text' at offset 0x1e8 contains 2 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000006 000800000002 R_X86_64_PC32 0000000000000004 obj1_global_int - 4
00000000000f 000800000002 R_X86_64_PC32 0000000000000004 obj1_global_int - 4
Relocation section '.rela.eh_frame' at offset 0x218 contains 1 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000020 000200000002 R_X86_64_PC32 0000000000000000 .text + 0
The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.
Symbol table '.symtab' contains 10 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS obj1.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 6
6: 0000000000000000 0 SECTION LOCAL DEFAULT 7
7: 0000000000000000 0 SECTION LOCAL DEFAULT 5
8: 0000000000000004 4 OBJECT GLOBAL DEFAULT COM obj1_global_int
9: 0000000000000000 22 FUNC GLOBAL DEFAULT 1 obj1_global_int_increment
No version information found in this file.
So that's pretty straight forward. The symbol table was larger than I expected and I mean to dig into why that might be and maybe write up that. Let's go through all the sections in the file:
The "NULL" section. Every file seems to have a NULL section at index 0.
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
Next is the program code. It seems to be 1 byte longer than what the dissasembly reports (below):
[ 1] .text PROGBITS 0000000000000000 00000040
0000000000000016 0000000000000000 AX 0 0 1
The relocation table. This contains the relocations to be performed when the object is included into an executable (more details below):
[ 2] .rela.text RELA 0000000000000000 000001e8
0000000000000030 0000000000000018 I 9 1 8
Some empty data section:
[ 3] .data PROGBITS 0000000000000000 00000056
0000000000000000 0000000000000000 WA 0 0 1
A .bss section. From the Linkers tutorial "A section which takes up memory space but has no associated contents. This is used for zero-initialized data.":
[ 4] .bss NOBITS 0000000000000000 00000056
0000000000000000 0000000000000000 WA 0 0 1
From Quora "It's used to hold comments about the generated ELF (details such as compiler version and execution platform).":
[ 5] .comment PROGBITS 0000000000000000 00000056
000000000000002e 0000000000000001 MS 0 0 1
This is an empty section that is here simply to indicate if the stack should be executable. The permissions on this segment are used to set the permissions on the stack:
[ 6] .note.GNU-stack PROGBITS 0000000000000000 00000084
0000000000000000 0000000000000000 0 0 1
The .eh_frame seems somehow related to debugging:
[ 7] .eh_frame PROGBITS 0000000000000000 00000088
0000000000000038 0000000000000000 A 0 0 8
This too:
[ 8] .rela.eh_frame RELA 0000000000000000 00000218
0000000000000018 0000000000000018 I 9 7 8
The symbol table for the object:
[ 9] .symtab SYMTAB 0000000000000000 000000c0
00000000000000f0 0000000000000018 10 8 8
A string table for storing strings like "obj1_global_int" and "obj1_global_int_increment":
[10] .strtab STRTAB 0000000000000000 000001b0
0000000000000037 0000000000000000 0 0 1
The string table for the section headers, storing strings like ".symtab", ".rela.text" etc:
[11] .shstrtab STRTAB 0000000000000000 00000230
0000000000000059 0000000000000000 0 0 1
Relocaton is the process of modifying the machine instructions so that address references are changed during linking. When the object file is created the compiler generates a relocation section in the object file that describes the location of bytes that need to be changed when the machine code is copied into the binary. The relocation section ".rela.text" contains the code relocations to be performed when inserting the object code into an executable. The offsets are 6 and 16. The program linker copies the object code in the ".text" section into the executable and then needs to adjust any code that refers to this variable to a new offset.
Relocation section '.rela.text' at offset 0x1e8 contains 2 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000006 000800000002 R_X86_64_PC32 0000000000000004 obj1_global_int - 4
00000000000f 000800000002 R_X86_64_PC32 0000000000000004 obj1_global_int - 4
Let's look at the dissaembly of the object code (objdump -d obj1.o
):
obj1.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <obj1_global_int_increment_func>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # a <obj1_global_int_increment_func+0xa>
a: 83 c0 01 add $0x1,%eax
d: 89 05 00 00 00 00 mov %eax,0x0(%rip) # 13 <obj1_global_int_increment_func+0x13>
13: 90 nop
14: 5d pop %rbp
15: c3 retq
Here we can see the function moving the value of obj1_global_int into %eax, adding 0x01 to it and moving that value back into obj1_global_int. The relocations are at offset 0x06 and 0x0f. These correspond to these two instructions:
8b 05 00 00 00 00
89 05 00 00 00 00
The %rip
means "Relative Instruction Pointer" meaning we are referring to a
location in the executable memory relative to the instruction being executed.
In the dissembly above the relative position is 0x0, which is obviously not
going to work. Hence relocations are going to change that when the object is
linked into the executable. Lets see what it looks like when linked into a
binary:
#include "obj1.h"
int main(int argc, char** argv) {
obj1_global_int_increment();
return 0;
}
And compiled (gcc -o main obj1.o main.c
). Now lets dissasemble the executable
and find the part containing the included obj1.o (objdump -d main
):
0000000000000660 <obj1_global_int_increment_func>:
660: 55 push %rbp
661: 48 89 e5 mov %rsp,%rbp
664: 8b 05 c2 09 20 00 mov 0x2009c2(%rip),%eax # 20102c <obj1_global_int>
66a: 83 c0 01 add $0x1,%eax
66d: 89 05 b9 09 20 00 mov %eax,0x2009b9(%rip) # 20102c <obj1_global_int>
673: 90 nop
674: 5d pop %rbp
675: c3 retq
First we can see that the object code has a different address in the program memory space (0x660 to 0x675, rather than 0x0 to 0x15). If we look at the two instructions we observed before we can see the relocations have been made so the two previous instructions are now:
8b 05 c2 09 20 00
89 05 b9 09 20 00
The program linker has determined where the new location of %rip for each
instruction is and transformed that into an address relative to the instruction
pointer. I believe the reason why the offset looks big is that it's a twos
compliment negative offset (the address of obj1_global_int
is less than the
object code) but I was unable to confirm that.