目标文件的格式 链接到标题
目前,Linux 平台流行的 可执行文件(Executable)主要包含以下格式:
- Linux 下的 ELF(Executable Linkable Format),注意这里是二进制文件其内容的组织格式,与后缀无关;
目标文件是源代码经过编译后但是未进行链接的那些中间文件(Linux 下为 .o
文件),它与可执行文件格式非常相似,一般与可执行文件一起采用同一种格式存储,Linux 下采用 ELF 文件格式。
动态链接库(Dynamic Linking Library)、静态链接库(Static Linking Library)均采用可执行文件格式存储,Linux 下均按照 ELF 格式存储。
- Linux 下的
.so
、.a
;
ELF 文件结构 链接到标题
CSAPP 上的 ELF 格式文件结构图:
更详细的 ELF 文件结构图:
可以看到,ELF 文件包含四个部分:
- 第一部分为 ELF Header;
- 第二部分为 Program Header Table,Relocatable object file 中该部分不存在,Executable object file 中该部分存在;
- 第三部分为 ELF Sections,包括
.text
、.rodata
、.data
、.bss
等; - 第四部分为 ELF Section Header Table(或称节头表,后面以 sht 指代),注意 sht 不像 ELF Header 那样只有一块,它由多个 Section header table entry 组成。
ELF 的 16 进制内容 链接到标题
elf.c
的内容如下:
unsigned long long data1 = 0xdddddddd11111111;
unsigned long long data2 = 0xdddddddd22222222;
void func1() {
}
void func2() {
}
执行 gcc -c elf.c
,会生成 elf.o
,执行 hexdump elf.o > elf.txt
,会将 elf.o
的内容以 16 进制形式存储在 elf.txt
中,elf.txt
的内容如下:
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 01 00 3e 00 01 00 00 00 00 00 00 00 00 00 00 00 |..>.............|
00000020 00 00 00 00 00 00 00 00 28 02 00 00 00 00 00 00 |........(.......|
00000030 00 00 00 00 40 00 00 00 00 00 40 00 0b 00 0a 00 |....@.....@.....|
00000040 55 48 89 e5 90 5d c3 55 48 89 e5 90 5d c3 00 00 |UH...].UH...]...|
00000050 11 11 11 11 dd dd dd dd 22 22 22 22 dd dd dd dd |........""""....|
00000060 00 47 43 43 3a 20 28 44 65 62 69 61 6e 20 31 32 |.GCC: (Debian 12|
00000070 2e 32 2e 30 2d 31 34 29 20 31 32 2e 32 2e 30 00 |.2.0-14) 12.2.0.|
00000080 14 00 00 00 00 00 00 00 01 7a 52 00 01 78 10 01 |.........zR..x..|
00000090 1b 0c 07 08 90 01 00 00 1c 00 00 00 1c 00 00 00 |................|
000000a0 00 00 00 00 07 00 00 00 00 41 0e 10 86 02 43 0d |.........A....C.|
000000b0 06 42 0c 07 08 00 00 00 1c 00 00 00 3c 00 00 00 |.B..........<...|
000000c0 00 00 00 00 07 00 00 00 00 41 0e 10 86 02 43 0d |.........A....C.|
000000d0 06 42 0c 07 08 00 00 00 00 00 00 00 00 00 00 00 |.B..............|
000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000f0 01 00 00 00 04 00 f1 ff 00 00 00 00 00 00 00 00 |................|
00000100 00 00 00 00 00 00 00 00 00 00 00 00 03 00 01 00 |................|
00000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000120 07 00 00 00 11 00 02 00 00 00 00 00 00 00 00 00 |................|
00000130 08 00 00 00 00 00 00 00 0d 00 00 00 11 00 02 00 |................|
00000140 08 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 |................|
00000150 13 00 00 00 12 00 01 00 00 00 00 00 00 00 00 00 |................|
00000160 07 00 00 00 00 00 00 00 19 00 00 00 12 00 01 00 |................|
00000170 07 00 00 00 00 00 00 00 07 00 00 00 00 00 00 00 |................|
00000180 00 65 6c 66 2e 63 00 64 61 74 61 31 00 64 61 74 |.elf.c.data1.dat|
00000190 61 32 00 66 75 6e 63 31 00 66 75 6e 63 32 00 00 |a2.func1.func2..|
000001a0 20 00 00 00 00 00 00 00 02 00 00 00 02 00 00 00 | ...............|
000001b0 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 |........@.......|
000001c0 02 00 00 00 02 00 00 00 07 00 00 00 00 00 00 00 |................|
000001d0 00 2e 73 79 6d 74 61 62 00 2e 73 74 72 74 61 62 |..symtab..strtab|
000001e0 00 2e 73 68 73 74 72 74 61 62 00 2e 74 65 78 74 |..shstrtab..text|
000001f0 00 2e 64 61 74 61 00 2e 62 73 73 00 2e 63 6f 6d |..data..bss..com|
00000200 6d 65 6e 74 00 2e 6e 6f 74 65 2e 47 4e 55 2d 73 |ment..note.GNU-s|
00000210 74 61 63 6b 00 2e 72 65 6c 61 2e 65 68 5f 66 72 |tack..rela.eh_fr|
00000220 61 6d 65 00 00 00 00 00 00 00 00 00 00 00 00 00 |ame.............|
00000230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000260 00 00 00 00 00 00 00 00 1b 00 00 00 01 00 00 00 |................|
00000270 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000280 40 00 00 00 00 00 00 00 0e 00 00 00 00 00 00 00 |@...............|
00000290 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
000002a0 00 00 00 00 00 00 00 00 21 00 00 00 01 00 00 00 |........!.......|
000002b0 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000002c0 50 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 |P...............|
000002d0 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 |................|
000002e0 00 00 00 00 00 00 00 00 27 00 00 00 08 00 00 00 |........'.......|
000002f0 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000300 60 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |`...............|
00000310 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
00000320 00 00 00 00 00 00 00 00 2c 00 00 00 01 00 00 00 |........,.......|
00000330 30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |0...............|
00000340 60 00 00 00 00 00 00 00 20 00 00 00 00 00 00 00 |`....... .......|
00000350 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
00000360 01 00 00 00 00 00 00 00 35 00 00 00 01 00 00 00 |........5.......|
00000370 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000380 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000390 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
000003a0 00 00 00 00 00 00 00 00 4a 00 00 00 01 00 00 00 |........J.......|
000003b0 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000003c0 80 00 00 00 00 00 00 00 58 00 00 00 00 00 00 00 |........X.......|
000003d0 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 |................|
000003e0 00 00 00 00 00 00 00 00 45 00 00 00 04 00 00 00 |........E.......|
000003f0 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |@...............|
00000400 a0 01 00 00 00 00 00 00 30 00 00 00 00 00 00 00 |........0.......|
00000410 08 00 00 00 06 00 00 00 08 00 00 00 00 00 00 00 |................|
00000420 18 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00 |................|
00000430 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000440 d8 00 00 00 00 00 00 00 a8 00 00 00 00 00 00 00 |................|
00000450 09 00 00 00 03 00 00 00 08 00 00 00 00 00 00 00 |................|
00000460 18 00 00 00 00 00 00 00 09 00 00 00 03 00 00 00 |................|
00000470 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000480 80 01 00 00 00 00 00 00 1f 00 00 00 00 00 00 00 |................|
00000490 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
000004a0 00 00 00 00 00 00 00 00 11 00 00 00 03 00 00 00 |................|
000004b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000004c0 d0 01 00 00 00 00 00 00 54 00 00 00 00 00 00 00 |........T.......|
000004d0 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
000004e0 00 00 00 00 00 00 00 00 |........|
000004e8
在 /usr/include/elf.h
中,有两个结构体 Elf64_Ehdr
和 Elf64_Shdr
,分别对应 ELF Header 和 Section header table entry,我们根据这两个结构体的声明,就能大致解析 elf.txt
文件。
typedef uint16_t Elf64_Half;
typedef uint32_t Elf64_Word;
typedef int32_t Elf64_Sword;
typedef uint64_t Elf64_Xword;
typedef int64_t Elf64_Sxword;
typedef uint64_t Elf64_Addr;
typedef uint64_t Elf64_Off;
typedef struct
{
Elf64_Word sh_name; /* Section name (string tbl index) */
Elf64_Word sh_type; /* Section type */
Elf64_Xword sh_flags; /* Section flags */
Elf64_Addr sh_addr; /* Section virtual addr at execution */
Elf64_Off sh_offset; /* Section file offset */ 为 0x0e = 14
Elf64_Xword sh_size; /* Section size in bytes */
Elf64_Word sh_link; /* Link to another section */
Elf64_Word sh_info; /* Additional section information */
Elf64_Xword sh_addralign; /* Section alignment */
Elf64_Xword sh_entsize; /* Entry size if section holds table */
} Elf64_Shdr
typedef struct
{
unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
Elf64_Half e_type; /* Object file type */
Elf64_Half e_machine; /* Architecture */
Elf64_Word e_version; /* Object file version */
Elf64_Addr e_entry; /* Entry point virtual address */
Elf64_Off e_phoff; /* Program header table file offset */
Elf64_Off e_shoff; /* Section header table file offset */
Elf64_Word e_flags; /* Processor-specific flags */
Elf64_Half e_ehsize; /* ELF header size in bytes */
Elf64_Half e_phentsize; /* Program header table entry size */
Elf64_Half e_phnum; /* Program header table entry count */
Elf64_Half e_shentsize; /* Section header table entry size */
Elf64_Half e_shnum; /* Section header table entry count */
Elf64_Half e_shstrndx; /* Section header string table index */
} Elf64_Ehdr;
解析 ELF 文件 链接到标题
elf.txt
第一行 00000000
对应的内容为 magic number,又称魔数,标识该文件格式,例如 0x7f
说明是 ELF
格式,0x45
、0x4c
、0x46
分别是 E
、 L
、F
的 ascii 码。
第二行 00000010
前 8 个字节分别表示 e_type
、e_machine
、e_version
,e_type
为 0x01
表示为 Relocatable object file,e_machine
为 0x3e
表示为 x86_64
机器,后 8 个字节为 e_entry
,规定 ELF
程序的入口虚拟地址,操作系统在加载完程序之后从这个地址开始执行进程的指令,Relocatable object file 一般没有入口地址,该值为 0。
第三行 00000020
前 8 个字节为 e_phoff
,表示文件中 Program Header Table 的偏移量,对 Relocatable object file 来说无意义,为 0;后 8 个字节为 e_shoff
,注意是小端机,其值为 0x228
,说明 sht 的偏移量为 0x228
(也可以认为从 0x228
处起为 sht)。
第四行 00000040
前 4 个字节为 e_flags
,我们不关心;5、6 字节为 e_ehsize
,为 0x40
,说明 ELF Header 占据大小为 40 字节;7、8 字节为 e_phentsize
,为每个 Program header table entry 占据的大小,由于是 Relocatable object file,这里为 0;9、10 字节为 Program header table entry 的个数,也为 0;11、12 字节为 sht entry 的大小,为 0x40
,即 64 字节;13、14 字节为 sht entry 的个数,为 0x0b
;最后两个字节为 e_shstrndx
,值为 0xa
,指示了节名称字符串表在 sht 中的位置。
到这里,前四行的内容即为 ELF Header 的所有内容,我们已经可以计算出该 ELF 格式文件的大小了,为
$size = e_shoff + e_shentsize * e_shnum = \texttt{0x228} + \texttt{0x40} * \texttt{0x0b} = \texttt{0x4e8}$,elf.txt
文件的最后一行正好是 0x4e8
。
我们也可以执行 readelf -h elf.o
,其内容如下:
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 552 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 64 (bytes)
Number of section headers: 11
Section header string table index: 10
与我们之前的分析也能够对应起来。
Section header table 解析 链接到标题
首先,执行 readelf -S elf.o
,输出如下:
There are 11 section headers, starting at offset 0x228:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040
000000000000000e 0000000000000000 AX 0 0 1
[ 2] .data PROGBITS 0000000000000000 00000050
0000000000000010 0000000000000000 WA 0 0 8
[ 3] .bss NOBITS 0000000000000000 00000060
0000000000000000 0000000000000000 WA 0 0 1
[ 4] .comment PROGBITS 0000000000000000 00000060
0000000000000020 0000000000000001 MS 0 0 1
[ 5] .note.GNU-stack PROGBITS 0000000000000000 00000080
0000000000000000 0000000000000000 0 0 1
[ 6] .eh_frame PROGBITS 0000000000000000 00000080
0000000000000058 0000000000000000 A 0 0 8
[ 7] .rela.eh_frame RELA 0000000000000000 000001a0
0000000000000030 0000000000000018 I 8 6 8
[ 8] .symtab SYMTAB 0000000000000000 000000d8
00000000000000a8 0000000000000018 9 3 8
[ 9] .strtab STRTAB 0000000000000000 00000180
000000000000001f 0000000000000000 0 0 1
[10] .shstrtab STRTAB 0000000000000000 000001d0
0000000000000054 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
D (mbind), l (large), p (processor specific)
首先,从之前的 ELF Header 我们可以得知,sht 是从 0x228
开始的,sht entry 的 size 为 0x40
,第零号 entry
的 name
为空,其他值也全都是 0,我们可以看第一号 entry
。
前 4 个字节为 sh_name
,值为 0x1b
,表示的是它所代表的字符串在 section header string table
中的起始位置的偏移(或者说索引),我们可以看到,.shstrtab
的 offset
为 0x1d0
,0x1d0 + 0x1b
的位置,以该位置起始的字符串为 0x2e
、0x74
、0x65
、0x78
、0x74
,其 ascii 码对应的字符组成的字符串为 .text
。
5~8 四个字节为 sh_type
,值为 0x1
;
9~16 这八个字节为 sh_flags
,值为0x6
,对应二进制为 $110_{(2)}$,要从二进制掩码的角度考虑这个值,通过 sh_flags
和 sh_type
即可确定该 Section 是 PROGBITS
类型,具有可写和可分配的属性;
17~24 这八个字节为 sh_addr
,为 Section 在内存中的虚拟地址,对于 Relocatable object file 来说,该值没什么意义,因此为 0;
25~32 这八个字节为 sh_offset
,即该 sht 对应的 Section table 的在文件中的 offset
,值为 0x40
;
33~40 这八个字节为 sh_size
,表示 Sectionn table 的大小,这里为 0x0e
;
41~48 这八个字节为 sh_link
和 sh_info
,这里对其意义暂时不予讨论;
49~56 这八个字节为 sh_addralign
,表示节头表的对齐方式;
59~64 这八个字节为 sh_entsize
,表示 Section table 的单个表项的大小,这里值为 0x0
,没有意义;
我们可以查看 .symtab
sht 的 sh_entsize
为 0x18
,其 sh_size
为 0xa8
,$\frac{\texttt{0xa8}}{\texttt{0x18}} = \texttt{0x7}$,所以 .symtab
Section table 一共有七个表项。
执行 readelf -s elf.o
,输出为
Symbol table '.symtab' contains 7 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS elf.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text
3: 0000000000000000 8 OBJECT GLOBAL DEFAULT 2 data1
4: 0000000000000008 8 OBJECT GLOBAL DEFAULT 2 data2
5: 0000000000000000 7 FUNC GLOBAL DEFAULT 1 func1
6: 0000000000000007 7 FUNC GLOBAL DEFAULT 1 func2
输出与计算结果一致。