目标文件的格式 链接到标题

目前,Linux 平台流行的 可执行文件(Executable)主要包含以下格式:

  • Linux 下的 ELF(Executable Linkable Format),注意这里是二进制文件其内容的组织格式,与后缀无关;

目标文件是源代码经过编译后但是未进行链接的那些中间文件(Linux 下为 .o 文件),它与可执行文件格式非常相似,一般与可执行文件一起采用同一种格式存储,Linux 下采用 ELF 文件格式。

动态链接库(Dynamic Linking Library)、静态链接库(Static Linking Library)均采用可执行文件格式存储,Linux 下均按照 ELF 格式存储。

  • Linux 下的 .so.a

ELF 文件结构 链接到标题

CSAPP 上的 ELF 格式文件结构图:

8A6Nume3zbD4rfk

更详细的 ELF 文件结构图:

r6qALmx8dRVwPhD

可以看到,ELF 文件包含四个部分:

  • 第一部分为 ELF Header;
  • 第二部分为 Program Header Table,Relocatable object file 中该部分不存在,Executable object file 中该部分存在;
  • 第三部分为 ELF Sections,包括 .text.rodata.data.bss等;
  • 第四部分为 ELF Section Header Table(或称节头表,后面以 sht 指代),注意 sht 不像 ELF Header 那样只有一块,它由多个 Section header table entry 组成。

ELF 的 16 进制内容 链接到标题

elf.c 的内容如下:

unsigned long long data1 = 0xdddddddd11111111;
unsigned long long data2 = 0xdddddddd22222222;
void func1() {
}
void func2() {
}

执行 gcc -c elf.c,会生成 elf.o,执行 hexdump elf.o > elf.txt,会将 elf.o 的内容以 16 进制形式存储在 elf.txt 中,elf.txt 的内容如下:

00000000  7f 45 4c 46 02 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  01 00 3e 00 01 00 00 00  00 00 00 00 00 00 00 00  |..>.............|
00000020  00 00 00 00 00 00 00 00  28 02 00 00 00 00 00 00  |........(.......|
00000030  00 00 00 00 40 00 00 00  00 00 40 00 0b 00 0a 00  |....@.....@.....|
00000040  55 48 89 e5 90 5d c3 55  48 89 e5 90 5d c3 00 00  |UH...].UH...]...|
00000050  11 11 11 11 dd dd dd dd  22 22 22 22 dd dd dd dd  |........""""....|
00000060  00 47 43 43 3a 20 28 44  65 62 69 61 6e 20 31 32  |.GCC: (Debian 12|
00000070  2e 32 2e 30 2d 31 34 29  20 31 32 2e 32 2e 30 00  |.2.0-14) 12.2.0.|
00000080  14 00 00 00 00 00 00 00  01 7a 52 00 01 78 10 01  |.........zR..x..|
00000090  1b 0c 07 08 90 01 00 00  1c 00 00 00 1c 00 00 00  |................|
000000a0  00 00 00 00 07 00 00 00  00 41 0e 10 86 02 43 0d  |.........A....C.|
000000b0  06 42 0c 07 08 00 00 00  1c 00 00 00 3c 00 00 00  |.B..........<...|
000000c0  00 00 00 00 07 00 00 00  00 41 0e 10 86 02 43 0d  |.........A....C.|
000000d0  06 42 0c 07 08 00 00 00  00 00 00 00 00 00 00 00  |.B..............|
000000e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000f0  01 00 00 00 04 00 f1 ff  00 00 00 00 00 00 00 00  |................|
00000100  00 00 00 00 00 00 00 00  00 00 00 00 03 00 01 00  |................|
00000110  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000120  07 00 00 00 11 00 02 00  00 00 00 00 00 00 00 00  |................|
00000130  08 00 00 00 00 00 00 00  0d 00 00 00 11 00 02 00  |................|
00000140  08 00 00 00 00 00 00 00  08 00 00 00 00 00 00 00  |................|
00000150  13 00 00 00 12 00 01 00  00 00 00 00 00 00 00 00  |................|
00000160  07 00 00 00 00 00 00 00  19 00 00 00 12 00 01 00  |................|
00000170  07 00 00 00 00 00 00 00  07 00 00 00 00 00 00 00  |................|
00000180  00 65 6c 66 2e 63 00 64  61 74 61 31 00 64 61 74  |.elf.c.data1.dat|
00000190  61 32 00 66 75 6e 63 31  00 66 75 6e 63 32 00 00  |a2.func1.func2..|
000001a0  20 00 00 00 00 00 00 00  02 00 00 00 02 00 00 00  | ...............|
000001b0  00 00 00 00 00 00 00 00  40 00 00 00 00 00 00 00  |........@.......|
000001c0  02 00 00 00 02 00 00 00  07 00 00 00 00 00 00 00  |................|
000001d0  00 2e 73 79 6d 74 61 62  00 2e 73 74 72 74 61 62  |..symtab..strtab|
000001e0  00 2e 73 68 73 74 72 74  61 62 00 2e 74 65 78 74  |..shstrtab..text|
000001f0  00 2e 64 61 74 61 00 2e  62 73 73 00 2e 63 6f 6d  |..data..bss..com|
00000200  6d 65 6e 74 00 2e 6e 6f  74 65 2e 47 4e 55 2d 73  |ment..note.GNU-s|
00000210  74 61 63 6b 00 2e 72 65  6c 61 2e 65 68 5f 66 72  |tack..rela.eh_fr|
00000220  61 6d 65 00 00 00 00 00  00 00 00 00 00 00 00 00  |ame.............|
00000230  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000260  00 00 00 00 00 00 00 00  1b 00 00 00 01 00 00 00  |................|
00000270  06 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000280  40 00 00 00 00 00 00 00  0e 00 00 00 00 00 00 00  |@...............|
00000290  00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
000002a0  00 00 00 00 00 00 00 00  21 00 00 00 01 00 00 00  |........!.......|
000002b0  03 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000002c0  50 00 00 00 00 00 00 00  10 00 00 00 00 00 00 00  |P...............|
000002d0  00 00 00 00 00 00 00 00  08 00 00 00 00 00 00 00  |................|
000002e0  00 00 00 00 00 00 00 00  27 00 00 00 08 00 00 00  |........'.......|
000002f0  03 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000300  60 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |`...............|
00000310  00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
00000320  00 00 00 00 00 00 00 00  2c 00 00 00 01 00 00 00  |........,.......|
00000330  30 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |0...............|
00000340  60 00 00 00 00 00 00 00  20 00 00 00 00 00 00 00  |`....... .......|
00000350  00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
00000360  01 00 00 00 00 00 00 00  35 00 00 00 01 00 00 00  |........5.......|
00000370  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000380  80 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000390  00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
000003a0  00 00 00 00 00 00 00 00  4a 00 00 00 01 00 00 00  |........J.......|
000003b0  02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000003c0  80 00 00 00 00 00 00 00  58 00 00 00 00 00 00 00  |........X.......|
000003d0  00 00 00 00 00 00 00 00  08 00 00 00 00 00 00 00  |................|
000003e0  00 00 00 00 00 00 00 00  45 00 00 00 04 00 00 00  |........E.......|
000003f0  40 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |@...............|
00000400  a0 01 00 00 00 00 00 00  30 00 00 00 00 00 00 00  |........0.......|
00000410  08 00 00 00 06 00 00 00  08 00 00 00 00 00 00 00  |................|
00000420  18 00 00 00 00 00 00 00  01 00 00 00 02 00 00 00  |................|
00000430  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000440  d8 00 00 00 00 00 00 00  a8 00 00 00 00 00 00 00  |................|
00000450  09 00 00 00 03 00 00 00  08 00 00 00 00 00 00 00  |................|
00000460  18 00 00 00 00 00 00 00  09 00 00 00 03 00 00 00  |................|
00000470  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000480  80 01 00 00 00 00 00 00  1f 00 00 00 00 00 00 00  |................|
00000490  00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
000004a0  00 00 00 00 00 00 00 00  11 00 00 00 03 00 00 00  |................|
000004b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000004c0  d0 01 00 00 00 00 00 00  54 00 00 00 00 00 00 00  |........T.......|
000004d0  00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
000004e0  00 00 00 00 00 00 00 00                           |........|
000004e8

/usr/include/elf.h 中,有两个结构体 Elf64_EhdrElf64_Shdr,分别对应 ELF Header 和 Section header table entry,我们根据这两个结构体的声明,就能大致解析 elf.txt 文件。

typedef uint16_t Elf64_Half;
typedef uint32_t Elf64_Word;
typedef	int32_t  Elf64_Sword;
typedef uint64_t Elf64_Xword;
typedef	int64_t  Elf64_Sxword;
typedef uint64_t Elf64_Addr;
typedef uint64_t Elf64_Off;
typedef struct
{
  Elf64_Word	sh_name;		/* Section name (string tbl index) */
  Elf64_Word	sh_type;		/* Section type */
  Elf64_Xword	sh_flags;		/* Section flags */
  Elf64_Addr	sh_addr;		/* Section virtual addr at execution */
  Elf64_Off	sh_offset;		/* Section file offset */  0x0e = 14
  Elf64_Xword	sh_size;		/* Section size in bytes */
  Elf64_Word	sh_link;		/* Link to another section */
  Elf64_Word	sh_info;		/* Additional section information */
  Elf64_Xword	sh_addralign;		/* Section alignment */
  Elf64_Xword	sh_entsize;		/* Entry size if section holds table */
} Elf64_Shdr

typedef struct
{
  unsigned char	e_ident[EI_NIDENT];	/* Magic number and other info */
  Elf64_Half	e_type;			/* Object file type */
  Elf64_Half	e_machine;		/* Architecture */
  Elf64_Word	e_version;		/* Object file version */
  Elf64_Addr	e_entry;		/* Entry point virtual address */
  Elf64_Off	e_phoff;		/* Program header table file offset */
  Elf64_Off	e_shoff;		/* Section header table file offset */
  Elf64_Word	e_flags;		/* Processor-specific flags */
  Elf64_Half	e_ehsize;		/* ELF header size in bytes */
  Elf64_Half	e_phentsize;		/* Program header table entry size */
  Elf64_Half	e_phnum;		/* Program header table entry count */
  Elf64_Half	e_shentsize;		/* Section header table entry size */
  Elf64_Half	e_shnum;		/* Section header table entry count */
  Elf64_Half	e_shstrndx;		/* Section header string table index */
} Elf64_Ehdr;

解析 ELF 文件 链接到标题

elf.txt 第一行 00000000 对应的内容为 magic number,又称魔数,标识该文件格式,例如 0x7f 说明是 ELF 格式,0x450x4c0x46 分别是 ELF 的 ascii 码。

第二行 00000010 前 8 个字节分别表示 e_typee_machinee_versione_type0x01 表示为 Relocatable object filee_machine0x3e 表示为 x86_64 机器,后 8 个字节为 e_entry,规定 ELF 程序的入口虚拟地址,操作系统在加载完程序之后从这个地址开始执行进程的指令,Relocatable object file 一般没有入口地址,该值为 0。

第三行 00000020 前 8 个字节为 e_phoff,表示文件中 Program Header Table 的偏移量,对 Relocatable object file 来说无意义,为 0;后 8 个字节为 e_shoff,注意是小端机,其值为 0x228,说明 sht 的偏移量为 0x228(也可以认为从 0x228 处起为 sht)。

第四行 00000040 前 4 个字节为 e_flags,我们不关心;5、6 字节为 e_ehsize,为 0x40,说明 ELF Header 占据大小为 40 字节;7、8 字节为 e_phentsize,为每个 Program header table entry 占据的大小,由于是 Relocatable object file,这里为 0;9、10 字节为 Program header table entry 的个数,也为 0;11、12 字节为 sht entry 的大小,为 0x40,即 64 字节;13、14 字节为 sht entry 的个数,为 0x0b;最后两个字节为 e_shstrndx,值为 0xa,指示了节名称字符串表在 sht 中的位置。

到这里,前四行的内容即为 ELF Header 的所有内容,我们已经可以计算出该 ELF 格式文件的大小了,为 $size = e_shoff + e_shentsize * e_shnum = \texttt{0x228} + \texttt{0x40} * \texttt{0x0b} = \texttt{0x4e8}$,elf.txt 文件的最后一行正好是 0x4e8

我们也可以执行 readelf -h elf.o,其内容如下:

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          552 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           64 (bytes)
  Number of section headers:         11
  Section header string table index: 10

与我们之前的分析也能够对应起来。

Section header table 解析 链接到标题

首先,执行 readelf -S elf.o,输出如下:

There are 11 section headers, starting at offset 0x228:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000000000000  00000040
       000000000000000e  0000000000000000  AX       0     0     1
  [ 2] .data             PROGBITS         0000000000000000  00000050
       0000000000000010  0000000000000000  WA       0     0     8
  [ 3] .bss              NOBITS           0000000000000000  00000060
       0000000000000000  0000000000000000  WA       0     0     1
  [ 4] .comment          PROGBITS         0000000000000000  00000060
       0000000000000020  0000000000000001  MS       0     0     1
  [ 5] .note.GNU-stack   PROGBITS         0000000000000000  00000080
       0000000000000000  0000000000000000           0     0     1
  [ 6] .eh_frame         PROGBITS         0000000000000000  00000080
       0000000000000058  0000000000000000   A       0     0     8
  [ 7] .rela.eh_frame    RELA             0000000000000000  000001a0
       0000000000000030  0000000000000018   I       8     6     8
  [ 8] .symtab           SYMTAB           0000000000000000  000000d8
       00000000000000a8  0000000000000018           9     3     8
  [ 9] .strtab           STRTAB           0000000000000000  00000180
       000000000000001f  0000000000000000           0     0     1
  [10] .shstrtab         STRTAB           0000000000000000  000001d0
       0000000000000054  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)

首先,从之前的 ELF Header 我们可以得知,sht 是从 0x228 开始的,sht entry 的 size 为 0x40,第零号 entryname 为空,其他值也全都是 0,我们可以看第一号 entry

前 4 个字节为 sh_name,值为 0x1b,表示的是它所代表的字符串在 section header string table 中的起始位置的偏移(或者说索引),我们可以看到,.shstrtaboffset0x1d00x1d0 + 0x1b 的位置,以该位置起始的字符串为 0x2e0x740x650x780x74,其 ascii 码对应的字符组成的字符串为 .text

5~8 四个字节为 sh_type,值为 0x1

9~16 这八个字节为 sh_flags,值为0x6,对应二进制为 $110_{(2)}$,要从二进制掩码的角度考虑这个值,通过 sh_flagssh_type 即可确定该 Section 是 PROGBITS 类型,具有可写和可分配的属性;

17~24 这八个字节为 sh_addr,为 Section 在内存中的虚拟地址,对于 Relocatable object file 来说,该值没什么意义,因此为 0;

25~32 这八个字节为 sh_offset,即该 sht 对应的 Section table 的在文件中的 offset,值为 0x40

33~40 这八个字节为 sh_size,表示 Sectionn table 的大小,这里为 0x0e

41~48 这八个字节为 sh_linksh_info,这里对其意义暂时不予讨论;

49~56 这八个字节为 sh_addralign,表示节头表的对齐方式;

59~64 这八个字节为 sh_entsize,表示 Section table 的单个表项的大小,这里值为 0x0,没有意义;

我们可以查看 .symtab sht 的 sh_entsize0x18,其 sh_size0xa8,$\frac{\texttt{0xa8}}{\texttt{0x18}} = \texttt{0x7}$,所以 .symtab Section table 一共有七个表项。

执行 readelf -s elf.o,输出为

Symbol table '.symtab' contains 7 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS elf.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 .text
     3: 0000000000000000     8 OBJECT  GLOBAL DEFAULT    2 data1
     4: 0000000000000008     8 OBJECT  GLOBAL DEFAULT    2 data2
     5: 0000000000000000     7 FUNC    GLOBAL DEFAULT    1 func1
     6: 0000000000000007     7 FUNC    GLOBAL DEFAULT    1 func2

输出与计算结果一致。

参考 链接到标题

计算机那些事 (4)——ELF 文件结构

CSAPP

yangaaamin