- Learning Linux Binary Analysis
- Ryan “elfmaster” O'Neill
- 2667字
- 2021-07-16 12:56:54
ELF dynamic linking
In the old days, everything was statically linked. If a program used external library functions, the entire library was compiled directly into the executable. ELF supports dynamic linking, which is a much more efficient way to go about handling shared libraries.
When a program is loaded into memory, the dynamic linker also loads and binds the shared libraries that are needed to that process address space. The topic of dynamic linking is rarely understood by people in any depth as it is a relatively complex procedure and seems to work like magic under the hood. In this section, we will demystify some of its complexities and reveal how it works and also how it can be abused by attackers.
Shared libraries are compiled as position-independent and can therefore be easily relocated into a process address space. A shared library is a dynamic ELF object. If you look at readelf -h lib.so
, you will see that the e_type
(ELF file type) is called ET_DYN
. Dynamic objects are very similar to executables. They do not typically have a PT_INTERP
segment since they are loaded by the program interpreter, and therefore will not be invoking a program interpreter.
When a shared library is loaded into a process address space, it must have any relocations satisfied that reference other shared libraries. The dynamic linker must modify the GOT (Global offset table) of the executable (located in the section .got.plt
), which is a table of addresses located in the data segment. It is in the data segment because it must be writeable (at least initially; see read-only relocations as a security feature). The dynamic linker patches the GOT with resolved shared library addresses. We will explain the process of lazy linking shortly.
The auxiliary vector
When a program gets loaded into memory by the sys_execve()
syscall, the executable is mapped in and given a stack (among other things). The stack for that process address space is set up in a very specific way to pass information to the dynamic linker. This particular setup and arrangement of information is known as the auxiliary vector or auxv. The bottom of the stack (which is its highest memory address since the stack grows down on x86 architecture) is loaded with the following information:
[argc][argv][envp][auxiliary][.ascii data for argv/envp]
The auxiliary vector (or auxv) is a series of ElfN_auxv_t structs.
typedef struct { uint64_t a_type; /* Entry type */ union { uint64_t a_val; /* Integer value */ } a_un; } Elf64_auxv_t;
The a_type
describes the auxv entry type, and the a_val provides its value. The following are some of the most important entry types that are needed by the dynamic linker:
#define AT_EXECFD 2 /* File descriptor of program */ #define AT_PHDR 3 /* Program headers for program */ #define AT_PHENT 4 /* Size of program header entry */ #define AT_PHNUM 5 /* Number of program headers */ #define AT_PAGESZ 6 /* System page size */ #define AT_ENTRY 9 /* Entry point of program */ #define AT_UID 11 /* Real uid */
The dynamic linker retrieves information from the stack about the executing program. The linker must know where the program headers are, the entry point of the program, and so on. I listed only a few of the auxv entry types previously, taken from /usr/include/elf.h
.
The auxiliary vector gets set up by a kernel function called create_elf_tables()
that resides in the Linux source code /usr/src/linux/fs/binfmt_elf.c
.
In fact, the execution process from the kernel looks something like the following:
sys_execve()
→.- Calls
do_execve_common()
→. - Calls
search_binary_handler()
→. - Calls
load_elf_binary()
→. - Calls
create_elf_tables()
→.
The following is some of the code from create_elf_tables()
in /usr/src/linux/fs/binfmt_elf.c
that adds auxv entries:
NEW_AUX_ENT(AT_PAGESZ, ELF_EXEC_PAGESIZE); NEW_AUX_ENT(AT_PHDR, load_addr + exec->e_phoff); NEW_AUX_ENT(AT_PHENT, sizeof(struct elf_phdr)); NEW_AUX_ENT(AT_PHNUM, exec->e_phnum); NEW_AUX_ENT(AT_BASE, interp_load_addr); NEW_AUX_ENT(AT_ENTRY, exec->e_entry);
As you can see, the ELF entry point and the address of the program headers, among other values, are placed onto the stack with the NEW_AUX_ENT()
macro in the kernel.
Once a program is loaded into memory and the auxiliary vector has been filled in, control is passed to the dynamic linker. The dynamic linker resolves symbols and relocations for shared libraries that are linked into the process address space. By default, an executable is dynamically linked with the GNU C library libc.so
. The ldd
command will show you the shared library dependencies of a given executable.
Learning about the PLT/GOT
The PLT (procedure linkage table) and GOT (Global offset table) can be found in executable files and shared libraries. We will be focusing specifically on the PLT/GOT of an executable program. When a program calls a shared library function such as strcpy()
or printf()
, which are not resolved until runtime, there must exist a mechanism to dynamically link the shared libraries and resolve the addresses to the shared functions. When a dynamically linked program is compiled, it handles shared library function calls in a specific way, far differently from a simple call
instruction to a local function.
Let's take a look at a call to the libc.so function fgets()
in a 32-bit compiled ELF executable. We will use a 32-bit executable in our examples because the relationship with the GOT is easier to visualize since IP relative addressing is not used, as it is in 64-bit executables:
objdump -d test ... 8048481: e8 da fe ff ff call 8048360<fgets@plt> ...
The address 0x8048360
corresponds to the PLT entry for fgets()
. Let's take a look at that address in our executable:
objdump -d test (grep for 8048360) ... 08048360<fgets@plt>: /* A jmp into the GOT */ 8048360: ff 25 00 a0 04 08 jmp *0x804a000 8048366: 68 00 00 00 00 push $0x0 804836b: e9 e0 ff ff ff jmp 8048350 <_init+0x34> ...
So the call to fgets()
leads to 8048360, which is the PLT jump table entry for fgets()
. As we can see, there is an indirect jump to the address stored at 0x804a000
in the preceding disassembled code output. This address is a GOT (Global offset table) entry that holds the address to the actual fgets()
function in the libc shared library.
However, the first time a function is called, its address has not yet been resolved by the dynamic linker, when the default behavior lazy linking is being used. Lazy linking implies that the dynamic linker should not resolve every function at program loading time. Instead, it will resolve the functions as they are called, which is made possible through the .plt
and .got.plt
sections (which correspond to the Procedure linkage table, and the Global offset table, respectively). This behavior can be changed to what is called strict linking with the LD_BIND_NOW
environment variable so that all dynamic linking happens right at program loading time. Lazy linking increases performance at load time, which is why it is the default behavior, but it also can be unpredictable since a linking error may not occur until after the program has been running for some time. I have actually only experienced this myself one time over the course of years. It is also worth noting that some security features, namely, read-only relocations cannot be applied unless strict linking is enabled because the .plt.got
section (among others) is marked read-only; this can only occur after the dynamic linker has finished patching it, and thus strict linking must be used.
Let's take a look at the relocation entry for fgets()
:
$ readelf -r test Offset Info Type SymValue SymName ... 0804a000 00000107 R_386_JUMP_SLOT 00000000 fgets ...
Note
R_386_JUMP_SLOT
is a relocation type for PLT/GOT entries. On x86_64
, it is called R_X86_64_JUMP_SLOT
.
Notice that the relocation offset is the address 0x804a000, the same address that the fgets()
PLT jumps into. Assuming that fgets()
is being called for the first time, the dynamic linker has to resolve the address of fgets()
and place its value into the GOT entry for fgets()
.
Let's take a look at the GOT in our test program:
08049ff4 <_GLOBAL_OFFSET_TABLE_>: 8049ff4: 28 9f 04 08 00 00 sub %bl,0x804(%edi) 8049ffa: 00 00 add %al,(%eax) 8049ffc: 00 00 add %al,(%eax) 8049ffe: 00 00 add %al,(%eax) 804a000: 66 83 04 08 76 addw $0x76,(%eax,%ecx,1) 804a005: 83 04 08 86 addl $0xffffff86,(%eax,%ecx,1) 804a009: 83 04 08 96 addl $0xffffff96,(%eax,%ecx,1) 804a00d: 83 .byte 0x83 804a00e: 04 08 add $0x8,%al
The address 0x08048366
is highlighted in the preceding and is found at 0x804a000
in the GOT. Remember that little endian reverses the byte order, so it appears as 66 83 04 08
. This address is not the address to the fgets()
function since it has not yet been resolved by the linker, but instead points back down into the PLT entry for fgets()
. Let's look at the PLT entry for fgets()
again:
08048360 <fgets@plt>: 8048360: ff 25 00 a0 04 08 jmp *0x804a000 8048366: 68 00 00 00 00 push $0x0 804836b: e9 e0 ff ff ff jmp 8048350 <_init+0x34>
So, jmp *0x804a000
jumps to the contained address there within 0x8048366
, which is the push $0x0
instruction. That push instruction has a purpose, which is to push the GOT entry for fgets()
onto the stack. The GOT entry offset for fgets()
is 0x0, which corresponds to the first GOT entry that is reserved for a shared library symbol value, which is actually the fourth GOT entry, GOT[3]. In other words, the shared library addresses don't get plugged in starting at GOT[0] and they begin at GOT[3] (the fourth entry) because the first three are reserved for other purposes.
Note
Take note of the following GOT offsets:
- GOT[0] contains an address that points to the dynamic segment of the executable, which is used by the dynamic linker for extracting dynamic linking-related information
- GOT[1] contains the address of the
link_map
structure that is used by the dynamic linker to resolve symbols - GOT[2] contains the address to the dynamic linkers
_dl_runtime_resolve()
function that resolves the actual symbol address for the shared library function
The last instruction in the fgets()
PLT stub is a jmp 8048350. This address points to the very first PLT entry in every executable, known as PLT-0.
PLT-0 from our executable contains the following code:
8048350: ff 35 f8 9f 04 08 pushl 0x8049ff8 8048356: ff 25 fc 9f 04 08 jmp *0x8049ffc 804835c: 00 00 add %al,(%eax)
The first pushl
instruction pushes the address of the second GOT entry, GOT[1], onto the stack, which, as noted earlier, contains the address of the link_map
structure.
The jmp *0x8049ffc
performs an indirect jmp into the third GOT entry, GOT[2], which contains the address to the dynamic linkers _dl_runtime_resolve()
function, therefore transferring control to the dynamic linker and resolving the address for fgets()
. Once fgets()
has been resolved, all future calls to the PLT entry forfgets()
will result in a jump to the fgets()
code itself, rather than pointing back into the PLT and going through the lazy linking process again.
The following is a summary of what we have just covered:
- Call
fgets@PLT
(to call thefgets
function). - PLT code does an indirect
jmp
to the address in the GOT. - The GOT entry contains the address that points back into PLT at the
push
instruction. - The
push $0x0
instruction pushes the offset of thefgets()
GOT entry onto the stack. - The final
fgets()
PLT instruction is a jmp to the PLT-0 code. - The first instruction of PLT-0 pushes the address of GOT[1] onto the stack that contains an offset into the
link_map
struct forfgets()
. - The second instruction of PLT-0 is a jmp to the address in GOT[2] that points to the dynamic linker's
_dl_runtime_resolve()
, which then handles theR_386_JUMP_SLOT
relocation by adding the symbol value (memory address) offgets()
to its corresponding GOT entry in the.got.plt
section.
The next time fgets()
is called, the PLT entry will jump directly to the function itself rather than having to perform the relocation procedure again.
The dynamic segment revisited
I earlier referenced the dynamic segment as a section named .dynamic
. The dynamic segment has a section header referencing it, but it also has a program header referencing it because it must be found during runtime by the dynamic linker; since section headers don't get loaded into memory, there has to be an associated program header for it.
The dynamic segment contains an array of structs of type ElfN_Dyn
:
typedef struct { Elf32_Sword d_tag; union { Elf32_Word d_val; Elf32_Addr d_ptr; } d_un; } Elf32_Dyn;
The d_tag
field contains a tag that matches one of the numerous definitions that can be found in the ELF(5) man pages. I have listed some of the most important ones used by the dynamic linker.
DT_NEEDED
This holds the string table offset to the name of a needed shared library.
DT_SYMTAB
This contains the address of the dynamic symbol table also known by its section name .dynsym
.
DT_HASH
This holds the address of the symbol hash table, also known by its section name .hash
(or sometimes named .gnu.hash
).
DT_STRTAB
This holds the address of the symbol string table, also known by its section name .dynstr
.
DT_PLTGOT
This holds the address of the global offset table.
Note
The preceding dynamic tags demonstrate how the location of certain sections can be found through the dynamic segment that can aid in the forensics reconstruction task of rebuilding a section header table. If the section header table has been stripped, a clever inpidual can rebuild parts of it by getting information from the dynamic segment (that is, the .dynstr, .dynsym, and .hash, among others).
Other segments such as text and data can yield information that you need as well (such as for the .text
and .data
sections).
The d_val
member of ElfN_Dyn
holds an integer value that has various interpretations such as being the size of a relocation entry to give one instance.
The d_ptr
member holds a virtual memory address that can point to various locations needed by the linker; a good example would be the address to the symbol table for the d_tag
DT_SYMTAB
.
The dynamic linker utilizes the ElfN_Dyn
d_tags
to locate the different parts of the dynamic segment that contain a reference to a part of the executable through the d_tag
such as DT_SYMTAB
, which has a d_ptr
to give the virtual address to the symbol table.
When the dynamic linker is mapped into memory, it first handles any of its own relocations if necessary; remember that the linker is a shared library itself. It then looks at the executable program's dynamic segment and searches for the DT_NEEDED
tags that contain pointers to the strings or pathnames of the necessary shared libraries. When it maps a needed shared library into the memory, it accesses the library's dynamic segment (yes, they too have dynamic segments) and adds the library's symbol table to a chain of symbol tables that exists to hold the symbol tables for each mapped library.
The linker creates a struct link_map
entry for each shared library and stores it in a linked list:
struct link_map { ElfW(Addr) l_addr; /* Base address shared object is loaded at. */ char *l_name; /* Absolute file name object was found in. */ ElfW(Dyn) *l_ld; /* Dynamic section of the shared object. */ struct link_map *l_next, *l_prev; /* Chain of loaded objects. */ };
Once the linker has finished building its list of dependencies, it handles the relocations on each library, similar to the relocations we discussed earlier in this chapter, as well as fixing up the GOT of each shared library. Lazy linking still applies to the PLT/GOT of shared libraries as well, so GOT relocations (of type R_386_JMP_SLOT
) won't happen until the point when a function has actually been called.
For more detailed information on ELF and dynamic linking, read the ELF specification online or take a look at some of the interesting glibc source code available. Hopefully, dynamic linking has become less of a mystery and more of an intrigue at this point. In Chapter 7, Process Memory Forensics we will be covering PLT/GOT poisoning techniques to redirect shared library function calls. A very fun technique is to subvert dynamic linking.
- Raspberry Pi for Python Programmers Cookbook(Second Edition)
- INSTANT OpenCV Starter
- Functional Kotlin
- 人人都懂設計模式:從生活中領悟設計模式(Python實現)
- Java網絡編程核心技術詳解(視頻微課版)
- Babylon.js Essentials
- Internet of Things with ESP8266
- Arduino Wearable Projects
- 區塊鏈架構之美:從比特幣、以太坊、超級賬本看區塊鏈架構設計
- Java EE Web應用開發基礎
- Mapping with ArcGIS Pro
- 軟件測試技術
- Cloud Development andDeployment with CloudBees
- Python程序設計案例教程
- VBA Automation for Excel 2019 Cookbook