- Learning Linux Binary Analysis
- Ryan “elfmaster” O'Neill
- 1979字
- 2021-07-16 12:56:54
ELF relocations
From the ELF(5) man pages:
Relocation is the process of connecting symbolic references with symbolic definitions. Relocatable files must have information that describes how to modify their section contents, thus allowing executable and shared object files to hold the right information for a process's program image. Relocation entries are these data.
The process of relocation relies on symbols and sections, which is why we covered symbols and sections first. In relocations, there are relocation records, which essentially contain information about how to patch the code related to a given symbol. Relocations are literally a mechanism for binary patching and even hot-patching in memory when the dynamic linker is involved. The linker program: /bin/ld
that is used to create executable files, and shared libraries must have some type of metadata that describes how to patch certain instructions. This metadata is stored as what we call relocation records. I will further explain relocations by using an example.
Imagine having two object files linked together to create an executable. We have obj1.o
that contains the code to call a function named foo()
that is located in obj2.o
. Both obj1.o and obj2.o
are analyzed by the linker program and contain relocation records so that they may be linked to create a fully working executable program. Symbolic references will be resolved into symbolic definitions, but what does that even mean? Object files are relocatable code, which means that it is code that is meant to be relocated to a location at a given address within an executable segment. Before the relocation process happens, the code has symbols and code that will not properly function or cannot be properly referenced without first knowing their location in memory. These must be patched after the position of the instruction or symbol within the executable segment is known by the linker.
Let's take a quick look at a 64-bit relocation entry:
typedef struct { Elf64_Addr r_offset; Uint64_t r_info; } Elf64_Rel;
And some relocation entries require an addend:
typedef struct { Elf64_Addr r_offset; uint64_t r_info; int64_t r_addend; } Elf64_Rela;
The r_offset
points to the location that requires the relocation action. A relocation action describes the details of how to patch the code or data contained at r_offset
.
The r_info
gives both the symbol table index with respect to which the relocation must be made and the type of relocation to apply.
The r_addend
specifies a constant addend used to compute the value stored in the relocatable field.
The relocation records for 32-bit ELF files are the same as for 64-bit, but use 32-bit integers. The following example for are object file code will be compiled as 32-bit so that we can demonstrate implicit addends, which are not as commonly used in 64-bit. An implicit addend occurs when the relocation records are stored in ElfN_Rel type structures that don't contain an r_addend
field and therefore the addend is stored in the relocation target itself. The 64-bit executables tend to use the ElfN_Rela
structs that contain an explicit addend. I think it is worth understanding both scenarios, but implicit addends are a little more confusing, so it makes sense to bring light to this area.
Let's take a look at the source code:
_start() { foo(); }
We see that it calls the foo()
function. However, the foo()
function is not located directly within that source code file; so, upon compiling, there will be a relocation entry created that is necessary for later satisfying the symbolic reference:
$ objdump -d obj1.o obj1.o: file format elf32-i386 Disassembly of section .text: 00000000 <func>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 ec 08 sub $0x8,%esp 6: e8 fc ff ff ff call 7 <func+0x7> b: c9 leave c: c3 ret
As we can see, the call to foo()
is highlighted and it contains the value 0xfffffffc
, which is the implicit addend. Also notice the call 7
. The number 7
is the offset of the relocation target to be patched. So when obj1.o
(which calls foo()
located in obj2.o
) is linked with obj2.o
to make an executable, a relocation entry that points at offset 7
is processed by the linker, telling it which location (offset 7) needs to be modified. The linker then patches the 4 bytes at offset 7 so that it will contain the real offset to the foo()
function, after foo()
has been positioned somewhere within the executable.
Note
The call instruction e8 fc ff ff ff
contains the implicit addend and is important to remember for this lesson; the value 0xfffffffc
is -(4)
or -(sizeof(uint32_t))
. A dword is 4 bytes on a 32-bit system, which is the size of this relocation target.
$ readelf -r obj1.o Relocation section '.rel.text' at offset 0x394 contains 1 entries: Offset Info Type Sym.Value Sym. Name 00000007 00000902 R_386_PC32 00000000 foo
As we can see, a relocation field at offset 7 is specified by the relocation entry's r_offset
field.
R_386_PC32
is the relocation type. To understand all of these types, read the ELF specs. Each relocation type requires a different computation on the relocation target being modified.R_386_PC32
modifies the target withS + A – P
.S
is the value of the symbol whose index resides in the relocation entry.A
is the addend found in the relocation entry.P
is the place (section offset or address) of the storage unit being relocated (computed usingr_offset
).
Let's look at the final output of our executable after compiling obj1.o
and obj2.o
on a 32-bit system:
$ gcc -nostdlib obj1.o obj2.o -o relocated $ objdump -d relocated test: file format elf32-i386 Disassembly of section .text: 080480d8 <func>: 80480d8: 55 push %ebp 80480d9: 89 e5 mov %esp,%ebp 80480db: 83 ec 08 sub $0x8,%esp 80480de: e8 05 00 00 00 call 80480e8 <foo> 80480e3: c9 leave 80480e4: c3 ret 80480e5: 90 nop 80480e6: 90 nop 80480e7: 90 nop 080480e8 <foo>: 80480e8: 55 push %ebp 80480e9: 89 e5 mov %esp,%ebp 80480eb: 5d pop %ebp 80480ec: c3 ret
We can see that the call instruction (the relocation target) at 0x80480de has been modified with the 32-bit offset value of 5
, which points foo()
. The value 5
is the result of the R386_PC_32
relocation action:
S + A – P: 0x80480e8 + 0xfffffffc – 0x80480df = 5
The 0xfffffffc
is the same as –4
if a signed integer, so the calculation can also be seen as:
0x80480e8 + (0x80480df + sizeof(uint32_t))
To calculate an offset into a virtual address, use the following computation:
address_of_call + offset + 5 (Where 5 is the length of the call instruction)
Which in this case is 0x80480de + 5 + 5 = 0x80480e8
.
Note
Pay attention to this computation as it is important to remember and can be used when calculating offsets to addresses frequently.
An address may also be computed into an offset with the following computation:
address – address_of_call – 4 (Where 4 is the length of the immediate operand to the call instruction, which is 32bits).
As mentioned previously, the ELF specs cover ELF relocations in depth, and we will be visiting some of the types used in dynamic linking in the next section, such as R386_JMP_SLOT
relocation entries.
Relocatable code injection-based binary patching
Relocatable code injection is a technique that hackers, virus writers, or anyone who wants to modify the code in a binary may utilize as a way to relink a binary after it's already been compiled and linked into an executable. That is, you can inject an object file into an executable, update the executable's symbol table to reflect newly inserted functionality, and perform the necessary relocations on the injected object code so that it becomes a part of the executable.
A complicated virus might use this technique rather than just appending position-independent code. This technique requires making room in the target executable to inject the code, followed by applying the relocations. We will cover binary infection and code injection more thoroughly in Chapter 4, ELF Virus Technology – Linux/Unix Viruses.
As mentioned in Chapter 1, The Linux Environment and Its Tools, there is an amazing tool called Eresi (http://www.eresi-project.org), which is capable of relocatable code injection (aka ET_REL
injection). I also designed a custom reverse engineering tool for ELF, namely, Quenya. It is very old but can be found at http://www.bitlackeys.org/projects/quenya_32bit.tgz. Quenya has many features and capabilities, and one of them is to inject object code into an executable. This can be very useful for patching a binary by hijacking a given function. Quenya is only a prototype and was never developed to the extent that the Eresi project was. I am only using it as an example because I am more familiar with it; however, I will say that for more reliable results, it may be desirable to either use Eresi or write your own tooling.
Let us pretend we are an attacker and we want to infect a 32-bit program that calls puts()
to print Hello World
. Our goal is to hijack puts()
so that it calls evil_puts()
:
#include <sys/syscall.h> int _write (int fd, void *buf, int count) { long ret; __asm__ __volatile__ ("pushl %%ebx\n\t" "movl %%esi,%%ebx\n\t" "int $0x80\n\t""popl %%ebx":"=a" (ret) :"0" (SYS_write), "S" ((long) fd), "c" ((long) buf), "d" ((long) count)); if (ret >= 0) { return (int) ret; } return -1; } int evil_puts(void) { _write(1, "HAHA puts() has been hijacked!\n", 31); }
Now we compile evil_puts.c
into evil_puts.o
and inject it into our program called ./hello_world
:
$ ./hello_world Hello World
This program calls the following:
puts("Hello World\n");
We now use Quenya
to inject and relocate our evil_puts.o
file into hello_world
:
[Quenya v0.1@alchemy] reloc evil_puts.o hello_world 0x08048624 addr: 0x8048612 0x080485c4 _write addr: 0x804861e 0x080485c4 addr: 0x804868f 0x080485c4 addr: 0x80486b7 Injection/Relocation succeeded
As we can see, the write()
function from our evil_puts.o
object file has been relocated and assigned an address at 0x804861e
in the executable file hello_world
. The next command hijack overwrites the global offset table entry for puts()
with the address of evil_puts()
:
[Quenya v0.1@alchemy] hijack binary hello_world evil_puts puts Attempting to hijack function: puts Modifying GOT entry for puts Successfully hijacked function: puts Committing changes into executable file [Quenya v0.1@alchemy] quit
And Whammi!
ryan@alchemy:~/quenya$ ./hello_world HAHA puts() has been hijacked!
We have successfully relocated an object file into an executable and modified the executable's control flow so that it executes the code that we injected. If we use readelf -s
on hello_world
, we can actually now see a symbol for evil_puts()
.
For your interest, I have included a small snippet of code that contains the ELF relocation mechanics in Quenya; it may be a little bit obscure without seeing the rest of the code base, but it is also somewhat straightforward if you have retained what we learned about relocations:
switch(obj.shdr[i].sh_type) { case SHT_REL: /* Section contains ElfN_Rel records */ rel = (Elf32_Rel *)(obj.mem + obj.shdr[i].sh_offset); for (j = 0; j < obj.shdr[i].sh_size / sizeof(Elf32_Rel); j++, rel++) { /* symbol table */ symtab = (Elf32_Sym *)obj.section[obj.shdr[i].sh_link]; /* symbol we are applying relocation to */ symbol = &symtab[ELF32_R_SYM(rel->r_info)]; /* section to modify */ TargetSection = &obj.shdr[obj.shdr[i].sh_info]; TargetIndex = obj.shdr[i].sh_info; /* target location */ TargetAddr = TargetSection->sh_addr + rel->r_offset; /* pointer to relocation target */ RelocPtr = (Elf32_Addr *)(obj.section[TargetIndex] + rel->r_offset); /* relocation value */ RelVal = symbol->st_value; RelVal += obj.shdr[symbol->st_shndx].sh_addr; printf("0x%08x %s addr: 0x%x\n",RelVal, &SymStringTable[symbol->st_name], TargetAddr); switch (ELF32_R_TYPE(rel->r_info)) { /* R_386_PC32 2 word32 S + A - P */ case R_386_PC32: *RelocPtr += RelVal; *RelocPtr -= TargetAddr; break; /* R_386_32 1 word32 S + A */ case R_386_32: *RelocPtr += RelVal; break; } }
As shown in the preceding code, the relocation target that RelocPtr
points to is modified according to the relocation action requested by the relocation type (such as R_386_32
).
Although relocatable code binary injection is a good example of the idea behind relocations, it is not a perfect example of how a linker actually performs it with multiple object files. Nevertheless, it still retains the general idea and application of a relocation action. Later on we will talk about shared library (ET_DYN
) injection, which brings us now to the topic of dynamic linking.
- Java程序設計(慕課版)
- Learning Java Functional Programming
- LabVIEW 2018 虛擬儀器程序設計
- Scratch 3.0少兒編程與邏輯思維訓練
- Python機器學習實戰
- Serverless架構
- Mastering JavaScript High Performance
- 大學計算機基礎實驗指導
- Protocol-Oriented Programming with Swift
- Laravel Design Patterns and Best Practices
- Unity 5 Game Optimization
- Clojure編程樂趣
- Web開發新體驗
- 城市信息模型平臺頂層設計與實踐
- jBPM6 Developer Guide