- Learning Linux Binary Analysis
- Ryan “elfmaster” O'Neill
- 1831字
- 2021-07-16 12:56:53
ELF symbols
Symbols are a symbolic reference to some type of data or code such as a global variable or function. For instance, the printf()
function is going to have a symbol entry that points to it in the dynamic symbol table .dynsym
. In most shared libraries and dynamically linked executables, there exist two symbol tables. In the readelf -S
output shown previously, you can see two sections: .dynsym
and .symtab
.
The .dynsym
contains global symbols that reference symbols from an external source, such as libc
functions like printf
, whereas the symbols contained in .symtab
will contain all of the symbols in .dynsym
, as well as the local symbols for the executable, such as global variables, or local functions that you have defined in your code. So .symtab
contains all of the symbols, whereas .dynsym
contains just the dynamic/global symbols.
So the question is: Why have two symbol tables if .symtab
already contains everything that's in .dynsym
? If you check out the readelf -S
output of an executable, you will see that some sections are marked A (ALLOC) or WA (WRITE/ALLOC) or AX (ALLOC/EXEC). If you look at .dynsym
, you will see that it is marked ALLOC, whereas .symtab
has no flags.
ALLOC means that the section will be allocated at runtime and loaded into memory, and .symtab
is not loaded into memory because it is not necessary for runtime. The .dynsym
contains symbols that can only be resolved at runtime, and therefore they are the only symbols needed at runtime by the dynamic linker. So, while the .dynsym
symbol table is necessary for the execution of dynamically linked executables, the .symtab
symbol table exists only for debugging and linking purposes and is often stripped (removed) from production binaries to save space.
Let's take a look at what an ELF symbol entry looks like for 64-bit ELF files:
typedef struct { uint32_t st_name; unsigned char st_info; unsigned char st_other; uint16_t st_shndx; Elf64_Addr st_value; Uint64_t st_size; } Elf64_Sym;
Symbol entries are contained within the .symtab
and .dynsym
sections, which is why the sh_entsize
(section header entry size) for those sections are equivalent to sizeof(ElfN_Sym)
.
st_name
The st_name
contains an offset into the symbol table's string table (located in either .dynstr
or .strtab
), where the name of the symbol is located, such as printf
.
st_value
The st_value
holds the value of the symbol (either an address or offset of its location).
st_size
The st_size
contains the size of the symbol, such as the size of a global function ptr
, which would be 4 bytes on a 32-bit system.
st_other
This member defines the symbol visibility.
st_shndx
Every symbol table entry is defined in relation to some section. This member holds the relevant section header table index.
st_info
The st_info
specifies the symbol type and binding attributes. For a complete list of these types and attributes, consult the ELF(5) man page. The symbol types start with STT whereas the symbol bindings start with STB. As an example, a few common ones are as explained in the next sections.
Symbol types
We've got the following symbol types:
STT_NOTYPE
: The symbols type is undefinedSTT_FUNC
: The symbol is associated with a function or other executable codeSTT_OBJECT
: The symbol is associated with a data object
Symbol bindings
We've got the following symbol bindings:
STB_LOCAL
: Local symbols are not visible outside the object file containing their definition, such as a function declared static.STB_GLOBAL
: Global symbols are visible to all object files being combined. One file's definition of a global symbol will satisfy another file's undefined reference to the same symbol.STB_WEAK
: Similar to global binding, but with less precedence, meaning that the binding is weak and may be overridden by another symbol (with the same name) that is not marked asSTB_WEAK
.
There are macros for packing and unpacking the binding and type fields:
ELF32_ST_BIND(info)
orELF64_ST_BIND(info)
extract a binding from anst_info
valueELF32_ST_TYPE(info)
orELF64_ST_TYPE(info)
extract a type from anst_info
valueELF32_ST_INFO(bind, type)
orELF64_ST_INFO(bind, type)
convert a binding and a type into anst_info
value
Let's look at the symbol table for the following source code:
static inline void foochu() { /* Do nothing */ } void func1() { /* Do nothing */ } _start() { func1(); foochu(); }
The following is the command to see the symbol table entries for functions foochu
and func1
:
ryan@alchemy:~$ readelf -s test | egrep 'foochu|func1' 7: 080480d8 5 FUNC LOCAL DEFAULT 2 foochu 8: 080480dd 5 FUNC GLOBAL DEFAULT 2 func1
We can see that the foochu
function is a value of 0x80480da
, and is a function (STT_FUNC
) that has a local symbol binding (STB_LOCAL
). If you recall, we talked a little bit about LOCAL
bindings, which mean that the symbol cannot be seen outside the object file it is defined it, which is why foochu
is local, since we declared it with the static keyword in our source code.
Symbols make life easier for everyone; they are a part of ELF objects for the purpose of linking, relocation, readable disassembly, and debugging. This brings me to the topic of a useful tool that I coded in 2013, named ftrace
. Similar to, and in the same spirit of ltrace
and strace
, ftrace
will trace all of the function calls made within the binary and can also show other branch instructions such as jumps. I originally designed ftrace
to help in reversing binaries for which I didn't have the source code while at work. The ftrace
is considered to be a dynamic analysis tool. Let's take a look at some of its capabilities. We compile a binary with the following source code:
#include <stdio.h> int func1(int a, int b, int c) { printf("%d %d %d\n", a, b ,c); } int main(void) { func1(1, 2, 3); }
Now, assuming that we don't have the preceding source code and we want to know the inner workings of the binary that it compiles into, we can run ftrace
on it. First let's look at the synopsis:
ftrace [-p <pid>] [-Sstve] <prog>
The usage is as follows:
[-p]
: This traces by PID[-t]
: This is for the type detection of function args[-s]
: This prints string values[-v]
: This gives a verbose output[-e]
: This gives miscellaneous ELF information (symbols, dependencies)[-S]
: This shows function calls with stripped symbols[-C]
: This completes the control flow analysis
Let's give it a try:
ryan@alchemy:~$ ftrace -s test [+] Function tracing begins here: PLT_call@0x400420:__libc_start_main() LOCAL_call@0x4003e0:_init() (RETURN VALUE) LOCAL_call@0x4003e0: _init() = 0 LOCAL_call@0x40052c:func1(0x1,0x2,0x3) // notice values passed PLT_call@0x400410:printf("%d %d %d\n") // notice we see string value 1 2 3 (RETURN VALUE) PLT_call@0x400410: printf("%d %d %d\n") = 6 (RETURN VALUE) LOCAL_call@0x40052c: func1(0x1,0x2,0x3) = 6 LOCAL_call@0x400470:deregister_tm_clones() (RETURN VALUE) LOCAL_call@0x400470: deregister_tm_clones() = 7
A clever inpidual might now be asking: What happens if a binary's symbol table has been stripped? That's right; you can strip a binary of its symbol table; however, a dynamically linked executable will always retain .dynsym
but will discard .symtab
if it is stripped, so only the imported library symbols will show up.
If a binary is compiled statically (gcc-static
) or without libc
linking (gcc-nostdlib
), and it is then stripped with the strip
command, a binary will have no symbol table at all since the dynamic symbol table is no longer imperative. The ftrace
behaves differently with the –S
flag that tells ftrace
to show every function call even if there is no symbol attached to it. When using the –S
flag, ftrace
will display function names as SUB_<address_of_function>
, similar to how IDA pro will show functions that have no symbol table reference.
Let's look at the following very simple source code:
int foo(void) { } _start() { foo(); __asm__("leave"); }
The preceding source code simply calls the foo()
function and exits. The reason we are using _start()
instead of main()
is because we compile it with the following:
gcc -nostdlib test2.c -o test2
The gcc
flag -nostdlib
directs the linker to omit standard libc
linking conventions and to simply compile the code that we have and nothing more. The default entry point is a symbol called _start()
:
ryan@alchemy:~$ ftrace ./test2 [+] Function tracing begins here: LOCAL_call@0x400144:foo() (RETURN VALUE) LOCAL_call@0x400144: foo() = 0 Now let's strip the symbol table and run ftrace on it again: ryan@alchemy:~$ strip test2 ryan@alchemy:~$ ftrace -S test2 [+] Function tracing begins here: LOCAL_call@0x400144:sub_400144() (RETURN VALUE) LOCAL_call@0x400144: sub_400144() = 0
We now notice that foo()
function has been replaced by sub_400144()
, which shows that the function call is happening at address 0x400144
. Now if we look at the binary test2
before we stripped the symbols, we can see that 0x400144
is indeed where foo()
is located:
ryan@alchemy:~$ objdump -d test2 test2: file format elf64-x86-64 Disassembly of section .text: 0000000000400144<foo>: 400144: 55 push %rbp 400145: 48 89 e5 mov %rsp,%rbp 400148: 5d pop %rbp 400149: c3 retq 000000000040014a <_start>: 40014a: 55 push %rbp 40014b: 48 89 e5 mov %rsp,%rbp 40014e: e8 f1 ff ff ff callq 400144 <foo> 400153: c9 leaveq 400154: 5d pop %rbp 400155: c3 retq
In fact, to give you a really good idea of how helpful symbols can be to reverse engineers (when we have them), let's take a look at the test2
binary, this time without symbols to demonstrate how it becomes slightly less obvious to read. This is primarily because branch instructions no longer have a symbol name attached to them, so analyzing the control flow becomes more tedious and requires more annotation, which some disassemblers like IDA-pro allow us to do as we go:
$ objdump -d test2 test2: file format elf64-x86-64 Disassembly of section .text: 0000000000400144 <.text>: 400144: 55 push %rbp 400145: 48 89 e5 mov %rsp,%rbp 400148: 5d pop %rbp 400149: c3 retq 40014a: 55 push %rbp 40014b: 48 89 e5 mov %rsp,%rbp 40014e: e8 f1 ff ff ff callq 0x400144 400153: c9 leaveq 400154: 5d pop %rbp 400155: c3 retq
The only thing to give us an idea where a new function starts is by examining the procedure prologue, which is at the beginning of every function, unless (gcc -fomit-frame-pointer
) has been used, in which case it becomes less obvious to identify.
This book assumes that the reader already has some knowledge of assembly language, since teaching x86 asm is not the goal of this book, but notice the preceding emboldened procedure prologue, which helps denote the start of each function. The procedure prologue just sets up the stack frame for each new function that has been called by backing up the base pointer on the stack and setting its value to the stack pointers before the stack pointer is adjusted to make room for local variables. This way variables can be referenced as positive offsets from a fixed address stored in the base pointer register ebp/rbp
.
Now that we've gotten a grasp on symbols, the next step is to understand relocations. We will see in the next section how symbols, relocations, and sections are all closely tied together and live at the same level of abstraction within the ELF format.
- Rust編程:入門、實戰(zhàn)與進階
- Processing互動編程藝術(shù)
- JavaScript從入門到精通(第3版)
- MongoDB權(quán)威指南(第3版)
- Python之光:Python編程入門與實戰(zhàn)
- 圖數(shù)據(jù)庫實戰(zhàn)
- Access 2010數(shù)據(jù)庫應(yīng)用技術(shù)實驗指導(dǎo)與習題選解(第2版)
- Hadoop大數(shù)據(jù)分析技術(shù)
- 玩轉(zhuǎn).NET Micro Framework移植:基于STM32F10x處理器
- Modernizing Legacy Applications in PHP
- ROS機器人編程實戰(zhàn)
- Android嵌入式系統(tǒng)程序開發(fā)(基于Cortex-A8)
- Building UIs with Wijmo
- Mastering Python
- Web前端開發(fā)精品課:HTML5 Canvas開發(fā)詳解