官术网_书友最值得收藏!

Components of toolchains

The GNU toolchain is a term used for a collection of programming tools under the GNU Project umbrella. This suite of tools is what is normally called a toolchain, and is used for the development of applications and operating systems. It plays an important role in the development of embedded systems and Linux systems, in particular.

The following projects are included in the GNU toolchain:

  • GNU make: This represents an automation tool used for compilation and build
  • GNU Compiler Collection (GCC): This represents a compiler's suite that is used for a number of available programming languages
  • GNU Binutils: This contains tools, such as linkers, assemblers, and so on - these tools are able to manipulate binaries
  • GNU Bison: This is a parser generator
  • GNU Debugger (GDB): This is a code debugging tool
  • GNU m4: This is an m4 macro processor
  • GNU build system (autotools): This consists of the following:
    • Autoconf
    • Autoheaders
    • Automake
    • Libtool

The projects included in the toolchain is described in the following diagram:

An embedded development environment needs more than a cross-compilation toolchain. It needs libraries and it should target system-specific packages, such as programs, libraries, and utilities, and host specific debuggers, editors, and utilities. In some cases, usually when talking about a company's environment, a number of servers host target devices, and an certain hardware probes are connected to the host through Ethernet or other methods. This emphasizes the fact that an embedded distribution includes a great number of tools, and, usually, a number of these tools require customization. Presenting each of these will take up more than a chapter in a book.

In this book, however, we will cover only the toolchain building components. These include the following:

  • binutils
  • gcc
  • glibc (C libraries)
  • kernel headers

I will start by the introducing the first item on this list, the GNU Binutils package. Developed under the GNU GPL license, it represents a set of tools that are used to create and manage binary files, object code, assembly files, and profile data for a given architecture. Here is a list with the functionalities and names of the available tools for GNU Binutils package:

  • The GNU linker, that is ld
  • The GNU assembler, that is as
  • A utility that converts addresses into filenames and line numbers, that is addr2line
  • A utility to create, extract, and modify archives, that is ar
  • A tool used to listing the symbols available inside object files, that is nm
  • Copying and translating object files, that is objcopy
  • Displaying information from object files, that is objdump
  • Generating an index to for the contents of an archive, that is ranlib
  • Displaying information from any ELF format object file, that is readelf
  • Listing the section sizes of an object or archive file, that is size
  • Listing printable strings from files, that is strings
  • Discarding the symbols utility that is strip
  • Filtering or demangle encoded C++ symbols, that is c++filt
  • Creating files that build use DLLs, that is dlltool
  • A new, faster, ELF-only linker, which is still in beta testing, that is gold
  • Displaying the profiling information tool, that is gprof
  • Converting an object code into an NLM, that is nlmconv
  • A Windows-compatible message compiler, that is windmc
  • A compiler for Windows resource files, that is windres

The majority of these tools use the Binary File Descriptor (BFD) library for low-level data manipulation, and also, many of them use the opcode library to assemble and disassemble operations.

Note

Useful information about binutils can be found at http://www.gnu.org/software/binutils/.

In the toolchain generation process, the next item on the list is represented by kernel headers, and are needed by the C library for interaction with the kernel. Before compiling the corresponding C library, the kernel headers need to be supplied so that they can offer access to the available system calls, data structures, and constants definitions. Of course, any C library defines sets of specifications that are specific to each hardware architecture; here, I am referring to application binary interface (ABI).

An application binary interface (ABI) represents the interface between two modules. It gives information on how functions are called and the kind of information that should be passed between components or to the operating system. Referring to a book, such as The Linux Kernel Primer, will do you good, and in my opinion, is a complete guide for what the ABI offers. I will try to reproduce this definition for you.

An ABI can be seen as a set of rules similar to a protocol or an agreement that offers the possibility for a linker to put together compiled modules into one component without the need of recompilation. At the same time, an ABI describes the binary interface between these components. Having this sort of convention and conforming to an ABI offers the benefits of linking object files that could have been compiled with different compilers.

It can be easily seen from both of these definitions that an ABI is dependent on the type of platform, which can include physical hardware, some kind of virtual machine, and so on. It may also be dependent on the programming language that is used and the compiler, but most of it depends on the platform.

The ABI presents how the generated codes operate. The code generation process must also be aware of the ABI, but when coding in a high-level language, attention given to the ABI is rarely a problem. This information could be considered as necessary knowledge to specify some ABI related options.

As a general rule, ABI must be respected for its interaction with external components. However, with regard to interaction with its internal modules, the user is free to do whatever he or she wants. Basically, they are able to reinvent the ABI and form their own dependence on the limitations of the machine. The simple example here is related to various citizens who belong to their own country or region, because they learned and know the language of that region since they were born. Hence, they are able to understand one another and communicate without problems. For an external citizen to be able to communicate, he or she will need to know the language of a region, and being in this community seems natural, so it will not constitute a problem. Compilers are also able to design their own custom calling conventions where they know the limitations of functions that are called within a module. This exercise is typically done for optimization reasons. However, this can be considered an abuse of the ABI term.

The kernel in reference to a user space ABI is backward compatible, and it makes sure that binaries are generated using older kernel header versions, rather than the ones available on the running kernel, will work best. The disadvantages of this are represented by the fact that new system calls, data structures, and binaries generated with a toolchain that use newer kernel headers, might not work for newer features. The need for the latest kernel headers can be justified by the need to have access to the latest kernel features.

The GNU Compiler Collection, also known as GCC, represents a compiler system that constitutes the key component of the GNU toolchain. Although it was originally named the GNU C Compiler, due to the fact that it only handled the C programming language, it soon begun to represent a collection of languages, such as C, C++, Objective C, Fortran, Java, Ada, and Go, as well as the libraries for other languages (such as libstdc++, libgcj, and so on).

It was originally written as the compiler for the GNU operating system and developed as a 100 percent free software. It is distributed under the GNU GPL. This helped it extend to its functionalities across a wide variety of architectures, and it played an important role in the growth of open source software.

The development of GCC started with the effort put in by Richard Stallman to bootstrap the GNU operating system. This quest led Stallman to write his own compiler from scratch. It was released in 1987, with Stallman as the author and other as contributors to it. By 1991, it had already reached a stable phase, but it was unable to include improvements due to its architectural limitations. This meant that the starting point for work on GCC version 2 had begun, but it did not take long until the need for development of new language interfaces started to appear in it as well, and developers started doing their own forks of the compiler source code. This fork initiative proved to be very inefficient, and because of the difficulty of accepting the code procedure, working on it became really frustrating.

This changed in 1997, when a group of developers gathered as the Experimental/Enhanced GNU Compiler System (EGCS) workgroup started merging several forks in one project. They had so much success in this venture, and gathered so many features, that they made Free Software Foundation (FSF) halt their development of GCC version 2 and appointed EGCS the official GCC version and maintainers by April 1999. They united with each other with the release of GCC 2.95. More information on the history and release history of the GNU Compiler Collection can be found at https://www.gnu.org/software/gcc/releases.html and http://en.wikipedia.org/wiki/GNU_Compiler_Collection#Revision_history.

The GCC interface is similar to the Unix convention, where users call a language-specific driver, which interprets arguments and calls a compiler. It then runs an assembler on the resulting outputs and, if necessary, runs a linker to obtain the final executable. For each language compiler, there is a separate program that performs the source code read.

The process of obtaining an executable from source code has some execution steps. After the first step, an abstract syntax tree is generated and, in this stage, compiler optimization and static code analysis can be applied. The optimizations and static code analysis can be both applied on architecture-independent GIMPLE or its superset GENERIC representation, and also on architecture-dependent Register Transfer Language (RTL) representation, which is similar to the LISP language. The machine code is generated using pattern-matching algorithm, which was written by Jack Davidson and Christopher Fraser.

GCC was initially written almost entirely in C language, although the Ada frontend is written mostly in Ada language. However, in 2012, the GCC committee announced the use of C++ as an implementation language. The GCC library could not be considered finished as an implementation language, even though its main activities include adding new languages support, optimizations, improved runtime libraries, and increased speed for debugging applications.

Each available frontend generated a tree from the given source code. Using this abstract tree form, different languages can share the same backend. Initially, GCC used Look-Ahead LR (LALR) parsers, which were generated using Bison, but over time, it moved on to recursive-descendent parsers for C, C++, and Objective-C in 2006. Today, all available frontends use handwritten recursive-descendent parsers.

Until recently, the syntax tree abstraction of a program was not independent of a target processor, because the meaning of the tree was different from one language frontend to the other, and each provided its own tree syntax. All this changed with the introduction of GENERIC and GIMPLE architecture-independent representations, which were introduced with the GCC 4.0 version.

GENERIC is a more complex intermediate representation, while GIMPLE is a simplified GENERIC and targets all the frontends of GCC. Languages, such as C, C++ or Java frontends, directly produce GENERIC tree representations in the frontend. Others use different intermediate representations that are then parsed and converted to GENERIC representations.

The GIMPLE transformation represents complex expressions that are split into a three address code using temporary variables. The GIMPLE representation was inspired by the SIMPLE representation used on the McCAT compiler for simplifying the analysis and optimization of programs.

The middle stage representation of GCC involves code analysis and optimization, and works independently in terms of a compiled language and the target architecture. It starts from the GENERIC representation and continues to the Register Transfer Language (RTL) representation. The optimization mostly involves jump threading, instruction scheduling, loop optimization, sub expression elimination, and so on. The RTL optimizations are less important than the ones done through GIMPLE representations. However, they include dead code elimination, global value numbering, partial redundancy elimination, sparse conditional constant propagation, scalar replacement of aggregates, and even automatic vectorization or automatic parallelization.

The GCC backend is mainly represented by preprocessor macros and specific target architecture functions, such as endianness definitions, calling conventions, or word sizes. The initial stage of the backend uses these representations to generate the RTL; this suggests that although GCC's RTL representation is nominally processor-independent, the initial processing of abstract instructions is adapted for each specific target.

A machine-specific description file contains RTL patterns, also code snippets, or operand constraints that form a final assembly. In the process of RTL generation, the constraints of the target architecture are verified. To generate an RTL snippet, it must match one or a number RTL patterns from the machine description file, and at the same time also satisfy the limitations for these patterns. If this is not done, the process of conversion for the final RTL into machine code would be impossible. Toward the end of compilation, the RTL representation becomes a strict form. Its representation contains a real machine register correspondence and a template from the target's machine description file for each instruction reference.

As a result, the machine code is obtained by calling small snippets of code, which are associated with corresponding patterns. In this way, instructions are generated from target instruction sets. This process involves the usage of registers, offsets, and addresses from the reload phase.

Note

More information about a GCC compiler can be found at http://gcc.gnu.org/ or http://en.wikipedia.org/wiki/GNU_Compiler_Collection.

The last element that needs to be introduced here is the C library. It represents the interface between a Linux kernel and applications used on a Linux system. At the same time, it offers aid for the easier development of applications. There are a couple of C libraries available in this community:

  • glibc
  • eglibc
  • Newlib
  • bionic
  • musl
  • uClibc
  • dietlibc
  • Klibc

The choice of the C library used by the GCC compiler will be executed in the toolchain generation phase, and it will be influenced not only by the size and application support offered by the libraries, but also by compliance of standards, completeness, and personal preference.

主站蜘蛛池模板: 永福县| 穆棱市| 达州市| 万荣县| 上饶市| 邻水| 凯里市| 仁怀市| 郴州市| 榆社县| 沙洋县| 福州市| 察雅县| 庐江县| 洮南市| 福清市| 子长县| 泰顺县| 太湖县| 昔阳县| 宝山区| 大厂| 蓝田县| 仙游县| 明光市| 石嘴山市| 霍州市| 甘洛县| 临夏县| 乐陵市| 北海市| 鄂州市| 报价| 临安市| 保定市| 界首市| 克东县| 德保县| 宣汉县| 葵青区| 永兴县|