When generating a toolchain, the first thing that needs to be done is the establishment of an ABI used to generate binaries. This means that the kernel needs to understand this ABI and, at the same time, all the binaries in the system need to be compiled with the same ABI.
When working with the GNU toolchain, a good source of gathering information and understanding the ways in which work is done with these tools is to consult the GNU coding standards. The coding standard's purposes are very simple: to make sure that the work with the GNU ecosystem is performed in a clean, easy, and consistent manner. This is a guideline that needs to be used by people interested in working with GNU tools to write reliable, solid, and portable software. The main focus of the GNU toolchain is represented by the C language, but the rules applied here are also very useful for any programming languages. The purpose of each rule is explained by making sure that the logic behind the given information is passed to the reader.
The main language that we will be focusing on will also be the C programming language. With regard to the GNU coding standard compatibility regarding libraries for GNU, exceptions or utilities and their compatibility should be very good when compared with standards, such as the ones from Berkeley Unix, Standard C, or POSIX. In case of conflicts in compatibility, it is very useful to have compatibility modes for that programming language.
Standards, such as POSIX and C, have a number of limitations regarding the support for extensions - however, these extensions could still be used by including a —posix, —ansi, or —compatible option to disable them. In case the extension offers a high probability of breaking a program or script by being incompatible, a redesign of its interface should be made to ensure compatibility.
A large number of GNU programs suppress the extensions that are known to cause conflict with POSIX if the POSIXLY_CORRECT environment variable is defined. The usage of user defined features offers the possibility for interchanging GNU features with other ones totally different, better, or even use a compatible feature. Additional useful features are always welcomed.
If we take a quick look at the GNU Standard documentation, some useful information can be learned from it:
It is better to use the int type, although you might consider defining a narrower data type. There are, of course, a number of special cases where this could be hard to use. One such example is the dev_t system type, because it is shorter than int on some machines and wider on others. The only way to offer support for non-standard C types involves checking the width of dev_t using Autoconf and, after this, choosing the argument type accordingly. However, it may not worth the trouble.
For the GNU Project, the implementation of an organization's standard specifications is optional, and this can be done only if it helps the system by making it better overall. In most situations, following published standards fits well within a users needs because their programs or scripts could be considered more portable. One such example is represented by the GCC, which implements almost all the features of Standard C, as the standard requires. This offers a great advantage for the developers of the C program. This also applies to GNU utilities that follow POSIX.2 specifications.
There are also specific points in the specifications that are not followed, but this happens with the sole reason of making the GNU system better for users. One such example would be the fact that the Standard C program does not permit extensions to C, but, GCC implements many of them, some being later embraced by the standard. For developers interested in outputting an error message as required by the standard, the --pedantic argument can be used. It is implemented with a view to making sure that GCC fully implements the standard.
The POSIX.2 standard mentions that commands, such as du and df, should output sizes in units of 512 bytes. However, users want units of 1KB and this default behavior is implemented. If someone is interested in having the behavior requested by POSIX standard, they would need to set the POSIXLY_CORRECT environment variable.
Another such example is represented by the GNU utilities, which don't always respect the POSIX.2 standard specifications when referring to support for long named command-line options or intermingling of options with arguments. This incompatibility with the POSIX standard is very useful in practice for developers. The main idea here is not to reject any new feature or remove an older one, although a certain standard mentions it as deprecated or forbidden.
To make sure that you write robust code, a number of guidelines should be mentioned. The first one refers to the fact that limitations should not be used for any data structure, including files, file names, lines, and symbols, and especially arbitrary limitations. All data structures should be dynamically allocated. One of the reasons for this is represented by the fact that most Unix utilities silently truncate long lines; GNU utilities do not do these kind of things.
Utilities that are used to read files should avoid dropping null characters or nonprinting characters. The exception here is when these utilities, that are intended for interfacing with certain types of printers or terminals, are unable to handle the previously mentioned characters. The advice that I'd give in this case would be to try and make programs work with a UTF-8 character set, or other sequences of bytes used to represent multibyte characters.
Make sure that you check system calls for error return values; the exception here is when a developer wishes to ignore the errors. It would be a good idea to include the system error text from strerror, perror, or equivalent error handling functions, in error messages that result from a crashed on system call, adding the name of the source code file, and also the name of the utility. This is done to make sure that the error message is easy to read and understand by anyone involved in the interaction with the source code or the program.
Check the return value for malloc or realloc to verify if they've returned zero. In case realloc is used in order to make a block smaller in systems that approximate block dimensions to powers of 2, realloc may have a different behavior and get a different block. In Unix, when realloc has a bug, it destroys the storage block for a zero return value. For GNU, this bug does not occur, and when it fails, the original block remains unchanged. If you want to run the same program on Unix and do not want to lose data, you could check if the bug was resolved on the Unix system or use the malloc GNU.
The content of the block that was freed is not accessible to alter or for any other interactions from the user. This can be done before calling free.
When a malloc command fails in a noninteractive program, we face a fatal error. In case the same situation is repeated, but, this time, an interactive program is involved, it would be better to abort the command and return to the read loop. This offers the possibility to free up virtual memory, kill other processes, and retry the command.
To decode arguments, the getopt_long option can be used.
When writing static storage during program execution, use C code for its initialization. However, for data that will not be changed, reserve C initialized declarations.
Try to keep away from low-level interfaces to unknown Unix data structures - this could happen when the data structure do not work in a compatible fashion. For example, to find all the files inside a directory, a developer could use the readdir function, or any high-level interface available function, since these do not have compatibility problems.
For signal handling, use the BSD variant of signal and the POSIX sigaction function. The USG signal interface is not the best alternative in this case. Using POSIX signal functions is nowadays considered the easiest way to develop a portable program. However, the use of one function over another is completely up to the developer.
For error checks that identify impossible situations, just abort the program, since there is no need to print any messages. These type of checks bear witness to the existence of bugs. To fix these bugs, a developer will have to inspect the available source code and even start a debugger. The best approach to solve this problem would be to describe the bugs and problems using comments inside the source code. The relevant information could be found inside variables after examining them accordingly with a debugger.
Do not use a count of the encountered errors in a program as an exit status. This practice is not the best, mostly because the values for an exit status are limited to 8 bits only, and an execution of the executable might have more than 255 errors. For example, if you try to return exit status 256 for a process, the parent process will see a status of zero and consider that the program finished successfully.
If temporary files are created, checking that the TMPDIR environment variable would be a good idea. If the variable is defined, it would be wise to use the /tmp directory instead. The use of temporary files should be done with caution because there is the possibility of security breaches occurring when creating them in world-writable directories. For C language, this can be avoided by creating temporary files in the following manner:
fd = open (filename, O_WRONLY | O_CREAT | O_EXCL, 0600);
This can also be done using the mkstemps function, which is made available by Gnulib.
For a bash environment, use the noclobber environment variable, or the set -C short version, to avoid the previously mentioned problem. Furthermore, the mktemp available utility is altogether a better solution for making a temporary file a shell environment; this utility is available in the GNU Coreutils package.
After the introduction of the packages that comprise a toolchain, this section will introduce the steps needed to obtain a custom toolchain. The toolchain that will be generated will contain the same sources as the ones available inside the Poky dizzy branch. Here, I am referring to the gcc version 4.9, binutils version 2.24, and glibc version 2.20. For Ubuntu systems, there are also shortcuts available. A generic toolchain can be installed using the available package manager, and there are also alternatives, such as downloading custom toolchains available inside Board Support Packages, or even from third parties, including CodeSourcery and Linaro. More information on toolchains can be found at http://elinux.org/Toolchains. The architecture that will be used as a demo is an ARM architecture.
The toolchain build process has eight steps. I will only outline the activities required for each one of them, but I must mention that they are all automatized inside the Yocto Project recipes. Inside the Yocto Project section, the toolchain is generated without notice. For interaction with the generated toolchain, the simplest task would be to call meta-ide-support, but this will be presented in the appropriate section as follows:
The setup: This represents the step in which top-level build directories and source subdirectories are created. In this step, variables such as TARGET, SYSROOT, ARCH, COMPILER, PATH, and others are defined.
Geting the sources: This represents the step in which packages, such as binutils, gcc, glibc, linux kernel headers, and various patches are made available for use in later steps.
GNU Binutils setup - This represents the steps in which the interaction with the binutils package is done, as shown here:
Unzip the sources available from the corresponding release
Patch the sources accordingly, if this applies
Configure, the package accordingly
Compile the sources
Install the sources in the corresponding location
Linux kernel headers setup: This represents the steps in which the interaction with the Linux kernel sources is presented, as shown here:
Unzip the kernel sources.
Patch the kernel sources, if this applies.
Configure the kernel for the selected architecture. In this step, the corresponding kernel config file is generated. More information about Linux kernel will be presented in Chapter 4, Linux Kernel.
Compile the Linux kernel headers and copy them in the corresponding location.
Install the headers in the corresponding locations.
Glibc headers setup: This represents the steps used to setting the glibc build area and installation headers, as shown here:
Unzip the glibc archive and headers files
Patch the sources, if this applies
Configure the sources accordingly enabling the -with-headers variable to link the libraries to the corresponding Linux kernel headers
Compile the glibc headers files
Install the headers accordingly
GCC first stage setup: This represents the step in which the C runtime files, such as crti.o and crtn.o, are generated:
Unzip the gcc archive
Patch the gcc sources if necessary
Configure the sources enabling the needed features
Compile the C runtime components
Install the sources accordingly
Build the glibc sources: This represents the step in which the glibc sources are built and the necessary ABI setup is done, as shown here:
Configure the glibc library by setting the mabi and march variables accordingly
Compile the sources
Install the glibc accordingly
GCC second stage setup: This represents the final setup phase in which the toolchain configuration is finished, as shown here:
Configure the gcc sources
Compile the sources
Install the binaries in the corresponding location
After these steps are performed, a toolchain will be available for the developer to use. The same strategy and build procedure steps is followed inside the Yocto Project.