官术网_书友最值得收藏!

Working with Basic Data Types

Everything in a computer is a sequence of binary digits. C's intrinsic data types enable the compiler to tell the computer how to interpret binary sequences of data.

A binary sequence plus a data type results in a meaningful value. The data type not only leads to a meaningful value but it also helps determine what kind of operations on that value make sense. Operations involve manipulating values as well as converting or casting a value from one data type to a related data type.

Once we have explored C's intrinsic data types in this chapter, we can then use them as building blocks for more complex data representations. This chapter, then, is the basis for the more complex data representations we will encounter in Chapter 8, Creating and Using Enumerations, through Chapter 16, Creating and Using More Complex Structures.

The following topics will be covered in this chapter:

  • Understanding bytes and chunks of data
  • Working with whole numbers
  • Working with numbers with decimal places
  • Using single characters
  • Understanding false (or zero) versus true (or anything not exactly zero)
  • Understanding how types are implemented on your computer with sizeof()
  • Understanding casting
  • Discovering the minimum and maximum values for each type on your computer

Technical requirements

For the rest of this book, unless otherwise noted, you will continue to use your computer with the following:

  • The plain text editor of your choice
  • A console, Terminal, or command-line window (depending on your OS)
  • The compiler, either GCC or Clang, for your particular OS

For the sake of consistency, it is best if you use the same computer and programming tools. By doing so, you can focus more closely on the details of C on your computer.

The source code for this chapter can be found athttps://github.com/PacktPublishing/Learn-C-Programming.

Understanding data types

Everything in a computer is a sequence ofbinary digits(or bits). A single bit is either off (0) or on (1). Eight bits are strung together to form a byte. A byte is the basic data unit. Bytes are treated singly, as pairs called16-bit words, as quadruples to form 32-bit words, and as octets to form 64-bit words. These combinations of sizes of bytes are used in the following ways:

  • Instructions for the CPU
  • Addresses for the locations of all things in the computer
  • Data values

The compiler generates binary instructions from our C statements; hence, we don't need to deal with the instructions since we are writing proper C syntax.

We also interact with various parts of the computer via the address of that part. Typically, we don't do this directly. For instance, we've seen how printf() knows how to fetch the data from a function call we make and then move it to the part of the computer that spills it out to the console. We are not, nor should we be, concerned with these addresses since from computer to computer and from version to version of our operating system, they may change.

We will deal with the addresses of some, but not all, things in the computer in Chapter 13, Using Pointers. For the most part, again, the compiler handles these issues for us.

Both instructions and addresses deal with data. Instructions manipulate data and move it around. Addresses are required for the instructions to be able to fetch the data and also to store the data. In between fetching and storing, instructions manipulate it.

Before we get to manipulating data, we need to understand how data is represented and the various considerations for each type of data.

Here is the basic problem. We have a pattern of black and white bits; what does it mean?

To illustrate how a pattern alone may not provide enough information for proper interpretation, let's consider the following sequence. What does 13 mean in this context?

OK, I know what you are thinking. But wait! Look again. Now, what does 13 mean in this context?

Combining both aspects, we can now see the full spectrum of the problem:

The central picture is just a two-dimensional pattern of black and white pixels. In one context, the central picture makes sense seen as the number 13; in another context, the central picture makes the most sense seen as the letter B. We can only resolve the ambiguity of the pixel pattern from its context with other pixel patterns. How we interpret the patterns of black and white pixels is entirely dependent upon the context in which we view them.

This is very much like the byte sequences the compiler generates, which the CPU processes. Internally, commands, addresses, and data in the computer are nothing more than sequences of 1s and 0s of various sizes. How the computer interprets the patterns of 1s and 0s is entirely dependent upon the context given to them by the computer language and the programmer.

We, as programmers, must provide the guidelines to the compiler, consequently to the CPU, on how to interpret the sequence. We do this in C by explicitly assigning a data type to the data we want to manipulate.

C is a strongly typed language. That is, every value must have a type associated with it. It should be noted that some languages infer the type of a piece of data by how it is used. They will also make assumptions about how to convert one data type into another. These are called loosely typed languages. C also does conversions from one type to another, but the rules are fairly specific compared to other programming languages.

In C, as in most programming languages, there are five basic and intrinsic data types. Intrinsic means these types and all operations on them are built into the language.

Five basic types are as follows:

  • Whole numbers: They can represent a positive-only range of values or a range that includes both positive and negative values.
  • Numbers with fractions, or decimal numbers: These are all the numbers between whole numbers, such as ?, ?, 0.79, 1.125, and 3.14159 – an approximate value for π, or even 3.1415926535897932384626433 – an even more precise but still approximate value for π. Decimal numbers can always include negative values.
  • Characters: These are the basis of C strings. Some languages have a separate string type. In C, strings are a special case of arrays of characters—not a data type but a special arrangement of contiguous character values.
  • Boolean values: These can be of any size depending on the preference of the compiler and the machine's preferred whole number size.
  • Addresses: These are the location of bytes in a computer's memory. C provides for direct addresses of values in memory. Many languages do not allow direct addressing.

Within each of these five types, there are different sizes of types to represent different ranges of values. C has very specific rules about how to convert a given data type into another. Some are valid, others make no sense. We will explore these in Chapter 4, Using Variables and Assignment.

For now, we need to understand the basic types and the different sizes of values they might represent.

Bytes and chunks of data

The smallest data value in C is a bit. However, bit operations tend to be very expensive and not all that common for most computer problems. We will not go into bit operations in this book. If you find you need to delve deeper into bit operations in C, please check out the annotated bibliography in the appendix for texts that treat this subject more fully.

The basic data value in C is a byte or a sequence of 8 bits. The set of values a byte can represent is 256, or 28 values. These values have a range of 0 to 255, or 28-1. 0 is a value that must be represented in the set of 256 values; we can't leave that value out. A byte can either represent a positive integer in the range of 0-255, or 28-1, or a negative integer in the range of -128-127. In either case, there are only 256 unique combinations of 1s and 0s.

While most humans don't ordinarily count this high, for a computer, this is a very narrow range of values. A byte is the smallest of the chunks of data since each byte in memory can be addressed directly. A byte is also commonly used for alphanumeric characters (like you are now reading) but is not large enough for Unicode characters. ASCII characters and Unicode characters will be explained in great detail in Chapter 15, Working with Strings.

Chunks, or bytes, increase in multiples of 2 from 1 byte, 2 bytes, 4 bytes, 8 bytes, and 16 bytes. The following table shows how these may be used:

In the history of computing, there have been various byte ranges for basic computation. The very earliest and simplest CPUs used 1-byte integers. These very rapidly developed into 16-bit computers whose address space and largest integer value could be expressed in 2 bytes. As the range of integers increased from 2 to 4 to 8, so too did the range of possible memory addresses and the ranges of floating-point numbers.

As the problems that were addressed by computers further expanded, computers themselves expanded. This resulted in more powerful computers with a 4-byte address range and 4-byte integer values. These machines were prevalent from the 1990s through to the early part of the 21st century.

Today, most desktop computers are 64-bit computing devices that can address incredibly large amounts of memory and model problems that can account for all the atoms in the universe! For problems that require the processing of values that are 128 bytes and higher, very specialized computers have been developed.

You will seldom, if ever, need to consider those astronomically large numbers but they are necessary to solve mind-bendingly large and complex problems. Nonetheless, what you can do with very small chunks and relatively small ranges of values, you can also do with large ones. It is more important for us to learn how different types are represented and used, regardless of their size.

Notice in the preceding table the correlation between the number of bits in a chunk and the exponent in the binary form. Also notice that the number of bytes is a power of 2: 20, 21, 22, 23, 24. There are no 3-byte, 5-byte, or 7-byte chunks. They are just not needed.

You can also see from the table that the typical use of a chunk is directly related to its size. In C, the machine's preferred whole number size is typically the same size as an address. That is, the machine's natural integer size is the count of the largest number of bytes that the machine can address. This is not a hard rule, but it is a common guideline.

Byte allocations and ranges may vary from machine to machine. Embedded computers, tablets, and phones will likely have different sizes for each type than desktop computers or even supercomputers. We'll create thesizes_ranges.c program later in this chapter to confirm and verify the sizes and ranges of integers on your machine. This program will be handy to run whenever you are presented with a new system on which to develop C programs.

Representing whole numbers

The basic whole number type is an integer or just int. Integers can either be positive only, called unsigned, or they can be negative and positive, called signed. As you might expect, the natural use for integers is to count things. You must specify unsigned if you know you will not need negative values.

To be explicit, the default type is unsigned int, where the keyword unsigned is optional.

An unsigned integer has its lowest value of 0 and its highest value when all bits are set to 1. For instance, a single byte value has a possible 256 values but their range is 0 to 255. This is sometimes called the one-off problem where the starting value for counting is 0 and not 1, as we were taught when we first learned to count. It is a problem because it takes some time for new programmers to adjust their thinking. Until you are comfortable thinking in this way, the one-off problem will be a common source of confusion and possibly the cause of bugs in your code. We will revisit the need for this kind of thinking when we explore loops (Chapter 7, Exploring Loops and Iteration) and when we work with arrays (Chapter 11, Working with Arrays, and Chapter 12, Working with Multi-Dimensional Arrays) and strings (Chapter 15, Working with Strings).

Representing positive and negative whole numbers

When negative numbers are needed, that is, whole numbers smaller than 0, we specify them with the signed keyword. So, a signed integer would be specified as signed int. The natural use for signed integers is when we want to express a direction relative to zero, either larger or smaller. By default, and without any extra specifiers, integers are signed.

A signed integer uses one of the bits to indicate whether the remaining bits represent a positive or negative number. Typically, this is the most significant bit; the least significant bit is that which represents the value 1. As with positive whole numbers, a signed integer has the same number of values, but the range is shifted so that half of the values are below 0, or, algebraically speaking, to the left of 0. For instance, a single signed byte has 256 possible values but their range is -128 to 127. Remember to count 0 as one of the possible values. Hence the apparent asymmetric range of values (there's that pesky one-off problem, again).

Specifying different sizes of integers

Integers can be specified to have various sizes for their data chunk. The smallest chunk is a single byte. This is called a char. It is so named for historical reasons. Before Unicode came along, the full set of English characters, uppercase, lowercase, numbers, punctuation, and certain special characters, could be represented with 256 values. In some languages, a byte is actually called a byte; unfortunately, not in C.

C99 added more integer types that specify the minimum width of integer values. The basic set of these are of the int<n>_t or uint<n>_tforms, where <n> is either 8, 16, 32, or 64. The values of these types are exactly that number of bits. Such type specifications allow much greater predictability when porting a program from one computer system to a different one with possibly a different CPU and operating system. There are additional integer types to aid portability not listed here:

Notes:

  • When signed or unsigned is specified, the type is guaranteed to be of the specified positive/negative or positive only ranges. When not specified, the default may be signed.
  • short is guaranteed to be at least 2 bytes but may be longer depending upon the machine.
  • int, unsigned, long, and unsigned long are guaranteed to be at least 4 bytes but may be longer depending on the machine.
  • long long and unsigned long long are guaranteed to be at least 8 bytes but may be longer depending on the machine.

Do not be too concerned with all of these variations at first. For the most part, you can safely use int until you begin developing programs on a wider variety of hardware where portability is a bigger concern.

While int types represent whole numbers, this is a relatively small set of numbers unless we can also represent the numbers between whole numbers—numbers with fractions or decimal numbers.

Representing numbers with decimals

Not everything in the world is a whole number. For that, we have numbers with fractions or decimal numbers. Decimal numbers are used most naturally for measuring things.

A real number is of the following form:

significand x 10exponent

Here, both the significand and the exponent are signed integers. The size of each depends upon the number of bytes for a given real number type. There are no unsigned components. This provides a very large range of numbers, from positive to negative as well as very small fractional values:

Typically, when real numbers are used, either very precise values are desired or the calculations tend to have incredibly large ranges.

For completeness, decimal numbers are just one part of the set of real numbers. Real numbers include all rational numbers, irrational numbers, transcendental numbers, such as π, and integers. Real numbers exist on a number line. These are contrasted with imaginary numbers, sometimes called complex numbers. These have an imaginary component, which is -11/2, the square root of -1.

Another use for values is to represent alphabetical characters.

Representing single characters

To specify a single character, use either char or unsigned char. C was developed in the time before Unicode. The character set they decided upon using was ASCII (short for American Standard Code for Information Interchange). All the necessary characters for printing control, device control, and printable characters and punctuation could be represented in 7 bits.

One reason ASCII was chosen was because of its somewhat logical ordering of uppercase and lowercase letters. An uppercase A and lowercase a are different by only 1 bit. This makes it relatively easy to convert from uppercase to lowercase and vice versa. There is an ASCII table provided for your reference in the Appendix; we also develop a program to print a complete ASCII table in Chapter 15, Working with Strings.

To summarize ASCII's organization, refer to the following table:

As Unicode was developed and has become standardized, it used 2-byte or 4-byte encodings for the many character sets of the world's languages. 7-bit ASCII codes were incorporated into the lowest 7 bits for backward compatibility with original ASCII. However, Unicode is not implemented uniformly across all operating systems.

Representing Boolean true/false

A Boolean value is one that evaluates to true or false. On some systems, YES and yes are equivalent to true while NO and no are equivalent to false. For, instance, Is today Wednesday? evaluates to true only 1 out of 7 days. The other 6 days, it evaluates to false.

Before C99, there was no explicit type for Boolean. A value of any type that is 0 (exactly zero) is considered as also evaluating to a Boolean false. Any other value than exactly 0 (a bit pattern of only zeros) will evaluate to a Boolean value of true. Real numbers rarely, if ever, evaluate exactly to 0, especially after any kind of operation on them. These data types would therefore almost always evaluate to true and so would be poor choices as a Boolean substitute.

Since C99, a _Bool type has been available, which, when evaluated, will always evaluate to only 0 or 1. When we include the stdbool.h file, we are able to use the bool type as well; this is a bit cleaner than using the cumbersome _Bool type.

As a general rule, it is always more reliable to test for zero-ness, or false, than to rely on the compiler's implementation for interpreting Boolean true values from other types.

Understanding the sizes of data types

As we discussed earlier, the number of bytes that a type uses is directly related to the range of values it can hold. Up to this point, this has all been necessarily theoretical. Let's now write a program to demonstrate what we've been exploring.

Thesizeof() operator

The sizeof() operation is a built-in function that takes as its parameter a C data type and returns the number of bytes for that data type. Let's write a program to see how this works.

In the first part, we'll set up the necessary include files, declare function prototypes, and create our main() function. Even though we show this program in two parts, it is really just a single file. The following program, sizes_ranges1.c, shows the first part of our program:

#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>

// function prototypes
void printSizes( void );

int main( void )
{
printSizes();
}

The header file, stdio.h, is included, as are two new header files—stdint.h and stdbool.h. Recall that stdio.h declares, amongst other things, the function prototype for printf(). stdint.h declares the sizes in bytes of each of the intrinsic data types. stdbool.h defines the booldata type and the values true and false. These are part of the C Standard Library. We will encounter several other C Standard Library files but not all of them. All of them are listed with a brief description of their purpose in the Appendix. We will learn a great deal more about header files in Chapter 24, Working with Multi-File Programs.

As you can see, we call a function that has been declared, or prototyped, but has not yet been defined. Let's define it in the next section of the program:

  // function to print the # of bytes for each of C11's data types
//
void printSizes( void )
{
printf( "Size of C data types\n\n" );
printf( "Type Bytes\n\n" );
printf( "char %lu\n" , sizeof( char ) );
printf( "int8_t %lu\n" , sizeof( int8_t ) );
printf( "unsigned char %lu\n" , sizeof( unsigned char ) );
printf( "uint8_t %lu\n" , sizeof( uint8_t ) );
printf( "short %lu\n" , sizeof( short ) );
printf( "int16_t %lu\n" , sizeof( int16_t ) );
printf( "uint16t %lu\n" , sizeof( uint16_t ) );
printf( "int %lu\n" , sizeof( int ) );
printf( "unsigned %lu\n" , sizeof( unsigned ) );
printf( "long %lu\n" , sizeof( long ) );
printf( "unsigned long %lu\n" , sizeof( unsigned long ) );
printf( "int32_t %lu\n" , sizeof( int32_t ) );
printf( "uint32_t %lu\n" , sizeof( uint32_t ) );
printf( "long long %lu\n" , sizeof( long long ) );
printf( "int64_t %lu\n" , sizeof( int64_t ) );
printf( "unsigned long long %lu\n" , sizeof( unsigned long long ) );
printf( "uint64_t %lu\n" , sizeof( uint64_t ) );
printf( "\n" );
printf( "float %lu\n" , sizeof( float ) );
printf( "double %lu\n" , sizeof( double ) );
printf( "long double %lu\n" , sizeof( long double ) );
printf( "\n" );
printf( "bool %lu\n" , sizeof( bool ) );
printf( "\n" );
}

In this program, we need to include the header file, <stdint.h>, which defines the fixed-width integer types. If you omit this include, you'll get a few errors. Try that—comment out that include line and see what happens.

To get the new bool definition, we also have to include <stdbool.h>. What happens if you omit that file?

The return type of sizeof() on my system is unsigned long. Therefore, we use the format specifier %lu to properly print out a value of that type.

On my system, I get the following output:

On my 64-bit operating system, a pointer is 8 bytes (64 bits). So too, then, are long and unsigned long.

How do the values reported by your system differ from these?

Ranges of values

Let's extend this program to provide the ranges for each data type. While we could compute these values ourselves, they are defined in two header files—limits.h for integer limits and float.h for real number limits. To implement this, we add another function prototype, add a call to that function from within main(), and then define the function to print out the ranges. In the printRanges() function, we use the fixed-width types to avoid variations from system to system.

Let's add another function. In the following code, the additional include directives and function prototype are highlighted:

#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>
#include <limits.h>
#include <float.h>

// function prototypes
void printSizes( void );
void printRanges( void );

int main( void ) {
printSizes();
printRanges();
}
Now, having printRanges() prototypes, let's add its definition. The printSizes() function is unchanged:
void printRanges( void )  {
printf( "Ranges for integer data types in C\n\n" );
printf( "int8_t %20d %20d\n" , SCHAR_MIN , SCHAR_MAX );
printf( "int16_t %20d %20d\n" , SHRT_MIN , SHRT_MAX );
printf( "int32_t %20d %20d\n" , INT_MIN , INT_MAX );
printf( "int64_t %20lld %20lld\n" , LLONG_MIN , LLONG_MAX );
printf( "uint8_t %20d %20d\n" , 0 , UCHAR_MAX );
printf( "uint16_t %20d %20d\n" , 0 , USHRT_MAX );
printf( "uint32_t %20d %20u\n" , 0 , UINT_MAX );
printf( "uint64_t %20d %20llu\n" , 0 , ULLONG_MAX );
printf( "\n" );
printf( "Ranges for real number data types in C\n\n" );
printf( "float %14.7g %14.7g\n" , FLT_MIN , FLT_MAX );
printf( "double %14.7g %14.7g\n" , DBL_MIN , DBL_MAX );
printf( "long double %14.7Lg %14.7Lg\n" , LDBL_MIN , LDBL_MAX );
printf( "\n" );
}

Some of the numbers that appear after % in the format specifier string may appear mysterious. These will be explained in exhaustive detail in Chapter 19, Exploring Formatted Output. The result of the added function should look like this, in addition to what we had before:

How do the values from your system compare?

Summary

Again, whew!

There were a lot of details about data types, chunk sizes, and value ranges. The key idea from this chapter is to remember that there are only really four data types—integer, real number, character, and boolean. The fifth type, pointers, is really just a special case of integers.

In the next chapter, we will explore how to use the different types of values when we create and assign values.

主站蜘蛛池模板: 东乡县| 潮州市| 福贡县| 遂昌县| 永定县| 平顶山市| 马尔康县| 平远县| 正蓝旗| 长阳| 泾阳县| 陆良县| 盐津县| 读书| 德格县| 德州市| 宜君县| 长岛县| 竹北市| 麦盖提县| 天等县| 中超| 嘉祥县| 神池县| 通山县| 神池县| 阿拉善左旗| 昭觉县| 昭平县| 晋州市| 沈阳市| 凤阳县| 湘阴县| 正阳县| 邮箱| 横山县| 韩城市| 内乡县| 丹巴县| 什邡市| 封丘县|