可以看四张牌的斗牛棋牌

書名： Red Hat Enterprise Linux Troubleshooting Guide
作者名： Benjamin Cane
本章字數： 6124字
更新時間： 2021-07-09 21:50:12

Troubleshooting commands

This section will cover frequently used troubleshooting commands that can be used to gather information from the system or a running service. While it is not feasible to cover every possible command, the commands used do cover fundamental troubleshooting steps for Linux systems.

Command-line basics

The troubleshooting steps used within this book are primarily command-line based. While it is possible to perform many of these things from a graphical desktop environment, the more advanced items are command-line specific. As such, this book assumes that the reader has at least a basic understanding of Linux. To be more specific, this book assumes that the reader has logged into a server via SSH and is familiar with basic commands such as cd, cp, mv, rm, and ls.

For those who might not have much familiarity, I wanted to quickly cover some basic command-line usage that will be required knowledge for this book.

Command flags

Many readers are probably familiar with the following command:

$ ls -la
total 588
drwx------. 5 vagrant vagrant   4096 Jul  4 21:26 .
drwxr-xr-x. 3 root    root        20 Jul 22  2014 ..
-rw-rw-r--. 1 vagrant vagrant 153104 Jun 10 17:03 app.c

Most should recognize that this is the ls command and it is used to perform a directory listing. What might not be familiar is what exactly the –la part of the command is or does. To understand this better, let's look at the ls command by itself:

$ ls
app.c  application  app.py  bomber.py  index.html  lookbusy-1.4  lookbusy-1.4.tar.gz  lotsofiles

The previous execution of the ls command looks very different from the previous. The reason for this is because the latter is the default output for ls. The –la portion of the command is what is commonly referred to as command flags or options. The command flags allow a user to change the default behavior of the command providing it with specific options.

In fact, the –la flags are two separate options, –l and –a; they can even be specified separately:

 $ ls -l -a
total 588
drwx------. 5 vagrant vagrant   4096 Jul  4 21:26 .
drwxr-xr-x. 3 root    root        20 Jul 22  2014 ..
-rw-rw-r--. 1 vagrant vagrant 153104 Jun 10 17:03 app.c

We can see from the preceding snippet that the output of ls –la is exactly the same as ls –l –a. For common commands, such as the ls command, it does not matter if the flags are grouped or separated, they will be parsed in the same way. Throughout this book, examples will show both grouped and ungrouped. If grouping or ungrouping is performed for any specific reason it will be called out; otherwise, the grouping or ungrouping used within this book is used for visual appeal and memorization.

In addition to grouping and ungrouping, this book will also show flags in their long format. In the previous examples, we showed the flag -a, this is known as a short flag. This same option can also be provided in the long format --all:

$ ls -l --all
total 588
drwx------. 5 vagrant vagrant   4096 Jul  4 21:26 .
drwxr-xr-x. 3 root    root        20 Jul 22  2014 ..
-rw-rw-r--. 1 vagrant vagrant 153104 Jun 10 17:03 app.c

The –a and the --all flags are essentially the same option; it can simply be represented in both short and long form.

One important thing to remember is that not every short flag has a long form and vice versa. Each command has its own syntax, some commands only support the short form, others only support the long form, but many support both. In most cases, the long and short flags will both be documented within the command's man page.

The piping command output

Another common command-line practice that will be used several times throughout this book is piping output. Specifically, examples such as the following:

$ ls -l --all | grep app
-rw-rw-r--. 1 vagrant vagrant 153104 Jun 10 17:03 app.c
-rwxrwxr-x. 1 vagrant vagrant  29390 May 18 00:47 application
-rw-rw-r--. 1 vagrant vagrant   1198 Jun 10 17:03 app.py

In the preceding example, the output of the ls -l --all command is piped to the grep command. By placing | or the pipe character between the two commands, the output of the first command is "piped" to the input for the second command. The example preceding the ls command will be executed; with that, the grep command will then search that output for any instance of the pattern "app".

Piping output to grep will actually be used quite often throughout this book, as it is a simple way to trim the output into a maintainable size. Many times the examples will also contain multiple levels of piping:

$ ls -la | grep app | awk '{print $4,$9}'
vagrant app.c
vagrant application
vagrant app.py

In the preceding code the output of ls -la is piped to the input of grep; however, this time, the output of grep is also piped to the input of awk.

While many commands can be piped to, not every command supports this. In general, commands that accept user input from files or command-line also accept piped input. As with the flags, a command's man page can be used to identify whether the command accepts piped input or not.

Gathering general information

When managing the same servers for a long time, you start to remember key information about those servers. Such as the amount of physical memory, the size and layout of their filesystems, and what processes should be running. However, when you are not familiar with the server in question it is always a good idea to gather this type of information.

The commands in this section are commands that can be used to gather this type of general information.

w – show who is logged on and what they are doing

Early in my systems administration career, I had a mentor who used to tell me: I always run w when I log into a server. This simple tip has actually been very useful over and over again in my career. The w command is simple; when executed it will output information such as system uptime, load average, and who is logged in:

# w
 04:07:37 up 14:26,  2 users,  load average: 0.00, 0.01, 0.05
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT
root     tty1      Wed13   11:24m  0.13s  0.13s -bash
root     pts/0     20:47    1.00s  0.21s  0.19s -bash

This information can be extremely useful when working with unfamiliar systems. The output can be useful even when you are familiar with the system. With this command, you can see:

When this system was last rebooted:
04:07:37 up 14:26: This information can be extremely useful; whether it is an alert for a service like Apache being down, or a user calling in because they were locked out of the system. When these issues are caused by an unexpected reboot, the reported issue does not often include this information. By running the w command, it is easy to see the time elapsed since the last reboot.
The load average of the system:
load average: 0.00, 0.01, 0.05: The load average is a very important measurement of system health. To summarize it, the load average is the average number of processes in a wait state over a period of time. The three numbers in the output of w represent different times.
The numbers are ordered from left to right as 1 minute, 5 minutes, and 15 minutes.
Who is logged in and what they are running:
- USER TTY LOGIN@ IDLE JCPU PCPU WHAT
- root tty1 Wed13 11:24m 0.13s 0.13s -bash
The final piece of information that the w command provides is users that are currently logged in and what command they are executing.

This is essentially the same output as the who command, which includes the user logged in, when they logged in, how long they have been idle, and what command their shell is running. The last item in that list is extremely important.

Oftentimes, when working with big teams, it is common for more than one person to respond to an issue or ticket. By running the w command immediately after login, you will see what other users are doing, preventing you from overriding any troubleshooting or corrective steps the other person has taken.

rpm – RPM package manager

The rpm command is used to manage Red Hat package manager (RPM). With this command, you can install and remove RPM packages, as well as search for packages that are already installed.

Earlier in this chapter, we saw how the rpm command can be used to look for configuration files. The following are several additional ways we can use the rpm command to find critical information.

Listing all packages installed

Often when troubleshooting services, a critical step is identifying the version of the service and how it was installed. To list all RPM packages installed on a system, simply execute the rpm command with -q (query) and -a (all):

# rpm -q -a
kpatch-0.0-1.el7.noarch
virt-what-1.13-5.el7.x86_64
filesystem-3.2-18.el7.x86_64
gssproxy-0.3.0-9.el7.x86_64
hicolor-icon-theme-0.12-7.el7.noarch

The rpm command is a very diverse command with many flags. In the preceding example the -q and -a flags are used. The -q flag tells the rpm command that the action being taken is a query; you can think of this as being put into a "search mode". The -a or --all flag tells the rpm command to list all packages.

A useful feature is to add the --last flag to the preceding command, as this causes the rpm command to list the packages by install time with the latest being first.

Listing all files deployed by a package

Another useful rpm function is to show all of the files deployed by a specific package:

# rpm -q --filesbypkg kpatch-0.0-1.el7.noarch
kpatch                    /usr/bin/kpatch
kpatch                    /usr/lib/systemd/system/kpatch.service

In the preceding example, we again use the -q flag to specify that we are running a query, along with the --filesbypkg flag. The --filesbypkg flag will cause the rpm command to list all of the files deployed by the specified package.

This example can be very useful when trying to identify a service's configuration file location.

Using package verification

In this third example, we are going to use an extremely useful feature of rpm—verify. The rpm command has the ability to verify whether or not the files deployed by a specified package have been altered from their original contents. To do this, we will use the -V (verify) flag:

# rpm -V httpd
S.5....T.  c /etc/httpd/conf/httpd.conf

In the preceding example, we simply run the rpm command with the -V flag followed by a package name. As the -q flag is used for querying, the -V flag is for verifying. With this command, we can see that only the /etc/httpd/conf/httpd.conf file was listed; this is because rpm will only output files that have been altered.

In the first column of this output, we can see which verification checks the file failed. While this column is a bit cryptic at first, the rpm man page has a useful table (as shown in the following list) explaining what each character means:

S: This means that the file size differs
M: This means that the mode differs (includes permissions and file type)
5: This means that the digest (formerly MD5 sum) differs
D: This means indicates the device major/minor number mismatch
L: This means indicates the readLink(2) path mismatch
U: This means that the user ownership differs
G: This means that the group ownership differs
T: This means that mTime differs
P: This means that caPabilities differs

Using this list we can see that the httpd.conf's file size, MD5 sum, and mtime (modify time) are not what was deployed by httpd.rpm. This means that it is highly likely that the httpd.conf file has been modified after installation.

While the rpm command might not seem like a troubleshooting command at first, the preceding examples show just how powerful of a troubleshooting tool it can be. With these examples, it is simple to identify important files and whether or not those files have been modified from the deployed version.

df – report file system space usage

The df command is a very useful command when troubleshooting file system issues. The df command is used to output space utilization for mounted file systems:

# df -h
Filesystem             Size  Used Avail Use% Mounted on
/dev/mapper/rhel-root  6.7G  1.6G  5.2G  24% /
devtmpfs               489M     0  489M   0% /dev
tmpfs                  498M     0  498M   0% /dev/shm
tmpfs                  498M   13M  485M   3% /run
tmpfs                  498M     0  498M   0% /sys/fs/cgroup
/dev/sdb1              212G   58G  144G  29% /repos
/dev/sda1              497M  117M  380M  24% /boot

In the preceding example, the df command included the -h flag. This flag causes the df command to print any size values in a "human readable" format. By default, df will simply print these values in kilobytes. From the example, we can quickly see the current usage of all mounted filesystems. Specifically, if we look at the output, we can see that /filesystem is currently 24 percent used:

Filesystem             Size  Used Avail Use% Mounted on
/dev/mapper/rhel-root  6.7G  1.6G  5.2G  24% /

This is a very quick and easy way to identify whether any file system is full. In addition, the df command is also very useful in showing details of what file systems are mounted and where they are mounted to. From the line containing the /filesystem, we can see that the underlying device is /dev/mapper/rhel-root.

From this one command, we were able to identify two critical pieces of information.

Showing available inodes

The default behavior for df is to show the amount of used file system space. However, it can also be used to show the quantity of inodes available, used, and free for each file system. To output the inode utilization, simply add the -i (inode) flag when executing the df command:

# df -i
Filesystem              Inodes IUsed    IFree IUse% Mounted on
/dev/mapper/rhel-root  7032832 44318  6988514    1% /
devtmpfs                125039   347   124692    1% /dev

It is still possible to use the –h flag with df to print the output in a human readable format. However, with the –i flag, this abbreviates the output to M for millions, K for thousands, and so on. This output can be easily confused with Megabytes or Kilobytes, so in general, I do not use the human readable inode output when sharing the output with other users/administrators.

free – display memory utilization

When executed, the free command will output statistics about the memory available and in use on the system:

$ free
             total       used       free     shared    buffers     cached
Mem:       1018256     789796     228460      13116       3608     543484
-/+ buffers/cache:     242704     775552
Swap:       839676          4     839672

From the previous example, we can see that the output of the free command provides the total available memory, amount of memory currently used, and amount of memory free. The free command is a simple and quick way to identify the current state of memory on a system.

However, the output of free can be a bit confusing at first.

What is free, is not always free

Linux utilizes memory differently as compared to other operating systems. In the preceding output, you will see that it has 543,484 KB listed as cached. This memory, while technically used, is actually part of the available memory. The system can reallocate this cached memory as required.

A quick and easy way of seeing what is actually used or free can be seen on the second line of output. The preceding output shows that 775,552 KB of memory is available on the system.

The /proc/meminfo file

In previous RHEL releases, the second line of the free command was the easiest method for identifying how much memory is available. However, with RHEL 7, there have been some improvements to the /proc/meminfo file. One of those improvements is the addition of the MemAvailable statistic:

$ grep Available /proc/meminfo
MemAvailable:     641056 kB

The /proc/meminfo file is one of the many useful files located in the /proc file system. This file is maintained by the kernel and contains the system's current memory statistics. This file can be very useful when troubleshooting memory issues as it contains much more information than the output of the free command.

ps – report a snapshot of current running processes

The ps command is a fundamental command for any troubleshooting activity. This command, when executed, will output a list of running processes:

# ps
  PID TTY          TIME CMD
15618 pts/0    00:00:00 ps
17633 pts/0    00:00:00 bash

The ps command has many flags and options to show different information about running processes. The following are a few example ps commands that are useful during troubleshooting.

Printing every process in long format

The following ps command uses the -e (everything, all process), -l (long format), and -f (full format) flags. These flags will cause the ps command to not only print every process but will also print them in a format that provides quite a bit of useful information:

# ps -elf
F S UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY   TIME CMD
1 S root   2     0   0  80  0 - 0 kthrea Dec24 ?   00:00:00 [kthreadd]

In the preceding output of ps -elf, we can see many useful pieces of information for the kthreadd process, information such as the parent process ID (PPID), the priority (PRI), the niceness value (NI), and the resident memory size (SZ) of the running processes.

I have found that the preceding example is a very general-purpose ps command and can be used in most situations.

Printing a specific user's processes

The preceding example can get quite large; making it difficult to identify specific processes. This example uses the -U flag to specify a user. This causes the ps command to print all processes running as the specified user; postfix in the following case:

ps -U postfix -l
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY       TIME CMD
4 S    89  1546  1536  0  80   0 - 23516 ep_pol ?    00:00:00 qmgr
4 S    89 16711  1536  0  80   0 - 23686 ep_pol ?  00:00:00 pickup

It is important to note that the –U flag can also be combined with other flags to provide even more information on the running processes. In the preceding example, the -l flag is once again used to print the output in the long format.

Printing a process by process ID

If the process ID or PID is already known, it is possible to narrow down the process listing even further by specifying the process with the –p (process ID) flag:

# ps -p 1236 -l
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY       TIME CMD
4 S     0  1236     1  0  80   0 - 20739 poll_s ?    00:00:00 sshd

This can be especially useful when combined with the –L (show threads with LWP column) or –m (show threads after process) flag, which are used to print process threads. When troubleshooting multithreaded applications the -L and -m flags can be critical.

Printing processes with performance information

The ps command allows the user to customize the columns printed with the -o (user defined format) flag:

# ps -U postfix -o pid,user,pcpu,vsz,cmd
  PID USER     %CPU    VSZ CMD
 1546 postfix   0.0  94064 qmgr -l -t unix -u
16711 postfix   0.0  94744 pickup -l -t unix -u

The –o option allows for a wide number of custom columns. In the preceding version, I selected options that are similar to those printed in the top command.

The top command is one of the most popular Linux troubleshooting commands. It is used to show the top processes ordered by CPU usage (by default). In this chapter, I have opted to omit the top command, as I feel that the ps command is even more fundamental and flexible than the top command. As one becomes more familiar with the ps command, the top command will be easy to learn and understand.

Networking

Networking is an essential skill for any systems administrator. Without a properly configured network interface, a server serves little purpose. The commands in this section are specifically for looking up network configuration and current status. These commands are essential to learn, as they will not only be useful for troubleshooting but also for day-to-day setup and configuration.

ip – show and manipulate network settings

The ip command is used to manage network settings such as interface configuration, routing and essentially anything network related. While these are not traditionally considered troubleshooting tasks, the ip command can also be used to display a system's network configuration. Without being able to look up networking details such as routing or device configuration, it would be very difficult to troubleshoot network-related issues.

The following examples show various ways to use the ip command to identify critical network configuration settings.

Show IP address configuration for a specific device

One of the core uses of the ip command is to lookup a network interface and display its configuration. To do this, we will use the following command:

# ip addr show dev enp0s3
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:6e:35:18 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
       valid_lft 45083sec preferred_lft 45083sec
    inet6 fe80::a00:27ff:fe6e:3518/64 scope link
       valid_lft forever preferred_lft forever

In the preceding ip command, the first option provided addr (address) is used to define the type of information we are looking for. The second option show, tells ip to display the configuration of the first option. The third option dev (device) is followed by the network interface device in question; enp0s3. If the third option is omitted the ip command will show the address configuration for all network devices.

The device name enp0s3 might look a bit strange for those who have experience with previous RHEL releases. This device is following a newer network device naming scheme introduced with systemd. As of RHEL 7, network devices will use device names such as the previous, which are based on device driver and BIOS details.

To find out more about RHEL 7's new naming scheme simply reference the following URL:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Networking_Guide/ch-Consistent_Network_Device_Naming.html

Show routing configuration

The ip command can also be used to show routing configurations. This information is essential for troubleshooting connectivity issues between servers:

# ip route show
default via 10.0.2.2 dev enp0s3  proto static  metric 1024
10.0.2.0/24 dev enp0s3  proto kernel  scope link  src 10.0.2.15
192.168.56.0/24 dev enp0s8  proto kernel  scope link  src 192.168.56.101

The preceding ip command uses the route option followed by the show option to display all defined routes for this server. Like the previous example, it is possible to limit this output to a specific device by adding the dev (device) option followed by the device name:

# ip route show dev enp0s3
default via 10.0.2.2  proto static  metric 1024
10.0.2.0/24  proto kernel  scope link  src 10.0.2.15

Show network statistics for a specified device

Where the previous examples showed ways to lookup the current networking configuration, this next command uses the -s (statistics) flag to show network statistics for the specified device:

# ip -s link show dev enp0s3
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 08:00:27:6e:35:18 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    109717927  125911   0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    3944294    40127    0       0       0       0

In the preceding example, the link (network device) option was used to specify that the statistics should be limited to the specified device.

The statistics information shown can be useful when troubleshooting packets that are being dropped or to identify which interface has higher network utilization.

netstat – network statistics

The netstat command is an essential tool in any system administrator's tool belt. This can be seen by the fact that the netstat command is universally available even to operating systems that do not traditionally utilize command line for administration.

Printing network connections

One of the primary uses of netstat is to print the existing established network connections. This can be done by simply executing netstat; however, if the -a (all) flag is used, the output will also include listening ports:

# netstat -na
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address      Foreign Address    State
tcp        0      0 127.0.0.1:25       0.0.0.0:*          LISTEN
tcp        0      0 0.0.0.0:44969      0.0.0.0:*          LISTEN
tcp        0      0 0.0.0.0:111        0.0.0.0:*          LISTEN
tcp        0      0 0.0.0.0:22         0.0.0.0:*          LISTEN
tcp        0      0 192.168.56.101:22  192.168.56.1:50122 ESTABLISHED
tcp6       0      0 ::1:25               :::*               LISTEN

While the -a (all) flag used the preceding netstat causes to print all listening ports, the -n flag is used to force output into a numeric format, such as printing IP addresses rather than DNS host names.

The preceding example will be used heavily during Chapter 5, Network Troubleshooting, where we will be troubleshooting network connectivity.

Printing all ports listening for tcp connections

I have seen many instances where a service is running and is visible via the ps command; however, the port for clients to connect to was not bound and listening. The following netstat command can be very useful when troubleshooting connectivity issues with a service:

# netstat -nlp --tcp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address State       PID/Program name
tcp        0      0 127.0.0.1:25            0.0.0.0:* LISTEN      1536/master
tcp        0      0 0.0.0.0:44969           0.0.0.0:* LISTEN      1270/rpc.statd
tcp        0      0 0.0.0.0:111             0.0.0.0:* LISTEN      1215/rpcbind
tcp        0      0 0.0.0.0:22              0.0.0.0:* LISTEN      1236/sshd
tcp6       0      0 ::1:25                  :::* LISTEN      1536/master
tcp6       0      0 :::111                  :::* LISTEN      1215/rpcbind
tcp6       0      0 :::22                   :::* LISTEN      1236/sshd
tcp6       0      0 :::46072                :::* LISTEN      1270/rpc.statd

The preceding command is very useful as it combines three useful options:

–l (listening), which tells netstat to only list listening sockets
--tcp, which tells netstat to limit the output to TCP connections
–p (program), which tells netstat to list the PID and name of the process listening on that port

Delay

An often overlooked option with netstat is to utilize the delay feature. By adding a number at the end of the command, netstat will continuously run and will sleep for the specified number of seconds between executions.

If the following command is executed, the netstat command will print all listening TCP sockets every five seconds:

# netstat -nlp --tcp 5

The delay feature can be very useful when investigating network connectivity issues. As it can easily show when an application binds a port for new connections.

Performance

While we touched a bit on troubleshooting performance with commands such as free and ps, this section will show some very useful commands that answer the age-old question of "Why is it slow?"

iotop – a simple top-like I/O monitor

The iotop command is a relatively newer command to Linux. In previous RHEL releases while available it was not installed by default. The iotop command provides a top command-like interface but rather than showing which processes are utilizing the most CPU time or memory, it shows processes ordered by I/O utilization:

# iotop
Total DISK READ :       0.00 B/s | Total DISK WRITE :       0.00 B/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO COMMAND
 1536 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % master -w
    1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % systemd --switched-root --system --deserialize 23

Unlike some of the previous commands, iotop is very specialized to showing processes utilizing I/O. There are however, some very useful flags that can change iotop's default behavior. Flags such as –o (only), which tells iotop to only print processes using I/O rather than its default behavior of printing all processes. Another useful set of flags are -q (quiet) and –n (number of iterations).

Together with the -o flag, these flags can be used to tell iotop to print only the processes using I/O without clearing the screen for the next iteration:

# iotop -o -q -n2
Total DISK READ :     0.00 B/s | Total DISK WRITE :       0.00 B/s
Actual DISK READ:     0.00 B/s | Actual DISK WRITE:       0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN   IO   COMMAND
Total DISK READ :     0.00 B/s | Total DISK WRITE :       0.00 B/s
Actual DISK READ:     0.00 B/s | Actual DISK WRITE:       0.00 B/s
22965 be/4 root       0.00 B/s    0.00 B/s  0.00 %  0.03 % [kworker/0:3]

If we look at the preceding example output, we can see two independent iterations of the iotop command. However, unlike previous examples, the output is continuous allowing us to see which processes were using I/O at each iteration.

By default, the delay between iotop iterations is 1 second; however, this can be modified with the -d (delay) flag.

iostat – report I/O and CPU statistics

Where iotop shows what processes are utilizing I/O, iostat shows what devices are being utilized:

# iostat -t 1 2
Linux 3.10.0-123.el7.x86_64 (localhost.localdomain)   12/25/2014 _x86_64_  (1 CPU)

12/25/2014 03:20:10 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.11    0.00    0.17    0.01    0.00   99.72

Device:            tps    kB_read/s    kB_wrtn/s    kB_read kB_wrtn
sda               0.38         2.84         7.02     261526 646339
sdb               0.01         0.06         0.00       5449 12
dm-0              0.33         2.77         7.00     254948 644275
dm-1              0.00         0.01         0.00        936 4

12/25/2014 03:20:11 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.99    0.00    0.00   99.01

Device:            tps    kB_read/s    kB_wrtn/s    kB_read kB_wrtn
sda               0.00         0.00         0.00          0 0
sdb               0.00         0.00         0.00          0 0
dm-0              0.00         0.00         0.00          0 0
dm-1              0.00         0.00         0.00          0 0

The preceding iostat command uses the -t (timestamp) flag to print a timestamp with each report. The two numbers are interval and count values. In the preceding example, the iostat is run with a one second interval for a total count of two iterations.

The iostat command can be very useful for diagnosing issues related to I/O. However, the output can often be misleading. When executed, the values provided in the first report are averages since the last reboot of the system. The subsequent reports are since the previous report. In this example, we executed two reports, one second apart. You can see that the numbers in the first report are much higher than the second report.

For this reason, many systems administrators simply ignore the first report but they do not fully understand why. Therefore, it is not uncommon for someone unfamiliar with iostat to react to the values in the first report.

The iostat command does have a flag -y (omit first report), which will actually cause iostat to omit the first report. This is a good flag to teach users who may not be very familiar with using iostat.

Manipulating the output

The iostat command also has quite a few useful flags that allow you to manipulate how it presents data. Flags such as –p (device) allow you to limit statistics to a specified device or –x (extended stats) that will print extended statistics:

# iostat -p sda -tx
Linux 3.10.0-123.el7.x86_64 (localhost.localdomain)   12/25/2014 _x86_64_  (1 CPU)

12/25/2014 03:38:00 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.11    0.00    0.17    0.01    0.00   99.72

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.01     0.02    0.13    0.25     2.81     6.95 51.70     0.00    7.62    1.57   10.79   0.85   0.03
sda1              0.00     0.00    0.02    0.02     0.05     0.02 3.24     0.00    0.24    0.42    0.06   0.23   0.00
sda2              0.01     0.02    0.11    0.19     2.75     6.93 65.47     0.00    9.34    1.82   13.58   0.82   0.02

The preceding example uses the -p flag to specify the sda device, the -t flag to print timestamps, and the -x flag to print extended statistics. These flags can be very useful when measuring I/O performance for specific devices.

vmstat – report virtual memory statistics

Where iostat is used to report statistics about disk I/O performance, vmstat is used to report statistics about memory usage and performance:

# vmstat 1 3
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
2  0      4 225000   3608 544900    0    0     3     7   17   28 0  0 100  0  0
0  0      4 224992   3608 544900    0    0     0     0   19   19 0  0 100  0  0
0  0      4 224992   3608 544900    0    0     0     0    6    9 0  0 100  0  0

The vmstat syntax is very similar to iostat where you provide an interval and count of reports as command line arguments. Also, like iostat, the first report is actually an average since the last reboot and subsequent reports are since the previous report. Unfortunately, unlike the iostat command, the vmstat command does not include a flag to omit the first report. As such, in most cases, it is appropriate to simply ignore the first report.

While vmstat might not include a flag to omit the first report, it does have some very useful flags; they are flags such as –m (slabs), which causes vmstat to output the system's slabinfo at a defined interval, and -s (stats), which prints an extended report of the memory statistics for the system:

# vmstat -stats
      1018256 K total memory
       793416 K used memory,
       290372 K active memory
       360660 K inactive memory
       224840 K free memory
         3608 K buffer memory
       544908 K swap cache
       839676 K total swap
            4 K used swap
       839672 K free swap
        10191 non-nice user cpu ticks
           67 nice user cpu ticks
        11353 system cpu ticks
      9389547 idle cpu ticks
          556 IO-wait cpu ticks
           33 IRQ cpu ticks
         4434 softirq cpu ticks
            0 stolen cpu ticks
       267011 pages paged in
       647220 pages paged out
            0 pages swapped in
            1 pages swapped out
      1619609 interrupts
      2662083 CPU context switches
   1419453695 boot time
        59061 forks

The preceding code is an example of the -s or --stats flag being used.

sar – collect, report, or save system activity information

One very useful utility is the sar command, sar is a utility that comes with the sysstat package. The sysstat package includes various utilities that collect system metrics such as disk, CPU, memory, and network utilization. By default, this collection will run every 10 minutes and is executed as a cron job within /ettc/cron.d/sysstat.

While the data collected by sysstat can be very useful, this package is sometimes removed in high performance environments. As the collection of the system utilization statistics can add to the system's utilization, causing performance degradation. To see if the sysstat package is installed, simply use the rpm command with the -q (query) flag:

# rpm -q sysstat
sysstat-10.1.5-4.el7.x86_64

Using the sar command

The sar command allows users to review the information collected by the sysstat utilities. When executed with no flags, the sar command will print the current day's CPU statistics:

# sar | head -6
Linux 3.10.0-123.el7.x86_64 (localhost.localdomain)   12/25/2014   _x86_64_  (1 CPU)

12:00:01 AM     CPU     %user     %nice   %system   %iowait %steal     %idle
12:10:02 AM     all      0.05      0.00      0.20      0.01 0.00     99.74
12:20:01 AM     all      0.05      0.00      0.18      0.00 0.00     99.77
12:30:01 AM     all      0.06      0.00      0.25      0.00 0.00     99.69

Every day at midnight, the systat collector will create a new file to store the collected statistics. To reference the statistics within that file, simply use the -f (file) flag to run sar against the specified file:

# sar -f /var/log/sa/sa13
Linux 3.10.0-123.el7.x86_64 (localhost.localdomain)   12/13/2014   _x86_64_  (1 CPU)

10:24:43 AM       LINUX RESTART

10:30:01 AM     CPU     %user     %nice   %system   %iowait %steal     %idle
10:40:01 AM     all      2.99      0.00      0.96      0.43 0.00     95.62
10:50:01 AM     all      9.70      0.00      2.17      0.00 0.00     88.13
11:00:01 AM     all      0.31      0.00      0.30      0.02 0.00     99.37
11:10:01 AM     all      1.20      0.00      0.41      0.01 0.00     98.38
11:20:01 AM     all      0.01      0.00      0.04      0.01 0.00     99.94
11:30:01 AM     all      0.92      0.07      0.42      0.01 0.00     98.59
11:40:01 AM     all      0.17      0.00      0.08      0.00 0.00     99.74
11:50:02 AM     all      0.01      0.00      0.03      0.00 0.00     99.96

In the preceding code, the file specified was /var/log/sa/sa13; this file contains statistics for the 13th day of the current month.

The sar command has many useful flags, far too many to list in this chapter. A few extremely useful flags are listed as follows:

-b: This prints I/O statistics similar to the iostat command
-n ALL: This prints network statistics for all network devices
-R: This prints memory utilization statistics
-A: This prints all statistics gathered. It is essentially equivalent to running sar -bBdHqrRSuvwWy -I SUM -I XALL -m ALL -n ALL -u ALL -P ALL

While the sar command shows many statistics, we already covered commands such as iostat or vmstat. The biggest benefit of the sar command is the ability to review statistics in the past. This ability is critical when troubleshooting a performance issue that occurred for a short period of time or was already mitigated.

官术网_书友最值得收藏!

Red Hat Enterprise Linux Troubleshooting Guide