- Red Hat Enterprise Linux Troubleshooting Guide
- Benjamin Cane
- 6124字
- 2021-07-09 21:50:12
Troubleshooting commands
This section will cover frequently used troubleshooting commands that can be used to gather information from the system or a running service. While it is not feasible to cover every possible command, the commands used do cover fundamental troubleshooting steps for Linux systems.
Command-line basics
The troubleshooting steps used within this book are primarily command-line based. While it is possible to perform many of these things from a graphical desktop environment, the more advanced items are command-line specific. As such, this book assumes that the reader has at least a basic understanding of Linux. To be more specific, this book assumes that the reader has logged into a server via SSH and is familiar with basic commands such as cd
, cp
, mv
, rm
, and ls
.
For those who might not have much familiarity, I wanted to quickly cover some basic command-line usage that will be required knowledge for this book.
Many readers are probably familiar with the following command:
$ ls -la total 588 drwx------. 5 vagrant vagrant 4096 Jul 4 21:26 . drwxr-xr-x. 3 root root 20 Jul 22 2014 .. -rw-rw-r--. 1 vagrant vagrant 153104 Jun 10 17:03 app.c
Most should recognize that this is the ls
command and it is used to perform a directory listing. What might not be familiar is what exactly the –la
part of the command is or does. To understand this better, let's look at the ls command by itself:
$ ls app.c application app.py bomber.py index.html lookbusy-1.4 lookbusy-1.4.tar.gz lotsofiles
The previous execution of the ls
command looks very different from the previous. The reason for this is because the latter is the default output for ls
. The –la
portion of the command is what is commonly referred to as command flags or options. The command flags allow a user to change the default behavior of the command providing it with specific options.
In fact, the –la
flags are two separate options, –l
and –a
; they can even be specified separately:
$ ls -l -a total 588 drwx------. 5 vagrant vagrant 4096 Jul 4 21:26 . drwxr-xr-x. 3 root root 20 Jul 22 2014 .. -rw-rw-r--. 1 vagrant vagrant 153104 Jun 10 17:03 app.c
We can see from the preceding snippet that the output of ls –la
is exactly the same as ls –l –a
. For common commands, such as the ls
command, it does not matter if the flags are grouped or separated, they will be parsed in the same way. Throughout this book, examples will show both grouped and ungrouped. If grouping or ungrouping is performed for any specific reason it will be called out; otherwise, the grouping or ungrouping used within this book is used for visual appeal and memorization.
In addition to grouping and ungrouping, this book will also show flags in their long format. In the previous examples, we showed the flag -a
, this is known as a short flag. This same option can also be provided in the long format --all
:
$ ls -l --all total 588 drwx------. 5 vagrant vagrant 4096 Jul 4 21:26 . drwxr-xr-x. 3 root root 20 Jul 22 2014 .. -rw-rw-r--. 1 vagrant vagrant 153104 Jun 10 17:03 app.c
The –a
and the --all
flags are essentially the same option; it can simply be represented in both short and long form.
One important thing to remember is that not every short flag has a long form and vice versa. Each command has its own syntax, some commands only support the short form, others only support the long form, but many support both. In most cases, the long and short flags will both be documented within the command's man page.
Another common command-line practice that will be used several times throughout this book is piping
output. Specifically, examples such as the following:
$ ls -l --all | grep app -rw-rw-r--. 1 vagrant vagrant 153104 Jun 10 17:03 app.c -rwxrwxr-x. 1 vagrant vagrant 29390 May 18 00:47 application -rw-rw-r--. 1 vagrant vagrant 1198 Jun 10 17:03 app.py
In the preceding example, the output of the ls -l --all
command is piped to the grep
command. By placing |
or the pipe character between the two commands, the output of the first command is "piped" to the input for the second command. The example preceding the ls
command will be executed; with that, the grep
command will then search that output for any instance of the pattern "app
".
Piping output to grep
will actually be used quite often throughout this book, as it is a simple way to trim the output into a maintainable size. Many times the examples will also contain multiple levels of piping:
$ ls -la | grep app | awk '{print $4,$9}' vagrant app.c vagrant application vagrant app.py
In the preceding code the output of ls -la
is piped to the input of grep
; however, this time, the output of grep
is also piped to the input of awk
.
While many commands can be piped to, not every command supports this. In general, commands that accept user input from files or command-line also accept piped input. As with the flags, a command's man page can be used to identify whether the command accepts piped input or not.
Gathering general information
When managing the same servers for a long time, you start to remember key information about those servers. Such as the amount of physical memory, the size and layout of their filesystems, and what processes should be running. However, when you are not familiar with the server in question it is always a good idea to gather this type of information.
The commands in this section are commands that can be used to gather this type of general information.
Early in my systems administration career, I had a mentor who used to tell me: I always run w when I log into a server. This simple tip has actually been very useful over and over again in my career. The w
command is simple; when executed it will output information such as system uptime, load average, and who is logged in:
# w 04:07:37 up 14:26, 2 users, load average: 0.00, 0.01, 0.05 USER TTY LOGIN@ IDLE JCPU PCPU WHAT root tty1 Wed13 11:24m 0.13s 0.13s -bash root pts/0 20:47 1.00s 0.21s 0.19s -bash
This information can be extremely useful when working with unfamiliar systems. The output can be useful even when you are familiar with the system. With this command, you can see:
- When this system was last rebooted:
04:07:37 up 14:26
: This information can be extremely useful; whether it is an alert for a service like Apache being down, or a user calling in because they were locked out of the system. When these issues are caused by an unexpected reboot, the reported issue does not often include this information. By running thew
command, it is easy to see the time elapsed since the last reboot. - The load average of the system:
load average: 0.00, 0.01, 0.05
: The load average is a very important measurement of system health. To summarize it, the load average is the average number of processes in await
state over a period of time. The three numbers in the output ofw
represent different times.The numbers are ordered from left to right as 1 minute, 5 minutes, and 15 minutes.
- Who is logged in and what they are running:
USER TTY LOGIN@ IDLE JCPU PCPU WHAT
root tty1 Wed13 11:24m 0.13s 0.13s -bash
The final piece of information that the
w
command provides is users that are currently logged in and what command they are executing.
This is essentially the same output as the who
command, which includes the user logged in, when they logged in, how long they have been idle, and what command their shell is running. The last item in that list is extremely important.
Oftentimes, when working with big teams, it is common for more than one person to respond to an issue or ticket. By running the w
command immediately after login, you will see what other users are doing, preventing you from overriding any troubleshooting or corrective steps the other person has taken.
The rpm
command is used to manage Red Hat package manager (RPM). With this command, you can install and remove RPM packages, as well as search for packages that are already installed.
Earlier in this chapter, we saw how the rpm
command can be used to look for configuration files. The following are several additional ways we can use the rpm
command to find critical information.
Often when troubleshooting services, a critical step is identifying the version of the service and how it was installed. To list all RPM packages installed on a system, simply execute the rpm
command with -q
(query) and -a
(all):
# rpm -q -a kpatch-0.0-1.el7.noarch virt-what-1.13-5.el7.x86_64 filesystem-3.2-18.el7.x86_64 gssproxy-0.3.0-9.el7.x86_64 hicolor-icon-theme-0.12-7.el7.noarch
The rpm
command is a very diverse command with many flags. In the preceding example the -q
and -a
flags are used. The -q
flag tells the rpm
command that the action being taken is a query; you can think of this as being put into a "search mode". The -a
or --all
flag tells the rpm
command to list all packages.
A useful feature is to add the --last
flag to the preceding command, as this causes the rpm
command to list the packages by install time with the latest being first.
Another useful rpm
function is to show all of the files deployed by a specific package:
# rpm -q --filesbypkg kpatch-0.0-1.el7.noarch kpatch /usr/bin/kpatch kpatch /usr/lib/systemd/system/kpatch.service
In the preceding example, we again use the -q
flag to specify that we are running a query, along with the --filesbypkg
flag. The --filesbypkg
flag will cause the rpm
command to list all of the files deployed by the specified package.
This example can be very useful when trying to identify a service's configuration file location.
In this third example, we are going to use an extremely useful feature of rpm
—verify. The rpm
command has the ability to verify whether or not the files deployed by a specified package have been altered from their original contents. To do this, we will use the -V
(verify) flag:
# rpm -V httpd S.5....T. c /etc/httpd/conf/httpd.conf
In the preceding example, we simply run the rpm
command with the -V
flag followed by a package name. As the -q
flag is used for querying, the -V
flag is for verifying. With this command, we can see that only the /etc/httpd/conf/httpd.conf
file was listed; this is because rpm
will only output files that have been altered.
In the first column of this output, we can see which verification checks the file failed. While this column is a bit cryptic at first, the rpm man page has a useful table (as shown in the following list) explaining what each character means:
S
: This means that the file size differsM
: This means that the mode differs (includes permissions and file type)5
: This means that the digest (formerlyMD5 sum
) differsD
: This means indicates the device major/minor number mismatchL
: This means indicates thereadLink(2)
path mismatchU
: This means that the user ownership differsG
: This means that the group ownership differsT
: This means thatmTime
differsP
: This means thatcaPabilities
differs
Using this list we can see that the httpd
.conf's
file size, MD5
sum, and mtime
(modify time) are not what was deployed by httpd.rpm
. This means that it is highly likely that the httpd.conf
file has been modified after installation.
While the rpm
command might not seem like a troubleshooting command at first, the preceding examples show just how powerful of a troubleshooting tool it can be. With these examples, it is simple to identify important files and whether or not those files have been modified from the deployed version.
The df
command is a very useful command when troubleshooting file system issues. The df
command is used to output space utilization for mounted file systems:
# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhel-root 6.7G 1.6G 5.2G 24% / devtmpfs 489M 0 489M 0% /dev tmpfs 498M 0 498M 0% /dev/shm tmpfs 498M 13M 485M 3% /run tmpfs 498M 0 498M 0% /sys/fs/cgroup /dev/sdb1 212G 58G 144G 29% /repos /dev/sda1 497M 117M 380M 24% /boot
In the preceding example, the df
command included the -h
flag. This flag causes the df
command to print any size values in a "human readable" format. By default, df
will simply print these values in kilobytes. From the example, we can quickly see the current usage of all mounted filesystems. Specifically, if we look at the output, we can see that /filesystem
is currently 24 percent used:
Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhel-root 6.7G 1.6G 5.2G 24% /
This is a very quick and easy way to identify whether any file system is full. In addition, the df
command is also very useful in showing details of what file systems are mounted and where they are mounted to. From the line containing the /filesystem
, we can see that the underlying device is /dev/mapper/rhel-root
.
From this one command, we were able to identify two critical pieces of information.
The default behavior for df
is to show the amount of used file system space. However, it can also be used to show the quantity of inodes available, used, and free for each file system. To output the inode utilization, simply add the -i
(inode) flag when executing the df
command:
# df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/rhel-root 7032832 44318 6988514 1% / devtmpfs 125039 347 124692 1% /dev
It is still possible to use the –h
flag with df
to print the output in a human readable format. However, with the –i
flag, this abbreviates the output to M
for millions, K
for thousands, and so on. This output can be easily confused with Megabytes or Kilobytes, so in general, I do not use the human readable inode output when sharing the output with other users/administrators.
When executed, the free
command will output statistics about the memory available and in use on the system:
$ free total used free shared buffers cached Mem: 1018256 789796 228460 13116 3608 543484 -/+ buffers/cache: 242704 775552 Swap: 839676 4 839672
From the previous example, we can see that the output of the free
command provides the total available memory, amount of memory currently used, and amount of memory free. The free
command is a simple and quick way to identify the current state of memory on a system.
However, the output of free
can be a bit confusing at first.
Linux utilizes memory differently as compared to other operating systems. In the preceding output, you will see that it has 543,484 KB listed as cached. This memory, while technically used, is actually part of the available memory. The system can reallocate this cached memory as required.
A quick and easy way of seeing what is actually used or free can be seen on the second line of output. The preceding output shows that 775,552 KB of memory is available on the system.
In previous RHEL releases, the second line of the free
command was the easiest method for identifying how much memory is available. However, with RHEL 7, there have been some improvements to the /proc/meminfo
file. One of those improvements is the addition of the MemAvailable statistic:
$ grep Available /proc/meminfo MemAvailable: 641056 kB
The /proc/meminfo
file is one of the many useful files located in the /proc
file system. This file is maintained by the kernel and contains the system's current memory statistics. This file can be very useful when troubleshooting memory issues as it contains much more information than the output of the free
command.
The ps
command is a fundamental command for any troubleshooting activity. This command, when executed, will output a list of running processes:
# ps PID TTY TIME CMD 15618 pts/0 00:00:00 ps 17633 pts/0 00:00:00 bash
The ps
command has many flags and options to show different information about running processes. The following are a few example ps
commands that are useful during troubleshooting.
The following ps
command uses the -e
(everything, all process), -l
(long format), and -f
(full format) flags. These flags will cause the ps
command to not only print every process but will also print them in a format that provides quite a bit of useful information:
# ps -elf F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 1 S root 2 0 0 80 0 - 0 kthrea Dec24 ? 00:00:00 [kthreadd]
In the preceding output of ps -elf
, we can see many useful pieces of information for the kthreadd
process, information such as the parent process ID (PPID), the priority (PRI), the niceness value (NI), and the resident memory size (SZ) of the running processes.
I have found that the preceding example is a very general-purpose ps
command and can be used in most situations.
The preceding example can get quite large; making it difficult to identify specific processes. This example uses the -U
flag to specify a user. This causes the ps
command to print all processes running as the specified user; postfix in the following case:
ps -U postfix -l F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 4 S 89 1546 1536 0 80 0 - 23516 ep_pol ? 00:00:00 qmgr 4 S 89 16711 1536 0 80 0 - 23686 ep_pol ? 00:00:00 pickup
It is important to note that the –U
flag can also be combined with other flags to provide even more information on the running processes. In the preceding example, the -l
flag is once again used to print the output in the long format.
If the process ID or PID is already known, it is possible to narrow down the process listing even further by specifying the process with the –p
(process ID) flag:
# ps -p 1236 -l F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 4 S 0 1236 1 0 80 0 - 20739 poll_s ? 00:00:00 sshd
This can be especially useful when combined with the –L
(show threads with LWP column) or –m
(show threads after process) flag, which are used to print process threads. When troubleshooting multithreaded applications the -L
and -m
flags can be critical.
The ps
command allows the user to customize the columns printed with the -o
(user defined format) flag:
# ps -U postfix -o pid,user,pcpu,vsz,cmd PID USER %CPU VSZ CMD 1546 postfix 0.0 94064 qmgr -l -t unix -u 16711 postfix 0.0 94744 pickup -l -t unix -u
The –o
option allows for a wide number of custom columns. In the preceding version, I selected options that are similar to those printed in the top command.
The top command is one of the most popular Linux troubleshooting commands. It is used to show the top processes ordered by CPU usage (by default). In this chapter, I have opted to omit the top command, as I feel that the ps
command is even more fundamental and flexible than the top command. As one becomes more familiar with the ps
command, the top command will be easy to learn and understand.
Networking
Networking is an essential skill for any systems administrator. Without a properly configured network interface, a server serves little purpose. The commands in this section are specifically for looking up network configuration and current status. These commands are essential to learn, as they will not only be useful for troubleshooting but also for day-to-day setup and configuration.
The ip
command is used to manage network settings such as interface configuration, routing and essentially anything network related. While these are not traditionally considered troubleshooting tasks, the ip
command can also be used to display a system's network configuration. Without being able to look up networking details such as routing or device configuration, it would be very difficult to troubleshoot network-related issues.
The following examples show various ways to use the ip
command to identify critical network configuration settings.
One of the core uses of the ip
command is to lookup a network interface and display its configuration. To do this, we will use the following command:
# ip addr show dev enp0s3 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 08:00:27:6e:35:18 brd ff:ff:ff:ff:ff:ff inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3 valid_lft 45083sec preferred_lft 45083sec inet6 fe80::a00:27ff:fe6e:3518/64 scope link valid_lft forever preferred_lft forever
In the preceding ip
command, the first option provided addr
(address) is used to define the type of information we are looking for. The second option show
, tells ip
to display the configuration of the first option. The third option dev
(device) is followed by the network interface device in question; enp0s3
. If the third option is omitted the ip
command will show the address configuration for all network devices.
The device name enp0s3
might look a bit strange for those who have experience with previous RHEL releases. This device is following a newer network device naming scheme introduced with systemd
. As of RHEL 7, network devices will use device names such as the previous, which are based on device driver and BIOS details.
To find out more about RHEL 7's new naming scheme simply reference the following URL:
The ip
command can also be used to show routing configurations. This information is essential for troubleshooting connectivity issues between servers:
# ip route show default via 10.0.2.2 dev enp0s3 proto static metric 1024 10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.15 192.168.56.0/24 dev enp0s8 proto kernel scope link src 192.168.56.101
The preceding ip
command uses the route
option followed by the show
option to display all defined routes for this server. Like the previous example, it is possible to limit this output to a specific device by adding the dev
(device) option followed by the device name:
# ip route show dev enp0s3 default via 10.0.2.2 proto static metric 1024 10.0.2.0/24 proto kernel scope link src 10.0.2.15
Where the previous examples showed ways to lookup the current networking configuration, this next command uses the -s
(statistics) flag to show network statistics for the specified device:
# ip -s link show dev enp0s3 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000 link/ether 08:00:27:6e:35:18 brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 109717927 125911 0 0 0 0 TX: bytes packets errors dropped carrier collsns 3944294 40127 0 0 0 0
In the preceding example, the link
(network device) option was used to specify that the statistics should be limited to the specified device.
The statistics information shown can be useful when troubleshooting packets that are being dropped or to identify which interface has higher network utilization.
The netstat
command is an essential tool in any system administrator's tool belt. This can be seen by the fact that the netstat
command is universally available even to operating systems that do not traditionally utilize command line for administration.
One of the primary uses of netstat
is to print the existing established network connections. This can be done by simply executing netstat
; however, if the -a
(all) flag is used, the output will also include listening ports:
# netstat -na Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:44969 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN tcp 0 0 192.168.56.101:22 192.168.56.1:50122 ESTABLISHED tcp6 0 0 ::1:25 :::* LISTEN
While the -a
(all) flag used the preceding netstat
causes to print all listening ports, the -n
flag is used to force output into a numeric format, such as printing IP addresses rather than DNS host names.
The preceding example will be used heavily during Chapter 5, Network Troubleshooting, where we will be troubleshooting network connectivity.
I have seen many instances where a service is running and is visible via the ps
command; however, the port for clients to connect to was not bound and listening. The following netstat
command can be very useful when troubleshooting connectivity issues with a service:
# netstat -nlp --tcp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1536/master tcp 0 0 0.0.0.0:44969 0.0.0.0:* LISTEN 1270/rpc.statd tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 1215/rpcbind tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1236/sshd tcp6 0 0 ::1:25 :::* LISTEN 1536/master tcp6 0 0 :::111 :::* LISTEN 1215/rpcbind tcp6 0 0 :::22 :::* LISTEN 1236/sshd tcp6 0 0 :::46072 :::* LISTEN 1270/rpc.statd
The preceding command is very useful as it combines three useful options:
–l
(listening), which tellsnetstat
to only list listening sockets--tcp
, which tellsnetstat
to limit the output to TCP connections–p
(program), which tellsnetstat
to list the PID and name of the process listening on that port
An often overlooked option with netstat
is to utilize the delay feature. By adding a number at the end of the command, netstat
will continuously run and will sleep for the specified number of seconds between executions.
If the following command is executed, the netstat
command will print all listening TCP sockets every five seconds:
# netstat -nlp --tcp 5
The delay feature can be very useful when investigating network connectivity issues. As it can easily show when an application binds a port for new connections.
Performance
While we touched a bit on troubleshooting performance with commands such as free
and ps
, this section will show some very useful commands that answer the age-old question of "Why is it slow?"
The iotop
command is a relatively newer command to Linux. In previous RHEL releases while available it was not installed by default. The iotop
command provides a top command-like interface but rather than showing which processes are utilizing the most CPU time or memory, it shows processes ordered by I/O utilization:
# iotop Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND 1536 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % master -w 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % systemd --switched-root --system --deserialize 23
Unlike some of the previous commands, iotop
is very specialized to showing processes utilizing I/O. There are however, some very useful flags that can change iotop's default behavior. Flags such as –o
(only), which tells iotop
to only print processes using I/O rather than its default behavior of printing all processes. Another useful set of flags are -q
(quiet) and –n
(number of iterations).
Together with the -o
flag, these flags can be used to tell iotop
to print only the processes using I/O without clearing the screen for the next iteration:
# iotop -o -q -n2 Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s 22965 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.03 % [kworker/0:3]
If we look at the preceding example output, we can see two independent iterations of the iotop
command. However, unlike previous examples, the output is continuous allowing us to see which processes were using I/O at each iteration.
By default, the delay between iotop
iterations is 1 second; however, this can be modified with the -d
(delay) flag.
Where iotop
shows what processes are utilizing I/O, iostat
shows what devices are being utilized:
# iostat -t 1 2 Linux 3.10.0-123.el7.x86_64 (localhost.localdomain) 12/25/2014 _x86_64_ (1 CPU) 12/25/2014 03:20:10 PM avg-cpu: %user %nice %system %iowait %steal %idle 0.11 0.00 0.17 0.01 0.00 99.72 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 0.38 2.84 7.02 261526 646339 sdb 0.01 0.06 0.00 5449 12 dm-0 0.33 2.77 7.00 254948 644275 dm-1 0.00 0.01 0.00 936 4 12/25/2014 03:20:11 PM avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.99 0.00 0.00 99.01 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 0.00 0.00 0.00 0 0 sdb 0.00 0.00 0.00 0 0 dm-0 0.00 0.00 0.00 0 0 dm-1 0.00 0.00 0.00 0 0
The preceding iostat
command uses the -t
(timestamp) flag to print a timestamp with each report. The two numbers are interval and count values. In the preceding example, the iostat
is run with a one second interval for a total count of two iterations.
The iostat
command can be very useful for diagnosing issues related to I/O. However, the output can often be misleading. When executed, the values provided in the first report are averages since the last reboot of the system. The subsequent reports are since the previous report. In this example, we executed two reports, one second apart. You can see that the numbers in the first report are much higher than the second report.
For this reason, many systems administrators simply ignore the first report but they do not fully understand why. Therefore, it is not uncommon for someone unfamiliar with iostat
to react to the values in the first report.
The iostat
command does have a flag -y
(omit first report), which will actually cause iostat
to omit the first report. This is a good flag to teach users who may not be very familiar with using iostat
.
The iostat
command also has quite a few useful flags that allow you to manipulate how it presents data. Flags such as –p
(device) allow you to limit statistics to a specified device or –x
(extended stats) that will print extended statistics:
# iostat -p sda -tx Linux 3.10.0-123.el7.x86_64 (localhost.localdomain) 12/25/2014 _x86_64_ (1 CPU) 12/25/2014 03:38:00 PM avg-cpu: %user %nice %system %iowait %steal %idle 0.11 0.00 0.17 0.01 0.00 99.72 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.01 0.02 0.13 0.25 2.81 6.95 51.70 0.00 7.62 1.57 10.79 0.85 0.03 sda1 0.00 0.00 0.02 0.02 0.05 0.02 3.24 0.00 0.24 0.42 0.06 0.23 0.00 sda2 0.01 0.02 0.11 0.19 2.75 6.93 65.47 0.00 9.34 1.82 13.58 0.82 0.02
The preceding example uses the -p
flag to specify the sda
device, the -t
flag to print timestamps, and the -x
flag to print extended statistics. These flags can be very useful when measuring I/O performance for specific devices.
Where iostat
is used to report statistics about disk I/O performance, vmstat
is used to report statistics about memory usage and performance:
# vmstat 1 3 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 2 0 4 225000 3608 544900 0 0 3 7 17 28 0 0 100 0 0 0 0 4 224992 3608 544900 0 0 0 0 19 19 0 0 100 0 0 0 0 4 224992 3608 544900 0 0 0 0 6 9 0 0 100 0 0
The vmstat
syntax is very similar to iostat
where you provide an interval and count of reports as command line arguments. Also, like iostat
, the first report is actually an average since the last reboot and subsequent reports are since the previous report. Unfortunately, unlike the iostat
command, the vmstat
command does not include a flag to omit the first report. As such, in most cases, it is appropriate to simply ignore the first report.
While vmstat
might not include a flag to omit the first report, it does have some very useful flags; they are flags such as –m
(slabs), which causes vmstat
to output the system's slabinfo
at a defined interval, and -s
(stats), which prints an extended report of the memory statistics for the system:
# vmstat -stats 1018256 K total memory 793416 K used memory, 290372 K active memory 360660 K inactive memory 224840 K free memory 3608 K buffer memory 544908 K swap cache 839676 K total swap 4 K used swap 839672 K free swap 10191 non-nice user cpu ticks 67 nice user cpu ticks 11353 system cpu ticks 9389547 idle cpu ticks 556 IO-wait cpu ticks 33 IRQ cpu ticks 4434 softirq cpu ticks 0 stolen cpu ticks 267011 pages paged in 647220 pages paged out 0 pages swapped in 1 pages swapped out 1619609 interrupts 2662083 CPU context switches 1419453695 boot time 59061 forks
The preceding code is an example of the -s
or --stats
flag being used.
One very useful utility is the sar
command, sar
is a utility that comes with the sysstat
package. The sysstat
package includes various utilities that collect system metrics such as disk, CPU, memory, and network utilization. By default, this collection will run every 10 minutes and is executed as a cron
job within /ettc/cron.d/sysstat
.
While the data collected by sysstat
can be very useful, this package is sometimes removed in high performance environments. As the collection of the system utilization statistics can add to the system's utilization, causing performance degradation. To see if the sysstat
package is installed, simply use the rpm command with the -q
(query) flag:
# rpm -q sysstat sysstat-10.1.5-4.el7.x86_64
The sar
command allows users to review the information collected by the sysstat
utilities. When executed with no flags, the sar
command will print the current day's CPU statistics:
# sar | head -6 Linux 3.10.0-123.el7.x86_64 (localhost.localdomain) 12/25/2014 _x86_64_ (1 CPU) 12:00:01 AM CPU %user %nice %system %iowait %steal %idle 12:10:02 AM all 0.05 0.00 0.20 0.01 0.00 99.74 12:20:01 AM all 0.05 0.00 0.18 0.00 0.00 99.77 12:30:01 AM all 0.06 0.00 0.25 0.00 0.00 99.69
Every day at midnight, the systat
collector will create a new file to store the collected statistics. To reference the statistics within that file, simply use the -f
(file) flag to run sar
against the specified file:
# sar -f /var/log/sa/sa13 Linux 3.10.0-123.el7.x86_64 (localhost.localdomain) 12/13/2014 _x86_64_ (1 CPU) 10:24:43 AM LINUX RESTART 10:30:01 AM CPU %user %nice %system %iowait %steal %idle 10:40:01 AM all 2.99 0.00 0.96 0.43 0.00 95.62 10:50:01 AM all 9.70 0.00 2.17 0.00 0.00 88.13 11:00:01 AM all 0.31 0.00 0.30 0.02 0.00 99.37 11:10:01 AM all 1.20 0.00 0.41 0.01 0.00 98.38 11:20:01 AM all 0.01 0.00 0.04 0.01 0.00 99.94 11:30:01 AM all 0.92 0.07 0.42 0.01 0.00 98.59 11:40:01 AM all 0.17 0.00 0.08 0.00 0.00 99.74 11:50:02 AM all 0.01 0.00 0.03 0.00 0.00 99.96
In the preceding code, the file specified was /var/log/sa/sa13
; this file contains statistics for the 13th day of the current month.
The sar
command has many useful flags, far too many to list in this chapter. A few extremely useful flags are listed as follows:
-b
: This prints I/O statistics similar to theiostat
command-n ALL
: This prints network statistics for all network devices-R
: This prints memory utilization statistics-A
: This prints all statistics gathered. It is essentially equivalent to runningsar -bBdHqrRSuvwWy -I SUM -I XALL -m ALL -n ALL -u ALL -P ALL
While the sar
command shows many statistics, we already covered commands such as iostat
or vmstat
. The biggest benefit of the sar
command is the ability to review statistics in the past. This ability is critical when troubleshooting a performance issue that occurred for a short period of time or was already mitigated.
- AngularJS入門與進階
- Drupal 8 Blueprints
- Photoshop智能手機APP UI設計之道
- 程序員面試算法寶典
- Object-Oriented JavaScript(Second Edition)
- 名師講壇:Java微服務架構實戰(SpringBoot+SpringCloud+Docker+RabbitMQ)
- Arduino家居安全系統構建實戰
- 基于SpringBoot實現:Java分布式中間件開發入門與實戰
- SpringBoot從零開始學(視頻教學版)
- 軟件測試技術
- Using Yocto Project with BeagleBone Black
- Mastering VMware vSphere Storage
- Building an E-Commerce Application with MEAN
- 算法技術手冊
- PHP程序員面試算法寶典