Measuring CPU Load

Various tools are available for monitoring CPU load, but it’s usually the combination of these tools that provides the most useful data.

Other than just showing how long the system has been up, the uptime command can be used to give you a rough estimate of the system load. The uptime command prints the current time, the length of time that the system has been up, and the average number of jobs in the run queue over the last 1, 5, and 15 minutes. When I type the uptime command, the system responds with the following:

11:23am  up 2 day(s), 19:15,  1 user,  load average: 0.01, 0.02, 0.04 

Let’s look at the load average numbers. The load average is the sum of the run queue length and the number of jobs currently running on CPUs. In short, it’s a rough estimate of CPU usage. Notice the figures, showing averages over the last 1, 5, and 15 minutes. High load averages mean that the system is being used heavily and the response time is sluggish. What is a high load average? It depends on your system. If you’ve been keeping an eye on the load average, you’ll know what is a good average and what is a bad average based on the history of the system. Normally, I would say a load average of 3 or less is good, but I’ve seen systems with a load average of 5 in which performance is still good. Different system configurations behave differently under the same load averages.

Keep in mind that the load average is simply a starting point. Just because the load average is low, it doesn’t mean you are not experiencing slow response times.

The ps command will give you more useful information regarding what is going on with your system. Use the following options with the ps command to get a complete picture of all the processes running on your system:

ps –elf 

The system responds with the following:

F S      UID   PID  PPID  C PRI NI   ADDR    SZ    WCHAN   STIME TTY   TIME CMD 
19 T     root     0     0  0   0 SY    ?      0            Apr 30 ?    0:18 sched 
 8 S     root     1     0  0  40 20    ?    150        ?   Apr 30 ?    0:00 /etc/init -
19 S     root     2     0  0   0 SY    ?      0        ?   Apr 30 ?    0:00 pageout 
19 S     root     3     0  0   0 SY    ?      0        ?   Apr 30 ?    1:01 fsflush 
 8 S     root   333     1  0  40 20    ?    217        ?   Apr 30 ?    0:00  
 /usr/lib/saf/sac -t 300 
 8 S     root  2087     1  0  40 20    ?    239        ? 10:42:32 ?     
 0:00/bin/ksh /usr/dt/bin/sdtvolcheck -d 
 8 S     root   144     1  0  40 20    ?    273        ?   Apr 30 ?    0:00  
 /usr/sbin/rpcbind 
 8 S     root    52     1  0  40 20    ?    268        ?   Apr 30 ?    0:00  
 /usr/lib/sysevent/syseventd 
 8 S     root    62     1  0  40 20    ?    343        ?   Apr 30 ?    0:01  
 /usr/lib/picl/picld 
 8 S     root   190     1  0  40 20    ?    562        ?   Apr 30 ?    0:00  
 /usr/lib/autofs/automountd 
 8 S     root   233     1  0  40 20    ?    173        ?   Apr 30 ?    0:00  
 /usr/lib/power/powerd 
 8 S     root   166     1  0  40 20    ?    292        ?   Apr 30 ?    0:00  
 /usr/sbin/inetd -s 
 8 S   daemon   183     1  0  40 20    ?    306        ?   Apr 30 ?    0:00  
 /usr/lib/nfs/statd 
 8 S     root   201     1  0  40 20    ?    410        ?   Apr 30 ?    0:00  
 /usr/sbin/syslogd 
 8 S     root   220     1  0  40 20    ?    394        ?   Apr 30 ?    0:00  
 /usr/lib/lpsched 
 8 S     root   180     1  0  40 20    ?    266        ?   Apr 30 ?    0:00  
 /usr/lib/nfs/lockd 
 8 S     root   215     1  0  40 20    ?    449        ?   Apr 30 ?    0:01  
 /usr/openwin/bin/fbconsole -d :0 

The ps command was covered in detail in Chapter 15, “Managing Processes,” so I won’t go into detail on this command again.

The prstat command is similar to the ps command, except (as shown in Chapter 15) it continually updates the display of information on your screen. Use this command to watch processes on your system that might be eating up system resources. The sdtprocess GUI, also described in Chapter 15, provides a friendlier graphical version of this command.

vmstat provides a convenient summary of system activity as well. When you run vmstat for the first time, the displayed result represents a summary of information since boot time. To obtain useful real-time statistics, run vmstat with a time step as follows:

vmstat 30 

This tells vmstat to run every 30 seconds and to display the results on the screen as follows until you type Ctrl+C to interrupt the command:

kthr      memory            page            disk          faults      cpu 
 r b w   swap  free  re  mf pi po fr de sr dd f0 s0 --   in   sy   cs us sy id 
 0 0 0 596704 31592   0   1  0  0  0  0  0  0  0  0  0  403   96   61  2  0 98 
 0 0 0 595040 24624   2  12  0  0  0  0  0  1  0  0  0  404  104   62  0  0 99 
 0 0 0 595040 24624   2  11  0  0  0  0  0  1  0  0  0  413  147   79  0  1 99 

Note

Disregard the first line of output. This is a summary of information since the system was booted.


The vmstat command outputs columns of information with a header across the top. Each field of output is described in Table 19.1.

Table 19.1. vmstat Fields
Field Description
kthr/r Run queue length.
kthr/b Kernel threads blocked while waiting for I/O.
kthr/w Idle processes that have been swapped.
memory/swap Free, unreserved swap space (KB).
memory/free Free memory (KB).
page/re Pages reclaimed from the free list.
page/mf Minor faults (page in memory but not mapped). If the page is still in memory, a minor fault remaps the page.
page/pi Paged in from swap (KB/s). (When a page is brought back from the swap device, the process will stop execution and wait. This might affect performance.)
page/po Paged out to swap (KB/s). The page has been written and freed.
page/fr Freed or destroyed (KB/s). This column reports the activity of the page scanner.
page/de Anticipated short-term memory shortfall (KB).
page/sr Scan rate (pages). This number is not reported as a “rate” but as a total number of pages scanned.
disk/s# Disk activity for disk # (disk operations per second).
faults/in Interrupts per second.
faults/sy System calls per second.
faults/cs Context switches per second.
cpu/us User CPU time (%).
cpu/sy System (kernel) CPU time (%).
cpu/id Idle + I/O wait CPU time (%).

Note

The free column in vmstat now really does mean memory that is free and not used by the page cache. In the past, it gave unreliable results.


The column labeled r under the kthr section is the run queue of processes waiting to get on the CPU(s). The id column is CPU idle time. If a 0 (zero) appears in this column, the system lacks the CPU resources to keep up with the process demand. Here’s an example of a system that lacks CPU resources:

kthr     memory            page               disk            faults       cpu 
r  b w   swap   free   re  mf  pi po fr  de  sr m0 m1 m2 m3   in   sy   cs  us sy id 
45 0 0 2887216 182104 3  707 449 6  455  0   80 2  6  1  0   1531 5797  983 61 30  9 
58 0 0 2831312 46408  5  983 582 56 3211 0  492 0  0  0  0   1413 4797 1027 69 31  0 
55 0 0 2830944 56064  2  649 656 3  806  0  121 0  0  0  0   1441 4627  989 69 31  0 

See that the CPU idle time is zero, and the CPU is spending the majority of CPU time in user space (see us column). Two approaches can be taken here: Add extra CPUs or look over the application code to determine if the application can be optimized.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset