Sometimes minor problems can be tracked down by watching the behavior of an application in user space. Watching programs can also help in building confidence that a driver is working correctly. For example, we were able to feel confident about scull after looking at how its read implementation reacted to read requests for different amounts of data.
There are various ways to watch a user-space program working. You can run a debugger on it to step through its functions, add print statements, or run the program under strace. Here we’ll discuss just the last technique, which is most interesting when the real goal is examining kernel code.
The strace command is a powerful tool that
shows all the system calls issued by a user-space program. Not only
does it show the calls, but it can also show the arguments to the
calls, as well as return values in symbolic form. When a system call
fails, both the symbolic value of the error (e.g.,
ENOMEM
) and the corresponding string (Out of memory
) are displayed. strace
has many command-line options; the most useful of which are
-t to display the time
when each call is executed,
-T to display the time spent
in the call, -e to limit the
types of calls traced, and -o to redirect
the output to a file. By default, strace
prints tracing information on stderr
.
strace receives information from the kernel itself. This means that a program can be traced regardless of whether or not it was compiled with debugging support (the -g option to gcc) and whether or not it is stripped. You can also attach tracing to a running process, similar to the way a debugger can connect to a running process and control it.
The trace information is often used to support bug reports sent to application developers, but it’s also invaluable to kernel programmers. We’ve seen how driver code executes by reacting to system calls; strace allows us to check the consistency of input and output data of each call.
For example,the following screen dump shows the last lines of running the command strace ls /dev > /dev/scull0:
[...] open("/dev", O_RDONLY|O_NONBLOCK) = 4 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 brk(0x8055000) = 0x8055000 lseek(4, 0, SEEK_CUR) = 0 getdents(4, /* 70 entries */, 3933) = 1260 [...] getdents(4, /* 0 entries */, 3933) = 0 close(4) = 0 fstat(1, {st_mode=S_IFCHR|0664, st_rdev=makedev(253, 0), ...}) = 0 ioctl(1, TCGETS, 0xbffffa5c) = -1 ENOTTY (Inappropriate ioctl for device) write(1, "MAKEDEV atibm audio audio1 a"..., 4096) = 4000 write(1, "d2 sdd3 sdd4 sdd5 sdd6 sdd7"..., 96) = 96 write(1, "4 sde5 sde6 sde7 sde8 sde9 "..., 3325) = 3325 close(1) = 0 _exit(0) = ?
It’s apparent in the first write call that after ls finished looking in the target directory, it tried to write 4 KB. Strangely (for ls), only four thousand bytes were written, and the operation was retried. However, we know that the write implementation in scull writes a single quantum at a time, so we could have expected the partial write. After a few steps, everything sweeps through, and the program exits successfully.
As another example, let’s read the scull device (using the wc command):
[...] open("/dev/scull0", O_RDONLY) = 4 fstat(4, {st_mode=S_IFCHR|0664, st_rdev=makedev(253, 0), ...}) = 0 read(4, "MAKEDEV atibm audio audio1 a"..., 16384) = 4000 read(4, "d2 sdd3 sdd4 sdd5 sdd6 sdd7"..., 16384) = 3421 read(4, "", 16384) = 0 fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(3, 7), ...}) = 0 ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0 write(1, " 7421 /dev/scull0 ", 20) = 20 close(4) = 0 _exit(0) = ?
As expected, read is able to retrieve only four
thousand bytes at a time, but the total amount of data is the same
that was written in the previous example. It’s interesting to note how
retries are organized in this example, as opposed to the previous
trace. wc is optimized for fast reading
and thus bypasses the standard library, trying to read more data with
a single system call. You can see from the read
lines in the trace how wc tried to read 16
KB at a time.
Linux experts can find much useful information in the output of strace. If you’re put off by all the symbols, you can limit yourself to watching how the file methods (open, read, and so on) work.
Personally, we find strace most useful for pinpointing runtime errors from system calls. Often the perror call in the application or demo program isn’t verbose enough to be useful for debugging, and being able to tell exactly which arguments to which system call triggered the error can be a great help.