Chapter 4. UNIX I/O

UNIX uses a uniform device interface, through file descriptors, that allows the same I/O calls to be used for terminals, disks, tapes, audio and even network communication. This chapter explores the five functions that form the basis for UNIX device-independent I/O. The chapter also examines I/O from multiple sources, blocking I/O with timeouts, inheritance of file descriptors and redirection. The code carefully handles errors and interruption by signals.

Device Terminology

A peripheral device is piece of hardware accessed by a computer system. Common peripheral devices include disks, tapes, CD-ROMs, screens, keyboards, printers, mouse devices and network interfaces. User programs perform control and I/O to these devices through system calls to operating system modules called device drivers. A device driver hides the details of device operation and protects the device from unauthorized use. Devices of the same type may vary substantially in their operation, so to be usable, even a single-user machine needs device drivers. Some operating systems provide pseudodevice drivers to simulate devices such as terminals. Pseudoterminals, for example, simplify the handling of remote login to computer systems over a network or a modem line.

Some operating systems provide specific system calls for each type of supported device, requiring the systems programmer to learn a complex set of calls for device control. UNIX has greatly simplified the programmer device interface by providing uniform access to most devices through five functions—open, close, read, write and ioctl. All devices are represented by files, called special files, that are located in the /dev directory. Thus, disk files and other devices are named and accessed in the same way. A regular file is just an ordinary data file on disk. A block special file represents a device with characteristics similar to a disk. The device driver transfers information from a block special device in blocks or chunks, and usually such devices support the capability of retrieving a block from anywhere on the device. A character special file represents a device with characteristics similar to a terminal. The device appears to represent a stream of bytes that must be accessed in sequential order.

Reading and Writing

UNIX provides sequential access to files and other devices through the read and write functions. The read function attempts to retrieve nbyte bytes from the file or device represented by fildes into the user variable buf. You must actually provide a buffer that is large enough to hold nbyte bytes of data. (A common mistake is to provide an uninitialized pointer, buf, rather than an actual buffer.)

SYNOPSIS

   #include <unistd.h>

   ssize_t read(int fildes, void *buf, size_t nbyte);
                                                              POSIX

If successful, read returns the number of bytes actually read. If unsuccessful, read returns –1 and sets errno. The following table lists the mandatory errors for read.

errno

cause

ECONNRESET

read attempted on a socket and connection was forcibly closed by its peer

EAGAIN

O_NONBLOCK is set for file descriptor and thread would be delayed

EBADF

fildes is not a valid file descriptor open for reading

EINTR

read was terminated due to receipt of a signal and no data was transferred

EIO

process is a member of a background process group attempting to read from its controlling terminal and either process is ignoring or blocking SIGTTIN or process group is orphaned

ENOTCONN

read attempted on socket that is not connected

EOVERFLOW

the file is a regular file, nbyte is greater than 0, and the starting position exceeds offset maximum

ETIMEDOUT

read attempted on socket and transmission timeout occurred

EWOULDBLOCK

file descriptor is for socket marked O_NONBLOCK and no data is waiting to be received (EAGAIN is alternative)

A read operation for a regular file may return fewer bytes than requested if, for example, it reached end-of-file before completely satisfying the request. A read operation for a regular file returns 0 to indicate end-of-file. When special files corresponding to devices are read, the meaning of a read return value of 0 depends on the implementation and the particular device. A read operation for a pipe returns as soon as the pipe is not empty, so the number of bytes read can be less than the number of bytes requested. (Pipes are a type of communication buffer discussed in Chapter 6.) When reading from a terminal, read returns 0 when the user enters an end-of-file character. On many systems the default end-of-file character is Ctrl-D.

The ssize_t data type is a signed integer data type used for the number of bytes read, or –1 if an error occurs. On some systems, this type may be larger than an int. The size_t is an unsigned integer data type for the number of bytes to read.

Example 4.1. 

The following code segment reads at most 100 bytes into buf from standard input.

char buf[100];
ssize_t bytesread;

bytesread = read(STDIN_FILENO, buf, 100);

This code does no error checking.

The file descriptor, which represents a file or device that is open, can be thought of as an index into the process file descriptor table. The file descriptor table is in the process user area and provides access to the system information for the associated file or device.

When you execute a program from the shell, the program starts with three open streams associated with file descriptors STDIN_FILENO, STDOUT_FILENO and STDERR_FILENO. STDIN_FILENO and STDOUT_FILENO are standard input and standard output, respectively. By default, these two streams usually correspond to keyboard input and screen output. Programs should use STDERR_FILENO, the standard error device, for error messages and should never close it. In legacy code standard input, standard output and standard error are represented by 0, 1 and 2, respectively. However, you should always use their symbolic names rather than these numeric values. Section 4.6 explains how file descriptors work.

Example 4.2. 

What happens when the following code executes?

char *buf;
ssize_t bytesread;

bytesread = read(STDIN_FILENO, buf, 100);

Answer:

The code segment, which may compile without error, does not allocate space for buf. The result of read is unpredictable, but most probably it will generate a memory access violation. If buf is an automatic variable stored on the stack, it is not initialized to any particular value. Whatever that memory happens to hold is treated as the address of the buffer for reading.

The readline function of Program 4.1 reads bytes, one at a time, into a buffer of fixed size until a newline character (' ') or an error occurs. The function handles end-of-file, limited buffer size and interruption by a signal. The readline function returns the number of bytes read or –1 if an error occurs. A return value of 0 indicates an end-of-file before any characters were read. A return value greater than 0 indicates the number of bytes read. In this case, the buffer contains a string ending in a newline character. A return value of –1 indicates that errno has been set and one of the following errors occurred.

  • An error occurred on read.

  • At least one byte was read and an end-of-file occurred before a newline was read.

  • nbytes-1 bytes were read and no newline was found.

Upon successful return of a value greater than 0, the buffer contains a string ending in a newline character. If readline reads from a file that does not end with a newline character, it treats the last line read as an error. The readline function is available in the restart library, of Appendix B.

Example 4.1. readline.c

The readline function returns the next line from a file.

#include <errno.h>
#include <unistd.h>

int readline(int fd, char *buf, int nbytes) {
   int numread = 0;
   int returnval;

   while (numread < nbytes - 1) {
      returnval = read(fd, buf + numread, 1);
      if ((returnval == -1) && (errno == EINTR))
         continue;
      if ( (returnval == 0) && (numread == 0) )
         return 0;
      if (returnval == 0)
         break;
      if (returnval == -1)
         return -1;
      numread++;
      if (buf[numread-1] == '
') {
         buf[numread] = '';
         return numread;
      }
   }
   errno = EINVAL;
   return -1;
}

Example 4.3. 

The following code segment calls the readline function of Program 4.1 to read a line of at most 99 bytes from standard input.

int bytesread;
char mybuf[100];

bytesread = readline(STDIN_FILENO, mybuf, sizeof(mybuf));

Example 4.4. 

Under what circumstances does the readline function of Program 4.1 return a buffer with no newline character?

Answer:

This can only happen if the return value is 0 or –1. The return value of 0 indicates that nothing was read. The return of –1 indicates some type of error. In either case, the buffer may not contain a string.

The write function attempts to output nbyte bytes from the user buffer buf to the file represented by file descriptor fildes.

SYNOPSIS

  #include <unistd.h>

  ssize_t write(int fildes, const void *buf, size_t nbyte);
                                                                    POSIX

If successful, write returns the number of bytes actually written. If unsuccessful, write returns –1 and sets errno. The following table lists the mandatory errors for write.

errno

cause

ECONNRESET

write attempted on a socket that is not connected

EAGAIN

O_NONBLOCK is set for file descriptor and thread would be delayed

EBADF

fildes is not a valid file descriptor open for writing

EFBIG

attempt to write a file that exceeds implementation-defined maximum; file is a regular file, nbyte is greater than 0, and starting position exceeds offset maximum

EINTR

write was terminated due to receipt of a signal and no data was transferred

EIO

process is a member of a background process group attempting to write to controlling terminal, TOSTOP is set, process is neither blocking nor ignoring SIGTTOU and process group is orphaned

ENOSPC

no free space remaining on device containing the file

EPIPE

attempt to write to a pipe or FIFO not open for reading or that has only one end open (thread may also get SIGPIPE), or write attempted on socket shut down for writing or not connected (if not connected, also generates SIGPIPE signal)

EWOULDBLOCK

file descriptor is for socket marked O_NONBLOCK and write would block (EAGAIN is alternative)

Example 4.5. 

What can go wrong with the following code segment?

#define BLKSIZE 1024
char buf[BLKSIZE];

read(STDIN_FILENO, buf, BLKSIZE);
write(STDOUT_FILENO, buf, BLKSIZE);

Answer:

The write function assumes that the read has filled buf with BLKSIZE bytes. However, read may fail or may not read the full BLKSIZE bytes. In these two cases, write outputs garbage.

Example 4.6. 

What can go wrong with the following code segment to read from standard input and write to standard output?

#define BLKSIZE 1024
char buf[BLKSIZE];
ssize_t bytesread;

bytesread = read(STDIN_FILENO, buf, BLKSIZE);
if (bytesread > 0)
   write(STDOUT_FILE, buf, bytesread);

Answer:

Although write uses bytesread rather than BLKSIZE, there is no guarantee that write actually outputs all of the bytes requested. Furthermore, either read or write can be interrupted by a signal. In this case, the interrupted call returns a –1 with errno set to EINTR.

Program 4.2 copies bytes from the file represented by fromfd to the file represented by tofd. The function restarts read and write if either is interrupted by a signal. Notice that the write statement specifies the buffer by a pointer, bp, rather than by a fixed address such as buf. If the previous write operation did not output all of buf, the next write operation must start from the end of the previous output. The copyfile function returns the number of bytes read and does not indicate whether or not an error occurred.

Example 4.7. simplecopy.c

The following program calls copyfile to copy a file from standard input to standard output.

#include <stdio.h>
#include <unistd.h>

int copyfile(int fromfd, int tofd);

int main (void) {
   int numbytes;

   numbytes = copyfile(STDIN_FILENO, STDOUT_FILENO);
   fprintf(stderr, "Number of bytes copied: %d
", numbytes);
   return 0;
}

Example 4.8. 

What happens when you run the program of Example 4.7?

Answer:

Standard input is usually set to read one line at a time, so I/O is likely be entered and echoed on line boundaries. The I/O continues until you enter the end-of-file character (often Ctrl-D by default) at the start of a line or you interrupt the program by entering the interrupt character (often Ctrl-C by default). Use the stty -a command to find the current settings for these characters.

Example 4.2. copyfile1.c

The copyfile.c function copies a file from fromfd to tofd.

#include <errno.h>
#include <unistd.h>
#define BLKSIZE 1024

int copyfile(int fromfd, int tofd) {
   char *bp;
   char buf[BLKSIZE];
   int bytesread, byteswritten;
   int totalbytes = 0;

   for (  ;  ;  ) {
      while (((bytesread = read(fromfd, buf, BLKSIZE)) == -1) &&
             (errno == EINTR)) ;         /* handle interruption by signal */
      if (bytesread <= 0)          /* real error or end-of-file on fromfd */
         break;
      bp = buf;
      while (bytesread > 0) {
         while(((byteswritten = write(tofd, bp, bytesread)) == -1 ) &&
              (errno == EINTR)) ;        /* handle interruption by signal */
         if (byteswritten <= 0)                     /* real error on tofd */
            break;
         totalbytes += byteswritten;
         bytesread -= byteswritten;
         bp += byteswritten;
      }
      if (byteswritten == -1)                       /* real error on tofd */
          break;
   }
   return totalbytes;
}

Example 4.9. 

How would you use the program of Example 4.7 to copy the file myin.dat to myout.dat?

Answer:

Use redirection. If the executable of Example 4.7 is called simplecopy, the line would be as follows.

simplecopy < myin.dat > myout.dat

The problems of restarting read and write after signals and of writing the entire amount requested occur in nearly every program using read and write. Program 4.3 shows a separate r_read function that you can use instead of read when you want to restart after a signal. Similarly, Program 4.4 shows a separate r_write function that restarts after a signal and writes the full amount requested. For convenience, a number of functions, including r_read, r_write, copyfile and readline, have been collected in a library called restart.c. The prototypes for these functions are contained in restart.h, and we include this header file when necessary. Appendix B presents the complete restart library implementation.

Example 4.3. r_read.c

The r_read.c function is similar to read except that it restarts itself if interrupted by a signal.

#include <errno.h>
#include <unistd.h>

ssize_t r_read(int fd, void *buf, size_t size) {
   ssize_t retval;

   while (retval = read(fd, buf, size), retval == -1 && errno == EINTR) ;
   return retval;
}

Example 4.4. r_write.c

The r_write.c function is similar to write except that it restarts itself if interrupted by a signal and writes the full amount requested.

#include <errno.h>
#include <unistd.h>

ssize_t r_write(int fd, void *buf, size_t size) {
   char *bufp;
   size_t bytestowrite;
   ssize_t byteswritten;
   size_t totalbytes;

   for (bufp = buf, bytestowrite = size, totalbytes = 0;
        bytestowrite > 0;
        bufp += byteswritten, bytestowrite -= byteswritten) {
      byteswritten = write(fd, bufp, bytestowrite);
      if ((byteswritten) == -1 && (errno != EINTR))
         return -1;
      if (byteswritten == -1)
         byteswritten = 0;
      totalbytes += byteswritten;
   }
   return totalbytes;
}

The functions r_read and r_write can greatly simplify programs that need to read and write while handling signals.

Program 4.5 shows the readwrite function that reads bytes from one file descriptor and writes all of the bytes read to another one. It uses a buffer of size PIPE_BUF to transfer at most PIPE_BUF bytes. This size is useful for writing to pipes since a write to a pipe of PIPE_BUF bytes or less is atomic. Program 4.6 shows a version of copyfile that uses the readwrite function. Compare this with Program 4.2.

Example 4.5. readwrite.c

A program that reads from one file descriptor and writes all the bytes read to another file descriptor.

#include <limits.h>
#include "restart.h"
#define BLKSIZE PIPE_BUF

int readwrite(int fromfd, int tofd) {
   char buf[BLKSIZE];
   int bytesread;

   if ((bytesread = r_read(fromfd, buf, BLKSIZE)) == -1)
      return -1;
   if (bytesread == 0)
      return 0;
   if (r_write(tofd, buf, bytesread) == -1)
      return -1;
   return bytesread;
}

Example 4.6. copyfile.c

A simplified implementation of copyfile that uses r_read and r_write.

#include <unistd.h>
#include "restart.h"
#define BLKSIZE 1024

int copyfile(int fromfd, int tofd) {
   char buf[BLKSIZE];
   int bytesread, byteswritten;
   int totalbytes = 0;

   for (  ;  ;  ) {
      if ((bytesread = r_read(fromfd, buf, BLKSIZE)) <= 0)
         break;
      if ((byteswritten = r_write(tofd, buf, bytesread)) == -1)
         break;
      totalbytes += byteswritten;
   }
   return totalbytes;
}

The r_write function writes all the bytes requested and restarts the write if fewer bytes are written. The r_read only restarts if interrupted by a signal and often reads fewer bytes than requested. The readblock function is a version of read that continues reading until the requested number of bytes is read or an error occurs. Program 4.7 shows an implementation of readblock. The readblock function is part of the restart library. It is especially useful for reading structures.

Example 4.7. readblock.c

A function that reads a specific number of bytes.

#include <errno.h>
#include <unistd.h>

ssize_t readblock(int fd, void *buf, size_t size) {
   char *bufp;
   size_t bytestoread;
   ssize_t bytesread;
   size_t totalbytes;

   for (bufp = buf, bytestoread = size, totalbytes = 0;
        bytestoread > 0;
        bufp += bytesread, bytestoread -= bytesread) {
      bytesread = read(fd, bufp, bytestoread);
      if ((bytesread == 0) && (totalbytes == 0))
         return 0;
      if (bytesread == 0) {
         errno = EINVAL;
         return -1;
      }
      if ((bytesread) == -1 && (errno != EINTR))
         return -1;
      if (bytesread == -1)
         bytesread = 0;
      totalbytes += bytesread;
   }
   return totalbytes;
}

There are only three possibilities for the return value of readblock. The readblock function returns 0 if an end-of-file occurs before any bytes are read. This happens if the first call to read returns 0. If readblock is successful, it returns size, signifying that the requested number of bytes was successfully read. Otherwise, readblock returns –1 and sets errno. If readblock reaches the end-of-file after some, but not all, of the needed bytes have been read, readblock returns –1 and sets errno to EINVAL.

Example 4.10. 

The following code segment can be used to read a pair of integers from an open file descriptor.

struct {
   int x;
   int y;
} point;
if (readblock(fd, &point, sizeof(point)) <= 0)
   fprintf(stderr, "Cannot read a point.
");

Program 4.8 combines readblock with r_write to read a fixed number of bytes from one open file descriptor and write them to another open file descriptor.

Example 4.8. readwriteblock.c

A program that copies a fixed number of bytes from one file descriptor to another.

#include "restart.h"

int readwriteblock(int fromfd, int tofd, char *buf, int size) {
   int bytesread;

   bytesread = readblock(fromfd, buf, size);
   if (bytesread != size)                      /* can only be 0 or -1 */
      return bytesread;
   return r_write(tofd, buf, size);
}

Opening and Closing Files

The open function associates a file descriptor (the handle used in the program) with a file or physical device. The path parameter of open points to the pathname of the file or device, and the oflag parameter specifies status flags and access modes for the opened file. You must include a third parameter to specify access permissions if you are creating a file.

SYNOPSIS

   #include <fcntl.h>
   #include <sys/stat.h>

   int open(const char *path, int oflag, ...);
                                                          POSIX

If successful, open returns a nonnegative integer representing the open file descriptor. If unsuccessful, open returns –1 and sets errno. The following table lists the mandatory errors for open.

errno

cause

EACCES

search permission on component of path prefix denied, or file exists and permissions specified by oflag denied, or file does not exist and write permission on parent directory denied, or O_TRUNC specified and write permission denied

EEXIST

O_CREAT and OEXCL are set and named file already exists

EINTR

signal was caught during open

EISDIR

named file is directory and oflag includes O_WRONLY or O_RDWR

ELOOP

a loop exists in resolution of path

EMFILE

OPEN_MAX file descriptors currently open in calling process

ENAMETOOLONG

the length of path exceeds PATH_MAX, or a pathname component is longer than NAME_MAX

ENFILE

maximum allowable number of files currently open in system

ENOENT

O_CREAT not set and name file does not exist, or O_CREAT is set and either path prefix does not exist or or path is an empty string

ENOSPC

directory or file system for new file cannot be expanded, the file does not exist and O_CREAT is specified

ENOTDIR

a component of the path prefix is not a directory

ENXIO

O_NONBLOCK is set, the named file is a FIFO, O_WRONLY is set, no process has file open for reading; file is a special file and device associated with file does not exist

EOVERFLOW

named file is a regular file and size cannot be represented by an object of type off_t

EROFS

the named file resides on a read-only file system and one of O_WRONLY, O_RDWR, O_CREAT (if the file does not exist), or O_TRUNC is set in oflag

Construct the oflag argument by taking the bitwise OR (|) of the desired combination of the access mode and the additional flags. The POSIX values for the access mode flags are O_RDONLY, O_WRONLY and O_RDWR. You must specify exactly one of these designating read-only, write-only or read-write access, respectively.

The additional flags include O_APPEND, O_CREAT, O_EXCL, O_NOCTTY, O_NONBLOCK and O_TRUNC. The O_APPEND flag causes the file offset to be moved to the end of the file before a write, allowing you to add to an existing file. In contrast, O_TRUNC truncates the length of a regular file opened for writing to 0. The O_CREAT flag causes a file to be created if it doesn’t already exist. If you include the O_CREAT flag, you must also pass a third argument to open to designate the permissions. If you want to avoid writing over an existing file, use the combination O_CREAT | O_EXCL. This combination returns an error if the file already exists. The O_NOCTTY flag prevents an opened device from becoming a controlling terminal. Controlling terminals are discussed in Section 11.5. The O_NONBLOCK flag controls whether the open returns immediately or blocks until the device is ready. Section 4.8 discusses how the O_NONBLOCK flag affects the behavior of read and write. Certain POSIX extensions specify additional flags. You can find the flags in fcntl.h.

Example 4.11. 

The following code segment opens the file /home/ann/my.dat for reading.

int myfd;
myfd = open("/home/ann/my.dat", O_RDONLY);

This code does no error checking.

Example 4.12. 

How can the call to open of Example 4.11 fail?

Answer:

The open function returns –1 if the file doesn’t exist, the open call was interrupted by a signal or the process doesn’t have the appropriate access permissions. If your code uses myfd for a subsequent read or write operation, the operation fails.

Example 4.13. 

The following code segment restarts open after a signal occurs.

int myfd;
while((myfd = open("/home/ann/my.dat", O_RDONLY)) == -1 &&
       errno == EINTR) ;
if (myfd == -1)               /* it was a real error, not a signal */
   perror("Failed to open the file");
else                                                /* continue on */

Example 4.14. 

How would you modify Example 4.13 to open /home/ann/my.dat for nonblocking read?

Answer:

You would OR the O_RDONLY and the O_NONBLOCK flags.

myfd = open("/home/ann/my.dat", O_RDONLY | O_NONBLOCK);

Each file has three classes associated with it: a user (or owner), a group and everybody else (others). The possible permissions or privileges are read(r), write(w) and execute(x). These privileges are specified separately for the user, the group and others. When you open a file with the O_CREAT flag, you must specify the permissions as the third argument to open in a mask of type mode_t.

Historically, the file permissions were laid out in a mask of bits with 1’s in designated bit positions of the mask, signifying that a class had the corresponding privilege. Figure 4.1 shows an example of a typical layout of such a permission mask. Although numerically coded permission masks frequently appear in legacy code, you should avoid using numerical values in your programs.

Historical layout of the permissions mask.

Figure 4.1. Historical layout of the permissions mask.

POSIX defines symbolic names for masks corresponding to the permission bits so that you can specify file permissions independently of the implementation. These names are defined in sys/stat.h. Table 4.1 lists the symbolic names and their meanings. To form the permission mask, bitwise OR the symbols corresponding to the desired permissions.

Table 4.1. POSIX symbolic names for file permissions.

symbol

meaning

S_IRUSR

read by owner

S_IWUSR

write by owner

S_IXUSR

execute by owner

S_IRWXU

read, write, execute by owner

S_IRGRP

read by group

S_IWGRP

write by group

S_IXGRP

execute by group

S_IRWXG

read, write, execute by group

S_IROTH

read by others

S_IWOTH

write by others

S_IXOTH

execute by others

S_IRWXO

read, write, execute by others

S_ISUID

set user ID on execution

S_ISGID

set group ID on execution

Example 4.15. 

The following code segment creates a file, info.dat, in the current directory. If the info.dat file already exists, it is overwritten. The new file can be read or written by the user and only read by everyone else.

int fd;
mode_t fdmode = (S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);

if ((fd = open("info.dat", O_RDWR | O_CREAT, fdmode)) == -1)
   perror("Failed to open info.dat");

Program 4.9 copies a source file to a destination file. Both filenames are passed as command-line arguments. Because the open function for the destination file has O_CREAT | O_EXCL, the file copy fails if that file already exists.

Example 4.9. copyfilemain.c

A program to copy a file.

#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/stat.h>
#include "restart.h"

#define READ_FLAGS O_RDONLY
#define WRITE_FLAGS (O_WRONLY | O_CREAT | O_EXCL)
#define WRITE_PERMS (S_IRUSR | S_IWUSR)

int main(int argc, char *argv[]) {
   int bytes;
   int fromfd, tofd;

   if (argc != 3) {
      fprintf(stderr, "Usage: %s from_file to_file
", argv[0]);
      return 1;
   }

   if ((fromfd = open(argv[1], READ_FLAGS)) == -1) {
      perror("Failed to open input file");
      return 1;
   }

   if ((tofd = open(argv[2], WRITE_FLAGS, WRITE_PERMS)) == -1) {
      perror("Failed to create output file");
      return 1;
   }

   bytes = copyfile(fromfd, tofd);
   printf("%d bytes copied from %s to %s
", bytes, argv[1], argv[2]);
   return 0;                                   /* the return closes the files */
}

Program 4.9 returns immediately after performing the copy and does not explicitly close the file. The return from main causes the necessary cleanup to release the resources associated with open files. In general, however, you should be careful to release open file descriptors by calling close.

The close function has a single parameter, fildes, representing the open file whose resources are to be released.

SYNOPSIS

   #include <unistd.h>

   int close(int fildes);
                                     POSIX

If successful, close returns 0. If unsuccessful, close returns –1 and sets errno. The following table lists the mandatory errors for close.

errno

cause

EBADF

fildes is not a valid file descriptor

EINTR

the close function was interrupted by a signal

Program 4.10 shows an r_close function that restarts itself after interruption by a signal. Its prototype is in the header file restart.h.

Example 4.10. r_close.c

The r_close.c function is similar to close except that it restarts itself if interrupted by a signal.

#include <errno.h>
#include <unistd.h>

int r_close(int fd) {
   int retval;

   while (retval = close(fd), retval == -1 && errno == EINTR) ;
   return retval;
}

The select Function

The handling of I/O from multiple sources is an important problem that arises in many different forms. For example, a program may want to overlap terminal I/O with reading input from a disk or with printing. Another example occurs when a program expects input from two different sources, but it doesn’t know which input will be available first. If the program tries to read from source A, and in fact, input was only available from source B, the program blocks. To solve this problem, we need to block until input from either source becomes available. Blocking until at least one member of a set of conditions becomes true is called OR synchronization. The condition for the case described is “input available” on a descriptor.

One method of monitoring multiple file descriptors is to use a separate process for each one. Program 4.11 takes two command-line arguments, the names of two files to monitor. The parent process opens both files before creating the child process. The parent monitors the first file descriptor, and the child monitors the second. Each process echoes the contents of its file to standard output. If two named pipes are monitored, output appears as input becomes available.

Example 4.11. monitorfork.c

A program that monitors two files by forking a child process.

#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include "restart.h"

int main(int argc, char *argv[]) {
   int bytesread;
   int childpid;
   int fd, fd1, fd2;

   if (argc != 3) {
      fprintf(stderr, "Usage: %s file1 file2
", argv[0]);
      return 1;
   }
   if ((fd1 = open(argv[1], O_RDONLY)) == -1) {
      fprintf(stderr, "Failed to open file %s:%s
", argv[1], strerror(errno));
      return 1;
   }
   if ((fd2 = open(argv[2], O_RDONLY)) == -1) {
      fprintf(stderr, "Failed to open file %s:%s
", argv[2], strerror(errno));
      return 1;
   }
   if ((childpid = fork()) == -1) {
      perror("Failed to create child process");
      return 1;
   }
   if (childpid > 0)                                         /* parent code */
      fd = fd1;
   else
      fd = fd2;
   bytesread = copyfile(fd, STDOUT_FILENO);
   fprintf(stderr, "Bytes read: %d
", bytesread);
   return 0;
}

While using separate processes to monitor two file descriptors can be useful, the two processes have separate address spaces and so it is difficult for them to interact.

Example 4.16. 

How would you modify Program 4.11 so that it prints the total number of bytes read from the two files?

Answer:

Set up some form of interprocess communication before creating the child. For example, the parent process could create a pipe and the child could send its byte count to the pipe when it has finished. After the parent has processed its file, the parent could wait for the child and read the byte count from the pipe.

The select call provides a method of monitoring file descriptors from a single process. It can monitor for three possible conditions—a read can be done without blocking, a write can be done without blocking, or a file descriptor has error conditions pending. Older versions of UNIX defined the select function in sys/time.h, but the POSIX standard now uses sys/select.h.

The nfds parameter of select gives the range of file descriptors to be monitored. The value of nfds must be at least one greater than the largest file descriptor to be checked. The readfds parameter specifies the set of descriptors to be monitored for reading. Similarly, writefds specifies the set of descriptors to be monitored for writing, and errorfds specifies the file descriptors to be monitored for error conditions. The descriptor sets are of type fd_set. Any of these parameters may be NULL, in which case select does not monitor the descriptor for the corresponding event. The last parameter is a timeout value that forces a return from select after a certain period of time has elapsed, even if no descriptors are ready. When timeout is NULL, select may block indefinitely.

SYNOPSIS

   #include <sys/select.h>

   int select(int nfds, fd_set *restrict readfds,
              fd_set *restrict writefds, fd_set *restrict errorfds,
              struct timeval *restrict timeout);

   void FD_CLR(int fd, fd_set *fdset);
   int FD_ISSET(int fd, fd_set *fdset);
   void FD_SET(int fd, fd_set *fdset);
   void FD_ZERO(fd_set *fdset);
                                                                              POSIX

On successful return, select clears all the descriptors in each of readfds, writefds and errorfds except those descriptors that are ready. If successful, the select function returns the number of file descriptors that are ready. If unsuccessful, select returns –1 and sets errno. The following table lists the mandatory errors for select.

errno

cause

EBADF

one or more file descriptor sets specified an invalid file descriptor

EINTR

the select was interrupted by a signal before timeout or selected event occurred

EINVAL

an invalid timeout interval was specified, or nfds is less than 0 or greater than FD_SETSIZE

Historically, systems implemented the descriptor set as an integer bit mask, but that implementation does not work for more than 32 file descriptors on most systems. The descriptor sets are now usually represented by bit fields in arrays of integers. Use the macros FD_SET, FD_CLR, FD_ISSET and FD_ZERO to manipulate the descriptor sets in an implementation-independent way as demonstrated in Program 4.12.

The FD_SET macro sets the bit in *fdset corresponding to the fd file descriptor, and the FD_CLR macro clears the corresponding bit. The FD_ZERO macro clears all the bits in *fdset. Use these three macros to set up descriptor masks before calling select. Use the FD_ISSET macro after select returns, to test whether the bit corresponding to the file descriptor fd is set in the mask.

Example 4.12. whichisready.c

A function that blocks until one of two file descriptors is ready.

#include <errno.h>
#include <string.h>
#include <sys/select.h>

int whichisready(int fd1, int fd2) {
   int maxfd;
   int nfds;
   fd_set readset;

   if ((fd1 < 0) || (fd1 >= FD_SETSIZE) ||
       (fd2 < 0) || (fd2 >= FD_SETSIZE)) {
      errno = EINVAL;
      return -1;
   }
   maxfd = (fd1 > fd2) ? fd1 : fd2;
   FD_ZERO(&readset);
   FD_SET(fd1, &readset);
   FD_SET(fd2, &readset);
   nfds = select(maxfd+1, &readset, NULL, NULL, NULL);
   if (nfds == -1)
      return -1;
   if (FD_ISSET(fd1, &readset))
      return fd1;
   if (FD_ISSET(fd2, &readset))
      return fd2;
   errno = EINVAL;
   return -1;
}

The function whichisready blocks until at least one of the two file descriptors passed as parameters is ready for reading and returns that file descriptor. If both are ready, it returns the first file descriptor. If unsuccessful, whichisready returns –1 and sets errno.

Example 4.13. copy2files.c

A function that uses select to do two concurrent file copies.

#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <sys/time.h>
#include "restart.h"

int copy2files(int fromfd1, int tofd1, int fromfd2, int tofd2) {
   int bytesread;
   int maxfd;
   int num;
   fd_set readset;
   int totalbytes = 0;

   if ((fromfd1 < 0) || (fromfd1 >= FD_SETSIZE) ||
       (tofd1 < 0) || (tofd1 >= FD_SETSIZE) ||
       (fromfd2 < 0) || (fromfd2 >= FD_SETSIZE) ||
       (tofd2 < 0) || (tofd2 >= FD_SETSIZE))
      return 0;
   maxfd = fromfd1;                     /* find the biggest fd for select */
   if (fromfd2 > maxfd)
      maxfd = fromfd2;

   for ( ; ; ) {
      FD_ZERO(&readset);
      FD_SET(fromfd1, &readset);
      FD_SET(fromfd2, &readset);
      if (((num = select(maxfd+1, &readset, NULL, NULL, NULL)) == -1) &&
         (errno == EINTR))
         continue;
      if (num == -1)
         return totalbytes;
      if (FD_ISSET(fromfd1, &readset)) {
         bytesread = readwrite(fromfd1, tofd1);
         if (bytesread <= 0)
            break;
         totalbytes += bytesread;
      }
      if (FD_ISSET(fromfd2, &readset)) {
         bytesread = readwrite(fromfd2, tofd2);
         if (bytesread <= 0)
            break;
         totalbytes += bytesread;
      }
   }
   return totalbytes;
}

The whichisready function of Program 4.12 is problematic because it always chooses fd1 if both fd1 and fd2 are ready. The copy2files function copies bytes from fromfd1 to tofd1 and from fromfd2 to tofd2 without making any assumptions about the order in which the bytes become available in the two directions. The function returns if either copy encounters an error or end-of-file.

The copy2files function of Program 4.13 can be generalized to monitor multiple file descriptors for input. Such a problem might be encountered by a command processor that was monitoring requests from different terminals. The program cannot predict which source will produce the next input, so it must use a method such as select. In addition, the set of monitored descriptors is dynamic—the program must remove a source from the monitoring set if an error condition arises on that source’s descriptor.

The monitorselect function in Program 4.14 monitors an array of open file descriptors fd. When input is available on file descriptor fd[i], the program reads information from fd[i] and calls docommand. The monitorselect function has two parameters: an array of open file descriptors and the number of file descriptors in the array. The function restarts the select or read if either is interrupted by a signal. When read encounters other types of errors or an end-of-file, monitorselect closes the corresponding descriptor and removes it from the monitoring set. The monitorselect function returns when all descriptors have indicated an error or end-of-file.

The waitfdtimed function in Program 4.15 takes two parameters: a file descriptor and an ending time. It uses gettimeout to calculate the timeout interval from the end time and the current time obtained by a call to gettimeofday. (See Section 9.1.3.) If select returns prematurely because of a signal, waitfdtimed recalculates the timeout and calls select again. The standard does not say anything about the value of the timeout parameter or the fd_set parameters of select when it is interrupted by a signal, so we reset them inside the while loop.

You can use the select timeout feature to implement a timed read operation, as shown in Program 4.16. The readtimed function behaves like read except that it takes an additional parameter, seconds, specifying a timeout in seconds. The readtimed function returns –1 with errno set to ETIME if no input is available in the next seconds interval. If interrupted by a signal, readtimed restarts with the remaining time. Most of the complication comes from the need to restart select with the remaining time when select is interrupted by a signal. The select function does not provide a direct way of determining the time remaining in this case. The readtimed function in Program 4.16 sets the end time for the timeout by calling add2currenttime in Program 4.15. It uses this value when calling waitfdtimed from Program 4.15 to wait until the file descriptor can be read or the time given has occurred.

Example 4.14. monitorselect.c

A function to monitor file descriptors using select.

#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <sys/select.h>
#include <sys/types.h>
#include "restart.h"
#define BUFSIZE 1024
void docommand(char *, int);

void monitorselect(int fd[], int numfds) {
   char buf[BUFSIZE];
   int bytesread;
   int i;
   int maxfd;
   int numnow, numready;
   fd_set readset;

   maxfd = 0;                  /* set up the range of descriptors to monitor */
   for (i = 0; i < numfds; i++) {
       if ((fd[i] < 0) || (fd[i] >= FD_SETSIZE))
          return;
       if (fd[i] >= maxfd)
          maxfd = fd[i] + 1;
   }
   numnow = numfds;
   while (numnow > 0) {            /* continue monitoring until all are done */
      FD_ZERO(&readset);                  /* set up the file descriptor mask */
      for (i = 0; i < numfds; i++)
         if (fd[i] >= 0)
            FD_SET(fd[i], &readset);
      numready = select(maxfd, &readset, NULL, NULL, NULL);  /* which ready? */
      if ((numready == -1) && (errno == EINTR))     /* interrupted by signal */
         continue;
      else if (numready == -1)                          /* real select error */
         break;
      for (i = 0; (i < numfds) && (numready > 0); i++) { /* read and process */
         if (fd[i] == -1)                         /* this descriptor is done */
            continue;
         if (FD_ISSET(fd[i], &readset)) {        /* this descriptor is ready */
            bytesread = r_read(fd[i], buf, BUFSIZE);
            numready--;
            if (bytesread > 0)
               docommand(buf, bytesread);
            else  {           /* error occurred on this descriptor, close it */
               r_close(fd[i]);
               fd[i] = -1;
               numnow--;
            }
         }
      }
   }
   for (i = 0; i < numfds; i++)
       if (fd[i] >= 0)
           r_close(fd[i]);
}

Example 4.15. waitfdtimed.c

A function that waits for a given time for input to be available from an open file descriptor.

#include <errno.h>
#include <string.h>
#include <sys/select.h>
#include <sys/time.h>
#include "restart.h"
#define MILLION 1000000L
#define D_MILLION 1000000.0

static int gettimeout(struct timeval end,
                               struct timeval *timeoutp) {
   gettimeofday(timeoutp, NULL);
   timeoutp->tv_sec = end.tv_sec - timeoutp->tv_sec;
   timeoutp->tv_usec = end.tv_usec - timeoutp->tv_usec;
   if (timeoutp->tv_usec >= MILLION) {
      timeoutp->tv_sec++;
      timeoutp->tv_usec -= MILLION;
   }
   if (timeoutp->tv_usec < 0) {
      timeoutp->tv_sec--;
      timeoutp->tv_usec += MILLION;
   }
   if ((timeoutp->tv_sec < 0) ||
       ((timeoutp->tv_sec == 0) && (timeoutp->tv_usec == 0))) {
      errno = ETIME;
      return -1;
   }
   return 0;
}

struct timeval add2currenttime(double seconds) {
   struct timeval newtime;

   gettimeofday(&newtime, NULL);
   newtime.tv_sec += (int)seconds;
   newtime.tv_usec += (int)((seconds - (int)seconds)*D_MILLION + 0.5);
   if (newtime.tv_usec >= MILLION) {
      newtime.tv_sec++;
      newtime.tv_usec -= MILLION;
   }
   return newtime;
}

int waitfdtimed(int fd, struct timeval end) {
   fd_set readset;
   int retval;
   struct timeval timeout;

   if ((fd < 0) || (fd >= FD_SETSIZE)) {
      errno = EINVAL;
      return -1;
   }
   FD_ZERO(&readset);
   FD_SET(fd, &readset);
   if (gettimeout(end, &timeout) == -1)
      return -1;
   while (((retval = select(fd + 1, &readset, NULL, NULL, &timeout)) == -1)
           && (errno == EINTR)) {
      if (gettimeout(end, &timeout) == -1)
         return -1;
      FD_ZERO(&readset);
      FD_SET(fd, &readset);
   }
   if (retval == 0) {
      errno = ETIME;
      return -1;
   }
   if (retval == -1)
      return -1;
   return 0;
}

Example 4.16. readtimed.c

A function do a timed read from an open file descriptor.

#include <sys/time.h>
#include "restart.h"

ssize_t readtimed(int fd, void *buf, size_t nbyte, double seconds) {
   struct timeval timedone;

   timedone = add2currenttime(seconds);
   if (waitfdtimed(fd, timedone) == -1)
      return (ssize_t)(-1);
   return r_read(fd, buf, nbyte);
}

Example 4.17. 

Why is it necessary to test whether newtime.tv_usec is greater than or equal to a million when it is set from the fractional part of seconds? What are the consequences of having that value equal to one million?

Answer:

Since the value is rounded to the nearest microsecond, a fraction such as 0.999999999 might round to one million when multiplied by MILLION. The action of functions that use struct timeval values are not specified when the tv_usec field is not strictly less than one million.

Example 4.18. 

One way to simplify Program 4.15 is to just restart the select with the same timeout whenever it is interrupted by a signal. What is wrong with this?

Answer:

If your program receives signals regularly and the time between signals is smaller than the timeout interval, waitfdtimed never times out.

The 2000 version of POSIX introduced a new version of select called pselect. The pselect function is identical to the select function, but it uses a more precise timeout structure, struct timespec, and allows for the blocking or unblocking of signals while it is waiting for I/O to be available. The struct timespec structure is discussed in Section 9.1.4. However, at the time of writing, (March 2003), none of the our test operating systems supported pselect.

The poll Function

The poll function is similar to select, but it organizes the information by file descriptor rather than by type of condition. That is, the possible events for one file descriptor are stored in a struct pollfd. In contrast, select organizes information by the type of event and has separate descriptor masks for read, write and error conditions. The poll function is part of the POSIX:XSI Extension and has its origins in UNIX System V.

The poll function takes three parameters: fds, nfds and timeout. The fds is an array of struct pollfd, representing the monitoring information for the file descriptors. The nfds parameter gives the number of descriptors to be monitored. The timeout value is the time in milliseconds that the poll should wait without receiving an event before returning. If the timeout value is –1, poll never times out. If integers are 32 bits, the maximum timeout period is about 30 minutes.

SYNOPSIS

  #include <poll.h>

  int poll(struct pollfd fds[], nfds_t nfds, int timeout);
                                                                      POSIX:XSI

The poll function returns 0 if it times out. If successful, poll returns the number of descriptors that have events. If unsuccessful, poll returns –1 and sets errno. The following table lists the mandatory errors for poll.

errno

cause

EAGAIN

allocation of internal data structures failed, but a subsequent request may succeed

EINTR

a signal was caught during poll

EINVAL

nfds is greater than OPEN_MAX

The struct pollfd structure includes the following members.

int fd;         /* file descriptor */
short events;   /* requested events */
short revents;  /* returned events */

The fd is the file descriptor number, and the events and revents are constructed by taking the logical OR of flags representing the various events listed in Table 4.2. Set events to contain the events to monitor; poll fills in the revents with the events that have occurred. The poll function sets the POLLHUP, POLLERR and POLLNVAL flags in revents to reflect the existence of the associated conditions. You do not need to set the corresponding bits in events for these. If fd is less than zero, the events field is ignored and revents is set to zero. The standard does not specify how end-of-file is to be handled. End-of-file can either be communicated by an revents flag of POLLHUP or a normal read of 0 bytes. It is possible for POLLHUP to be set even if POLLIN or POLLRDNORM indicates that there is still data to read. Therefore, normal reading should be handled before error checking.

Table 4.2. Values of the event flags for the poll function.

event flag

meaning

POLLIN

read other than high priority data without blocking

POLLRDNORM

read normal data without blocking

POLLRDBAND

read priority data without blocking

POLLPRI

read high-priority data without blocking

POLLOUT

write normal data without blocking

POLLWRNORM

same as POLLOUT

POLLERR

error occurred on the descriptor

POLLHUP

device has been disconnected

POLLNVAL

file descriptor invalid

Program 4.17 implements a function to process commands from multiple file descriptors by using the poll function. Compare the implementation with that of Program 4.14. The select call modifies the file descriptor sets that are passed to it, and the program must reset these descriptor sets each time it calls select. The poll function uses separate variables for input and return values, so it is not necessary to reset the list of monitored descriptors after each call to poll. The poll function has a number of advantages. The masks do not need to be reset after each call. Unlike select, the poll function treats errors as events that cause poll to return. The timeout parameter is easier to use, although its range is limited. Finally, poll does not need a max_fd argument.

Example 4.17. monitorpoll.c

A function to monitor an array of file descriptors by using poll.

#include <errno.h>
#include <poll.h>
#include <stdlib.h>
#include <stropts.h>
#include <unistd.h>
#include "restart.h"
#define BUFSIZE 1024

void docommand(char *, int);

void monitorpoll(int fd[], int numfds)  {
   char buf[BUFSIZE];
   int bytesread;
   int i;
   int numnow = 0;
   int numready;
   struct pollfd *pollfd;

   for (i=0; i< numfds; i++)             /* initialize the polling structure */
      if (fd[i] >= 0)
          numnow++;
   if ((pollfd = (void *)calloc(numfds, sizeof(struct pollfd))) == NULL)
      return;
   for (i = 0; i < numfds; i++) {
      (pollfd + i)->fd = *(fd + i);
      (pollfd + i)->events = POLLRDNORM;
   }
   while (numnow > 0) {        /* Continue monitoring until descriptors done */
      numready = poll(pollfd, numfds, -1);
      if ((numready == -1) && (errno == EINTR))
         continue;                /* poll interrupted by a signal, try again */
      else if (numready == -1)            /* real poll error, can't continue */
         break;
      for (i = 0; i < numfds && numready > 0; i++)  {
         if ((pollfd + i)->revents) {
            if ((pollfd + i)->revents & (POLLRDNORM | POLLIN) ) {
               bytesread = r_read(fd[i], buf, BUFSIZE);
               numready--;
               if (bytesread > 0)
                  docommand(buf, bytesread);
               else
                  bytesread = -1;                             /* end of file */
            } else if ((pollfd + i)->revents & (POLLERR | POLLHUP))
               bytesread = -1;
            else                    /* descriptor not involved in this round */
               bytesread = 0;
            if (bytesread == -1) {      /* error occurred, remove descriptor */
               r_close(fd[i]);
               (pollfd + i)->fd = -1;
               numnow--;
            }
         }
      }
   }
   for (i = 0; i < numfds; i++)
       r_close(fd[i]);
   free(pollfd);
}

File Representation

Files are designated within C programs either by file pointers or by file descriptors. The standard I/O library functions for ISO C (fopen, fscanf, fprintf, fread, fwrite, fclose and so on) use file pointers. The UNIX I/O functions (open, read, write, close and ioctl) use file descriptors. File pointers and file descriptors provide logical designations called handles for performing device-independent input and output. The symbolic names for the file pointers that represent standard input, standard output and standard error are stdin, stdout and stderr, respectively. These symbolic names are defined in stdio.h. The symbolic names for the file descriptors that represent standard input, standard output and standard error are STDIN_FILENO, STDOUT_FILENO and STDERR_FILENO, respectively. These symbolic names are defined in unistd.h.

Example 4.19. 

Explain the difference between a library function and a system call.

Answer:

The POSIX standard does not make a distinction between library functions and system calls. Traditionally, a library function is an ordinary function that is placed in a collection of functions called a library, usually because it is useful, widely used or part of a specification, such as C. A system call is a request to the operating system for service. It involves a trap to the operating system and often a context switch. System calls are associated with particular operating systems. Many library functions such as read and write are, in fact, jackets for system calls. That is, they reformat the arguments in the appropriate system-dependent form and then call the underlying system call to perform the actual operation.

Although the implementation details differ, versions of UNIX follow a similar implementation model for handling file descriptors and file pointers within a process. The remainder of this section provides a schematic model of how file descriptors (UNIX I/O) and file pointers (ISO C I/O) work. We use this model to explain redirection (Section 4.7) and inheritance (Section 4.6.3, Section 6.2 and Chapter 7).

File descriptors

The open function associates a file or physical device with the logical handle used in the program. The file or physical device is specified by a character string (e.g., /home/johns/my.dat or /dev/tty). The handle is an integer that can be thought of as an index into a file descriptor table that is specific to a process. It contains an entry for each open file in the process. The file descriptor table is part of the process user area, but the program cannot access it except through functions using the file descriptor.

Example 4.20. 

Figure 4.2 shows a schematic of the file descriptor table after a program executes the following.

myfd = open("/home/ann/my.dat", O_RDONLY);

The open function creates an entry in the file descriptor table that points to an entry in the system file table. The open function returns the value 3, specifying that the file descriptor entry is in position three of the process file descriptor table.

Schematic diagram of the relationship between the file descriptor table, the system file table and the in-memory inode table in a UNIX-like operating system after the code of Example 4.20 executes.

Figure 4.2. Schematic diagram of the relationship between the file descriptor table, the system file table and the in-memory inode table in a UNIX-like operating system after the code of Example 4.20 executes.

The system file table, which is shared by all the processes in the system, has an entry for each active open. Each system file table entry contains the file offset, an indication of the access mode (i.e., read, write or read-write) and a count of the number of file descriptor table entries pointing to it.

Several system file table entries may correspond to the same physical file. Each of these entries points to the same entry in the in-memory inode table. The in-memory inode table contains an entry for each active file in the system. When a program opens a particular physical file that is not currently open, the call creates an entry in this inode table for that file. Figure 4.2 shows that the file /home/ann/my.dat had been opened before the code of Example 4.20 because there are two entries in the system file table with pointers to the entry in the inode table. (The label B designates the earlier pointer in the figure.)

Example 4.21. 

What happens when the process whose file descriptor table is shown in Figure 4.2 executes the close(myfd) function?

Answer:

The operating system deletes the fourth entry in the file descriptor table and the corresponding entry in the system file table. (See Section 4.6.3 for a more complete discussion.) If the operating system also deleted the inode table entry, it would leave pointer B hanging in the system file table. Therefore, the inode table entry must have a count of the system file table entries that are pointing to it. When a process executes the close function, the operating system decrements the count in the inode entry. If the inode entry has a 0 count, the operating system deletes the inode entry from memory. (The operating system might not actually delete the entry right away on the chance that it will be accessed again in the immediate future.)

Example 4.22. 

The system file table entry contains an offset that gives the current position in the file. If two processes have each opened a file for reading, each process has its own offset into the file and reads the entire file independently of the other process. What happens if each process opens the same file for write? What would happen if the file offset were stored in the inode table instead of the system file table?

Answer:

The writes are independent of each other. Each user can write over what the other user has written because of the separate file offsets for each process. On the other hand, if the offsets were stored in the inode table rather than in the system file table, the writes from different active opens would be consecutive. Also, the processes that had opened a file for reading would only read parts of the file because the file offset they were using could be updated by other processes.

Example 4.23. 

Suppose a process opens a file for reading and then forks a child process. Both the parent and child can read from the file. How are reads by these two processes related? What about writes?

Answer:

The child receives a copy of the parent’s file descriptor table at the time of the fork. The processes share a system file table entry and therefore also share the file offset. The two processes read different parts of the file. If no other processes have the file open, writes append to the end of the file and no data is lost on writes. Subsection 4.6.3 covers this situation in more detail.

File pointers and buffering

The ISO C standard I/O library uses file pointers rather than file descriptors as handles for I/O. A file pointer points to a data structure called a FILE structure in the user area of the process.

Example 4.24. 

The following code segment opens the file /home/ann/my.dat for output and then writes a string to the file.

FILE *myfp;

if ((myfp = fopen("/home/ann/my.dat", "w")) == NULL)
   perror("Failed to open /home/ann/my.dat");
else
   fprintf(myfp, "This is a test");

Figure 4.3 shows a schematic of the FILE structure allocated by the fopen call of Example 4.24. The FILE structure contains a buffer and a file descriptor value. The file descriptor value is the index of the entry in the file descriptor table that is actually used to output the file to disk. In some sense the file pointer is a handle to a handle.

Schematic handling of a file pointer after fopen.

Figure 4.3. Schematic handling of a file pointer after fopen.

What happens when the program calls fprintf? The result depends on the type of file that was opened. Disk files are usually fully buffered, meaning that the fprintf does not actually write the This is a test message to disk, but instead writes the bytes to a buffer in the FILE structure. When the buffer fills, the I/O subsystem calls write with the file descriptor, as in the previous section. The delay between the time when a program executes fprintf and the time when the writing actually occurs may have interesting consequences, especially if the program crashes. Buffered data is sometimes lost on system crashes, so it is even possible for a program to appear to complete normally but its disk output could be incomplete.

How can a program avoid the effects of buffering? An fflush call forces whatever has been buffered in the FILE structure to be written out. A program can also call setvbuf to disable buffering.

Terminal I/O works a little differently. Files associated with terminals are line buffered rather than fully buffered (except for standard error, which by default, is not buffered). On output, line buffering means that the line is not written out until the buffer is full or until a newline symbol is encountered.

Example 4.25. bufferout.c

How does the output appear when the following program executes?

#include <stdio.h>

int main(void) {
   fprintf(stdout, "a");
   fprintf(stderr, "a has been written
");
   fprintf(stdout, "b");
   fprintf(stderr, "b has been written
");
   fprintf(stdout, "
");
   return 0;
}

Answer:

The messages written to standard error appear before the 'a' and 'b' because standard output is line buffered, whereas standard error is not buffered.

Example 4.26. bufferinout.c

How does the output appear when the following program executes?

#include <stdio.h>

int main(void) {
   int i;
   fprintf(stdout, "a");
   scanf("%d", &i);
   fprintf(stderr, "a has been written
");
   fprintf(stdout, "b");
   fprintf(stderr, "b has been written
");
   fprintf(stdout, "
");
   return 0;
}

Answer:

The scanf function flushes the buffer for stdout, so 'a' is displayed before the number is read in. After the number has been entered, 'b' still appears after the b has been written message.

The issue of buffering is more subtle than the previous discussion might lead you to believe. If a program that uses file pointers for a buffered device crashes, the last partial buffer created from the fprintf calls may never be written out. When the buffer is full, a write operation is performed. Completion of a write operation does not mean that the data actually made it to disk. In fact, the operating system copies the data to a system buffer cache. Periodically, the operating system writes these dirty blocks to disk. If the operating system crashes before it writes the block to disk, the program still loses the data. Presumably, a system crash is less likely to happen than an individual program crash.

Inheritance of file descriptors

When fork creates a child, the child inherits a copy of most of the parent’s environment and context, including the signal state, the scheduling parameters and the file descriptor table. The implications of inheritance are not always obvious. Because children receive a copy of their parent’s file descriptor table at the time of the fork, the parent and children share the same file offsets for files that were opened by the parent prior to the fork.

Example 4.27. openfork.c

In the following program, the child inherits the file descriptor for my.dat. Each process reads and outputs one character from the file.

#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/stat.h>

int main(void) {
   char c = '!';
   int myfd;

   if ((myfd = open("my.dat", O_RDONLY)) == -1) {
      perror("Failed to open file");
      return 1;
   }
   if (fork() == -1) {
      perror("Failed to fork");
      return 1;
   }
   read(myfd, &c, 1);
   printf("Process %ld got %c
", (long)getpid(), c);
   return 0;
}

Figure 4.4 shows the parent and child file descriptor tables for Example 4.27. The file descriptor table entries of the two processes point to the same entry in the system file table. The parent and child therefore share the file offset, which is stored in the system file table.

If the parent opens my.dat before forking, both parent and child share the system file table entry.

Figure 4.4. If the parent opens my.dat before forking, both parent and child share the system file table entry.

Example 4.28. 

Suppose the first few bytes in the file my.dat are abcdefg. What output would be generated by Example 4.27?

Answer:

Since the two processes share the file offset, the first one to read gets a and the second one to read gets b. Two lines are generated in the following form.

Process nnn got a
Process mmm got b

In theory, the lines could be output in either order but most likely would appear in the order shown.

Example 4.29. 

When a program closes a file, the entry in the file descriptor table is freed. What about the corresponding entry in the system file table?

Answer:

The system file table entry can only be freed if no more file descriptor table entries are pointing to it. For this reason, each system file table entry contains a count of the number of file descriptor table entries that are pointing to it. When a process closes a file, the operating system decrements the count and deletes the entry only when the count becomes 0.

Example 4.30. 

How does fork affect the system file table?

Answer:

The system file table is in system space and is not duplicated by fork. However, each entry in the system file table keeps a count of the number of file descriptor table entries pointing to it. These counts must be adjusted to reflect the new file descriptor table created for the child.

Example 4.31. forkopen.c

In the following program, the parent and child each open my.dat for reading, read one character, and output that character.

#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/stat.h>

int main(void) {
   char c = '!';
   int myfd;

   if (fork() == -1) {
      perror("Failed to fork");
      return 1;
   }
   if ((myfd = open("my.dat", O_RDONLY)) == -1) {
      perror("Failed to open file");
      return 1;
   }
   read(myfd, &c, 1);
   printf("Process %ld got %c
", (long)getpid(), c);
   return 0;
}

Figure 4.5 shows the file descriptor tables for Example 4.31. The file descriptor table entries corresponding to my.dat point to different system file table entries. Consequently, the parent and child do not share the file offset. The child does not inherit the file descriptor, because each process opens the file after the fork and each open creates a new entry in the system file table. The parent and child still share system file table entries for standard input, standard output and standard error.

If the parent and child open my.dat after the fork call, their file descriptor table entries point to different system file table entries.

Figure 4.5. If the parent and child open my.dat after the fork call, their file descriptor table entries point to different system file table entries.

Example 4.32. 

Suppose the first few bytes in the file my.dat are abcdefg. What output would be generated by Example 4.31?

Answer:

Since the two processes use different file offsets, each process reads the first byte of the file. Two lines are generated in the following form.

Process nnn got a
Process mmm got a

Example 4.33. fileiofork.c

What output would be generated by the following program?

#include <stdio.h>
#include <unistd.h>

int main(void) {
   printf("This is my output.");
   fork();
   return 0;
}

Answer:

Because of buffering, the output of printf is likely to be written to the buffer corresponding to stdout, but not to the actual output device. Since this buffer is part of the user space, it is duplicated by fork. When the parent and the child each terminate, the return from main causes the buffers to be flushed as part of the cleanup. The output appears as follows.

This is my output.This is my output.

Example 4.34. fileioforkline.c

What output would be generated by the following program?

#include <stdio.h>
#include <unistd.h>

int main(void) {
   printf("This is my output.
");
   fork();
   return 0;
}

Answer:

The buffering of standard output is usually line buffering. This means that the buffer is flushed when it contains a newline. Since in this case a newline is output, the buffer will probably be flushed before the fork and only one line of output will appear.

Filters and Redirection

UNIX provides a large number of utilities that are written as filters. A filter reads from standard input, performs a transformation, and outputs the result to standard output. Filters write their error messages to standard error. All of the parameters of a filter are communicated as command-line arguments. The input data should have no headers or trailers, and a filter should not require any interaction with the user.

Examples of useful UNIX filters include head, tail, more, sort, grep and awk. The cat command takes a list of filenames as command-line arguments, reads each of the files in succession, and echoes the contents of each file to standard output. However, if no input file is specified, cat takes its input from standard input and writes its results to standard output. In this case, cat behaves like a filter.

Recall that a file descriptor is an index into the file descriptor table of that process. Each entry in the file descriptor table points to an entry in the system file table, which is created when the file is opened. A program can modify the file descriptor table entry so that it points to a different entry in the system file table. This action is known as redirection. Most shells interpret the greater than character (>) on the command line as redirection of standard output and the less than character (<) as redirection of standard input. (Associate > with output by picturing it as an arrow pointing in the direction of the output file.)

Example 4.35. 

The cat command with no command-line arguments reads from standard input and echoes to standard output. The following command redirects standard output to my.file with >.

cat > my.file

The cat command of Example 4.35 gathers what is typed from the keyboard into the file my.file. Figure 4.6 depicts the file descriptor table for Example 4.35. Before redirection, entry [1] of the file descriptor table points to a system file table entry corresponding to the usual standard output device. After the redirection, entry [1] points to a system file table entry for my.file.

Status of the file descriptor table before and after redirection for the process that is executing cat > my.file.

Figure 4.6. Status of the file descriptor table before and after redirection for the process that is executing cat > my.file.

The redirection of standard output in cat > my.file occurs because the shell changes the standard output entry of the file descriptor table (a pointer to the system file table) to reference a system file table entry associated with my.file. To accomplish this redirection in a C program, first open my.file to establish an appropriate entry in the system file table. After the open operation, copy the pointer to my.file into the entry for standard output by executing the dup2 function. Then, call close to eliminate the extra file descriptor table entry for my.file.

The dup2 function takes two parameters, fildes and fildes2. It closes entry fildes2 of the file descriptor table if it was open and then copies the pointer of entry fildes into entry fildes2.

SYNOPSIS

  #include <unistd.h>

  int dup2(int fildes, int fildes2);
                                                  POSIX

On success, dup2 returns the file descriptor value that was duplicated. On failure, dup2 returns –1 and sets errno. The following table lists the mandatory errors for dup2.

errno

cause

EBADF

fildes is not a valid open file descriptor, or fildes2 is negative or greater than or equal to OPEN_MAX

EINTR

dup2 was interrupted by a signal

Example 4.36. 

Program 4.18 redirects standard output to the file my.file and then appends a short message to that file.

Figure 4.7 shows the effect of the redirection on the file descriptor table of Program 4.18. The open function causes the operating system to create a new entry in the system file table and to set entry [3] of the file descriptor table to point to this entry. The dup2 function closes the descriptor corresponding to the second parameter (standard output) and then copies the entry corresponding to the first parameter (fd) into the entry corresponding to the second parameter (STDOUT_FILENO). From that point on in the program, a write to standard output goes to my.file.

Status of the file descriptor table during the execution of Program 4.18.

Figure 4.7. Status of the file descriptor table during the execution of Program 4.18.

Example 4.18. redirect.c

A program that redirects standard output to the file my.file.

#include <fcntl.h>
#include <stdio.h>
#include <sys/stat.h>
#include <unistd.h>
#include "restart.h"
#define CREATE_FLAGS (O_WRONLY | O_CREAT | O_APPEND)
#define CREATE_MODE (S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH)

int main(void) {
   int fd;

   fd = open("my.file", CREATE_FLAGS, CREATE_MODE);
   if (fd == -1) {
       perror("Failed to open my.file");
       return 1;
   }
   if (dup2(fd, STDOUT_FILENO) == -1) {
      perror("Failed to redirect standard output");
      return 1;
   }
   if (r_close(fd) == -1) {
      perror("Failed to close the file");
      return 1;
   }
   if (write(STDOUT_FILENO, "OK", 2) == -1) {
      perror("Failed in writing to file");
      return 1;
   }
   return 0;
}

File Control

The fcntl function is a general-purpose function for retrieving and modifying the flags associated with an open file descriptor. The fildes argument of fcntl specifies the descriptor, and the cmd argument specifies the operation. The fcntl function may take additional parameters depending on the value of cmd.

SYNOPSIS

  #include <fcntl.h>
  #include <unistd.h>
  #include <sys/types.h>

  int fcntl(int fildes, int cmd, /* arg */ ...);
                                                              POSIX

The interpretation of the return value of fcntl depends on the value of the cmd parameter. However, if unsuccessful, fcntl returns –1 and sets errno. The following table lists the mandatory errors for fcntl.

errno

cause

EACCES

cmd is F_SETLK and locking not allowed

EBADF

fildes is not a valid open file descriptor or file is not opened properly for type of lock

EINTR

cmd is F_SETLKW and function interrupted by a signal

EINVAL

cmd is invalid, or cmd is F_DUPFD and arg is negative or greater than or equal to OPEN_MAX, or cmd is a locking function and arg is invalid, or fildes refers to a file that does not support locking

EMFILE

cmd is F_DUPFD and OPEN_MAX descriptors for process are open, or no file descriptors greater than or equal to arg are available

ENOLCK

cmd is F_SETLK or F_SETLKW and locks would exceed limit

EOVERFLOW

one of values to be returned cannot be represented correctly, or requested lock offset cannot be represented in off_t

The fcntl function may only be interrupted by a signal when the cmd argument is F_SETLKW (block until the process acquires an exclusive lock). In this case, fcntl returns –1 and sets errno to EINTR. Table 4.3 lists the POSIX values of the cmd parameter for fcntl.

An important example of the use of file control is to change an open file descriptor to use nonblocking I/O. When a file descriptor has been set for nonblocking I/O, the read and write functions return –1 and set errno to EAGAIN to report that the process would be delayed if a blocking I/O operation were tried. Nonblocking I/O is useful for monitoring multiple file descriptors while doing other work. Section 4.4 and Section 4.5 discuss the select and poll functions that allow a process to block until any of a set of descriptors becomes available. However, both of these functions block while waiting for I/O, so no other work can be done during the wait.

Table 4.3. POSIX values for cmd as specified in fcntl.h.

cmd

meaning

F_DUPFD

duplicate a file descriptor

F_GETFD

get file descriptor flags

F_SETFD

set file descriptor flags

F_GETFL

get file status flags and access modes

F_SETFL

set file status flags and access modes

F_GETOWN

if fildes is a socket, get process or group ID for out-of-band signals

F_SETOWN

if fildes is a socket, set process or group ID for out-of-band signals

F_GETLK

get first lock that blocks description specified by arg

F_SETLK

set or clear segment lock specified by arg

F_SETLKW

same as FSETLK except it blocks until request satisfied

To perform nonblocking I/O, a program can call open with the O_NONBLOCK flag set. A program can also change an open descriptor to be nonblocking by setting the O_NONBLOCK flag, using fcntl. To set an open descriptor to perform nonblocking I/O, use the F_GETFL command with fcntl to retrieve the flags associated with the descriptor. Use inclusive bitwise OR of O_NONBLOCK with these flags to create a new flags value. Finally, set the descriptor flags to this new value, using the F_SETFL command of fcntl.

Example 4.37. setnonblock.c

The following function sets an already opened file descriptor fd for nonblocking I/O.

#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

int setnonblock(int fd) {
   int fdflags;

   if ((fdflags = fcntl(fd, F_GETFL, 0)) == -1)
      return -1;
   fdflags |= O_NONBLOCK;
   if (fcntl(fd, F_SETFL, fdflags) == -1)
      return -1;
   return 0;
}

If successful, setnonblock returns 0. Otherwise, setnonblock returns –1 and sets errno.

The setnonblock function of Example 4.37 reads the current value of the flags associated with fd, performs a bitwise OR with O_NONBLOCK, and installs the modified flags. After this function executes, a read from fd returns immediately if no input is available.

Example 4.38. setblock.c

The following function changes the I/O mode associated with file descriptor fd to blocking by clearing the O_NONBLOCK file flag. To clear the flag, use bitwise AND with the complement of the O_NONBLOCK flag.

#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

int setblock(int fd) {
   int fdflags;

   if ((fdflags = fcntl(fd, F_GETFL, 0)) == -1)
      return -1;
   fdflags &= ~O_NONBLOCK;
   if (fcntl(fd, F_SETFL, fdflags) == -1)
      return -1;
   return 0;
}

If successful, setblock returns 0. If unsuccessful, setblock returns –1 and sets errno.

Example 4.39. process_or_do_work.c

The following function assumes that fd1 and fd2 are open for reading in nonblocking mode. If input is available from either one, the function calls docommand with the data read. Otherwise, the code calls dosomething. This implementation gives priority to fd1 and always handles input from this file descriptor before handling fd2.

#include <errno.h>
#include <unistd.h>
#include "restart.h"

void docommand(char *, int);
void dosomething(void);

void process_or_do_work(int fd1, int fd2) {
   char buf[1024];
   ssize_t bytesread;

   for ( ; ; ) {
      bytesread = r_read(fd1, buf, sizeof(buf));
      if ((bytesread == -1) && (errno != EAGAIN))
         return;                                    /* a real error on fd1 */
      else if (bytesread > 0) {
         docommand(buf, bytesread);
         continue;
      }
      bytesread = r_read(fd2, buf, sizeof(buf));
      if ((bytesread == -1) && (errno != EAGAIN))
         return;                                    /* a real error on fd2 */
      else if (bytesread > 0)
         docommand(buf, bytesread);
      else
         dosomething();          /* input not available, do something else */
   }
}

Exercise: Atomic Logging

Sometimes multiple processes need to output to the same log file. Problems can arise if one process loses the CPU while it is outputting to the log file and another process tries to write to the same file. The messages could get interleaved, making the log file unreadable. We use the term atomic logging to mean that multiple writes of one process to the same file are not mixed up with the writes of other processes writing to the same file.

This exercise describes a series of experiments to help you understand the issues involved when multiple processes try to write to the same file. We then introduce an atomic logging library and provide a series of examples of how to use the library. Appendix D.1 describes the actual implementation of this library, which is used in several places throughout the book as a tool for debugging programs.

The experiments in this section are based on Program 3.1, which creates a chain of processes. Program 4.19 modifies Program 3.1 so that the original process opens a file before creating the children. Each child writes a message to the file instead of to standard error. Each message is written in two pieces. Since the processes share an entry in the system file table, they share the file offset. Each time a process writes to the file, the file offset is updated.

Example 4.40. 

Run Program 4.19 several times and see if it generates output in the same order each time. Can you tell which parts of the output came from each process?

Answer:

On most systems, the output appears in the same order for most runs and each process generates a single line of output. However, this outcome is not guaranteed by the program. It is possible (but possibly unlikely) for one process to lose the CPU before both parts of its output are written to the file. In this, case the output is jumbled.

Example 4.19. chainopenfork.c

A program that opens a file before creating a chain of processes.

#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/stat.h>

#define BUFSIZE 1024
#define CREATE_FLAGS (O_WRONLY | O_CREAT | O_TRUNC)
#define CREATE_PERMS (S_IRUSR | S_IWUSR| S_IRGRP | S_IROTH)

int main  (int argc, char *argv[]) {
   char buf[BUFSIZE];
   pid_t childpid = 0;
   int fd;
   int i, n;

   if (argc != 3){       /* check for valid number of command-line arguments */
      fprintf (stderr, "Usage: %s processes filename
", argv[0]);
      return 1;
   }
                                        /* open the log file before the fork */
   fd = open(argv[2], CREATE_FLAGS, CREATE_PERMS);
   if (fd < 0) {
      perror("Failed to open file");
      return 1;
   }
   n = atoi(argv[1]);                              /* create a process chain */
   for (i = 1; i < n; i++)
       if (childpid = fork())
          break;
   if (childpid == -1) {
      perror("Failed to fork");
      return 1;
   }
                                       /* write twice to the common log file */
   sprintf(buf, "i:%d process:%ld ", i, (long)getpid());
   write(fd, buf, strlen(buf));
   sprintf(buf, "parent:%ld child:%ld
", (long)getppid(), (long)childpid);
   write(fd, buf, strlen(buf));
   return 0;
}

Example 4.41. 

Put sleep(1); after the first write function in Program 4.19 and run it again. Now what happens?

Answer:

Most likely, each process outputs the values of the first two integers and then each process outputs the last two integers.

Example 4.42. 

Copy chainopenfork.c to a file called chainforkopen.c and move the code to open the file after the loop that forks the children. How does the behavior of chainforkopen.c differ from that of chainopenfork.c?

Answer:

Each process now has a different system file table entry, and so each process has a different file offset. Because of O_TRUNC, each open deletes what was previously written to the file. Each process starts writing from the beginning of the file, overwriting what the other processes have written. The last process to write has control of the final file contents.

Example 4.43. 

Run chainforkopen several times and see if it generates the same order of the output each time. Which process was executed last? Do you see anything unusual about the contents of the file?

Answer:

The process that outputs last may be different on different systems. If the last process writes fewer bytes than another process, the file contains additional bytes after the line written by the last process.

If independent processes open the same log file, the results might be similar to that of Exercise 4.43. The last process to output overwrites what was previously written. One way to try to solve this problem is to call lseek to move to the end of the file before writing.

Example 4.44. 

Copy chainforkopen.c to a file called chainforkopenseek.c. Add code before each write to perform lseek to the end of the file. Also, remove the O_TRUNC flag from CREATE_FLAGS. Run the program several times and observe the behavior. Use a different file name each time.

Answer:

The lseek operation works as long as the process does not lose the CPU between lseek and write. For fast machines, you may have to run the program many times to observe this behavior. You can increase the likelihood of creating mixed-up output, by putting sleep(1); between lseek and write.

If a file is opened with the O_APPEND flag, then it automatically does all writes to the end of the file.

Example 4.45. 

Copy chainforkopen.c to a file called chainforkappend.c. Modify the CREATE_FLAGS constant by replacing O_TRUNC with O_APPEND. Run the program several times, possibly inserting sleep(1) between the write calls. What happens?

Answer:

The O_APPEND flag solves the problem of processes overwriting the log entries of other processes, but it does not prevent the individual pieces written by one process from being mixed up with the pieces of another.

Example 4.46. 

Copy chainforkappend.c to a file called chainforkonewrite.c. Combine the pair of sprintf calls so that the program uses a single write call to output its information. How does the program behave?

Answer:

The output is no longer interleaved.

Example 4.47. 

Copy chainforkonewrite.c to a file called chainforkfprintf.c. Replace open with a corresponding fopen function. Replace the single write with fprintf. How does the program behave?

Answer:

The fprintf operation causes the output to be written to a buffer in the user area. Eventually, the I/O subsystem calls write to output the contents of the buffer. You have no control over when write is called except that you can force a write operation by calling fflush. Process output can be interleaved if the buffer fills in the middle of the fprintf operation. Adding sleep(1); shouldn’t cause the problem to occur more or less often.

An atomic logging library

To make an atomic logger, we have to use a single write call to output information that we want to appear together in the log. The file must be opened with the O_APPEND flag. Here is the statement about the O_APPEND flag from the write man page that guarantees that the writing is atomic if we use the O_APPEND flag.

If the O_APPEND flag of the file status flags is set, the file offset will be set to the end of the file prior to each write and no intervening file modification operation will occur between changing the file offset and the write operation.

In the examples given here, it is simple to combine everything into a single call to write, but later we encounter situations in which it is more difficult. Appendix D.1 contains a complete implementation of a module that can be used with a program in which atomic logging is needed. A program using this module should include Program 4.20, which contains the prototypes for the publicly accessible functions. Note that the interface is simple and the implementation details are completely hidden from the user.

Example 4.20. atomic_logger.h

The include file for the atomic logging module.

int atomic_log_array(char *s, int len);
int atomic_log_clear();
int atomic_log_close();
int atomic_log_open(char *fn);
int atomic_log_printf(char *fmt, ...);
int atomic_log_send();
int atomic_log_string(char *s);

The atomic logger allows you to control how the output of programs that are running on the same machine is interspersed in a log file. To use the logger, first call atomic_log_open to create the log file. Call atomic_log_close when all logging is completed. The logger stores in a temporary buffer items written with atomic_log_array, atomic_log_string and atomic_log_printf. When the program calls atomic_log_send, the logger outputs the entire buffer, using a single write call, and frees the temporary buffers. The atomic_log_clear operation frees the temporary buffers without actually outputting to the log file. Each function in the atomic logging library returns 0 if successful. If unsuccessful, these functions return –1 and set errno.

The atomic logging facility provides three formats for writing to the log. Use atomic_log_array to write an array of a known number of bytes. Use atomic_log_string to log a string. Alternatively, you can use atomic_log_printf with a syntax similar to fprintf. Program 4.21 shows a version of the process chain that uses the first two forms for output to the atomic logger.

Example 4.48. 

How would you modify Program 4.21 to use atomic_log_printf?

Answer:

Eliminate the buf array and replace the four lines of code involving sprintf, atomic_log_array and atomic_log_string with the following.

atomic_log_printf("i:%d process:%ld ", i, (long)getpid());
atomic_log_printf("parent:%ld child ID:%ld
",
                  (long)getppid(), (long)childpid);

Alternatively use the following single call.

atomic_log_printf("i:%d process:%ld parent:%ld child:%ld
",
                  i, (long)getpid(), (long)getppid(), (long)childpid);

Example 4.21. chainforkopenlog.c

A program that uses the atomic logging module of Appendix D.1.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "atomic_logger.h"

#define BUFSIZE 1024

int main  (int argc, char *argv[]) {
   char buf[BUFSIZE];
   pid_t childpid = 0;
   int i, n;

   if (argc != 3){       /* check for valid number of command-line arguments */
      fprintf (stderr, "Usage: %s processes filename
", argv[0]);
      return 1;
   }
   n = atoi(argv[1]);                              /* create a process chain */
   for (i = 1; i < n; i++)
       if (childpid = fork())
          break;
   if (childpid == -1) {
      perror("Failed to fork");
      return 1;
   }

   if (atomic_log_open(argv[2]) == -1) {             /* open atomic log file */
      fprintf(stderr, "Failed to open log file");
      return 1;
   }
                                /* log the output, using two different forms */
   sprintf(buf, "i:%d process:%ld", i, (long)getpid());
   atomic_log_array(buf, strlen(buf));
   sprintf(buf, " parent:%ld child:%ld
", (long)getppid(), (long)childpid);
   atomic_log_string(buf);
   if (atomic_log_send() == -1) {
      fprintf(stderr, "Failed to send to log file");
      return 1;
   }
   atomic_log_close();
   return 0;
}

Example 4.49. 

Modify Program 4.19 to open an atomic log file after forking the children. (Do not remove the other open function call.) Repeat Exercises 4.40 through Exercise 4.47 after adding code to output the same information to the atomic logger as to the original file. Compare the output of the logger with the contents of the file.

Example 4.50. 

What happens if Program 4.19 opens the log file before forking the children?

Answer:

Logging should still be atomic. However, if the parent writes information to the log and doesn’t clear it before the fork, the children have a copy of this information in their logging buffers.

Another logging interface that is useful for debugging concurrent programs is the remote logging facility described in detail in Appendix D.2. Instead of logging information being sent to a file, it is sent to another process that has its own environment for displaying and saving the logged information. The remote logging process has a graphical user interface that allows the user to display the log. The remote logger does not have a facility for gathering information from a process to be displayed in a single block in the log file, but it allows logging from processes on multiple machines.

Exercise: A cat Utility

The cat utility has the following POSIX specification[52].

NAME
cat - concatenate and print files

SYNOPSIS
cat [-u] [file ...]

DESCRIPTION
        The cat utility shall read files in sequence and shall write
        their contents to the standard output in the same sequence.

OPTIONS
        The cat utility shall conform to the Base Definitions volume
        of IEEE STd 1003.1-2001, Section 12.2, Utility Syntax Guidelines.

        The following option shall be supported:

        -u      Write bytes from the input file to the standard output
                without delay as each is read

OPERANDS
        The following operand shall be supported:

        file    A pathname of an input file. If no file operands are
                specified, the standard input shall be used. If a file
                is '-', the cat utility shall read from the standard
                input at that point in the sequence. The cat utility
                shall not close and reopen standard input when it is
                referenced in this way, but shall accept multiple
                occurrences of '-' as a file operand.

STDIN
        The standard input shall be used only if no file operands are
        specified, or if a file operand is '-'. See the INPUT FILES
        section.

INPUT FILES
        The input files can be any file type.

ENVIRONMENT VARIABLES
        (.... a long section omitted here ....)

ASYNCHRONOUS EVENTS
        Default.

STDOUT
        The standard output shall contain the sequence of bytes read from
        the input files.  Nothing else shall be written to the standard
        output.

STDERR
        The standard error shall be used only for diagnostic messages.

OUTPUT FILES
        None.

EXTENDED DESCRIPTION
        None.

EXIT STATUS
        The following exit values shall be returned:

        0:      All input files were output successfully.

        >0      An error occurred.

CONSEQUENCES OF ERRORS
        Default.

The actual POSIX description continues with other sections, including APPLICATION USAGE, EXAMPLES and RATIONALE.

  1. Compare the POSIX description of cat with the man page for cat on your system and note any differences.

  2. Execute the cat command for many examples, including multiple input files and files that don’t exist. Include a case in which you redirect standard input to a disk file and use several '-' files on the command line. Explain what happens.

  3. Write your own cat utility to conform to the standard. Try to duplicate the behavior of the actual cat utility.

  4. Read the section of the cat man page on ENVIRONMENT VARIABLES.

  5. Experiment with the effect of relevant environment variables on the behavior of cat.

  6. Incorporate the handling of environment variables into your own cat utility.

Additional Reading

Advanced Programming in the UNIX Environment by Stevens [112] has an extensive discussion of UNIX I/O from a programmer’s viewpoint. Many books on Linux or UNIX programming also cover I/O. The USENIX Conference Proceedings are a good source of current information on tools and approaches evolving under UNIX.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset