UNIX uses a uniform device interface, through file descriptors, that allows the same I/O calls to be used for terminals, disks, tapes, audio and even network communication. This chapter explores the five functions that form the basis for UNIX device-independent I/O. The chapter also examines I/O from multiple sources, blocking I/O with timeouts, inheritance of file descriptors and redirection. The code carefully handles errors and interruption by signals.
A peripheral device is piece of hardware accessed by a computer system. Common peripheral devices include disks, tapes, CD-ROMs, screens, keyboards, printers, mouse devices and network interfaces. User programs perform control and I/O to these devices through system calls to operating system modules called device drivers. A device driver hides the details of device operation and protects the device from unauthorized use. Devices of the same type may vary substantially in their operation, so to be usable, even a single-user machine needs device drivers. Some operating systems provide pseudodevice drivers to simulate devices such as terminals. Pseudoterminals, for example, simplify the handling of remote login to computer systems over a network or a modem line.
Some operating systems provide specific system calls for each type of supported device, requiring the systems programmer to learn a complex set of calls for device control. UNIX has greatly simplified the programmer device interface by providing uniform access to most devices through five functions—open
, close
, read
, write
and ioctl
. All devices are represented by files, called special files, that are located in the /dev
directory. Thus, disk files and other devices are named and accessed in the same way. A regular file is just an ordinary data file on disk. A block special file represents a device with characteristics similar to a disk. The device driver transfers information from a block special device in blocks or chunks, and usually such devices support the capability of retrieving a block from anywhere on the device. A character special file represents a device with characteristics similar to a terminal. The device appears to represent a stream of bytes that must be accessed in sequential order.
UNIX provides sequential access to files and other devices through the read
and write
functions. The read
function attempts to retrieve nbyte
bytes from the file or device represented by fildes
into the user variable buf
. You must actually provide a buffer that is large enough to hold nbyte
bytes of data. (A common mistake is to provide an uninitialized pointer, buf
, rather than an actual buffer.)
SYNOPSIS #include <unistd.h> ssize_t read(int fildes, void *buf, size_t nbyte); POSIX
If successful, read
returns the number of bytes actually read. If unsuccessful, read
returns –1 and sets errno
. The following table lists the mandatory errors for read
.
| cause |
---|---|
| read attempted on a socket and connection was forcibly closed by its peer |
|
|
|
|
|
|
| process is a member of a background process group attempting to read from its controlling terminal and either process is ignoring or blocking |
| read attempted on socket that is not connected |
| the file is a regular file, |
| read attempted on socket and transmission timeout occurred |
| file descriptor is for socket marked |
A read
operation for a regular file may return fewer bytes than requested if, for example, it reached end-of-file before completely satisfying the request. A read
operation for a regular file returns 0 to indicate end-of-file. When special files corresponding to devices are read, the meaning of a read
return value of 0 depends on the implementation and the particular device. A read
operation for a pipe returns as soon as the pipe is not empty, so the number of bytes read can be less than the number of bytes requested. (Pipes are a type of communication buffer discussed in Chapter 6.) When reading from a terminal, read
returns 0 when the user enters an end-of-file character. On many systems the default end-of-file character is Ctrl-D.
The ssize_t
data type is a signed integer data type used for the number of bytes read, or –1 if an error occurs. On some systems, this type may be larger than an int
. The size_t
is an unsigned integer data type for the number of bytes to read.
Example 4.1.
The following code segment reads at most 100 bytes into buf
from standard input.
char buf[100]; ssize_t bytesread; bytesread = read(STDIN_FILENO, buf, 100);
This code does no error checking.
The file descriptor, which represents a file or device that is open, can be thought of as an index into the process file descriptor table. The file descriptor table is in the process user area and provides access to the system information for the associated file or device.
When you execute a program from the shell, the program starts with three open streams associated with file descriptors STDIN_FILENO
, STDOUT_FILENO
and STDERR_FILENO
. STDIN_FILENO
and STDOUT_FILENO
are standard input and standard output, respectively. By default, these two streams usually correspond to keyboard input and screen output. Programs should use STDERR_FILENO
, the standard error device, for error messages and should never close it. In legacy code standard input, standard output and standard error are represented by 0
, 1
and 2
, respectively. However, you should always use their symbolic names rather than these numeric values. Section 4.6 explains how file descriptors work.
Example 4.2.
What happens when the following code executes?
char *buf; ssize_t bytesread; bytesread = read(STDIN_FILENO, buf, 100);
Answer:
The code segment, which may compile without error, does not allocate space for buf
. The result of read
is unpredictable, but most probably it will generate a memory access violation. If buf
is an automatic variable stored on the stack, it is not initialized to any particular value. Whatever that memory happens to hold is treated as the address of the buffer for reading.
The readline
function of Program 4.1 reads bytes, one at a time, into a buffer of fixed size until a newline character ('
'
) or an error occurs. The function handles end-of-file, limited buffer size and interruption by a signal. The readline
function returns the number of bytes read or –1 if an error occurs. A return value of 0 indicates an end-of-file before any characters were read. A return value greater than 0 indicates the number of bytes read. In this case, the buffer contains a string ending in a newline character. A return value of –1 indicates that errno
has been set and one of the following errors occurred.
An error occurred on read
.
At least one byte was read and an end-of-file occurred before a newline was read.
nbytes-1
bytes were read and no newline was found.
Upon successful return of a value greater than 0, the buffer contains a string ending in a newline character. If readline
reads from a file that does not end with a newline character, it treats the last line read as an error. The readline
function is available in the restart library, of Appendix B.
Example 4.1. readline.c
The readline
function returns the next line from a file.
#include <errno.h> #include <unistd.h> int readline(int fd, char *buf, int nbytes) { int numread = 0; int returnval; while (numread < nbytes - 1) { returnval = read(fd, buf + numread, 1); if ((returnval == -1) && (errno == EINTR)) continue; if ( (returnval == 0) && (numread == 0) ) return 0; if (returnval == 0) break; if (returnval == -1) return -1; numread++; if (buf[numread-1] == ' ') { buf[numread] = ' '; return numread; } } errno = EINVAL; return -1; }
Example 4.3.
The following code segment calls the readline
function of Program 4.1 to read a line of at most 99 bytes from standard input.
int bytesread; char mybuf[100]; bytesread = readline(STDIN_FILENO, mybuf, sizeof(mybuf));
Example 4.4.
Under what circumstances does the readline
function of Program 4.1 return a buffer with no newline character?
Answer:
This can only happen if the return value is 0 or –1. The return value of 0 indicates that nothing was read. The return of –1 indicates some type of error. In either case, the buffer may not contain a string.
The write
function attempts to output nbyte
bytes from the user buffer buf
to the file represented by file descriptor fildes
.
SYNOPSIS #include <unistd.h> ssize_t write(int fildes, const void *buf, size_t nbyte); POSIX
If successful, write
returns the number of bytes actually written. If unsuccessful, write
returns –1 and sets errno
. The following table lists the mandatory errors for write
.
| cause |
---|---|
| write attempted on a socket that is not connected |
|
|
|
|
| attempt to write a file that exceeds implementation-defined maximum; file is a regular file, |
|
|
| process is a member of a background process group attempting to write to controlling terminal, |
| no free space remaining on device containing the file |
| attempt to write to a pipe or FIFO not open for reading or that has only one end open (thread may also get |
| file descriptor is for socket marked |
Example 4.5.
What can go wrong with the following code segment?
#define BLKSIZE 1024 char buf[BLKSIZE]; read(STDIN_FILENO, buf, BLKSIZE); write(STDOUT_FILENO, buf, BLKSIZE);
Answer:
The write
function assumes that the read
has filled buf
with BLKSIZE
bytes. However, read
may fail or may not read the full BLKSIZE
bytes. In these two cases, write
outputs garbage.
Example 4.6.
What can go wrong with the following code segment to read from standard input and write to standard output?
#define BLKSIZE 1024 char buf[BLKSIZE]; ssize_t bytesread; bytesread = read(STDIN_FILENO, buf, BLKSIZE); if (bytesread > 0) write(STDOUT_FILE, buf, bytesread);
Answer:
Although write
uses bytesread
rather than BLKSIZE
, there is no guarantee that write
actually outputs all of the bytes requested. Furthermore, either read
or write
can be interrupted by a signal. In this case, the interrupted call returns a –1 with errno
set to EINTR
.
Program 4.2 copies bytes from the file represented by fromfd
to the file represented by tofd
. The function restarts read
and write
if either is interrupted by a signal. Notice that the write
statement specifies the buffer by a pointer, bp
, rather than by a fixed address such as buf
. If the previous write
operation did not output all of buf
, the next write
operation must start from the end of the previous output. The copyfile
function returns the number of bytes read and does not indicate whether or not an error occurred.
Example 4.7. simplecopy.c
The following program calls copyfile
to copy a file from standard input to standard output.
#include <stdio.h> #include <unistd.h> int copyfile(int fromfd, int tofd); int main (void) { int numbytes; numbytes = copyfile(STDIN_FILENO, STDOUT_FILENO); fprintf(stderr, "Number of bytes copied: %d ", numbytes); return 0; }
Example 4.8.
What happens when you run the program of Example 4.7?
Answer:
Standard input is usually set to read one line at a time, so I/O is likely be entered and echoed on line boundaries. The I/O continues until you enter the end-of-file character (often Ctrl-D by default) at the start of a line or you interrupt the program by entering the interrupt character (often Ctrl-C by default). Use the stty -a
command to find the current settings for these characters.
Example 4.2. copyfile1.c
The copyfile.c
function copies a file from fromfd
to tofd
.
#include <errno.h> #include <unistd.h> #define BLKSIZE 1024 int copyfile(int fromfd, int tofd) { char *bp; char buf[BLKSIZE]; int bytesread, byteswritten; int totalbytes = 0; for ( ; ; ) { while (((bytesread = read(fromfd, buf, BLKSIZE)) == -1) && (errno == EINTR)) ; /* handle interruption by signal */ if (bytesread <= 0) /* real error or end-of-file on fromfd */ break; bp = buf; while (bytesread > 0) { while(((byteswritten = write(tofd, bp, bytesread)) == -1 ) && (errno == EINTR)) ; /* handle interruption by signal */ if (byteswritten <= 0) /* real error on tofd */ break; totalbytes += byteswritten; bytesread -= byteswritten; bp += byteswritten; } if (byteswritten == -1) /* real error on tofd */ break; } return totalbytes; }
Example 4.9.
How would you use the program of Example 4.7 to copy the file myin.dat
to myout.dat
?
Answer:
Use redirection. If the executable of Example 4.7 is called simplecopy
, the line would be as follows.
simplecopy < myin.dat > myout.dat
The problems of restarting read
and write
after signals and of writing the entire amount requested occur in nearly every program using read
and write
. Program 4.3 shows a separate r_read
function that you can use instead of read
when you want to restart after a signal. Similarly, Program 4.4 shows a separate r_write
function that restarts after a signal and writes the full amount requested. For convenience, a number of functions, including r_read
, r_write
, copyfile
and readline
, have been collected in a library called restart.c
. The prototypes for these functions are contained in restart.h
, and we include this header file when necessary. Appendix B presents the complete restart library implementation.
Example 4.3. r_read.c
The r_read.c
function is similar to read
except that it restarts itself if interrupted by a signal.
#include <errno.h> #include <unistd.h> ssize_t r_read(int fd, void *buf, size_t size) { ssize_t retval; while (retval = read(fd, buf, size), retval == -1 && errno == EINTR) ; return retval; }
Example 4.4. r_write.c
The r_write.c
function is similar to write
except that it restarts itself if interrupted by a signal and writes the full amount requested.
#include <errno.h> #include <unistd.h> ssize_t r_write(int fd, void *buf, size_t size) { char *bufp; size_t bytestowrite; ssize_t byteswritten; size_t totalbytes; for (bufp = buf, bytestowrite = size, totalbytes = 0; bytestowrite > 0; bufp += byteswritten, bytestowrite -= byteswritten) { byteswritten = write(fd, bufp, bytestowrite); if ((byteswritten) == -1 && (errno != EINTR)) return -1; if (byteswritten == -1) byteswritten = 0; totalbytes += byteswritten; } return totalbytes; }
The functions r_read
and r_write
can greatly simplify programs that need to read and write while handling signals.
Program 4.5 shows the readwrite
function that reads bytes from one file descriptor and writes all of the bytes read to another one. It uses a buffer of size PIPE_BUF
to transfer at most PIPE_BUF
bytes. This size is useful for writing to pipes since a write to a pipe of PIPE_BUF
bytes or less is atomic. Program 4.6 shows a version of copyfile
that uses the readwrite
function. Compare this with Program 4.2.
Example 4.5. readwrite.c
A program that reads from one file descriptor and writes all the bytes read to another file descriptor.
#include <limits.h> #include "restart.h" #define BLKSIZE PIPE_BUF int readwrite(int fromfd, int tofd) { char buf[BLKSIZE]; int bytesread; if ((bytesread = r_read(fromfd, buf, BLKSIZE)) == -1) return -1; if (bytesread == 0) return 0; if (r_write(tofd, buf, bytesread) == -1) return -1; return bytesread; }
Example 4.6. copyfile.c
A simplified implementation of copyfile
that uses r_read
and r_write
.
#include <unistd.h> #include "restart.h" #define BLKSIZE 1024 int copyfile(int fromfd, int tofd) { char buf[BLKSIZE]; int bytesread, byteswritten; int totalbytes = 0; for ( ; ; ) { if ((bytesread = r_read(fromfd, buf, BLKSIZE)) <= 0) break; if ((byteswritten = r_write(tofd, buf, bytesread)) == -1) break; totalbytes += byteswritten; } return totalbytes; }
The r_write
function writes all the bytes requested and restarts the write if fewer bytes are written. The r_read
only restarts if interrupted by a signal and often reads fewer bytes than requested. The readblock
function is a version of read
that continues reading until the requested number of bytes is read or an error occurs. Program 4.7 shows an implementation of readblock
. The readblock
function is part of the restart library. It is especially useful for reading structures.
Example 4.7. readblock.c
A function that reads a specific number of bytes.
#include <errno.h> #include <unistd.h> ssize_t readblock(int fd, void *buf, size_t size) { char *bufp; size_t bytestoread; ssize_t bytesread; size_t totalbytes; for (bufp = buf, bytestoread = size, totalbytes = 0; bytestoread > 0; bufp += bytesread, bytestoread -= bytesread) { bytesread = read(fd, bufp, bytestoread); if ((bytesread == 0) && (totalbytes == 0)) return 0; if (bytesread == 0) { errno = EINVAL; return -1; } if ((bytesread) == -1 && (errno != EINTR)) return -1; if (bytesread == -1) bytesread = 0; totalbytes += bytesread; } return totalbytes; }
There are only three possibilities for the return value of readblock
. The readblock
function returns 0 if an end-of-file occurs before any bytes are read. This happens if the first call to read
returns 0. If readblock
is successful, it returns size
, signifying that the requested number of bytes was successfully read. Otherwise, readblock
returns –1 and sets errno
. If readblock
reaches the end-of-file after some, but not all, of the needed bytes have been read, readblock
returns –1 and sets errno
to EINVAL
.
Example 4.10.
The following code segment can be used to read a pair of integers from an open file descriptor.
struct { int x; int y; } point; if (readblock(fd, &point, sizeof(point)) <= 0) fprintf(stderr, "Cannot read a point. ");
Program 4.8 combines readblock
with r_write
to read a fixed number of bytes from one open file descriptor and write them to another open file descriptor.
Example 4.8. readwriteblock.c
A program that copies a fixed number of bytes from one file descriptor to another.
#include "restart.h" int readwriteblock(int fromfd, int tofd, char *buf, int size) { int bytesread; bytesread = readblock(fromfd, buf, size); if (bytesread != size) /* can only be 0 or -1 */ return bytesread; return r_write(tofd, buf, size); }
The open
function associates a file descriptor (the handle used in the program) with a file or physical device. The path
parameter of open
points to the pathname of the file or device, and the oflag
parameter specifies status flags and access modes for the opened file. You must include a third parameter to specify access permissions if you are creating a file.
SYNOPSIS #include <fcntl.h> #include <sys/stat.h> int open(const char *path, int oflag, ...); POSIX
If successful, open
returns a nonnegative integer representing the open file descriptor. If unsuccessful, open
returns –1 and sets errno
. The following table lists the mandatory errors for open
.
| cause |
---|---|
| search permission on component of path prefix denied, or file exists and permissions specified by |
|
|
| signal was caught during |
| named file is directory and |
| a loop exists in resolution of |
|
|
| the length of |
| maximum allowable number of files currently open in system |
|
|
| directory or file system for new file cannot be expanded, the file does not exist and |
| a component of the path prefix is not a directory |
|
|
| named file is a regular file and size cannot be represented by an object of type |
| the named file resides on a read-only file system and one of |
Construct the oflag
argument by taking the bitwise OR (|
) of the desired combination of the access mode and the additional flags. The POSIX values for the access mode flags are O_RDONLY, O_WRONLY
and O_RDWR
. You must specify exactly one of these designating read-only, write-only or read-write access, respectively.
The additional flags include O_APPEND, O_CREAT, O_EXCL, O_NOCTTY, O_NONBLOCK
and O_TRUNC
. The O_APPEND
flag causes the file offset to be moved to the end of the file before a write, allowing you to add to an existing file. In contrast, O_TRUNC
truncates the length of a regular file opened for writing to 0. The O_CREAT
flag causes a file to be created if it doesn’t already exist. If you include the O_CREAT
flag, you must also pass a third argument to open
to designate the permissions. If you want to avoid writing over an existing file, use the combination O_CREAT | O_EXCL
. This combination returns an error if the file already exists. The O_NOCTTY
flag prevents an opened device from becoming a controlling terminal. Controlling terminals are discussed in Section 11.5. The O_NONBLOCK
flag controls whether the open
returns immediately or blocks until the device is ready. Section 4.8 discusses how the O_NONBLOCK
flag affects the behavior of read
and write
. Certain POSIX extensions specify additional flags. You can find the flags in fcntl.h
.
Example 4.11.
The following code segment opens the file /home/ann/my.dat
for reading.
int myfd; myfd = open("/home/ann/my.dat", O_RDONLY);
This code does no error checking.
Example 4.12.
How can the call to open
of Example 4.11 fail?
Answer:
The open
function returns –1 if the file doesn’t exist, the open
call was interrupted by a signal or the process doesn’t have the appropriate access permissions. If your code uses myfd
for a subsequent read
or write
operation, the operation fails.
Example 4.13.
The following code segment restarts open
after a signal occurs.
int myfd; while((myfd = open("/home/ann/my.dat", O_RDONLY)) == -1 && errno == EINTR) ; if (myfd == -1) /* it was a real error, not a signal */ perror("Failed to open the file"); else /* continue on */
Example 4.14.
How would you modify Example 4.13 to open /home/ann/my.dat
for nonblocking read?
Answer:
You would OR the O_RDONLY
and the O_NONBLOCK
flags.
myfd = open("/home/ann/my.dat", O_RDONLY | O_NONBLOCK);
Each file has three classes associated with it: a user (or owner), a group and everybody else (others). The possible permissions or privileges are read(r), write(w) and execute(x). These privileges are specified separately for the user, the group and others. When you open a file with the O_CREAT
flag, you must specify the permissions as the third argument to open
in a mask of type mode_t
.
Historically, the file permissions were laid out in a mask of bits with 1’s in designated bit positions of the mask, signifying that a class had the corresponding privilege. Figure 4.1 shows an example of a typical layout of such a permission mask. Although numerically coded permission masks frequently appear in legacy code, you should avoid using numerical values in your programs.
POSIX defines symbolic names for masks corresponding to the permission bits so that you can specify file permissions independently of the implementation. These names are defined in sys/stat.h
. Table 4.1 lists the symbolic names and their meanings. To form the permission mask, bitwise OR the symbols corresponding to the desired permissions.
Table 4.1. POSIX symbolic names for file permissions.
symbol | meaning |
---|---|
| read by owner |
| write by owner |
| execute by owner |
| read, write, execute by owner |
| read by group |
| write by group |
| execute by group |
| read, write, execute by group |
| read by others |
| write by others |
| execute by others |
| read, write, execute by others |
| set user ID on execution |
| set group ID on execution |
Example 4.15.
The following code segment creates a file, info.dat
, in the current directory. If the info.dat
file already exists, it is overwritten. The new file can be read or written by the user and only read by everyone else.
int fd; mode_t fdmode = (S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH); if ((fd = open("info.dat", O_RDWR | O_CREAT, fdmode)) == -1) perror("Failed to open info.dat");
Program 4.9 copies a source file to a destination file. Both filenames are passed as command-line arguments. Because the open
function for the destination file has O_CREAT | O_EXCL
, the file copy fails if that file already exists.
Example 4.9. copyfilemain.c
A program to copy a file.
#include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/stat.h> #include "restart.h" #define READ_FLAGS O_RDONLY #define WRITE_FLAGS (O_WRONLY | O_CREAT | O_EXCL) #define WRITE_PERMS (S_IRUSR | S_IWUSR) int main(int argc, char *argv[]) { int bytes; int fromfd, tofd; if (argc != 3) { fprintf(stderr, "Usage: %s from_file to_file ", argv[0]); return 1; } if ((fromfd = open(argv[1], READ_FLAGS)) == -1) { perror("Failed to open input file"); return 1; } if ((tofd = open(argv[2], WRITE_FLAGS, WRITE_PERMS)) == -1) { perror("Failed to create output file"); return 1; } bytes = copyfile(fromfd, tofd); printf("%d bytes copied from %s to %s ", bytes, argv[1], argv[2]); return 0; /* the return closes the files */ }
Program 4.9 returns immediately after performing the copy and does not explicitly close the file. The return from main
causes the necessary cleanup to release the resources associated with open files. In general, however, you should be careful to release open file descriptors by calling close
.
The close
function has a single parameter, fildes
, representing the open file whose resources are to be released.
SYNOPSIS #include <unistd.h> int close(int fildes); POSIX
If successful, close
returns 0. If unsuccessful, close
returns –1 and sets errno
. The following table lists the mandatory errors for close
.
| cause |
---|---|
|
|
| the |
Program 4.10 shows an r_close
function that restarts itself after interruption by a signal. Its prototype is in the header file restart.h
.
The handling of I/O from multiple sources is an important problem that arises in many different forms. For example, a program may want to overlap terminal I/O with reading input from a disk or with printing. Another example occurs when a program expects input from two different sources, but it doesn’t know which input will be available first. If the program tries to read from source A, and in fact, input was only available from source B, the program blocks. To solve this problem, we need to block until input from either source becomes available. Blocking until at least one member of a set of conditions becomes true is called OR synchronization. The condition for the case described is “input available” on a descriptor.
One method of monitoring multiple file descriptors is to use a separate process for each one. Program 4.11 takes two command-line arguments, the names of two files to monitor. The parent process opens both files before creating the child process. The parent monitors the first file descriptor, and the child monitors the second. Each process echoes the contents of its file to standard output. If two named pipes are monitored, output appears as input becomes available.
Example 4.11. monitorfork.c
A program that monitors two files by forking a child process.
#include <errno.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #include <unistd.h> #include "restart.h" int main(int argc, char *argv[]) { int bytesread; int childpid; int fd, fd1, fd2; if (argc != 3) { fprintf(stderr, "Usage: %s file1 file2 ", argv[0]); return 1; } if ((fd1 = open(argv[1], O_RDONLY)) == -1) { fprintf(stderr, "Failed to open file %s:%s ", argv[1], strerror(errno)); return 1; } if ((fd2 = open(argv[2], O_RDONLY)) == -1) { fprintf(stderr, "Failed to open file %s:%s ", argv[2], strerror(errno)); return 1; } if ((childpid = fork()) == -1) { perror("Failed to create child process"); return 1; } if (childpid > 0) /* parent code */ fd = fd1; else fd = fd2; bytesread = copyfile(fd, STDOUT_FILENO); fprintf(stderr, "Bytes read: %d ", bytesread); return 0; }
While using separate processes to monitor two file descriptors can be useful, the two processes have separate address spaces and so it is difficult for them to interact.
Example 4.16.
How would you modify Program 4.11 so that it prints the total number of bytes read from the two files?
Answer:
Set up some form of interprocess communication before creating the child. For example, the parent process could create a pipe and the child could send its byte count to the pipe when it has finished. After the parent has processed its file, the parent could wait for the child and read the byte count from the pipe.
The select
call provides a method of monitoring file descriptors from a single process. It can monitor for three possible conditions—a read can be done without blocking, a write can be done without blocking, or a file descriptor has error conditions pending. Older versions of UNIX defined the select
function in sys/time.h
, but the POSIX standard now uses sys/select.h
.
The nfds
parameter of select
gives the range of file descriptors to be monitored. The value of nfds
must be at least one greater than the largest file descriptor to be checked. The readfds
parameter specifies the set of descriptors to be monitored for reading. Similarly, writefds
specifies the set of descriptors to be monitored for writing, and errorfds
specifies the file descriptors to be monitored for error conditions. The descriptor sets are of type fd_set
. Any of these parameters may be NULL
, in which case select
does not monitor the descriptor for the corresponding event. The last parameter is a timeout value that forces a return from select
after a certain period of time has elapsed, even if no descriptors are ready. When timeout
is NULL
, select
may block indefinitely.
SYNOPSIS #include <sys/select.h> int select(int nfds, fd_set *restrict readfds, fd_set *restrict writefds, fd_set *restrict errorfds, struct timeval *restrict timeout); void FD_CLR(int fd, fd_set *fdset); int FD_ISSET(int fd, fd_set *fdset); void FD_SET(int fd, fd_set *fdset); void FD_ZERO(fd_set *fdset); POSIX
On successful return, select
clears all the descriptors in each of readfds
, writefds
and errorfds
except those descriptors that are ready. If successful, the select
function returns the number of file descriptors that are ready. If unsuccessful, select
returns –1 and sets errno
. The following table lists the mandatory errors for select
.
| cause |
---|---|
| one or more file descriptor sets specified an invalid file descriptor |
| the |
| an invalid timeout interval was specified, or |
Historically, systems implemented the descriptor set as an integer bit mask, but that implementation does not work for more than 32 file descriptors on most systems. The descriptor sets are now usually represented by bit fields in arrays of integers. Use the macros FD_SET
, FD_CLR
, FD_ISSET
and FD_ZERO
to manipulate the descriptor sets in an implementation-independent way as demonstrated in Program 4.12.
The FD_SET
macro sets the bit in *fdset
corresponding to the fd
file descriptor, and the FD_CLR
macro clears the corresponding bit. The FD_ZERO
macro clears all the bits in *fdset
. Use these three macros to set up descriptor masks before calling select
. Use the FD_ISSET
macro after select
returns, to test whether the bit corresponding to the file descriptor fd
is set in the mask.
Example 4.12. whichisready.c
A function that blocks until one of two file descriptors is ready.
#include <errno.h> #include <string.h> #include <sys/select.h> int whichisready(int fd1, int fd2) { int maxfd; int nfds; fd_set readset; if ((fd1 < 0) || (fd1 >= FD_SETSIZE) || (fd2 < 0) || (fd2 >= FD_SETSIZE)) { errno = EINVAL; return -1; } maxfd = (fd1 > fd2) ? fd1 : fd2; FD_ZERO(&readset); FD_SET(fd1, &readset); FD_SET(fd2, &readset); nfds = select(maxfd+1, &readset, NULL, NULL, NULL); if (nfds == -1) return -1; if (FD_ISSET(fd1, &readset)) return fd1; if (FD_ISSET(fd2, &readset)) return fd2; errno = EINVAL; return -1; }
The function whichisready
blocks until at least one of the two file descriptors passed as parameters is ready for reading and returns that file descriptor. If both are ready, it returns the first file descriptor. If unsuccessful, whichisready
returns –1 and sets errno
.
Example 4.13. copy2files.c
A function that uses select
to do two concurrent file copies.
#include <errno.h> #include <stdio.h> #include <string.h> #include <sys/time.h> #include "restart.h" int copy2files(int fromfd1, int tofd1, int fromfd2, int tofd2) { int bytesread; int maxfd; int num; fd_set readset; int totalbytes = 0; if ((fromfd1 < 0) || (fromfd1 >= FD_SETSIZE) || (tofd1 < 0) || (tofd1 >= FD_SETSIZE) || (fromfd2 < 0) || (fromfd2 >= FD_SETSIZE) || (tofd2 < 0) || (tofd2 >= FD_SETSIZE)) return 0; maxfd = fromfd1; /* find the biggest fd for select */ if (fromfd2 > maxfd) maxfd = fromfd2; for ( ; ; ) { FD_ZERO(&readset); FD_SET(fromfd1, &readset); FD_SET(fromfd2, &readset); if (((num = select(maxfd+1, &readset, NULL, NULL, NULL)) == -1) && (errno == EINTR)) continue; if (num == -1) return totalbytes; if (FD_ISSET(fromfd1, &readset)) { bytesread = readwrite(fromfd1, tofd1); if (bytesread <= 0) break; totalbytes += bytesread; } if (FD_ISSET(fromfd2, &readset)) { bytesread = readwrite(fromfd2, tofd2); if (bytesread <= 0) break; totalbytes += bytesread; } } return totalbytes; }
The whichisready
function of Program 4.12 is problematic because it always chooses fd1
if both fd1
and fd2
are ready. The copy2files
function copies bytes from fromfd1
to tofd1
and from fromfd2
to tofd2
without making any assumptions about the order in which the bytes become available in the two directions. The function returns if either copy encounters an error or end-of-file.
The copy2files
function of Program 4.13 can be generalized to monitor multiple file descriptors for input. Such a problem might be encountered by a command processor that was monitoring requests from different terminals. The program cannot predict which source will produce the next input, so it must use a method such as select
. In addition, the set of monitored descriptors is dynamic—the program must remove a source from the monitoring set if an error condition arises on that source’s descriptor.
The monitorselect
function in Program 4.14 monitors an array of open file descriptors fd
. When input is available on file descriptor fd[i]
, the program reads information from fd[i]
and calls docommand
. The monitorselect
function has two parameters: an array of open file descriptors and the number of file descriptors in the array. The function restarts the select
or read
if either is interrupted by a signal. When read
encounters other types of errors or an end-of-file, monitorselect
closes the corresponding descriptor and removes it from the monitoring set. The monitorselect
function returns when all descriptors have indicated an error or end-of-file.
The waitfdtimed
function in Program 4.15 takes two parameters: a file descriptor and an ending time. It uses gettimeout
to calculate the timeout interval from the end time and the current time obtained by a call to gettimeofday
. (See Section 9.1.3.) If select
returns prematurely because of a signal, waitfdtimed
recalculates the timeout and calls select
again. The standard does not say anything about the value of the timeout
parameter or the fd_set
parameters of select
when it is interrupted by a signal, so we reset them inside the while
loop.
You can use the select
timeout feature to implement a timed read operation, as shown in Program 4.16. The readtimed
function behaves like read
except that it takes an additional parameter, seconds
, specifying a timeout in seconds. The readtimed
function returns –1 with errno
set to ETIME
if no input is available in the next seconds
interval. If interrupted by a signal, readtimed
restarts with the remaining time. Most of the complication comes from the need to restart select
with the remaining time when select
is interrupted by a signal. The select
function does not provide a direct way of determining the time remaining in this case. The readtimed
function in Program 4.16 sets the end time for the timeout by calling add2currenttime
in Program 4.15. It uses this value when calling waitfdtimed
from Program 4.15 to wait until the file descriptor can be read or the time given has occurred.
Example 4.14. monitorselect.c
A function to monitor file descriptors using select
.
#include <errno.h> #include <string.h> #include <unistd.h> #include <sys/select.h> #include <sys/types.h> #include "restart.h" #define BUFSIZE 1024 void docommand(char *, int); void monitorselect(int fd[], int numfds) { char buf[BUFSIZE]; int bytesread; int i; int maxfd; int numnow, numready; fd_set readset; maxfd = 0; /* set up the range of descriptors to monitor */ for (i = 0; i < numfds; i++) { if ((fd[i] < 0) || (fd[i] >= FD_SETSIZE)) return; if (fd[i] >= maxfd) maxfd = fd[i] + 1; } numnow = numfds; while (numnow > 0) { /* continue monitoring until all are done */ FD_ZERO(&readset); /* set up the file descriptor mask */ for (i = 0; i < numfds; i++) if (fd[i] >= 0) FD_SET(fd[i], &readset); numready = select(maxfd, &readset, NULL, NULL, NULL); /* which ready? */ if ((numready == -1) && (errno == EINTR)) /* interrupted by signal */ continue; else if (numready == -1) /* real select error */ break; for (i = 0; (i < numfds) && (numready > 0); i++) { /* read and process */ if (fd[i] == -1) /* this descriptor is done */ continue; if (FD_ISSET(fd[i], &readset)) { /* this descriptor is ready */ bytesread = r_read(fd[i], buf, BUFSIZE); numready--; if (bytesread > 0) docommand(buf, bytesread); else { /* error occurred on this descriptor, close it */ r_close(fd[i]); fd[i] = -1; numnow--; } } } } for (i = 0; i < numfds; i++) if (fd[i] >= 0) r_close(fd[i]); }
Example 4.15. waitfdtimed.c
A function that waits for a given time for input to be available from an open file descriptor.
#include <errno.h> #include <string.h> #include <sys/select.h> #include <sys/time.h> #include "restart.h" #define MILLION 1000000L #define D_MILLION 1000000.0 static int gettimeout(struct timeval end, struct timeval *timeoutp) { gettimeofday(timeoutp, NULL); timeoutp->tv_sec = end.tv_sec - timeoutp->tv_sec; timeoutp->tv_usec = end.tv_usec - timeoutp->tv_usec; if (timeoutp->tv_usec >= MILLION) { timeoutp->tv_sec++; timeoutp->tv_usec -= MILLION; } if (timeoutp->tv_usec < 0) { timeoutp->tv_sec--; timeoutp->tv_usec += MILLION; } if ((timeoutp->tv_sec < 0) || ((timeoutp->tv_sec == 0) && (timeoutp->tv_usec == 0))) { errno = ETIME; return -1; } return 0; } struct timeval add2currenttime(double seconds) { struct timeval newtime; gettimeofday(&newtime, NULL); newtime.tv_sec += (int)seconds; newtime.tv_usec += (int)((seconds - (int)seconds)*D_MILLION + 0.5); if (newtime.tv_usec >= MILLION) { newtime.tv_sec++; newtime.tv_usec -= MILLION; } return newtime; } int waitfdtimed(int fd, struct timeval end) { fd_set readset; int retval; struct timeval timeout; if ((fd < 0) || (fd >= FD_SETSIZE)) { errno = EINVAL; return -1; } FD_ZERO(&readset); FD_SET(fd, &readset); if (gettimeout(end, &timeout) == -1) return -1; while (((retval = select(fd + 1, &readset, NULL, NULL, &timeout)) == -1) && (errno == EINTR)) { if (gettimeout(end, &timeout) == -1) return -1; FD_ZERO(&readset); FD_SET(fd, &readset); } if (retval == 0) { errno = ETIME; return -1; } if (retval == -1) return -1; return 0; }
Example 4.16. readtimed.c
A function do a timed read from an open file descriptor.
#include <sys/time.h> #include "restart.h" ssize_t readtimed(int fd, void *buf, size_t nbyte, double seconds) { struct timeval timedone; timedone = add2currenttime(seconds); if (waitfdtimed(fd, timedone) == -1) return (ssize_t)(-1); return r_read(fd, buf, nbyte); }
Example 4.17.
Why is it necessary to test whether newtime.tv_usec
is greater than or equal to a million when it is set from the fractional part of seconds
? What are the consequences of having that value equal to one million?
Answer:
Since the value is rounded to the nearest microsecond, a fraction such as 0.999999999 might round to one million when multiplied by MILLION
. The action of functions that use struct timeval
values are not specified when the tv_usec
field is not strictly less than one million.
Example 4.18.
One way to simplify Program 4.15 is to just restart the select
with the same timeout whenever it is interrupted by a signal. What is wrong with this?
Answer:
If your program receives signals regularly and the time between signals is smaller than the timeout interval, waitfdtimed
never times out.
The 2000 version of POSIX introduced a new version of select
called pselect
. The pselect
function is identical to the select
function, but it uses a more precise timeout structure, struct timespec
, and allows for the blocking or unblocking of signals while it is waiting for I/O to be available. The struct timespec
structure is discussed in Section 9.1.4. However, at the time of writing, (March 2003), none of the our test operating systems supported pselect
.
The poll
function is similar to select
, but it organizes the information by file descriptor rather than by type of condition. That is, the possible events for one file descriptor are stored in a struct pollfd
. In contrast, select
organizes information by the type of event and has separate descriptor masks for read, write and error conditions. The poll
function is part of the POSIX:XSI Extension and has its origins in UNIX System V.
The poll
function takes three parameters: fds
, nfds
and timeout
. The fds
is an array of struct pollfd
, representing the monitoring information for the file descriptors. The nfds
parameter gives the number of descriptors to be monitored. The timeout
value is the time in milliseconds that the poll
should wait without receiving an event before returning. If the timeout
value is –1, poll
never times out. If integers are 32 bits, the maximum timeout period is about 30 minutes.
SYNOPSIS #include <poll.h> int poll(struct pollfd fds[], nfds_t nfds, int timeout); POSIX:XSI
The poll
function returns 0 if it times out. If successful, poll
returns the number of descriptors that have events. If unsuccessful, poll
returns –1 and sets errno
. The following table lists the mandatory errors for poll
.
| cause |
---|---|
| allocation of internal data structures failed, but a subsequent request may succeed |
| a signal was caught during |
|
|
The struct pollfd
structure includes the following members.
int fd; /* file descriptor */ short events; /* requested events */ short revents; /* returned events */
The fd
is the file descriptor number, and the events
and revents
are constructed by taking the logical OR of flags representing the various events listed in Table 4.2. Set events
to contain the events to monitor; poll
fills in the revents
with the events that have occurred. The poll
function sets the POLLHUP
, POLLERR
and POLLNVAL
flags in revents
to reflect the existence of the associated conditions. You do not need to set the corresponding bits in events
for these. If fd
is less than zero, the events
field is ignored and revents
is set to zero. The standard does not specify how end-of-file is to be handled. End-of-file can either be communicated by an revents
flag of POLLHUP
or a normal read of 0 bytes. It is possible for POLLHUP
to be set even if POLLIN
or POLLRDNORM
indicates that there is still data to read. Therefore, normal reading should be handled before error checking.
Table 4.2. Values of the event flags for the poll
function.
event flag | meaning |
---|---|
| read other than high priority data without blocking |
| read normal data without blocking |
| read priority data without blocking |
| read high-priority data without blocking |
| write normal data without blocking |
| same as |
| error occurred on the descriptor |
| device has been disconnected |
| file descriptor invalid |
Program 4.17 implements a function to process commands from multiple file descriptors by using the poll
function. Compare the implementation with that of Program 4.14. The select
call modifies the file descriptor sets that are passed to it, and the program must reset these descriptor sets each time it calls select
. The poll
function uses separate variables for input and return values, so it is not necessary to reset the list of monitored descriptors after each call to poll
. The poll
function has a number of advantages. The masks do not need to be reset after each call. Unlike select
, the poll
function treats errors as events that cause poll
to return. The timeout
parameter is easier to use, although its range is limited. Finally, poll
does not need a max_fd
argument.
Example 4.17. monitorpoll.c
A function to monitor an array of file descriptors by using poll
.
#include <errno.h> #include <poll.h> #include <stdlib.h> #include <stropts.h> #include <unistd.h> #include "restart.h" #define BUFSIZE 1024 void docommand(char *, int); void monitorpoll(int fd[], int numfds) { char buf[BUFSIZE]; int bytesread; int i; int numnow = 0; int numready; struct pollfd *pollfd; for (i=0; i< numfds; i++) /* initialize the polling structure */ if (fd[i] >= 0) numnow++; if ((pollfd = (void *)calloc(numfds, sizeof(struct pollfd))) == NULL) return; for (i = 0; i < numfds; i++) { (pollfd + i)->fd = *(fd + i); (pollfd + i)->events = POLLRDNORM; } while (numnow > 0) { /* Continue monitoring until descriptors done */ numready = poll(pollfd, numfds, -1); if ((numready == -1) && (errno == EINTR)) continue; /* poll interrupted by a signal, try again */ else if (numready == -1) /* real poll error, can't continue */ break; for (i = 0; i < numfds && numready > 0; i++) { if ((pollfd + i)->revents) { if ((pollfd + i)->revents & (POLLRDNORM | POLLIN) ) { bytesread = r_read(fd[i], buf, BUFSIZE); numready--; if (bytesread > 0) docommand(buf, bytesread); else bytesread = -1; /* end of file */ } else if ((pollfd + i)->revents & (POLLERR | POLLHUP)) bytesread = -1; else /* descriptor not involved in this round */ bytesread = 0; if (bytesread == -1) { /* error occurred, remove descriptor */ r_close(fd[i]); (pollfd + i)->fd = -1; numnow--; } } } } for (i = 0; i < numfds; i++) r_close(fd[i]); free(pollfd); }
Files are designated within C programs either by file pointers or by file descriptors. The standard I/O library functions for ISO C (fopen
, fscanf
, fprintf
, fread
, fwrite
, fclose
and so on) use file pointers. The UNIX I/O functions (open
, read
, write
, close
and ioctl
) use file descriptors. File pointers and file descriptors provide logical designations called handles for performing device-independent input and output. The symbolic names for the file pointers that represent standard input, standard output and standard error are stdin
, stdout
and stderr
, respectively. These symbolic names are defined in stdio.h
. The symbolic names for the file descriptors that represent standard input, standard output and standard error are STDIN_FILENO
, STDOUT_FILENO
and STDERR_FILENO
, respectively. These symbolic names are defined in unistd.h
.
Example 4.19.
Explain the difference between a library function and a system call.
Answer:
The POSIX standard does not make a distinction between library functions and system calls. Traditionally, a library function is an ordinary function that is placed in a collection of functions called a library, usually because it is useful, widely used or part of a specification, such as C. A system call is a request to the operating system for service. It involves a trap to the operating system and often a context switch. System calls are associated with particular operating systems. Many library functions such as read
and write
are, in fact, jackets for system calls. That is, they reformat the arguments in the appropriate system-dependent form and then call the underlying system call to perform the actual operation.
Although the implementation details differ, versions of UNIX follow a similar implementation model for handling file descriptors and file pointers within a process. The remainder of this section provides a schematic model of how file descriptors (UNIX I/O) and file pointers (ISO C I/O) work. We use this model to explain redirection (Section 4.7) and inheritance (Section 4.6.3, Section 6.2 and Chapter 7).
The open
function associates a file or physical device with the logical handle used in the program. The file or physical device is specified by a character string (e.g., /home/johns/my.dat
or /dev/tty
). The handle is an integer that can be thought of as an index into a file descriptor table that is specific to a process. It contains an entry for each open file in the process. The file descriptor table is part of the process user area, but the program cannot access it except through functions using the file descriptor.
Example 4.20.
Figure 4.2 shows a schematic of the file descriptor table after a program executes the following.
myfd = open("/home/ann/my.dat", O_RDONLY);
The open
function creates an entry in the file descriptor table that points to an entry in the system file table. The open
function returns the value 3, specifying that the file descriptor entry is in position three of the process file descriptor table.
Figure 4.2. Schematic diagram of the relationship between the file descriptor table, the system file table and the in-memory inode table in a UNIX-like operating system after the code of Example 4.20 executes.
The system file table, which is shared by all the processes in the system, has an entry for each active open
. Each system file table entry contains the file offset, an indication of the access mode (i.e., read, write or read-write) and a count of the number of file descriptor table entries pointing to it.
Several system file table entries may correspond to the same physical file. Each of these entries points to the same entry in the in-memory inode table. The in-memory inode table contains an entry for each active file in the system. When a program opens a particular physical file that is not currently open, the call creates an entry in this inode table for that file. Figure 4.2 shows that the file /home/ann/my.dat
had been opened before the code of Example 4.20 because there are two entries in the system file table with pointers to the entry in the inode table. (The label B designates the earlier pointer in the figure.)
Example 4.21.
What happens when the process whose file descriptor table is shown in Figure 4.2 executes the close(myfd)
function?
Answer:
The operating system deletes the fourth entry in the file descriptor table and the corresponding entry in the system file table. (See Section 4.6.3 for a more complete discussion.) If the operating system also deleted the inode table entry, it would leave pointer B hanging in the system file table. Therefore, the inode table entry must have a count of the system file table entries that are pointing to it. When a process executes the close
function, the operating system decrements the count in the inode entry. If the inode entry has a 0 count, the operating system deletes the inode entry from memory. (The operating system might not actually delete the entry right away on the chance that it will be accessed again in the immediate future.)
Example 4.22.
The system file table entry contains an offset that gives the current position in the file. If two processes have each opened a file for reading, each process has its own offset into the file and reads the entire file independently of the other process. What happens if each process opens the same file for write? What would happen if the file offset were stored in the inode table instead of the system file table?
Answer:
The writes are independent of each other. Each user can write over what the other user has written because of the separate file offsets for each process. On the other hand, if the offsets were stored in the inode table rather than in the system file table, the writes from different active opens would be consecutive. Also, the processes that had opened a file for reading would only read parts of the file because the file offset they were using could be updated by other processes.
Example 4.23.
Suppose a process opens a file for reading and then forks a child process. Both the parent and child can read from the file. How are reads by these two processes related? What about writes?
Answer:
The child receives a copy of the parent’s file descriptor table at the time of the fork. The processes share a system file table entry and therefore also share the file offset. The two processes read different parts of the file. If no other processes have the file open, writes append to the end of the file and no data is lost on writes. Subsection 4.6.3 covers this situation in more detail.
The ISO C standard I/O library uses file pointers rather than file descriptors as handles for I/O. A file pointer points to a data structure called a FILE
structure in the user area of the process.
Example 4.24.
The following code segment opens the file /home/ann/my.dat
for output and then writes a string to the file.
FILE *myfp; if ((myfp = fopen("/home/ann/my.dat", "w")) == NULL) perror("Failed to open /home/ann/my.dat"); else fprintf(myfp, "This is a test");
Figure 4.3 shows a schematic of the FILE
structure allocated by the fopen
call of Example 4.24. The FILE
structure contains a buffer and a file descriptor value. The file descriptor value is the index of the entry in the file descriptor table that is actually used to output the file to disk. In some sense the file pointer is a handle to a handle.
What happens when the program calls fprintf
? The result depends on the type of file that was opened. Disk files are usually fully buffered, meaning that the fprintf
does not actually write the This is a test message to disk, but instead writes the bytes to a buffer in the FILE
structure. When the buffer fills, the I/O subsystem calls write
with the file descriptor, as in the previous section. The delay between the time when a program executes fprintf
and the time when the writing actually occurs may have interesting consequences, especially if the program crashes. Buffered data is sometimes lost on system crashes, so it is even possible for a program to appear to complete normally but its disk output could be incomplete.
How can a program avoid the effects of buffering? An fflush
call forces whatever has been buffered in the FILE
structure to be written out. A program can also call setvbuf
to disable buffering.
Terminal I/O works a little differently. Files associated with terminals are line buffered rather than fully buffered (except for standard error, which by default, is not buffered). On output, line buffering means that the line is not written out until the buffer is full or until a newline symbol is encountered.
Example 4.25. bufferout.c
How does the output appear when the following program executes?
#include <stdio.h> int main(void) { fprintf(stdout, "a"); fprintf(stderr, "a has been written "); fprintf(stdout, "b"); fprintf(stderr, "b has been written "); fprintf(stdout, " "); return 0; }
Answer:
The messages written to standard error appear before the 'a'
and 'b'
because standard output is line buffered, whereas standard error is not buffered.
Example 4.26. bufferinout.c
How does the output appear when the following program executes?
#include <stdio.h> int main(void) { int i; fprintf(stdout, "a"); scanf("%d", &i); fprintf(stderr, "a has been written "); fprintf(stdout, "b"); fprintf(stderr, "b has been written "); fprintf(stdout, " "); return 0; }
Answer:
The scanf
function flushes the buffer for stdout
, so 'a'
is displayed before the number is read in. After the number has been entered, 'b'
still appears after the b has been written
message.
The issue of buffering is more subtle than the previous discussion might lead you to believe. If a program that uses file pointers for a buffered device crashes, the last partial buffer created from the fprintf
calls may never be written out. When the buffer is full, a write
operation is performed. Completion of a write
operation does not mean that the data actually made it to disk. In fact, the operating system copies the data to a system buffer cache. Periodically, the operating system writes these dirty blocks to disk. If the operating system crashes before it writes the block to disk, the program still loses the data. Presumably, a system crash is less likely to happen than an individual program crash.
When fork
creates a child, the child inherits a copy of most of the parent’s environment and context, including the signal state, the scheduling parameters and the file descriptor table. The implications of inheritance are not always obvious. Because children receive a copy of their parent’s file descriptor table at the time of the fork, the parent and children share the same file offsets for files that were opened by the parent prior to the fork.
Example 4.27. openfork.c
In the following program, the child inherits the file descriptor for my.dat
. Each process reads and outputs one character from the file.
#include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/stat.h> int main(void) { char c = '!'; int myfd; if ((myfd = open("my.dat", O_RDONLY)) == -1) { perror("Failed to open file"); return 1; } if (fork() == -1) { perror("Failed to fork"); return 1; } read(myfd, &c, 1); printf("Process %ld got %c ", (long)getpid(), c); return 0; }
Figure 4.4 shows the parent and child file descriptor tables for Example 4.27. The file descriptor table entries of the two processes point to the same entry in the system file table. The parent and child therefore share the file offset, which is stored in the system file table.
Figure 4.4. If the parent opens my.dat
before forking, both parent and child share the system file table entry.
Example 4.28.
Suppose the first few bytes in the file my.dat
are abcdefg
. What output would be generated by Example 4.27?
Answer:
Since the two processes share the file offset, the first one to read gets a
and the second one to read gets b
. Two lines are generated in the following form.
Process nnn got a Process mmm got b
In theory, the lines could be output in either order but most likely would appear in the order shown.
Example 4.29.
When a program closes a file, the entry in the file descriptor table is freed. What about the corresponding entry in the system file table?
Answer:
The system file table entry can only be freed if no more file descriptor table entries are pointing to it. For this reason, each system file table entry contains a count of the number of file descriptor table entries that are pointing to it. When a process closes a file, the operating system decrements the count and deletes the entry only when the count becomes 0.
Example 4.30.
How does fork
affect the system file table?
Answer:
The system file table is in system space and is not duplicated by fork
. However, each entry in the system file table keeps a count of the number of file descriptor table entries pointing to it. These counts must be adjusted to reflect the new file descriptor table created for the child.
Example 4.31. forkopen.c
In the following program, the parent and child each open my.dat
for reading, read one character, and output that character.
#include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/stat.h> int main(void) { char c = '!'; int myfd; if (fork() == -1) { perror("Failed to fork"); return 1; } if ((myfd = open("my.dat", O_RDONLY)) == -1) { perror("Failed to open file"); return 1; } read(myfd, &c, 1); printf("Process %ld got %c ", (long)getpid(), c); return 0; }
Figure 4.5 shows the file descriptor tables for Example 4.31. The file descriptor table entries corresponding to my.dat
point to different system file table entries. Consequently, the parent and child do not share the file offset. The child does not inherit the file descriptor, because each process opens the file after the fork and each open
creates a new entry in the system file table. The parent and child still share system file table entries for standard input, standard output and standard error.
Figure 4.5. If the parent and child open my.dat
after the fork
call, their file descriptor table entries point to different system file table entries.
Example 4.32.
Suppose the first few bytes in the file my.dat
are abcdefg
. What output would be generated by Example 4.31?
Answer:
Since the two processes use different file offsets, each process reads the first byte of the file. Two lines are generated in the following form.
Process nnn got a Process mmm got a
Example 4.33. fileiofork.c
What output would be generated by the following program?
#include <stdio.h> #include <unistd.h> int main(void) { printf("This is my output."); fork(); return 0; }
Answer:
Because of buffering, the output of printf
is likely to be written to the buffer corresponding to stdout
, but not to the actual output device. Since this buffer is part of the user space, it is duplicated by fork
. When the parent and the child each terminate, the return from main
causes the buffers to be flushed as part of the cleanup. The output appears as follows.
This is my output.This is my output.
Example 4.34. fileioforkline.c
What output would be generated by the following program?
#include <stdio.h> #include <unistd.h> int main(void) { printf("This is my output. "); fork(); return 0; }
Answer:
The buffering of standard output is usually line buffering. This means that the buffer is flushed when it contains a newline. Since in this case a newline is output, the buffer will probably be flushed before the fork
and only one line of output will appear.
UNIX provides a large number of utilities that are written as filters. A filter reads from standard input, performs a transformation, and outputs the result to standard output. Filters write their error messages to standard error. All of the parameters of a filter are communicated as command-line arguments. The input data should have no headers or trailers, and a filter should not require any interaction with the user.
Examples of useful UNIX filters include head
, tail
, more
, sort
, grep
and awk
. The cat
command takes a list of filenames as command-line arguments, reads each of the files in succession, and echoes the contents of each file to standard output. However, if no input file is specified, cat
takes its input from standard input and writes its results to standard output. In this case, cat
behaves like a filter.
Recall that a file descriptor is an index into the file descriptor table of that process. Each entry in the file descriptor table points to an entry in the system file table, which is created when the file is opened. A program can modify the file descriptor table entry so that it points to a different entry in the system file table. This action is known as redirection. Most shells interpret the greater than character (>
) on the command line as redirection of standard output and the less than character (<
) as redirection of standard input. (Associate >
with output by picturing it as an arrow pointing in the direction of the output file.)
Example 4.35.
The cat
command with no command-line arguments reads from standard input and echoes to standard output. The following command redirects standard output to my.file
with >
.
cat > my.file
The cat
command of Example 4.35 gathers what is typed from the keyboard into the file my.file
. Figure 4.6 depicts the file descriptor table for Example 4.35. Before redirection, entry [1]
of the file descriptor table points to a system file table entry corresponding to the usual standard output device. After the redirection, entry [1]
points to a system file table entry for my.file
.
Figure 4.6. Status of the file descriptor table before and after redirection for the process that is executing cat > my.file
.
The redirection of standard output in cat > my.file
occurs because the shell changes the standard output entry of the file descriptor table (a pointer to the system file table) to reference a system file table entry associated with my.file
. To accomplish this redirection in a C program, first open my.file
to establish an appropriate entry in the system file table. After the open
operation, copy the pointer to my.file
into the entry for standard output by executing the dup2
function. Then, call close
to eliminate the extra file descriptor table entry for my.file
.
The dup2
function takes two parameters, fildes
and fildes2
. It closes entry fildes2
of the file descriptor table if it was open and then copies the pointer of entry fildes
into entry fildes2
.
SYNOPSIS #include <unistd.h> int dup2(int fildes, int fildes2); POSIX
On success, dup2
returns the file descriptor value that was duplicated. On failure, dup2
returns –1 and sets errno
. The following table lists the mandatory errors for dup2
.
| cause |
---|---|
|
|
|
|
Example 4.36.
Program 4.18 redirects standard output to the file my.file
and then appends a short message to that file.
Figure 4.7 shows the effect of the redirection on the file descriptor table of Program 4.18. The open
function causes the operating system to create a new entry in the system file table and to set entry [3]
of the file descriptor table to point to this entry. The dup2
function closes the descriptor corresponding to the second parameter (standard output) and then copies the entry corresponding to the first parameter (fd
) into the entry corresponding to the second parameter (STDOUT_FILENO
). From that point on in the program, a write to standard output goes to my.file
.
Example 4.18. redirect.c
A program that redirects standard output to the file my.file
.
#include <fcntl.h> #include <stdio.h> #include <sys/stat.h> #include <unistd.h> #include "restart.h" #define CREATE_FLAGS (O_WRONLY | O_CREAT | O_APPEND) #define CREATE_MODE (S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH) int main(void) { int fd; fd = open("my.file", CREATE_FLAGS, CREATE_MODE); if (fd == -1) { perror("Failed to open my.file"); return 1; } if (dup2(fd, STDOUT_FILENO) == -1) { perror("Failed to redirect standard output"); return 1; } if (r_close(fd) == -1) { perror("Failed to close the file"); return 1; } if (write(STDOUT_FILENO, "OK", 2) == -1) { perror("Failed in writing to file"); return 1; } return 0; }
The fcntl
function is a general-purpose function for retrieving and modifying the flags associated with an open file descriptor. The fildes
argument of fcntl
specifies the descriptor, and the cmd
argument specifies the operation. The fcntl
function may take additional parameters depending on the value of cmd
.
SYNOPSIS #include <fcntl.h> #include <unistd.h> #include <sys/types.h> int fcntl(int fildes, int cmd, /* arg */ ...); POSIX
The interpretation of the return value of fcntl
depends on the value of the cmd
parameter. However, if unsuccessful, fcntl
returns –1 and sets errno
. The following table lists the mandatory errors for fcntl
.
| cause |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
| one of values to be returned cannot be represented correctly, or requested lock offset cannot be represented in |
The fcntl
function may only be interrupted by a signal when the cmd
argument is F_SETLKW
(block until the process acquires an exclusive lock). In this case, fcntl
returns –1 and sets errno
to EINTR
. Table 4.3 lists the POSIX values of the cmd
parameter for fcntl
.
An important example of the use of file control is to change an open file descriptor to use nonblocking I/O. When a file descriptor has been set for nonblocking I/O, the read
and write
functions return –1 and set errno
to EAGAIN
to report that the process would be delayed if a blocking I/O operation were tried. Nonblocking I/O is useful for monitoring multiple file descriptors while doing other work. Section 4.4 and Section 4.5 discuss the select
and poll
functions that allow a process to block until any of a set of descriptors becomes available. However, both of these functions block while waiting for I/O, so no other work can be done during the wait.
Table 4.3. POSIX values for cmd
as specified in fcntl.h
.
| meaning |
---|---|
| duplicate a file descriptor |
| get file descriptor flags |
| set file descriptor flags |
| get file status flags and access modes |
| set file status flags and access modes |
| if |
| if |
| get first lock that blocks description specified by |
| set or clear segment lock specified by |
| same as |
To perform nonblocking I/O, a program can call open
with the O_NONBLOCK
flag set. A program can also change an open descriptor to be nonblocking by setting the O_NONBLOCK
flag, using fcntl
. To set an open descriptor to perform nonblocking I/O, use the F_GETFL
command with fcntl
to retrieve the flags associated with the descriptor. Use inclusive bitwise OR of O_NONBLOCK
with these flags to create a new flags value. Finally, set the descriptor flags to this new value, using the F_SETFL
command of fcntl
.
Example 4.37. setnonblock.c
The following function sets an already opened file descriptor fd
for nonblocking I/O.
#include <fcntl.h> #include <stdio.h> #include <unistd.h> int setnonblock(int fd) { int fdflags; if ((fdflags = fcntl(fd, F_GETFL, 0)) == -1) return -1; fdflags |= O_NONBLOCK; if (fcntl(fd, F_SETFL, fdflags) == -1) return -1; return 0; }
If successful, setnonblock
returns 0. Otherwise, setnonblock
returns –1 and sets errno
.
The setnonblock
function of Example 4.37 reads the current value of the flags associated with fd
, performs a bitwise OR with O_NONBLOCK
, and installs the modified flags. After this function executes, a read
from fd
returns immediately if no input is available.
Example 4.38. setblock.c
The following function changes the I/O mode associated with file descriptor fd
to blocking by clearing the O_NONBLOCK
file flag. To clear the flag, use bitwise AND with the complement of the O_NONBLOCK
flag.
#include <fcntl.h> #include <stdio.h> #include <unistd.h> int setblock(int fd) { int fdflags; if ((fdflags = fcntl(fd, F_GETFL, 0)) == -1) return -1; fdflags &= ~O_NONBLOCK; if (fcntl(fd, F_SETFL, fdflags) == -1) return -1; return 0; }
If successful, setblock
returns 0. If unsuccessful, setblock
returns –1 and sets errno
.
Example 4.39. process_or_do_work.c
The following function assumes that fd1
and fd2
are open for reading in nonblocking mode. If input is available from either one, the function calls docommand
with the data read. Otherwise, the code calls dosomething
. This implementation gives priority to fd1
and always handles input from this file descriptor before handling fd2
.
#include <errno.h> #include <unistd.h> #include "restart.h" void docommand(char *, int); void dosomething(void); void process_or_do_work(int fd1, int fd2) { char buf[1024]; ssize_t bytesread; for ( ; ; ) { bytesread = r_read(fd1, buf, sizeof(buf)); if ((bytesread == -1) && (errno != EAGAIN)) return; /* a real error on fd1 */ else if (bytesread > 0) { docommand(buf, bytesread); continue; } bytesread = r_read(fd2, buf, sizeof(buf)); if ((bytesread == -1) && (errno != EAGAIN)) return; /* a real error on fd2 */ else if (bytesread > 0) docommand(buf, bytesread); else dosomething(); /* input not available, do something else */ } }
Sometimes multiple processes need to output to the same log file. Problems can arise if one process loses the CPU while it is outputting to the log file and another process tries to write to the same file. The messages could get interleaved, making the log file unreadable. We use the term atomic logging to mean that multiple writes of one process to the same file are not mixed up with the writes of other processes writing to the same file.
This exercise describes a series of experiments to help you understand the issues involved when multiple processes try to write to the same file. We then introduce an atomic logging library and provide a series of examples of how to use the library. Appendix D.1 describes the actual implementation of this library, which is used in several places throughout the book as a tool for debugging programs.
The experiments in this section are based on Program 3.1, which creates a chain of processes. Program 4.19 modifies Program 3.1 so that the original process opens a file before creating the children. Each child writes a message to the file instead of to standard error. Each message is written in two pieces. Since the processes share an entry in the system file table, they share the file offset. Each time a process writes to the file, the file offset is updated.
Example 4.40.
Run Program 4.19 several times and see if it generates output in the same order each time. Can you tell which parts of the output came from each process?
Answer:
On most systems, the output appears in the same order for most runs and each process generates a single line of output. However, this outcome is not guaranteed by the program. It is possible (but possibly unlikely) for one process to lose the CPU before both parts of its output are written to the file. In this, case the output is jumbled.
Example 4.19. chainopenfork.c
A program that opens a file before creating a chain of processes.
#include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/stat.h> #define BUFSIZE 1024 #define CREATE_FLAGS (O_WRONLY | O_CREAT | O_TRUNC) #define CREATE_PERMS (S_IRUSR | S_IWUSR| S_IRGRP | S_IROTH) int main (int argc, char *argv[]) { char buf[BUFSIZE]; pid_t childpid = 0; int fd; int i, n; if (argc != 3){ /* check for valid number of command-line arguments */ fprintf (stderr, "Usage: %s processes filename ", argv[0]); return 1; } /* open the log file before the fork */ fd = open(argv[2], CREATE_FLAGS, CREATE_PERMS); if (fd < 0) { perror("Failed to open file"); return 1; } n = atoi(argv[1]); /* create a process chain */ for (i = 1; i < n; i++) if (childpid = fork()) break; if (childpid == -1) { perror("Failed to fork"); return 1; } /* write twice to the common log file */ sprintf(buf, "i:%d process:%ld ", i, (long)getpid()); write(fd, buf, strlen(buf)); sprintf(buf, "parent:%ld child:%ld ", (long)getppid(), (long)childpid); write(fd, buf, strlen(buf)); return 0; }
Example 4.41.
Put sleep(1);
after the first write
function in Program 4.19 and run it again. Now what happens?
Answer:
Most likely, each process outputs the values of the first two integers and then each process outputs the last two integers.
Example 4.42.
Copy chainopenfork.c
to a file called chainforkopen.c
and move the code to open the file after the loop that forks the children. How does the behavior of chainforkopen.c
differ from that of chainopenfork.c
?
Answer:
Each process now has a different system file table entry, and so each process has a different file offset. Because of O_TRUNC
, each open
deletes what was previously written to the file. Each process starts writing from the beginning of the file, overwriting what the other processes have written. The last process to write has control of the final file contents.
Example 4.43.
Run chainforkopen
several times and see if it generates the same order of the output each time. Which process was executed last? Do you see anything unusual about the contents of the file?
Answer:
The process that outputs last may be different on different systems. If the last process writes fewer bytes than another process, the file contains additional bytes after the line written by the last process.
If independent processes open the same log file, the results might be similar to that of Exercise 4.43. The last process to output overwrites what was previously written. One way to try to solve this problem is to call lseek
to move to the end of the file before writing.
Example 4.44.
Copy chainforkopen.c
to a file called chainforkopenseek.c
. Add code before each write
to perform lseek
to the end of the file. Also, remove the O_TRUNC
flag from CREATE_FLAGS
. Run the program several times and observe the behavior. Use a different file name each time.
Answer:
The lseek
operation works as long as the process does not lose the CPU between lseek
and write
. For fast machines, you may have to run the program many times to observe this behavior. You can increase the likelihood of creating mixed-up output, by putting sleep(1);
between lseek
and write
.
If a file is opened with the O_APPEND
flag, then it automatically does all writes to the end of the file.
Example 4.45.
Copy chainforkopen.c
to a file called chainforkappend.c
. Modify the CREATE_FLAGS
constant by replacing O_TRUNC
with O_APPEND
. Run the program several times, possibly inserting sleep(1)
between the write
calls. What happens?
Answer:
The O_APPEND
flag solves the problem of processes overwriting the log entries of other processes, but it does not prevent the individual pieces written by one process from being mixed up with the pieces of another.
Example 4.46.
Copy chainforkappend.c
to a file called chainforkonewrite.c
. Combine the pair of sprintf
calls so that the program uses a single write
call to output its information. How does the program behave?
Answer:
The output is no longer interleaved.
Example 4.47.
Copy chainforkonewrite.c
to a file called chainforkfprintf.c
. Replace open
with a corresponding fopen
function. Replace the single write
with fprintf
. How does the program behave?
Answer:
The fprintf
operation causes the output to be written to a buffer in the user area. Eventually, the I/O subsystem calls write
to output the contents of the buffer. You have no control over when write
is called except that you can force a write
operation by calling fflush
. Process output can be interleaved if the buffer fills in the middle of the fprintf
operation. Adding sleep(1)
; shouldn’t cause the problem to occur more or less often.
To make an atomic logger, we have to use a single write
call to output information that we want to appear together in the log. The file must be opened with the O_APPEND
flag. Here is the statement about the O_APPEND
flag from the write
man page that guarantees that the writing is atomic if we use the O_APPEND
flag.
If the
O_APPEND
flag of the file status flags is set, the file offset will be set to the end of the file prior to each write and no intervening file modification operation will occur between changing the file offset and the write operation.
In the examples given here, it is simple to combine everything into a single call to write
, but later we encounter situations in which it is more difficult. Appendix D.1 contains a complete implementation of a module that can be used with a program in which atomic logging is needed. A program using this module should include Program 4.20, which contains the prototypes for the publicly accessible functions. Note that the interface is simple and the implementation details are completely hidden from the user.
Example 4.20. atomic_logger.h
The include file for the atomic logging module.
int atomic_log_array(char *s, int len); int atomic_log_clear(); int atomic_log_close(); int atomic_log_open(char *fn); int atomic_log_printf(char *fmt, ...); int atomic_log_send(); int atomic_log_string(char *s);
The atomic logger allows you to control how the output of programs that are running on the same machine is interspersed in a log file. To use the logger, first call atomic_log_open
to create the log file. Call atomic_log_close
when all logging is completed. The logger stores in a temporary buffer items written with atomic_log_array, atomic_log_string
and atomic_log_printf
. When the program calls atomic_log_send
, the logger outputs the entire buffer, using a single write
call, and frees the temporary buffers. The atomic_log_clear
operation frees the temporary buffers without actually outputting to the log file. Each function in the atomic logging library returns 0 if successful. If unsuccessful, these functions return –1 and set errno
.
The atomic logging facility provides three formats for writing to the log. Use atomic_log_array
to write an array of a known number of bytes. Use atomic_log_string
to log a string. Alternatively, you can use atomic_log_printf
with a syntax similar to fprintf
. Program 4.21 shows a version of the process chain that uses the first two forms for output to the atomic logger.
Example 4.48.
How would you modify Program 4.21 to use atomic_log_printf
?
Answer:
Eliminate the buf
array and replace the four lines of code involving sprintf
, atomic_log_array
and atomic_log_string
with the following.
atomic_log_printf("i:%d process:%ld ", i, (long)getpid()); atomic_log_printf("parent:%ld child ID:%ld ", (long)getppid(), (long)childpid);
Alternatively use the following single call.
atomic_log_printf("i:%d process:%ld parent:%ld child:%ld ", i, (long)getpid(), (long)getppid(), (long)childpid);
Example 4.21. chainforkopenlog.c
A program that uses the atomic logging module of Appendix D.1.
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include "atomic_logger.h" #define BUFSIZE 1024 int main (int argc, char *argv[]) { char buf[BUFSIZE]; pid_t childpid = 0; int i, n; if (argc != 3){ /* check for valid number of command-line arguments */ fprintf (stderr, "Usage: %s processes filename ", argv[0]); return 1; } n = atoi(argv[1]); /* create a process chain */ for (i = 1; i < n; i++) if (childpid = fork()) break; if (childpid == -1) { perror("Failed to fork"); return 1; } if (atomic_log_open(argv[2]) == -1) { /* open atomic log file */ fprintf(stderr, "Failed to open log file"); return 1; } /* log the output, using two different forms */ sprintf(buf, "i:%d process:%ld", i, (long)getpid()); atomic_log_array(buf, strlen(buf)); sprintf(buf, " parent:%ld child:%ld ", (long)getppid(), (long)childpid); atomic_log_string(buf); if (atomic_log_send() == -1) { fprintf(stderr, "Failed to send to log file"); return 1; } atomic_log_close(); return 0; }
Example 4.49.
Modify Program 4.19 to open an atomic log file after forking the children. (Do not remove the other open
function call.) Repeat Exercises 4.40 through Exercise 4.47 after adding code to output the same information to the atomic logger as to the original file. Compare the output of the logger with the contents of the file.
Example 4.50.
What happens if Program 4.19 opens the log file before forking the children?
Answer:
Logging should still be atomic. However, if the parent writes information to the log and doesn’t clear it before the fork, the children have a copy of this information in their logging buffers.
Another logging interface that is useful for debugging concurrent programs is the remote logging facility described in detail in Appendix D.2. Instead of logging information being sent to a file, it is sent to another process that has its own environment for displaying and saving the logged information. The remote logging process has a graphical user interface that allows the user to display the log. The remote logger does not have a facility for gathering information from a process to be displayed in a single block in the log file, but it allows logging from processes on multiple machines.
The cat
utility has the following POSIX specification[52].
NAME cat - concatenate and print files SYNOPSIS cat [-u] [file ...] DESCRIPTION The cat utility shall read files in sequence and shall write their contents to the standard output in the same sequence. OPTIONS The cat utility shall conform to the Base Definitions volume of IEEE STd 1003.1-2001, Section 12.2, Utility Syntax Guidelines. The following option shall be supported: -u Write bytes from the input file to the standard output without delay as each is read OPERANDS The following operand shall be supported: file A pathname of an input file. If no file operands are specified, the standard input shall be used. If a file is '-', the cat utility shall read from the standard input at that point in the sequence. The cat utility shall not close and reopen standard input when it is referenced in this way, but shall accept multiple occurrences of '-' as a file operand. STDIN The standard input shall be used only if no file operands are specified, or if a file operand is '-'. See the INPUT FILES section. INPUT FILES The input files can be any file type. ENVIRONMENT VARIABLES (.... a long section omitted here ....) ASYNCHRONOUS EVENTS Default. STDOUT The standard output shall contain the sequence of bytes read from the input files. Nothing else shall be written to the standard output. STDERR The standard error shall be used only for diagnostic messages. OUTPUT FILES None. EXTENDED DESCRIPTION None. EXIT STATUS The following exit values shall be returned: 0: All input files were output successfully. >0 An error occurred. CONSEQUENCES OF ERRORS Default.
The actual POSIX description continues with other sections, including APPLICATION USAGE
, EXAMPLES
and RATIONALE
.
Compare the POSIX description of cat
with the man page for cat
on your system and note any differences.
Execute the cat
command for many examples, including multiple input files and files that don’t exist. Include a case in which you redirect standard input to a disk file and use several '-'
files on the command line. Explain what happens.
Write your own cat
utility to conform to the standard. Try to duplicate the behavior of the actual cat
utility.
Read the section of the cat
man page on ENVIRONMENT VARIABLES
.
Experiment with the effect of relevant environment variables on the behavior of cat
.
Incorporate the handling of environment variables into your own cat
utility.
Advanced Programming in the UNIX Environment by Stevens [112] has an extensive discussion of UNIX I/O from a programmer’s viewpoint. Many books on Linux or UNIX programming also cover I/O. The USENIX Conference Proceedings are a good source of current information on tools and approaches evolving under UNIX.