Chapter 2. Programs, Processes and Threads

One popular definition of a process is an instance of a program whose execution has started but has not yet terminated. This chapter discusses the differences between programs and processes and the ways in which the former are transformed into the latter. The chapter addresses issues of program layout, command-line arguments, program environment and exit handlers.

How a Program Becomes a Process

A program is a prepared sequence of instructions to accomplish a defined task. To write a C source program, a programmer creates disk files containing C statements that are organized into functions. An individual C source file may also contain variable and function declarations, type and macro definitions (e.g., typedef) and preprocessor commands (e.g., #ifdef, #include, #define). The source program contains exactly one main function.

Traditionally, C source filenames have a .c extension, and header filenames have a .h extension. Header files usually only contain macro and type definitions, defined constants and function declarations. Use the #include preprocessor command to insert the contents of a header file into the source.

The C compiler translates each source file into an object file. The compiler then links the individual object files with the necessary libraries to produce an executable module. When a program is run or executed, the operating system copies the executable module into a program image in main memory.

A process is an instance of a program that is executing. Each instance has its own address space and execution state. When does a program become a process? The operating system reads the program into memory. The allocation of memory for the program image is not enough to make the program a process. The process must have an ID (the process ID) so that the operating system can distinguish among individual processes. The process state indicates the execution status of an individual process. The operating system keeps track of the process IDs and corresponding process states and uses the information to allocate and manage resources for the system. The operating system also manages the memory occupied by the processes and the memory available for allocation.

When the operating system has added the appropriate information in the kernel data structures and has allocated the necessary resources to run the program code, the program has become a process. A process has an address space (memory it can access) and at least one flow of control called a thread. The variables of a process can either remain in existence for the life of the process (static storage) or be automatically allocated when execution enters a block and deallocated when execution leaves the block (automatic storage). Appendix A.5 discusses C storage classes in detail.

A process starts with a single flow of control that executes a sequence of instructions. The processor program counter keeps track of the next instruction to be executed by that processor (CPU). The CPU increments the program counter after fetching an instruction and may further modify it during the execution of the instruction, for example, when a branch occurs. Multiple processes may reside in memory and execute concurrently, almost independently of each other. For processes to communicate or cooperate, they must explicitly interact through operating system constructs such as the filesystem (Section 5.1), pipes (Section 6.1), shared memory (Section 15.3) or a network (Chapters 18-22).

Threads and Thread of Execution

When a program executes, the value of the process program counter determines which process instruction is executed next. The resulting stream of instructions, called a thread of execution, can be represented by the sequence of instruction addresses assigned to the program counter during the execution of the program’s code.

Example 2.1. 

Process 1 executes statements 245, 246 and 247 in a loop. Its thread of execution can be represented as 2451, 2461, 2471, 2451, 2461, 2471, 2451, 2461, 2471 . . . , where the subscripts identify the thread of execution as belonging to process 1.

The sequence of instructions in a thread of execution appears to the process as an uninterrupted stream of addresses. From the point of view of the processor, however, the threads of execution from different processes are intermixed. The point at which execution switches from one process to another is called a context switch.

Example 2.2. 

Process 1 executes its statements 245, 246 and 247 in a loop as in Example 2.1, and process 2 executes its statements 10, 11, 12 . . . . The CPU executes instructions in the order 2451, 2461, 2471, 2451, 2461, [context-switch instructions], 102, 112, 122, 132, [context-switch instructions], 2471, 2451, 2461, 2471 . . . . Context switches occur between 2461 and 102 and between 132 and 2471. The processor sees the threads of execution interleaved, whereas the individual processes see uninterrupted sequences.

A natural extension of the process model allows multiple threads to execute within the same process. Multiple threads avoid context switches and allow sharing of code and data. The approach may improve program performance on machines with multiple processors. Programs with natural parallelism in the form of independent tasks operating on shared data can take advantage of added execution power on these multiple-processor machines. Operating systems have significant natural parallelism and perform better by having multiple, simultaneous threads of execution. Vendors advertise symmetric multiprocessing support in which the operating system and applications have multiple undistinguished threads of execution that take advantage of parallel hardware.

A thread is an abstract data type that represents a thread of execution within a process. A thread has its own execution stack, program counter value, register set and state. By declaring many threads within the confines of a single process, a programmer can write programs that achieve parallelism with low overhead. While these threads provide low-overhead parallelism, they may require additional synchronization because they reside in the same process address space and therefore share process resources. Some people call processes heavyweight because of the work needed to start them. In contrast, threads are sometimes called lightweight processes.

Layout of a Program Image

After loading, the program executable appears to occupy a contiguous block of memory called a program image. Figure 2.1 shows a sample layout of a program image in its logical address space [112]. The program image has several distinct sections. The program text or code is shown in low-order memory. The initialized and uninitialized static variables have their own sections in the image. Other sections include the heap, stack and environment.

Sample layout for a program image in main memory.

Figure 2.1. Sample layout for a program image in main memory.

An activation record is a block of memory allocated on the top of the process stack to hold the execution context of a function during a call. Each function call creates a new activation record on the stack. The activation record is removed from the stack when the function returns, providing the last-called-first-returned order for nested function calls.

The activation record contains the return address, the parameters (whose values are copied from the corresponding arguments), status information and a copy of some of the CPU register values at the time of the call. The process restores the register values on return from the call represented by the record. The activation record also contains automatic variables that are allocated within the function while it is executing. The particular format for an activation record depends on the hardware and on the programming language.

In addition to the static and automatic variables, the program image contains space for argc and argv and for allocations by malloc. The malloc family of functions allocates storage from a free memory pool called the heap. Storage allocated on the heap persists until it is freed or until the program exits. If a function calls malloc, the storage remains allocated after the function returns. The program cannot access the storage after the return unless it has a pointer to the storage that is accessible after the function returns.

Static variables that are not explicitly initialized in their declarations are initialized to 0 at run time. Notice that the initialized static variables and the uninitialized static variables occupy different sections in the program image. Typically, the initialized static variables are part of the executable module on disk, but the uninitialized static variables are not. Of course, the automatic variables are not part of the executable module because they are only allocated when their defining block is called. The initial values of automatic variables are undetermined unless the program explicitly initializes them.

Example 2.3. 

Use ls -l to compare the sizes of the executable modules for the following two C programs. Explain the results.

Version 1: largearrayinit.c

int myarray[50000] = {1, 2, 3, 4};

int main(void) {
   myarray[0] = 3;
   return 0;
}

Version 2: largearray.c

int myarray[50000];

int main(void) {
    myarray[0] = 3;
    return 0;
}

Answer:

The executable module for Version 1 should be about 200,000 bytes larger than that of Version 2 because the myarray of Version 1 is initialized static data and is therefore part of the executable module. The myarray of Version 2 is not allocated until the program is loaded in memory, and the array elements are initialized to 0 at that time.

Static variables can make a program unsafe for threaded execution. For example, the C library function readdir and its relatives described in Section 5.2 use static variables to hold return values. The function strtok discussed in Section 2.6 uses a static variable to keep track of its progress between calls. Neither of these functions can be safely called by multiple threads within a program. In other words, they are not thread-safe. External static variables also make code more difficult to debug because successive invocations of a function that references a static variable may behave in unexpected ways. For these reasons, avoid using static variables except under controlled circumstances. Section 2.9 presents an example of when to use variables with static storage class.

Although the program image appears to occupy a contiguous block of memory, in practice, the operating system maps the program image into noncontiguous blocks of physical memory. A common mapping divides the program image into equal-sized pieces, called pages. The operating system loads the individual pages into memory and looks up the location of the page in a table when the processor references memory on that page. This mapping allows a large logical address space for the stack and heap without actually using physical memory unless it is needed. The operating system hides the existence of such an underlying mapping, so the programmer can view the program image as logically contiguous even when some of the pages do not actually reside in memory.

Library Function Calls

We introduce most library functions by a condensed version of its specification, and you should always refer to the man pages for more complete information.

The summary starts with a brief description of the function and its parameters, followed by a SYNOPSIS box giving the required header files and the function prototype. (Unfortunately, some compilers do not give warning messages if the header files are missing, so be sure to use lint as described in Appendix A to detect these problems.) The SYNOPSIS box also names the POSIX standard that specifies the function. A description of the function return values and a discussion of how the function reports errors follows the SYNOPSIS box. Here is a typical summary.

The close function deallocates the file descriptor specified by fildes.

SYNOPSIS

   #include <unistd.h>

   int close(int fildes);
                                   POSIX

If successful, close returns 0. If unsuccessful, close returns –1 and sets errno. The following table lists the mandatory errors for close.

errno

cause

EBADF

fildes is not valid

EINTR

close was interrupted by a signal

This book’s summary descriptions generally include the mandatory errors. These are the errors that the standard requires that every implementation detect. We include these particular errors because they are a good indication of the major points of failure. You must handle all errors, not just the mandatory ones. POSIX often defines many other types of optional errors. If an implementation chooses to treat the specified condition as an error, then it should use the specified error value. Implementations are free to define other errors as well. When there is only one mandatory error, we describe it in a sentence. When the function has more than one mandatory error, we use a table like the one for close.

Traditional UNIX functions usually return –1 (or sometimes NULL) and set errno to indicate the error. The POSIX standards committee decided that all new functions would not use errno and would instead directly return an error number as a function return value. We illustrate both ways of handling errors in examples throughout the text.

Example 2.4. 

The following code segment demonstrates how to call the close function.

int fildes;

if (close(fildes) == -1)
   perror("Failed to close the file");

The code assumes that the unistd.h header file has been included in the source. In general, we do not show the header files for code segments.

The perror function outputs to standard error a message corresponding to the current value of errno. If s is not NULL, perror outputs the string (an array of characters terminated by a null character) pointed to by s and followed by a colon and a space. Then, perror outputs an error message corresponding to the current value of errno followed by a newline.

SYNOPSIS

   #include <stdio.h>

   void perror(const char *s);
                                    POSIX:CX

No return values and no errors are defined for perror.

Example 2.5. 

The output produced by Example 2.4 might be as follows.

Failed to close the file: invalid file descriptor

The strerror function returns a pointer to the system error message corresponding to the error code errnum.

SYNOPSIS

   #include <string.h>

   char *strerror(int errnum);
                                   POSIX:CX

If successful, strerror returns a pointer to the error string. No values are reserved for failure.

Use strerror to produce informative messages, or use it with functions that return error codes directly without setting errno.

Example 2.6. 

The following code segment uses strerror to output a more informative error message when close fails.

int fildes;

if (close(fildes) == -1)
   fprintf(stderr, "Failed to close file descriptor %d: %s
",
                   fildes, strerror(errno));

The strerror function may change errno. You should save and restore errno if you need to use it again.

Example 2.7. 

The following code segment illustrates how to use strerror and still preserve the value of errno.

int error;
int fildes;

if (close(fildes) == -1) {
   error = errno;                           /* temporarily save errno */
   fprintf(stderr, "Failed to close file descriptor %d: %s
",
                   fildes, strerror(errno));
   errno = error;    /* restore errno after writing the error message */
}

Correctly handing errno is a tricky business. Because its implementation may call other functions that set errno, a library function may change errno, even though the man page doesn’t explicitly state that it does. Also, applications cannot change the string returned from strerror, but subsequent calls to either strerror or perror may overwrite this string.

Another common problem is that many library calls abort if the process is interrupted by a signal. Functions generally report this type of return with an error code of EINTR. For example, the close function may be interrupted by a signal. In this case, the error was not due to a problem with its execution but was a result of some external factor. Usually the program should not treat this interruption as an error but should restart the call.

Example 2.8. 

The following code segment restarts the close function if a signal occurs.

int error;
int fildes;

while (((error = close(fildes)) == -1) && (errno == EINTR))  ;
if (error == -1)
   perror("Failed to close the file"); /* a real close error occurred */

The while loop of Example 2.8 has an empty statement clause. It simply calls close until it either executes successfully or encounters a real error. The problem of restarting library calls is so common that we provide a library of restarted calls with prototypes defined in restart.h. The functions are designated by a leading r_ prepended to the regular library name. For example, the restart library designates a restarted version of close by the name r_close.

Example 2.9. 

The following code segment illustrates how to use a version of close from the restart library.

#include "restart.h"     /* user-defined library not part of standard */
int fildes;

if (r_close(fildes) == -1)
   perror("Failed to close the file"); /* a true close error occurred */

Function Return Values and Errors

Error handling is a key issue in writing reliable systems programs. When you are writing a function, think in terms of that function being called millions of times by the same application. How do you want the function to behave? In general, functions should never exit on their own, but rather should always indicate an error to the calling program. This strategy gives the caller an opportunity to recover or to shut down gracefully.

Functions should also not make unexpected changes to the process state that persist beyond the return from the function. For example, if a function blocks signals, it should restore the signal mask to its previous value before returning.

Finally, the function should release all the hidden resources that it uses during its execution. Suppose a function allocates a temporary buffer by calling malloc and does not free it before returning. One call to this function may not cause a problem, but hundreds or thousands of successive calls may cause the process memory usage to exceed its limits. Usually, a function that allocates memory should either free the memory or make a pointer available to the calling program. Otherwise, a long-running program may have a memory leak; that is, memory “leaks” out of the system and is not available until the process terminates.

You should also be aware that the failure of a library function usually does not cause your program to stop executing. Instead, the program continues, possibly using inconsistent or invalid data. You must examine the return value of every library function that can return an error that affects the running of your program, even if you think the chance of such an error occurring is remote.

Your own functions should also engage in careful error handling and communication. Standard approaches to handling errors in UNIX programs include the following.

  • Print out an error message and exit the program (only in main).

  • Return –1 or NULL, and set an error indicator such as errno.

  • Return an error code.

In general, functions should never exit on their own but should always report an error to the calling program. Error messages within a function may be useful during the debugging phase but generally should not appear in the final version. A good way to handle debugging is to enclose debugging print statements in a conditional compilation block so that you can reactivate them if necessary.

Example 2.10. 

The following code segment shows an example of how to use conditional compilation for error messages in functions.

    #define DEBUG    /* comment this line out for no error messages */

    int myfun(int x) {
       x++;
    #ifdef DEBUG
       fprintf(stderr, "The current value of x is %d
", x);
    #endif
}

If you comment the #define line out, the fprintf statement is not compiled and myfun does no printing. Alternatively, you can leave the #define out of the code completely and define DEBUG on the compiler line as follows.

cc -DDEBUG ...

Most library functions provide good models for implementing functions. Here are guidelines to follow.

  1. Make use of return values to communicate information and to make error trapping easy for the calling program.

  2. Do not exit from functions. Instead, return an error value to allow the calling program flexibility in handling the error.

  3. Make functions general but usable. (Sometimes these are conflicting goals.)

  4. Do not make unnecessary assumptions about sizes of buffers. (This is often hard to implement.)

  5. When it is necessary to use limits, use standard system-defined limits rather than arbitrary constants.

  6. Do not reinvent the wheel—use standard library functions when possible.

  7. Do not modify input parameter values unless it makes sense to do so.

  8. Do not use static variables or dynamic memory allocation if automatic allocation will do just as well.

  9. Analyze all the calls to the malloc family to make sure the program frees the memory that was allocated.

  10. Consider whether a function is ever called recursively or from a signal handler or from a thread. Functions with variables of static storage class may not behave in the desired way. (The error number can cause a big problem here.)

  11. Analyze the consequences of interruptions by signals.

  12. Carefully consider how the entire program terminates.

Argument Arrays

A command line consists of tokens (the arguments) that are separated by white space: blanks, tabs or a backslash () at the end of a line. Each token is a string of characters containing no white space unless quotation marks are used to group tokens. When a user enters a command line corresponding to a C executable program, the shell parses the command line into tokens and passes the result to the program in the form of an argument array. An argument array is an array of pointers to strings. The end of the array is marked by an entry containing a NULL pointer. Argument arrays are also useful for handling a variable number of arguments in calls to execvp and for handling environment variables. (Refer to Section 3.5 for an example of their application.)

Example 2.11. 

The following command line contains the four tokens: mine, -c, 10 and 2.0.

mine -c 10 2.0

The first token on a command line is the name of the command or executable. Figure 2.2 shows the argument array for the command line of Example 2.11.

The argv array for the call mine -c 10 2.0.

Figure 2.2. The argv array for the call mine -c 10 2.0.

Example 2.12. 

The mine program of Example 2.11 might start with the following line.

int main(int argc, char *argv[])

In Example 2.12, the argc parameter contains the number of command-line tokens or arguments (four for Example 2.11), and argv is an array of pointers to the command-line tokens. The argv is an example of an argument array.

Creating an argument array with makeargv

This section develops a function, makeargv, that creates an argument array from a string of tokens. The makeargv function illustrates some complications introduced by static variables. We use this function in several projects and exercises of subsequent chapters.

Example 2.13. 

Here is a prototype for a makeargv function that creates an argument array from a string of tokens.

char **makeargv(char *s);

The makeargv of Example 2.13 has a string input parameter and returns a pointer to an argv array. If the call fails, makeargv returns a NULL pointer.

Example 2.14. 

The following code segment illustrates how the makeargv function of Example 2.13 might be invoked.

int i;
char **myargv;
char mytest[] = "This is a test";

if ((myargv = makeargv(mytest)) == NULL)
   fprintf(stderr, "Failed to construct an argument array
");
else
   for (i = 0; myargv[i] != NULL; i++)
      printf("%d:%s
", i, myargv[i]);

Example 2.15. 

The following alternative prototype specifies that makeargv should pass the argument array as a parameter. This alternative version of makeargv returns an integer giving the number of tokens in the input string. In this case, makeargv returns –1 to indicate an error.

int makeargv(char *s, char ***argvp);

Example 2.16. 

The following code segment calls the makeargv function defined in Example 2.15.

int i;
char **myargv;
char mytest[] = "This is a test";
int numtokens;

if ((numtokens = makeargv(mytest, &myargv)) == -1)
   fprintf(stderr, "Failed to construct an argument array
");
else
   for (i = 0; i < numtokens; i++)
       printf("%d:%s
", i, myargv[i]);

Because C uses call-by-value parameter passing, Example 2.15 shows one more level of indirection (*) when the address of myargv is passed. A more general version of makeargv allows an extra parameter that represents the set of delimiters to use in parsing the string.

Example 2.17. 

The following prototype shows a makeargv function that has a delimiter set parameter.

int makeargv(const char *s, const char *delimiters, char ***argvp);

The const qualifier means that the function does not modify the memory pointed to by the first two parameters.

Program 2.1 calls the makeargv function of Example 2.17 to create an argument array from a string passed on the command line. The program checks that it has exactly one command-line argument and outputs a usage message if that is not the case. The main program returns 1 if it fails, and 0 if it completes successfully. The call to makeargv uses blank and tab as delimiters. The shell also uses the same delimiters, so be sure to enclose the command-line arguments in double quotes as shown in Example 2.18.

Example 2.18. 

If the executable for Program 2.1 is called argtest, the following command creates and prints an argument array for This is a test.

argtest "This is a test"

Example 2.1. argtest.c

A program that takes a single string as its command-line argument and calls makeargv to create an argument array.

#include <stdio.h>
#include <stdlib.h>
int makeargv(const char *s, const char *delimiters, char ***argvp);

int main(int argc, char *argv[]) {
   char delim[] = " 	";
   int i;
   char **myargv;
   int numtokens;

   if (argc != 2) {
      fprintf(stderr, "Usage: %s string
", argv[0]);
      return 1;
   }
   if ((numtokens = makeargv(argv[1], delim, &myargv)) == -1) {
      fprintf(stderr, "Failed to construct an argument array for %s
", argv[1]);
      return 1;
   }
   printf("The argument array contains:
");
   for (i = 0; i < numtokens; i++)
      printf("%d:%s
", i, myargv[i]);
   return 0;
}

Implementation of makeargv

This section develops an implementation of makeargv based on the prototype of Example 2.17 as follows.

int makeargv(const char *s, const char *delimiters, char ***argvp);

The makeargv function creates an argument array pointed to by argvp from the string s, using the delimiters specified by delimiters. If successful, makeargv returns the number of tokens. If unsuccessful, makeargv returns –1 and sets errno.

The const qualifiers on s and delimiters show that makeargv does not modify either s or delimiters. The implementation does not make any a priori assumptions about the length of s or of delimiters. The function also releases all memory that it dynamically allocates except for the actual returned array, so makeargv can be called multiple times without causing a memory leak.

In writing general library programs, you should avoid imposing unnecessary a priori limitations on sizes (e.g., by using buffers of predefined size). Although the system-defined constant MAX_CANON is a reasonable buffer size for handling command-line arguments, the makeargv function might be called to make an environment list or to parse an arbitrary command string read from a file. This implementation of makeargv allocates all buffers dynamically by calling malloc and uses the C library function strtok to split off individual tokens. To preserve the input string s, makeargv does not apply strtok directly to s. Instead, it creates a scratch area of the same size pointed to by t and copies s into it. The overall implementation strategy is as follows.

  1. Use malloc to allocate a buffer t for parsing the string in place. The t buffer must be large enough to contain s and its terminating `'.

  2. Copy s into t. Figure 2.3 shows the result for the string "mine -c 10 2.0".

    The makeargv makes a working copy of the string s in the buffer t to avoid modifying that input parameter.

    Figure 2.3. The makeargv makes a working copy of the string s in the buffer t to avoid modifying that input parameter.

  3. Make a pass through the string t, using strtok to count the tokens.

  4. Use the count (numtokens) to allocate an argv array.

  5. Copy s into t again.

  6. Use strtok to obtain pointers to the individual tokens, modifying t and effectively parsing t in place. Figure 2.4 shows the method for parsing the tokens in place.

    The makeargv parses the tokens in place by using strtok.

    Figure 2.4. The makeargv parses the tokens in place by using strtok.

The implementation of makeargv discussed here uses the C library function strtok to split a string into tokens. The first call to strtok is different from subsequent calls. On the first call, pass the address of the string to parse as the first argument, s1. On subsequent calls for parsing the same string, pass a NULL for s1. The second argument to strtok, s2, is a string of allowed token delimiters.

SYNOPSIS

   #include <string.h>

   char *strtok(char *restrict s1, const char *restrict s2);
                                                                  POSIX:CX

Each successive call to strtok returns the start of the next token and inserts a '' at the end of the token being returned. The strtok function returns NULL when it reaches the end of s1.

It is important to understand that strtok does not allocate new space for the tokens, but rather it tokenizes s1 in place. Thus, if you need to access the original s1 after calling strtok, you should pass a copy of the string.

The restrict qualifier on the two parameters requires that any object referenced by s1 in this function cannot also be accessed by s2. That is, the tail end of the string being parsed cannot be used to contain the delimiters. This restriction, one that would normally be satisfied in any conceivable application, allows the compiler to perform optimizations on the code for strtok. The const qualifier on the second parameter indicates that the strtok function does not modify the delimiter string.

Program 2.2 shows an implementation of makeargv. Since strtok allows the caller to specify which delimiters to use for separating tokens, the implementation includes a delimiters string as a parameter. The program begins by using strspn to skip over leading delimiters. This ensures that **argvp, which points to the first token, also points to the start of the scratch buffer, called t in the program. If an error occurs, this scratch buffer is explicitly freed. Otherwise, the calling program can free this buffer. The call to free may not be important for most programs, but if makeargv is called frequently from a shell or a long-running communication program, the unfreed space from failed calls to makeargv can accumulate. When using malloc or a related call, analyze whether to free the memory if an error occurs or when the function returns.

Example 2.2. makeargv.c

An implementation of makeargv.

#include <errno.h>
#include <stdlib.h>
#include <string.h>

int makeargv(const char *s, const char *delimiters, char ***argvp) {
   int error;
   int i;
   int numtokens;
   const char *snew;
   char *t;

   if ((s == NULL) || (delimiters == NULL) || (argvp == NULL)) {
      errno = EINVAL;
      return -1;
   }
   *argvp = NULL;
   snew = s + strspn(s, delimiters);         /* snew is real start of string */
   if ((t = malloc(strlen(snew) + 1)) == NULL)
      return -1;
   strcpy(t, snew);
   numtokens = 0;
   if (strtok(t, delimiters) != NULL)     /* count the number of tokens in s */
      for (numtokens = 1; strtok(NULL, delimiters) != NULL; numtokens++) ;

                             /* create argument array for ptrs to the tokens */
   if ((*argvp = malloc((numtokens + 1)*sizeof(char *))) == NULL) {
      error = errno;
      free(t);
      errno = error;
      return -1;
   }
                        /* insert pointers to tokens into the argument array */
   if (numtokens == 0)
      free(t);
   else {
      strcpy(t, snew);
      **argvp = strtok(t, delimiters);
      for (i = 1; i < numtokens; i++)
          *((*argvp) + i) = strtok(NULL, delimiters);
    }
    *((*argvp) + numtokens) = NULL;             /* put in final NULL pointer */
    return numtokens;
}

Example 2.19. freemakeargv.c

The following function frees all the memory associated with an argument array that was allocated by makeargv. If the first entry in the array is not NULL, freeing the entry also frees the memory allocated for all the strings. The argument array is freed next. Notice that it would be incorrect to free the argument array and then access the first entry.

#include <stdlib.h>

void freemakeargv(char **argv) {
   if (argv == NULL)
      return;
   if (*argv != NULL)
      free(*argv);
   free(argv);
}

Thread-Safe Functions

The strtok function is not a model that you should emulate in your programs. Because of its definition (page 35), it must use an internal static variable to keep track of the current location of the next token to parse within the string. However, when calls to strtok with different parse strings occur in the same program, the parsing of the respective strings may interfere because there is only one variable for the location.

Program 2.3 shows an incorrect way to determine the average number of words per line by using strtok. The wordaverage function determines the average number of words per line by using strtok to find the next line. The function then calls wordcount to count the number of words on this line. Unfortunately, wordcount also uses strtok, this time to parse the words on the line. Each of these functions by itself would be correct if the other one did not call strtok. The wordaverage function works correctly for the first line, but when wordaverage calls strtok to parse the second line, the internal state information kept by strtok has been reset by wordcount.

The behavior that causes wordaverage to fail also prevents strtok from being used safely in programs with multiple threads. If one thread is in the process of using strtok and a second thread calls strtok, subsequent calls may not behave properly. POSIX defines a thread-safe function, strtok_r, to be used in place of strtok. The _r stands for reentrant, an obsolescent term indicating the function can be reentered (called again) before a previous call finishes.

Example 2.3. wordaveragebad.c

An incorrect use of strtok to determine the average number of words per line.

#include <string.h>
#define LINE_DELIMITERS "
"
#define WORD_DELIMITERS " "

static int wordcount(char *s) {
   int count = 1;

   if (strtok(s, WORD_DELIMITERS) == NULL)
      return 0;
   while (strtok(NULL, WORD_DELIMITERS) != NULL)
      count++;
   return count;
}

double wordaverage(char *s) {      /* return average size of words in s */
   int linecount = 1;
   char *nextline;
   int words;

   nextline = strtok(s, LINE_DELIMITERS);
   if (nextline == NULL)
      return 0.0;
   words = wordcount(nextline);
   while ((nextline = strtok(NULL, LINE_DELIMITERS)) != NULL) {
      words += wordcount(nextline);
      linecount++;
   }
   return (double)words/linecount;
}

The strtok_r function behaves similarly to strtok except for an additional parameter, lasts, a user-provided pointer to a location that strtok_r uses to store the starting address for the next parse.

SYNOPSIS

   #include <string.h>

   char *strtok_r(char *restrict s, const char *restrict sep,
                  char **restrict lasts);
                                                                  POSIX:TSF

Each successive call to strtok_r returns the start of the next token and inserts a '' at the end of the token being returned. The strtok_r function returns NULL when it reaches the end of s.

Program 2.4 corrects Program 2.3 by using strtok_r. Notice that the identifier lasts used by each function has no linkage, so each invocation accesses a distinct object. Thus, the two functions use different variables for the third parameter of strtok_r and do not interfere.

Example 2.4. wordaverage.c

A correct use of strtok_r to determine the average number of words per line.

#include <string.h>
#define LINE_DELIMITERS "
"
#define WORD_DELIMITERS " "

static int wordcount(char *s) {
   int count = 1;
   char *lasts;

   if (strtok_r(s, WORD_DELIMITERS, &lasts) == NULL)
      return 0;
   while (strtok_r(NULL, WORD_DELIMITERS, &lasts) != NULL)
      count++;
   return count;
}

double wordaverage(char *s) {     /* return average size of words in s */
   char *lasts;
   int linecount = 1;
   char *nextline;
   int words;

   nextline = strtok_r(s, LINE_DELIMITERS, &lasts);
   if (nextline == NULL)
      return 0.0;
   words = wordcount(nextline);
   while ((nextline = strtok_r(NULL, LINE_DELIMITERS, &lasts)) != NULL) {
      words += wordcount(nextline);
      linecount++;
   }
   return (double)words/linecount;
}

Use of Static Variables

While care must be taken in using static variables in situations with multiple threads, static variables are useful. For example, a static variable can hold internal state information between calls to a function.

Program 2.5 shows a function called bubblesort along with auxiliary functions for keeping track of the number of interchanges made. The variable count has a static storage class because it is declared outside any block. The static qualifier forces this variable to have internal linkage, guaranteeing that the count variable cannot be directly accessed by any function aside from bubblesort.c. The clearcount function and the interchange in the onepass function are the only code segments that modify count. The internal linkage allows other files linked to bubblesort.c to use an identifier, count, without interfering with the integer count in this file.

The three functions clearcount, getcount and bubblesort have external linkage and are accessible from outside. Notice that the static qualifier for onepass gives this function internal linkage so that it is not accessible from outside this file. By using appropriate storage and linkage classes, bubblesort hides its implementation details from its callers.

Example 2.5. bubblesort.c

A function that sorts an array of integers and counts the number of interchanges made in the process.

static int count = 0;

static int onepass(int a[], int n) { /* return true if interchanges are made */
   int i;
   int interchanges = 0;
   int temp;

   for (i = 0; i < n - 1; i++)
      if (a[i] > a[i+1]) {
         temp = a[i];
         a[i] = a[i+1];
         a[i+1] = temp;
         interchanges = 1;
         count++;
      }
   return interchanges;
}

void clearcount(void) {
   count = 0;
}

int getcount(void) {
   return count;
}

void bubblesort(int a[], int n) {               /* sort a in ascending order */
   int i;
   for (i = 0; i < n - 1; i++)
      if (!onepass(a, n - i))
         break;
}

Example 2.20. 

For each object and function in Program 2.5 give the storage and linkage class where appropriate.

Answer:

The function onepass has internal linkage. The other functions have external linkage. Functions do not have a storage class. The count identifier has internal linkage and static storage. All other variables have no linkage and automatic storage. (See Section A.5 for additional discussion about linkage.)

Section 2.9 discusses a more complex use of static variables to approximate object-oriented behavior in a C program.

Structure of Static Objects

Static variables are commonly used in the C implementation of a data structure as an object. The data structure and all the functions that access it are placed in a single source file, and the data structure is defined outside any function. The data structure has the static attribute, giving it internal linkage: it is private to that source file. Any references to the data structure outside the file are made through the access functions (methods, in object-oriented terminology) defined within the file. The actual details of the data structure should be invisible to the outside world so that a change in the internal implementation does not require a change to the calling program. You can often make an object thread-safe by placing locking mechanisms in its access functions without affecting outside callers.

This section develops an implementation of a list object organized according to the type of static structure just described. Each element of the list consists of a time and a string of arbitrary length. The user can store items in the list object and traverse the list object to examine the contents of the list. The user may not modify data that has already been put in the list. This list object is useful for logging operations such as keeping a list of commands executed by a program.

The requirements make the implementation of the list both challenging and interesting. Since the user cannot modify data items once they are inserted, the implementation must make sure that no caller has access to a pointer to an item stored in the list. To satisfy this requirement, the implementation adds to the list a pointer to a copy of the string rather than a pointer to the original string. Also, when the user retrieves data from the list, the implementation returns a pointer to a copy of the data rather than a pointer to the actual data. In the latter case, the caller is responsible for freeing the memory occupied by the copy.

The trickiest part of the implementation is the traversal of the list. During a traversal, the list must save the current position to know where to start the next request. We do not want to do this the way strtok does, since this approach would make the list object unsafe for multiple simultaneous traversals. We also do not want to use the strtok_r strategy, which requires the calling program to provide a location for storing a pointer to the next entry in the list. This pointer would allow the calling program to modify entries in the list, a feature we have ruled out in the specification.

We solve this problem by providing the caller with a key value to use in traversing the list. The list object keeps an array of pointers to items in the list indexed by the key. The memory used by these pointers should be freed or reused when the key is no longer needed so that the implementation does not consume unnecessary memory resources.

Program 2.6 shows the listlib.h file containing the prototypes of the four access functions: accessdata, adddata, getdata and freekey. The data_t structure holds a time_t value (time) and a pointer to a character string of undetermined length (string). Programs that use the list must include the listlib.h header file.

Example 2.6. listlib.h

The header file listlib.h.

#include <time.h>

typedef struct data_struct {
     time_t time;
     char *string;
} data_t;

int accessdata(void);
int adddata(data_t data);
int freekey(int key);
int getdata(int key, data_t *datap);

Program 2.7 shows an implementation of the list object. The adddata function inserts a copy of the data item at the end of the list. The getdata function copies the next item in the traversal of the list into a user-supplied buffer of type data_t. The getdata function allocates memory for the copy of the string field of this data buffer, and the caller is responsible for freeing it.

The accessdata function returns an integer key for traversing the data list. Each key value produces an independent traversal starting from the beginning of the list. When the key is no longer needed, the caller can free the key resources by calling freekey. The key is also freed when the getdata function gives a NULL pointer for the string field of *datap to signify that there are no more entries to examine. Do not call freekey once you have reached the end of the list.

If successful, accessdata returns a valid nonnegative key. The other three functions return 0 if successful. If unsuccessful, these functions return –1 and set errno.

Example 2.7. listlib.c

A list object implementation.

#include <errno.h>
#include <stdlib.h>
#include <string.h>
#include "listlib.h"
#define TRAV_INIT_SIZE 8

typedef struct list_struct {
     data_t item;
     struct list_struct *next;
} list_t;

static list_t endlist;
static list_t *headptr = NULL;
static list_t *tailptr = NULL;
static list_t **travptrs = NULL;
static int travptrs_size = 0;

int accessdata(void) {              /* return a nonnegative key if successful */
   int i;
   list_t **newptrs;
   if (headptr == NULL) {             /* can't access a completely empty list */
      errno = EINVAL;
      return -1;
   }
   if (travptrs_size == 0) {                               /* first traversal */
      travptrs = (list_t **)calloc(TRAV_INIT_SIZE, sizeof(list_t *));
      if (travptrs == NULL)     /* couldn't allocate space for traversal keys */
         return -1;
      travptrs[0] = headptr;
      travptrs_size = TRAV_INIT_SIZE;
      return 0;
   }
   for (i = 0; i < travptrs_size; i++) {    /* look for an empty slot for key */
      if (travptrs[i] == NULL) {
         travptrs[i] = headptr;
         return i;
      }
   }
   newptrs = realloc(travptrs, 2*travptrs_size*sizeof(list_t *));
   if (newptrs == NULL)        /* couldn't expand the array of traversal keys */
      return -1;
   travptrs = newptrs;
   travptrs[travptrs_size] = headptr;
   travptrs_size *= 2;
   return travptrs_size/2;
}

int adddata(data_t data) {   /* allocate node for data and add to end of list */
   list_t *newnode;
   int nodesize;

   nodesize = sizeof(list_t) + strlen(data.string) + 1;
   if ((newnode = (list_t *)(malloc(nodesize))) == NULL) /* couldn't add node */
      return -1;
   newnode->item.time = data.time;
   newnode->item.string = (char *)newnode + sizeof(list_t);
   strcpy(newnode->item.string, data.string);
   newnode->next = NULL;
   if (headptr == NULL)
      headptr = newnode;
   else
      tailptr->next = newnode;
   tailptr = newnode;
   return 0;
}

int getdata(int key, data_t *datap) { /* copy next item and set datap->string */
   list_t *t;

   if ( (key < 0) || (key >= travptrs_size) || (travptrs[key] == NULL) ) {
      errno = EINVAL;
      return -1;
   }
   if (travptrs[key] == &endlist) { /* end of list, set datap->string to NULL */
      datap->string = NULL;
      travptrs[key] = NULL;
      return 0;       /* reaching end of list natural condition, not an error */
   }
   t = travptrs[key];
   datap->string = (char *)malloc(strlen(t->item.string) + 1);
   if (datap->string == NULL) /* couldn't allocate space for returning string */
      return -1;
   datap->time = t->item.time;
   strcpy(datap->string, t->item.string);
   if (t->next == NULL)
      travptrs[key] = &endlist;
   else
      travptrs[key] = t->next;
   return 0;
}

int freekey(int key) {                /* free list entry corresponding to key */
   if ( (key < 0) || (key >= travptrs_size) ) {           /* key out of range */
      errno = EINVAL;
      return -1;
   }
   travptrs[key] = NULL;
   return 0;
}

The implementation of Program 2.7 does not assume an upper bound on the length of the string field of data_t. The adddata function appends to its internal list structure a node containing a copy of data. The malloc function allocates space for both the list_t and its string data in a contiguous block. The only way that adddata can fail is if malloc fails. The accessdata function also fails if there are not sufficient resources to provide an additional access stream. The freekey function fails if the key passed is not valid or has already been freed. Finally, getdata fails if the key is not valid. Reaching the end of a list during traversal is a natural occurrence rather than an error. The getdata function sets the string field of *datap to NULL to indicate the end.

The implementation in Program 2.7 uses a key that is just an index into an array of traversal pointers. The implementation allocates the array dynamically with a small initial size. When the number of traversal streams exceeds the size of the array, accessdata calls realloc to expand the array.

The data structures for the object and the code for the access functions of listlib are in a single file. Several later projects use this list object or one that is similar. In an object representation, outside callers should not have access to the internal representation of the object. For example, they should not be aware that the object uses a linked list rather than an array or other implementation of the abstract data structure.

The implementation of Program 2.7 allows nested or recursive calls to correctly add data to the list or to independently traverse the list. However, the functions have critical sections that must be protected in a multithreaded environment. Sections 13.2.3 and 13.6 discuss how this can be done.

Example 2.21. 

What happens if you try to access an empty list in Program 2.7?

Answer:

The accessdata returns –1, indicating an error.

Program 2.8 executes commands and keeps an internal history, using the list data object of Program 2.7. The program takes an optional command-line argument, history. If history is present, the program outputs a history of commands run thus far whenever the program reads the string "history" from standard input.

Program 2.8 calls runproc to run the command and showhistory to display the history of commands that were run. The program uses fgets instead of gets to prevent a buffer overrun on input. MAX_CANON is a constant specifying the maximum number of bytes in a terminal input line. If MAX_CANON is not defined in limits.h, then the maximum line length depends on the particular device and the program sets the value to 8192 bytes.

Program 2.9 shows the source file containing the runproc and showhistory functions. When runproc successfully executes a command, it adds a node to the history list by calling adddata. The showhistory function displays the contents of each node in the list by calling the getdata function. After displaying the string in a data item, showhistory function frees the memory allocated by the getdata call. The showhistory function does not call freekey explicitly because it does a complete traversal of the list.

Example 2.8. keeplog.c

A main program that reads commands from standard input and executes them.

#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#ifndef MAX_CANON
#define MAX_CANON 8192
#endif

int runproc(char *cmd);
void showhistory(FILE *f);

int main(int argc, char *argv[]) {
   char cmd[MAX_CANON];
   int history = 1;

   if (argc == 1)
      history = 0;
   else if ((argc > 2) || strcmp(argv[1], "history")) {
      fprintf(stderr, "Usage: %s [history]
", argv[0]);
      return 1;
   }
   while(fgets(cmd, MAX_CANON, stdin) != NULL) {
      if (*(cmd + strlen(cmd) - 1) == '
')
          *(cmd + strlen(cmd) - 1) = 0;
      if (history && !strcmp(cmd, "history"))
         showhistory(stdout);
      else if (runproc(cmd)) {
         perror("Failed to execute command");
         break;
      }
   }
   printf("

>>>>>>The list of commands executed is:
");
   showhistory(stdout);
   return 0;
}

The runproc function of Program 2.9 calls the system function to execute a command. The runproc function returns 0 if the command can be executed. If the command cannot be executed, runproc returns –1 with errno set.

The system function passes the command parameter to a command processor for execution. It behaves as if a child process were created with fork and the child process invoked sh with execl.

SYNOPSIS

  #include <stdlib.h>

  int system(const char *command);
                                       POSIX:CX

If command is NULL, the system function always returns a nonzero value to mean that a command language interpreter is available. If command is not NULL, system returns the termination status of the command language interpreter after the execution of command. If system could not fork a child or get the termination status, it returns –1 and sets errno. A zero termination status generally indicates successful completion.

Example 2.9. keeploglib.c

The file keeploglib.c.

#include <stdio.h>
#include <stdlib.h>
#include "listlib.h"

int runproc(char *cmd) { /* execute cmd; store cmd and time in history list */
   data_t execute;

   if (time(&(execute.time)) == -1)
      return -1;
   execute.string = cmd;
   if (system(cmd) == -1)           /* command could not be executed at all */
      return -1;
   return adddata(execute);
}

void showhistory(FILE *f) {        /* output the history list of the file f */
   data_t data;
   int key;

   key = accessdata();
   if (key == -1) {
      fprintf(f, "No history
");
      return;
   }
   while (!getdata(key, &data) && (data.string != NULL)) {
      fprintf(f, "Command: %s
Time: %s
", data.string, ctime(&(data.time)));
      free(data.string);
   }
}

Process Environment

An environment list consists of an array of pointers to strings of the form name = value. The name specifies an environment variable, and the value specifies a string value associated with the environment variable. The last entry of the array is NULL.

The external variable environ points to the process environment list when the process begins executing. The strings in the process environment list can appear in any order.

SYNOPSIS

   extern char **environ
                             ISO C

If the process is initiated by execl, execlp, execv or execvp, then the process inherits the environment list of the process just before the execution of exec. The execle and execve functions specifically set the environment list as discussed in Section 3.5.

Example 2.22. environ.c

The following C program outputs the contents of its environment list and exits.

#include <stdio.h>

extern char **environ;

int main(void) {
   int i;

   printf("The environment list follows:
");
   for(i = 0; environ[i] != NULL; i++)
     printf("environ[%d]: %s
", i, environ[i]);
   return 0;
}

Environment variables provide a mechanism for using system-specific or user-specific information in setting defaults within a program. For example, a program may need to write status information in the user’s home directory or may need to find an executable file in a particular place. The user can set the information about where to look for executables in a single variable. Applications interpret the value of an environment variable in an application-specific way. Some of the environment variables described by POSIX are shown in Table 2.1. These environment variables are not required, but if one of these variables is present, it must have the meaning specified in the table.

Use getenv to determine whether a specific variable has a value in the process environment. Pass the name of the environment variable as a string.

SYNOPSIS

  #include <stdlib.h>

  char *getenv(const char *name);
                                        POSIX:CX

The getenv function returns NULL if the variable does not have a value. If the variable has a value, getenv returns a pointer to the string containing that value. Be careful about calling getenv more than once without copying the first return string into a buffer. Some implementations of getenv use a static buffer for the return strings and overwrite the buffer on each call.

Table 2.1. POSIX environment variables and their meanings.

variable

meaning

COLUMNS

preferred width in columns for terminal

HOME

user’s home directory

LANG

locale when not specified by LC_ALL or LC_*

LC_ALL

overriding name of locale

LC_COLLATE

name of locale for collating information

LC_CTYPE

name of locale for character classification

LC_MESSAGES

name of locale for negative or affirmative responses

LC_MONETARY

name of locale for monetary editing

LC_NUMERIC

name of locale for numeric editing

LC_TIME

name of locale for date/time information

LINES

preferred number of lines on a page or vertical screen

LOGNAME

login name associated with a process

PATH

path prefixes for finding executables

PWD

absolute pathname of the current working directory

SHELL

pathname of the user’s preferred command interpreter

TERM

terminal type for output

TMPDIR

pathname of directory for temporary files

TZ

time zone information

Example 2.23. 

POSIX specifies that the shell sh should use the environment variable MAIL as the pathname of the mailbox for incoming mail, provided that the MAILPATH variable is not set. The following code segment sets mailp to the value of the environment variable MAIL if this variable is defined and MAILPATH is not defined. Otherwise, the segment sets mailp to a default value.

#define MAILDEFAULT "/var/mail"
char *mailp = NULL;

if (getenv("MAILPATH") == NULL)
   mailp = getenv("MAIL");
if (mailp == NULL)
    mailp = MAILDEFAULT;

The first call to getenv in Example 2.23 merely checks for the existence of MAILPATH, so it is not necessary to copy the return value to a separate buffer before calling getenv again.

Do not confuse environment variables with predefined constants like MAX_CANON. The predefined constants are defined in header files with #define. Their values are constants and known at compile time. To see whether a definition of such a constant exists, use the #ifndef compiler directive as in Program 2.8. In contrast, environment variables are dynamic, and their values are not known until run time.

Example 2.24. getpaths.c

Write a function to produce an argument array containing the components of the PATH environment variable.

Answer:

#include <stdlib.h>
#define PATH_DELIMITERS ":"

int makeargv(const char *s, const char *delimiters, char ***argvp);

char **getpaths(void) {
   char **myargv;
   char *path;

   path = getenv("PATH");
   if (makeargv(path, PATH_DELIMITERS, &myargv) == -1)
      return NULL;
   else
      return myargv;
}

Process Termination

When a process terminates, the operating system deallocates the process resources, updates the appropriate statistics and notifies other processes of the demise. The termination can either be normal or abnormal. The activities performed during process termination include canceling pending timers and signals, releasing virtual memory resources, releasing other process-held system resources such as locks, and closing files that are open. The operating system records the process status and resource usage, notifying the parent in response to a wait function.

In UNIX, a process does not completely release its resources after termination until the parent waits for it. If its parent is not waiting when the process terminates, the process becomes a zombie. A zombie is an inactive process whose resources are deleted later when its parent waits for it. When a process terminates, its orphaned children and zombies are adopted by a special system process. In traditional UNIX systems, this special process is called the init process, a process with process ID value 1 that periodically waits for children.

A normal termination occurs under the following conditions.

  • return from main

  • Implicit return from main (the main function falls off the end)

  • Call to exit, _Exit or _exit

The C exit function calls user-defined exit handlers that were registered by atexit in the reverse order of registration. After calling the user-defined handlers, exit flushes any open streams that have unwritten buffered data and then closes all open streams. Finally, exit removes all temporary files that were created by tmpfile() and then terminates control. Using the return statement from main has the same effect as calling exit with the corresponding status. Reaching the end of main has the same effect as calling exit(0).

The _Exit and _exit functions do not call user-defined exit handlers before terminating control. The POSIX standard does not specify what happens when a program calls these functions: that is, whether open streams are flushed or temporary files are removed.

The functions exit, _Exit and _exit take a small integer parameter, status, indicating the termination status of the program. Use a status value of 0 to report a successful termination. Programmer-defined nonzero values of status report errors. Example 3.22 on page 77 illustrates how a parent can determine the value of status when it waits for the child. Only the low-order byte of the status value is available to the parent process.

SYNOPSIS

   #include <stdlib.h>

   void exit(int status);
   void _Exit(int status);
                                ISO C
SYNOPSIS

   #include <unistd.h>

   void _exit(int status);

                               POSIX

The C atexit function installs a user-defined exit handler. Exit handlers are executed on a last-installed-first-executed order when the program returns from main or calls exit. Use multiple calls to atexit to install several handlers. The atexit function takes a single parameter, the function to be executed as a handler.

SYNOPSIS

   #include <stdlib.h>

   int atexit(void (*func)(void));
                                        ISO C

If successful, atexit returns 0. If unsuccessful, atexit returns a nonzero value.

Program 2.10 has an exit handler, showtimes, that causes statistics about the time used by the program and its children to be output to standard error before the program terminates. The times function returns timing information in the form of the number of clock ticks. The showtimes function converts the time to seconds by dividing by the number of clock ticks per second (found by calling sysconf). Chapter 9 discusses time more completely.

Example 2.10. showtimes.c

A program with an exit handler that outputs CPU usage.

#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/times.h>

static void showtimes(void) {
   double ticks;
   struct tms tinfo;

   if ((ticks = (double) sysconf(_SC_CLK_TCK)) == -1)
      perror("Failed to determine clock ticks per second");
   else if (times(&tinfo) == (clock_t)-1)
      perror("Failed to get times information");
   else {
      fprintf(stderr, "User time:              %8.3f seconds
",
         tinfo.tms_utime/ticks);
      fprintf(stderr, "System time:            %8.3f seconds
",
         tinfo.tms_stime/ticks);
      fprintf(stderr, "Children's user time:   %8.3f seconds
",
         tinfo.tms_cutime/ticks);
      fprintf(stderr, "Children's system time: %8.3f seconds
",
         tinfo.tms_cstime/ticks);
   }
}

int main(void) {
   if (atexit(showtimes))  {
      fprintf(stderr, "Failed to install showtimes exit handler
");
      return 1;
   }
    /*  rest of main program goes here */
   return 0;
}

A process can also terminate abnormally either by calling abort or by processing a signal that causes termination. The signal may be generated by an external event (like Ctrl-C from the keyboard) or by an internal error such as an attempt to access an illegal memory location. An abnormal termination may produce a core dump, and user-installed exit handlers are not called.

Exercise: An env Utility

The env utility examines the environment and modifies it to execute another command. When called without arguments, the env command writes the current environment to standard output. The optional utility argument specifies the command to be executed under the modified environment. The optional -i argument means that env should ignore the environment inherited from the shell when executing utility. Without the -i option, env uses the [name=value] arguments to modify rather than replace the current environment to execute utility. The env utility does not modify the environment of the shell that executes it.

SYNOPSIS

   env [-i] [name=value] ... [utility [argument ...]]
                                                           POSIX:Shell and Utilities

Example 2.25. 

Calling env from the C shell on a machine running Sun Solaris produced the following output.

HOME=/users/srobbins
USER=srobbins
LOGNAME=srobbins
PATH=/bin:/usr/bin:/usr/ucb:/usr/bin/X11:/usr/local/bin
MAIL=/var/mail/srobbins
TZ=US/Central
SSH2_CLIENT=129.115.12.131 41064 129.115.12.131 22
TERM=sun-cmd
DISPLAY=sqr3:12.0
SSH2_SFTP_LOG_FACILITY=-1
PWD=/users/srobbins

Write a program called doenv that behaves in the same way as the env utility when executing another program.

  1. When called with no arguments, the doenv utility calls the getenv function and outputs the current environment to standard output.

  2. When doenv is called with the optional -i argument, the entire environment is replaced by the name=value pairs. Otherwise, the pairs modify or add to the current environment.

  3. If the utility argument is given, use system to execute utility after the environment has been appropriately changed. Otherwise, print the changed environment to standard output, one entry per line.

  4. One way to change the current environment in a program is to overwrite the value of the environ external variable. If you are completely replacing the old environment (-i option), count the number of name=value pairs, allocate enough space for the argument array (don’t forget the extra NULL entry), copy the pointers from argv into the array, and set environ.

  5. If you are modifying the current environment by overwriting environ, allocate enough space to hold the old entries and any new entries to be added. Copy the pointers from the old environ into the new one. For each name=value pair, determine whether the name is already in the old environment. If name appears, just replace the pointer. Otherwise, add the new entry to the array.

  6. Note that it is not safe to just append new entries to the old environ, since you cannot expand the old environ array with realloc. If all name=value pairs correspond to entries already in the environment, just replace the corresponding pointers in environ.

Exercise: Message Logging

The exercise in this section describes a logging library that is similar to the list object defined in listlib.h and listlib.c of Program 2.6 and Program 2.7, respectively. The logging utility allows the caller to save a message at the end of a list. The logger also records the time that the message was logged. Program 2.11 shows the log.h file for the logger.

Example 2.11. log.h

The header file log.h for the logging facility.

#include <time.h>

typedef struct data_struct {
     time_t time;
     char *string;
} data_t;

int addmsg(data_t data);
void clearlog(void);
char *getlog(void);
int savelog(char *filename);

The data_t structure and the addmsg function have the same respective roles as the list_t structure and adddata function of listlib.h. The savelog function saves the logged messages to a disk file. The clearlog function releases all the storage that has been allocated for the logged messages and empties the list of logged messages. The getlog function allocates enough space for a string containing the entire log, copies the log into this string, and returns a pointer to the string. It is the responsibility of the calling program to free this memory when necessary.

If successful, addmsg and savelog return 0. A successful getlog call returns a pointer to the log string. If unsuccessful, addmsg and savelog return –1. An unsuccessful getlog call returns NULL. These three functions also set errno on failure.

Program 2.12 contains templates for the four functions specified in log.h, as well as the static structures for the list itself. Complete the implementation of loglib.c. Use the logging facility to save the messages that were printed by some of your programs. How might you use this facility for program debugging and testing?

Example 2.12. loglib.c

A template for a simple logging facility.

#include <stdlib.h>
#include <string.h>
#include "log.h"

typedef struct list_struct {
     data_t item;
     struct list_struct *next;
} log_t;

static log_t *headptr = NULL;
static log_t *tailptr = NULL;

int addmsg(data_t data) {
   return 0;
}

void clearlog(void) {
}

char *getlog(void) {
   return NULL;
}

int savelog(char *filename) {
   return 0;
}

Additional Reading

The prerequisite programming background for doing the projects in this text includes a general knowledge of UNIX and C. Appendix A summarizes the basics of developing programs in a UNIX environment. UNIX in a Nutshell: A Desktop Quick Reference for System V by Robbins and Gilly is a good user’s reference [94]. A Practical Guide to the UNIX System, 3rd ed. by Sobell [108] gives an overview of UNIX and its utilities from the user perspective. The classic reference to C is The C Programming Language, 2nd ed. by Kernighan and Ritchie [62]. C: A Reference Manual, 4th ed. by Harbison and Steele [46] provides a detailed discussion of many of the C language issues that you might encounter in programming the projects for this text. Finally, Standard C Library by Plauger is an interesting, but ultimately detailed, look at C library function implementation [91]. The final arbiter of C questions is the ISO C Standard [56].

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset