Chapter 20: Getting Input from the Command Line

Up until this point, we have not read any input for any of our programs from any source. All program data has been hardcoded in the program itself. In this chapter, we will begin exploring programming input with one of the simplest available methods—inputting from the console's command line.

We will revisit the main() function with our added knowledge of function parameters and arrays of strings. We will then explore how to retrieve strings via an argument to the main() function.

The following topics will be covered in this chapter:

  • Understanding the two forms of main()
  • Understanding how argc and argv are related
  • Writing a program to retrieve values from argv and printing them
  • Understanding how to use getopt() for simple command-line processing

So, let's dive right in!

Technical requirements

Continue to use the tools you chose in the Technical requirements section of Chapter 1Running Hello, World!.

The source code for this chapter can be found at https://github.com/PacktPublishing/Learn-C-Programming-Second-Edition/tree/main/Chapter20.

Revisiting the main() function

The main() function is, as we have seen, the first place our program begins execution. Recall that before execution begins, various kinds of memory are allocated within our program space. Once the memory is allocated and initialized, the system calls main(), pretty much as we might call any other function. In this regard, main() is a function like any other.

The special features of main()

On the other hand, main() is not just another function in C. The main() function is a special C function with the following unique properties:

  • main() cannot be called by any other function.
  • main() is activated as the first function called when the program is invoked to begin execution.
  • When we return from the main() function, execution stops, and the program is terminated.
  • There are only two forms of the main() function prototype:
    • Has no arguments at all
    • Has exactly two arguments—an int value and a char* array

We will explore the second form of the main() function in this chapter.

The two forms of main()

Up until now, we have been using the first form of main():

int main( void ) { ... }

The second form of main() is as follows:

int main( int argc , char* argv[] ) { ... }

Here, we have the following:

  • argc is the short name for the argument count.
  • argv is the short name for the argument vector.

When our program declares the main() function in the second form, the command-line interpreter processes the command line and populates these two variables, passing them into the main() function body when the system calls main(). We can then access those values through these variable names.

It should be noted that argc and argv are arbitrary names. You might want to use alternative names in main(), as follows:

int main( int argumentCount, char* argumentVector[] ) { ... }

You could even use the following:

int main( int numArgs, char* argStrings[] ) { ... }

The names of the variables used in the main() function definition are not significant. However, what is significant is that the first parameter is an int value (with a name that we choose) and the second parameter is an array of char* (also with a name that we choose). argc and argv are merely common names used in the main() function declaration.

You may sometimes alternatively see main() declared as follows:

int main( int argc, char** argv ) { ... }

This form is equivalent to the others because of the interchangeability of pointer notation and array notation. However, I find using array notation clearer in this case, and, therefore, it is preferred.

We will explore how to use these parameters to retrieve the values from the string vector, or an array of string pointers, argv

Using argc and argv

While we could give alternate names for the argc and argv parameter names, we will use these two names throughout this chapter for consistency.

When we invoke a program, we now see the following:

  • Memory is allocated in the program space.
  • Command-line arguments are processed into function parameters passed into main(), or ignored if those parameters are absent.
  • The execution begins with a call to main().

The first thing to note is that every argument from the command line is broken up into strings. A pointer to the beginning of each string is placed in argv, and the argc counter is incremented. In many cases, string input is sufficient without any further processing. We will explore converting string input into other values in the next chapter, Chapter 21Exploring Formatted Input.

The program name itself is always placed in argv[0]. Therefore, argc will always be at least 1

Each argument is separated by whitespace. We can make a single argument of several space-separated words by enclosing the group of words in either single ('…') or double ("…") quotation marks.

Simple use of argc and argv

We can now explore command-line arguments with the following program:

#include <stdio.h>
void Usage( char* exec )  {
  printf( " usage: %s <argument1> <argument2> ... <argumentN>
" ,
          exec );
  exit(1); 
}
int main(int argc, char *argv[] )  {
  if( 1 == argc )  {
    printf( " No arguments given on command line.

" );
    Usage( argv[0] );
    return 0;    
  }
  
  printf( "argument count = [%d]
" , argc );
  printf( "executable = [%s]
" , argv[0] );
  for( int i = 1 ; i < argc ; i++ )  {
    printf( "argument %d = [%s]
" , i , argv[i] );
  }
  putchar( '
' );
  return 0; 
}

Whenever we process command-line arguments, it is always a good idea to implement a Usage() or usage() function. The purpose of this function is to remind the user of the proper use of the command-line arguments. The Usage() function is usually fairly simple and is typically presented when something goes wrong. Therefore, rather than return from this function, we call exit() to stop any further processing and immediately end the program.

This program first checks whether any arguments have been passed into main() via argv. If not, it prints a usage message and returns; otherwise, it iterates through argv, printing each argument on a line by itself. 

Enter this program into a file called showArgs.c, save it, compile it, and run it with the following invocations:

showArgs
showArgs one two three four five six
showArgs one two,three "four five" six
showArgs "one two three four five six"
showArgs "one two three" 'four five six'
showArgs "one 'two' three" 'four "five" six'

You should see the following output:

Figure 20.1 – Various inputs to showargs.c and outputs

Figure 20.1 – Various inputs to showargs.c and outputs

First, no arguments are given and the usage message is printed. Next, six arguments are given. Notice that because the program name is always counted, argc is 7, even though we only entered six arguments. In the remaining argument examples, various placements of a comma and single- and double-quotation mark pairs are tried. Notice that in the last example, 'two' is part of the first parameter and "five" is included in the second parameter. You may want to experiment further with other variations of delimiters and arguments yourself.

Command-line switches and command-line processors

The showArgs.c program is an extremely simple command-line argument processor. It merely prints out each command-line argument and does nothing else with any of them. In later chapters, we will see some ways that we might use these arguments.

We have been using command-line switches whenever we compiled our programs. Consider the following command:

cc showArgs.c -o showArgs -Wall -Werror -std=c17

We have given the cc program the following arguments:

  • The name of the input file to compile, which is showArgs.c.
  • An output file specifier, which is -o.
  • The name of the output file, which is showArgs. This represents an argument pair where the specifier and additional information are given in the very next argument. Notice how the specifier is preceded by a hyphen (-).
  • The option to provide warnings for all possible types of warnings with -Wall. Notice how this single parameter is preceded by a hyphen (-) but not with a space separator.
  • The option to treat all warnings as errors with -Werror. This has a similar format to -Wall.
  • Finally, the option to use the C++17 standard library with -std=c17, where the specifier is std and the option is c17. Notice how the two parts are separated by an equals sign (=).

This single command exhibits four different types of argument specifier formats—a single argument, an argument pair, two arguments where added options are appended to the specifier, and finally, a specifier using the equals sign (=) to add information to a single argument.

From this, you might begin to imagine how some command-line processors can be quite complicated as they provide a wide set of options for execution. There is no single standard command-line processor, nor a standard set of command-line options. Each set of command-line options is specific to the given program that provides those options. 

It is beyond the scope of this book to delve deeper into the myriad approaches employed for command-line processing. Some approaches are straightforward, while others are quite complex. Most approaches to command-line processing have evolved over time along with the program of which they are a part. New options and new option formats were often required and added. Old options and option formats are rarely discarded. In many cases, the resulting command-line processor has grown to become a tangled web of complex code. Therefore, a high degree of caution is recommended when trying to modify an existing command-line processor.

However, there is one C Standard Library routine, getopt(), and one GNU C Library function, getopt_long(), that is intended to simplify command-line option processing. The older getopt() routine, declared in unistd.h, expects single-character options. The newer and preferred getopt_long() routine is declared in getopt.h; it can process both single-character options as well as whole-word option specifiers. The getopt_long() function is system-dependent, and there are subtle differences in the way it is implemented for GNU C versus POSIX and other system libraries.

To get a flavor of getopt() and how it makes argument processing somewhat simpler, although not completely so, lets explore the following very simple program using getopt():

#include <stdio.h>
#include <stdbool.h>
#include <stdlib.h>
#include <unistd.h> 
static char* options_string = "pra:t:y:"; 
typedef struct _book {
    bool bPublished;
    bool bReprint;
    char* title;
    char* author;
    char* year;
} Book;
void Usage( char* exec ) {
  printf( "
Usage: %s -r -p -t 'title' -a 'name' -y year

" , 
          exec );
  exit(1);
} 
void PrintArgs( int argc , char** argv )  {
  printf( "argument count = [%d]
" , argc );
  printf( "%s " , argv[0] );
  for( int i = 1 ; i < argc ; i++ )  {
    printf( "%s " , argv[i] );
  }
  putchar( '
' );
  putchar( '
' );
}

Before we get to main(), we have declared a Book structure, a Usage() function, and a variation of our previous program in a PrintArgs() function. PrintArgs() does no processing of the arguments; it will only show us the given command line so that we can see how getopt() processes each part. Notice also that we have declared options_string, which we will pass to getopt() to tell it what to expect. Every single letter in this string defines an option. A single character has no additional arguments, whereas a single letter followed by a colon indicates that another argument is expected. getopts() may or may not distinguish between required versus optional arguments; this depends upon the specific system implementation.

options_string specifies p for published, r for reprinted, a for author (which expects an additional argument), t for title (which also expects an additional argument), and y for year (which also expects an additional argument). These arguments will be parsed by calls to getopt() and processed within a switch() statement, where they will be appropriately processed: variables set, files opened, and flags set, for example.

In main(), we perform the option processing in a while()… loop with a switch() statement to handle each case, as follows:

int main(int argc, char *argv[]) {
  char ch;
  Book book = { false , false , 0 , 0, 0 };
  PrintArgs( argc , argv );
  
  while( (ch = getopt( argc , argv , options_string ) ) != -1 ) {
    switch (ch) {
      case 'p':
        book.bPublished = true;  break;
      case 'r':
        book.bReprint = true;    break;
      case 't': 
        book.title = optarg;     break;
      case 'a':
        book.author = optarg;    break; 
      case 'y':
        book.year = optarg;      break;
      default:
        Usage( argv[0] );        break;
    }
  }
  printf( " Title is [%s]
" , book.title );
  printf( "Author is [%s]
" , book.author );
  printf( "Published [%s]
" , book.bPublished ? "yes" : "no" );
  if( book.year ) printf( "  Year is [%s]
" , book.year );
  printf( "Reprinted [%s]
" , book.bReprint? "yes" : "no" );
  if( optind < argc )  {
    printf( "non-option ARGV-elements: " );
    while( optind < argc )  {
      printf( "%s ", argv[ optind++ ] );
    }
    putchar( '
' );
   }  
  return 0;
}

This program first declares a book structure and calls PrintArgs(). It then declares a book structure to hold the values given on the command line. In the main() function, after variables are declared, a loop is entered to process the command-line arguments. This is the core processing loop for parsing and processing each command-line argument. Each time through the loop, getopt() is called to parse the next option and additional argument if needed. Note that the only way we exit this loop is when getopt() returns -1 when it has no more specified arguments. We test for that case in the while()… loop condition, enabling the looping to stop. 

In the call to getopt), the "t:a:y:" parameter string indicates that each single-letter option has an additional value field associated with it. The value of each argument is found in the optarg pointer variable. This pointer variable is declared alongside the implementation of the getopt() function, as well as the opterr, optind, and optopt variables. If an invalid option is encountered, a Usage: message is printed. After exiting the while()… loop, each argument value that getopt() processed is printed. Finally, any remaining command-line arguments are printed; the very last if()... statement then checks to see if there are any additional arguments that have not been specified by options_string. If so, it prints them out.

Note that in this basic example, when we get a valid option, we simply set a member element of the book structure.

Create this program as getopt_book.c. Compile and run it. Try invoking it with the following command lines:

getopt_book -t "There and Back" -a "Bilbo Baggins" - p – y 1955 -r
getopt_book -a "Jeff Szuhay" -t "Hello, world!" -y –2020 -p
getopt_book -x 
getopt_book

You should see the following output:

Figure 20.2 – Various inputs to getopt_book.c and outputs

Figure 20.2 – Various inputs to getopt_book.c and outputs

Note how several variables have already been declared by getopt(). These include optarg, to hold a pointer to the argument value string, and optind, to keep track of which index in the argument list is being processed.

In the repository, there is an additional example of a working GNU C getopt_long() program called getopt_long_CNUC.c. It is there for your continued exploration. You will see that it is quite a bit more flexible than getopt(). That program exercises the various types of required and optional arguments.

Summary

We have explored the simplest way to provide input to our programs via the command line. We first specified how the main() function can receive arguments that contain the count and values of arguments given to the program. We saw how argc and argv are related, and how to access each argv string. A simple program to print out arguments given to it was provided for further experimentation. We noted how all arguments are passed into main() as strings. Once we access those arguments, we can perform further processing on them to alter the behavior of our program. Finally, a very simple command-line processor was provided to demonstrate the use of the getopt() C Standard Library function.

In the next chapter, we will explore a more comprehensive way to receive input from the user while a program is running. Just as printf() writes formatted data from program variables to the console (screen), the scanf() function reads and formats data from the console (keyboard) into program variables.

Questions

  1. How many forms of main() are there?
  2. What does argv provide to our program?
  3. Is there a single standard command-line processor to parse and process argv?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset