CHAPTER 14

Command Line and
Shell Interaction

This chapter and the next explore interacting with Perl through the command line. This is a broader subject than it might at first appear, since it covers everything from command-line processing to properties of shells and terminal programming. A desktop icon can be configured to trigger a command too, so even in a graphical world command-line processing can still be relevant. Perl was partly inspired by Unix commands like sed and awk and so inherits a lot of shell-like sensibilities and syntax. Not surprisingly, therefore, Perl turns out to be quite good at command-line processing.

We begin by looking at how we read command-line options into our Perl programs, starting with the special array @ARGV and then the Getopt::Std and Getopt::Long modules. Following that we see how to find and possibly set the invoked program name and look at examining and affecting the environment with the special hash %ENV. Finally, we look at various ways that we can create command shells written in Perl itself, as well as blending features of the underlying shell into Perl.

Parsing Command-Line Arguments

When any Perl program is started, Perl passes any command-line arguments specified to the program in the special array @ARGV, which is automatically predefined by Perl to contain the list of arguments before execution starts. Perl does not perform any special processing on passed arguments, nor does it look for special arguments—how we deal with passed arguments is entirely up to us. Of course, Perl scripts can still be started with arguments even when they are being run from other programs, so all this still applies even if our script is not being started interactively.

Command-Line Conventions

There is, of course, more than one syntax for command-line options, depending on the shell and platform in use. Although there are several different conventions, they all work on the idea of options (sometimes called switches) and values. In the Unix world, an option is an argument that is, traditionally, one character long, prefixed with a minus, potentially accompanied by a following value:

> program -a 1 -b -2 -c -d

Options and values are distinct because options have a minus prefix, whereas values do not. If an option is followed by another option, it doesn't have a value. We can also have trailing arguments, values that do not belong to an option and have meanings in their own right.

> program -a 1 -b -2 -c -d file.txt file2.txt

How do we know if file.txt is a value belonging to the -d option or a stand-alone argument? We don't, just by inspection. The documentation and usage information for this program will have to tell us.

In general, the fewer characters we have to type to execute a command, the better. As it happens, the POSIX standard defines a fairly flexible convention for single-letter, single-minus options that allows bundling of valueless options and also permits values to follow options directly (with no space) or after an equals sign:

> program -a1 -b=-2 -cd

The advantage of eliminating a space as the option/value separator is that we can specify otherwise tricky values like negative numbers without ambiguity. The option -cd is a bundle of the options -c and -d. Because options are always single letters, and neither of these example options takes a value, the application can determine that the option is really a bundle of two.

A more recent convention for processing command-line arguments is provided by so-called GNU long options. In the following example, a double minus is used as the prefix for long option names that may be descriptive words rather than just single letters:

> program --option1=1 --option2=-2 --option3 --option4

Long names break the single-letter rule for option names and so prevent both bundling and appending a value directly to an option. The advantage is that they are easier to remember and considerably easier to understand. For instance, a long option like --verbose is much more comprehensible than -v. If we do have so many options that we are running out of letters, a friendly program will provide both, one for clarity and one for speed of typing once the user has learned how to use the command.

In these examples, we used - and --, but strictly speaking these are a Unix convention. In the Windows world, the original prefix was the forward slash, so we would be typing commands like this:

> program /a1 /b=-2 /c /d

More modern Windows command-line applications, particularly those inspired by or ported from Unix, understand both - and /. However, prefixes aside, the principle is just the same. It is important to realize that, unlike wildcards, - and / are not characters interpreted by the shell (be it a DOS or Unix shell). They are just a convention for the application itself to take note of and handle itself, and as such we are free to handle any kind of prefix or prefixes we like. Adhering to a consistent and recognizable convention is just friendlier for anyone else who wants to use our programs.

Another aspect of Windows shells that can cause problems is that, in general, they do not understand single quotes as a Unix shell would. First this means that whenever we need to include quoted text inside a value, we must use double quotes for both the value and the internal text and make sure to escape internal quotes with backslash characters. Second, characters with meaning to the shell also have to be escaped; we can't rely on single quotes to prevent the shell from expanding values via interpolation as we can in Unix. Perl's own -e option, which takes arbitrary Perl code as its value, is particularly vulnerable to issues like this.

Simple option processing is easy to arrange in Perl without additional support just by inspecting the contents of the @ARGV array. However, once we start adding concepts like options that may take optional values, options that may be defined many times, or options that only accept numeric values, things become more complex. Fortunately, a lot of the hard work in defining consistent command lines can be done for us by the Getopt::Std and Getopt::Long modules. Both these modules provide extensive support for command-line argument processing, as well as syntax checking.

The @ARGV Array

The @ARGV array contains all the arguments that were passed to our program when it was started. The definition of what defines an "argument" depends not on Perl, but on the shell that was used to start our program. However, in most cases spaces separate arguments from one another (Unix shell programmers may care to look up the IFS environment variable in the shell documentation). For example, if we were to run a Perl program called myscript like this:

> perl myscript -u sventek -p cuckoo

this results in an @ARGV array containing the four values -u, sventek, -p, and cuckoo. Unlike the C language, the first argument is not the program name. Instead, Perl places that information in the special variable $0 (or $PROGRAM_NAME with the English module), which is the convention used by Unix shells. Unlike shells, Perl does not put the next argument in $1, and it actually uses $1 and onward for quite a different purpose, as we saw in Chapter 11.

Perl's reason for assigning the program name to $0 and removing it from the argument list is twofold. In part, it is in deference to shell programming, but more significantly it allows us to do clever things with @ARGV that the presence of the program name would make inconvenient. Most notable among these is reading input from files passed in @ARGV automatically using the readline operator. From the shell programmer's perspective, @ARGV is similar to the shell variable $* with the program name removed. To C or C++ programmers, @ARGV is a smarter version of char *argv[], where we no longer need a separate int argc to tell us how many arguments the array contains.

To check the number of arguments, we can either use scalar or find the highest element in the array (remember that Perl indexes from zero):

scalar(@ARGV);  # number of arguments passed
$#ARGV;         # highest element = no. of arguments -1

In the preceding example, we are obviously passing arguments that are key-value pairs, so we would probably want to turn them into a hash. This won't work if an odd number of elements were passed, so we need to check for that before blindly turning @ARGV into a hash. We probably ought to also check that all the keys start with a minus, since that is the convention we are following. The following code handles all these issues:

#!/usr/bin/perl
# turntohash.pl
use warnings;
use strict;

my %args;

if (scalar(@ARGV)%2) {
   die "Odd number of arguments passed to $0";
} else {
   %args = @ARGV;   # convert to hash
   foreach (keys %args) {
      # check each of the keys
      die "Bad argument '$_' does not start with -" unless /^-/;
   }
}

This example has its limitations, however. It doesn't handle multiple instances of the same argument. For example, it overrides earlier definitions with later ones. Additionally, we don't make any attempt to check that the arguments have valid names, though we could easily do that within the loop. However, for simple argument parsing, it does the job without requiring the help of an external module like Getopt::Std.

Passing Arguments to Perl Itself

Since Perl passes everything supplied on the command line to our own programs, we cannot pass arguments to Perl itself when we run a script directly, although the shebang line (typically #!/usr/bin/perl) handles this problem for us elegantly, if we are able to use it. Otherwise, there are two possible solutions to this problem. One is to simply start our script via Perl while supplying any arguments we like to Perl:

> perl -w -Mstrict -MMyModule myperlscript -u sventek -p cuckoo

Note that Perl determines the name of the program to run, myperlscript, because it is the first bare argument (that is, first value that does not belong to a preceding option) to appear on the command line. All arguments that appear after it are deemed arguments to the program, not to Perl, and so this command does not do what we want:

> perl -w myperlscript -Mstrict -MmyModule # ERROR: -M passed to script, not Perl

The only rule here is to ensure that options meant for Perl are specified early enough on the command line.

If the name of the program (in this case, myperlscript) happens to resemble a command-line option or an optional value for the preceding option, we can make it explicit by using the special -- argument:

> perl -Mstrict -MMyModule -w -- myperlscript -name sventek -pass cuckoo

The -- argument is a convention, at least in the Unix world, which means that nothing should be processed beyond this point. Perl uses -- to separate its own arguments from the script name and the script's own arguments. When using the -s option (see the next section) to set variables from the command line, -- should also be used to stop feeding arguments to -s and to leave them in @ARGV. The program itself is placed after, not before, the --.

We can also specify arguments to pass to Perl by setting the environment variable PERL5OPT:

setenv PERL5OPT "-Mstrict -MPOSIX -w"   # Unix (csh)
export PERL5OPT="-Mstrict -MPOSIX -w"   # Unix (ksh/bash)
PERL5OPT="-Mstrict -MPOSIX -w"; export PERL5OPT   # Unix (older ksh)
set PERL5OPT = -Mstrict -MPOSIX -w   # DOS/Windows

Perl sees the value of PERL5OPT when it starts, and this is used with every Perl script, removing the need to type it in on a program-by-program basis. See Chapter 1 for more on Perl's environment variables.

Setting Variables from @ARGV

The -s option causes Perl to cease scanning the command line for options and to treat all options after it as variables set to the value following the option, or 1 if there is no following value. For example:

> perl -s -debug -- myscript.pl

This sets the variable $debug inside the script myscript.pl to the value 1. Alternatively, to set a different debug level, we could use

> perl -s -debug = 2 -- myscript.pl

The $debug variable in this example is a global package variable; we can access it from within the script as $main::debug, or declare it with use vars or our (the preferred way from Perl 5.6 onward). There is no limit to how many variables we may specify in this way.

An interesting, if slightly odd, use of -s is in the shebang line of a script:

#!/usr/bin/perl -s
...

This causes Perl to interpret any command line passed to the script as a series of variables to define, and is potentially useful for debugging applications that otherwise do not take arguments.

Reading Files from @ARGV

One of the most common classes of command-line utilities consists of scripts that process one or more files given as arguments. The Unix commands cat, more, and strings all fall under this banner, as do the DOS utilities type, dir, and del.

Since this is such a common use of command-line arguments, Perl caters for it with a special shortcut when we use the <> operator to read from standard input. Specifically, if we use <>, Perl tries to open each element in @ARGV as a file in turn and returns the lines read. If @ARGV is empty, <> reads from standard input. This allows us to write incredibly terse and concise scripts because we can eliminate the need to open filehandles or check that the arguments are really files; Perl will handle it for us. For example, here is a simple version of the Unix cat or DOS type command implemented in Perl:

print while <>;

Short, isn't it? We can try out this one-line program on any text file, and it will dutifully print out its contents. It works because <> attempts to read input from the contents of the @ARGV array, taking each entry to be a file name. Before we continue, it is worth comparing the preceding code with this very similar-looking example:

print <>;

While this produces the same result, it does so by evaluating <> in list context, which has the effect of reading all the lines in all the files into memory first, and then supplying that list of lines to print. As a result, it could consume enormous quantities of memory, depending on the files involved. The first example reads one line at a time only, so it does not have this problem. Returning to the first example, if nothing is contained in @ARGV, then standard input is used instead. Either way, the while loop aliases each line read to $_, which the print prints out:

> perl -e "print while <>" file1.txt file2.txt

Indeed, this is such a common task that Perl lets us place an implicit while (<>) {...} loop around code with the -n or -p options (other options that work in conjunction with -n and -p are -a, -l, and -o). So we could just have said

> perl -ne "print" file1.txt file2.txt

Just as with normal file reads, the line count variable, $., also works here, so we can print out a file with line numbers by modifying our program to read

#!/usr/bin/perl
# cat.pl
use warnings;
use strict;

print "$. : $_" while <>;

Or, as a command-line script:

> perl -ne 'print "$.:$_"' file1.txt file2.txt

Note the single quotes, which allow us to use double quotes for the interpolated string. In Windows shells, single quotes do not work this way, and so we must use double quotes and escape the inner double quotes:

> perl -ne "print "$.:$_"" file1.txt file2.txt

This version will also work fine in Unix, but involves more punctuation than is really necessary. Another option would be to use the qq quoting operator, of course:

> perl -ne "print qq[$.:$_]" file1.txt file2.txt

This all very well, but we can't tell where the first file ends and the second begins. The current file being processed is stored in the scalar variable $ARGV, so we can improve our one-line program still further by including the file name too:

print "$ARGV:$.:$_" while <>;

Note that if we pass more than one file, the script will happily read each of them in turn under the same filehandle. We do not see it directly in the preceding examples, but it is defined automatically by Perl and made available to us, if we need to refer to it, as ARGV. Perl does all the opening and closing of files read in this way behind the scenes, so all the files are treated as being part of the same file access from our point of view. This is why variables like $. do not reset from one file to the next. Depending on what we want to do, this can be an advantage or a problem, but if we want to fix this issue we can do so with the eof function:

#!/usr/bin/perl
# bettercat.pl
use warnings;
use strict;

while (<>) {
   print "$ARGV:$.:$_";
   close (ARGV) if eof;
}

This works by closing the current file (via the automatically defined ARGV) if there is no more data to read in it. Calling eof without a parameter or parentheses applies it to the file last read, which here happens to be the same file pointed to by ARGV. We could also have said eof(ARGV) to produce the same effect, but note that eof() with empty parentheses is quite different from eof with no parentheses—it will only detect the end of all input, or in other words, the end of the last file.

We can manipulate the @ARGV array before using <> to read the file names in it any way we like, for example, to remove non–file-name parameters. Here is a simple string-matching program in the style of the Unix grep command that does just that. The first argument is the pattern to search for. Anything after that is a file to search in, so we just remove the first argument with shift and let <> see the rest:

#!/usr/bin/perl
# simplegrep1.pl
use warnings;
use strict;

die "Usage: $0 <pattern> <file> [<file> ...] " unless scalar(@ARGV)>1;
my $pattern = shift @ARGV;   # get pattern from first argument
while (<>) {
   print "$ARGV:$. $_" if /$pattern/o; #o - compile pattern once only
   close (ARGV) if eof;
}

Note that when we come to use this program, * will work fine for Unix shells, since they automatically expand the wildcard and pass an actual list of files to our program. On Windows systems, the standard shell is not so smart and just passes the * as is. If we want to trap these instances, we can check for occurrences of *, ?, and so on and use the glob function (covered last chapter) in conjunction with the File::DosGlob module to make up for the shell's shortcomings.

@ARGV and Standard Input

In the previous example, the script needs two arguments to perform its function, so die is executed if fewer are received. This ensures the user has entered both a pattern and a file name before the program attempts to carry out its job. But what if we want to have the program read from standard input and not a file?

Handily, Perl can be made to open standard input in place of a file with the special file name -. We can insert this into @ARGV before we use <> in the event that only one argument is passed. The result is that the <> operator always gets a file name to read from, but the file name in this case is really the standard input to the program. Here's a modification to the preceding script that handles this possibility:

$pattern = shift @ARGV;
@Code:die "Usage: $0 <pattern> [<file> ...] " unless @ARGV>1;
@ARGV=('-') unless @ARGV;   # not actually necessary - see below

When Perl sees the file name - in @ARGV, it interprets it as a request to read from standard input. We can even supply it in the command line, which allows us to use the script in these (admittedly Unix-like) ways:

> cat filename | simplegrep pattern -
> simplegrep pattern - < filename

In fact, the explicit line to add - to @ARGV in the preceding example is not needed because Perl will do it for us automatically if nothing is present in @ARGV at all. If we print $ARGV, we can see that this is the case. This happens when <> is first used, so as long as @ARGV is empty before we use the readline operator, standard input is taken care of for us, and all we have to do is change the usage line to allow only one argument:

#!/usr/bin/perl
# simplegrep2.pl
use warnings;
use strict;

die "Usage: $0 <pattern> [<file> ...] " unless scalar(@ARGV);
my $pattern = shift @ARGV;   #get pattern from first argument
while (<>) {
   print "$ARGV:$.$_" if /$pattern/;
   close (ARGV) if eof;
}

We do not get this automatic use of - if @ARGV has any values in it. In these cases, we could push - onto the end of the array to have a script automatically switch to reading standard input once it has exhausted the contents of the files named explicitly beforehand.

Simple Command-Line Processing with Getopt::Std

Hand-coded command-line processing is fine when only a few relatively simple command-line arguments are required, but it becomes tricky to handle a larger number of arguments or a more complex syntax. In these cases, it is a far better idea to make use of one of the Getopt modules to simplify the task. Fortunately, the standard Perl library comes with two modules specifically designed to simplify the task of reading and parsing command-line arguments: Getopt::Std and Getopt::Long. While both modules modify @ARGV in the process of extracting options from it, neither module precludes reading files through <> as described earlier. After the command-line options are processed, anything left in (or inserted into) @ARGV can then be used with <> as before.

The Getopt::Std module is a simpler and lighter-weight module that provides support for single-character arguments, with or without values, in compliance with the POSIX standard. It is also in the style of the getopt feature of Unix shells from which it and its larger sibling derive their names. Parsed arguments are defined as global scalar variables based on the argument name or, if supplied, stored in a hash as key-value pairs.

The Getopt::Long module is a much more comprehensive and larger module that provides support for long argument names, stricter argument value checking, abbreviations, aliases, and other various features. It supports both POSIX-style arguments and (as its name suggests) GNU long arguments, but is possibly overkill for simple scripts. Since it provides a superset of the features of Getopt::Std, we will cover the simpler module first.

Getopt::Std provides two functions to define the list of expected arguments. The first is getopt, which allows us to specify a set of options that take parameters. Any other options (as defined by the fact that they start with -) are considered to be Boolean options that enable something and take no argument. The second is the more versatile getopts, which allows us to explicitly define both Boolean and value options and so can emit a warning about anything that does not appear to be either.

Basic Processing with getopt

The function getopt lets us define a set of arguments that all take an optional value. It is somewhat simplistic in its operation and exists as a compatible Perl equivalent of the getopt command implemented by all Unix shells. To use getopt, we supply it with a list of letters that correspond to the options that we wish to process. Any option that is not in the list is assumed to be a Boolean flag. For example:

use Getopt::Std;
getopt("atef");

This defines the options -a, -t, -e, and -f as arguments that take parameters. Then getopt will accept a value immediately after the argument, or separated by either a space or an equals sign. That is, all of the following are acceptable:

-abc
-a = bc
-a bc

When a program containing this code is called, the command line is parsed and a global scalar variable of the form $opt_X is set, where X is the name of the argument. If we create a script containing the preceding code and feed it some arguments, we can see this in action:

> perl myscript -a bc -e fg -k 99

This creates three global scalar variables, assuming that Perl will allow it. If we have use strict or use strict vars in effect, then we need to predeclare these variables with our (or use vars, prior to Perl 5.6) in order to avoid a run-time error. The equivalent direct assignments would have been

$opt_a = "bc";  # option a given value bc
$opt_e = "fg";  # option e given value fg
$opt_k = 1;     # 'k' not in list of arguments, therefore Boolean

The moment getopt sees something that is not an option or an option value, it terminates and leaves the remainder of @ARGV untouched. In this example, @ARGV is left holding the trailing argument 99, because it is neither an option nor a value belonging to the preceding valueless -k option.

Creating global scalar variables is inelegant. As an alternative, we can supply getopt with a reference to a hash as its second argument, say %opt. This causes getopt to populate the hash with the parsed values instead. The processed arguments appear as keys and values in %opts. Again, given the same example arguments as before, $opts{'k'} is defined to be 1 and @ARGV ends up containing 99 as the only unprocessed argument. The following script shows this in action and also prints out the parsed arguments and whatever is left in @ARGV afterwards, if anything:

#!/usr/bin/perl
# getopt.pl
use strict;
use warnings;

use Getopt::Std;

my %opts;

getopt("atef",\%opts);

print "Arguments: ";
foreach (keys %opts) {
   print " $_ => $opts{$_} ";
}

print "ARGV: ";
foreach (@ARGV) {
   print " $_ ";
}

Let's execute this script and review the results:

> perl getopt.pl -a bc -e fg -k 99


Arguments:
    e => fg
    a => bc
    k => 1
ARGV:
    99

It is worth noting that if we had put -k 99 as the first argument in the list, the 99 and everything following it, including the -a and -e options and their arguments, would have remained unprocessed in @ARGV.

If we don't specify a value for an argument that takes one, it defaults to 1, so -a and -a1 are effectively equivalent.

Smarter Processing with getopts

The more advanced getopts allows us to define both Boolean and value options. It can also check for invalid options, which immediately makes it more useful than getopt, and permits us to bundle options together. This provides POSIX-compliant command-line processing. It is inspired by and similar to the getopts built-in command of more modern Unix shells like bash.

Like getopt, options are defined by a string containing a list of characters. This time, however, value options are suffixed with a colon, with any letter not so suffixed taken to be Boolean. To define three Boolean flags, a, e, and t, and a value option f, we would use

getopts ("aetf:");   # 'aet' Boolean, 'f' value, defines $opt_X scalars

if ($opt_a) {
    ...
}

Like getopt, getopts takes an optional second parameter of a hash to hold the parsed values; otherwise, it defines global scalar variables of the form $opt_X.

getopts("aetf:",\%opts);   # ditto, puts values in %opts

The order of letters is not important, so the following are all equivalent:

getopts("f:ate");
getopts("af:et");
getopts("taf:e");

Any option that is not specified in the list will cause getopts to emit the warning Unknown option :X, where X is the option in question. Since this is a warning, we can trap it using one of the techniques discussed in Chapter 16, for example, by defining and assigning a subroutine to $SIG{__WARN__} if we want to process the unrecognized option ourselves or make the warning fatal by turning it into a die.

We mentioned bundles a moment ago. Bundling is the term used to describe several single-letter arguments combined into a contiguous string, and it is applicable only if all but the last of the concatenated arguments do not take a value. Unlike getopt, getopts permits bundling (but at the cost of not permitting the value to follow immediately after the option). That is, instead of entering

> perl myscript -a -t -e

we can enter

> perl myscript -ate

A value option can be bundled, but only if it is the last option in the bundle. It follows from this that we can only permit one value option in any given bundle. With the specification f:ate, we can legally enter

> perl myscript -aetf value   # ok
> perl myscript -taef value   # also ok, different order

but not

> perl myscript -fate value   # value does not follow f argument

The -- argument is recognized and processed by Getopt::Std, causing it to cease processing and to leave all remaining arguments in @ARGV. The -- itself is removed. We can pass on the remaining arguments to another program using system, exec, or open, or read them as files using <> if we wish:

> perl myscript -aetf value -- these arguments are not processed

This leaves @ARGV containing these, arguments, are, not, and processed.

Advanced Command-Line Processing with Getopt::Long

The Getopt::Long module performs the same role as Getopt::Std but with better parsing, error checking, and richer functionality. It handles single-letter options—including bundling—and in addition supports GNU long options.

The key distinguishing feature between the two modules is that Getopt::Long accepts the double-minus–prefixed long-option naming style. To illustrate what we mean, here are two versions of a putative length argument. Only the first can be handled by Getopt::Std, but both can be parsed by Getopt::Long:

-l   # traditional short option
--length = value   # more descriptive long option

The Getopt::Long module is very flexible and implements a number of optional features such as abbreviation, case sensitivity, and strict- or loose-option matching. In order to control the behavior of the module, we can make use of the Getopt::Long::Configure subroutine.

Basic Processing with GetOptions

The module Getopt::Long defines one function, GetOptions, to parse the contents of @ARGV. In its simplest form, it takes a list of options and scalar references, placing the value of each option into the corresponding reference. Without additional qualification, each option is handled as a Boolean flag and the associated scalar is set to 1 if seen. The following code snippet defines two Boolean options, verbose and background:

#!/usr/bin/perl
# definebool.pl
use warnings;
use strict;

use Getopt::Long;

my ($verbose, $background);   # parse 'verbose' and 'background' flags
GetOptions (verbose => $verbose, background => $background);

print "Verbose messages on " if $verbose;

After this code is executed, the variables $verbose and $background are either undefined or set to the value 1. We can easily use them in conditions, as illustrated previously.

If the command line was processed successfully, GetOptions returns with a true value; otherwise, it returns undef. We can therefore use it in conditions and terminate the program if all is not well. For example:

# print some help and exit if options are not valid
usage(), exit unless GetOptions (verbose => $verbose, background => $bg);

A warning will be raised by GetOptions for anything that it does not understand, so we are saved from the task of having to describe the problem ourselves (although we still have to provide the usage subroutine, presumably to list the valid options and command-line syntax to help the user).

If we do not supply a reference, GetOptions will define a global scalar with the name $opt_<option name> instead, in the same manner to Getopt::Std. This mode of use is generally deprecated on the basis that defining global variables is not good programming practice. GetOptions also accepts a hash reference as its first parameter and will store parsed arguments in it, if present. This is similar to Getopt::Std, but the arguments are inverted compared to getopt or getopts:

#!/usr/bin/perl
# hashref.pl
use warnings;
use strict;

use Getopt::Long;

my %opts;
GetOptions(\%opts, 'verbose', 'background'),

One special case bears mentioning here. We might want to handle the case of a single minus (conventionally used to mean "take input from standard input, not a file"), as used by several Unix commands and demonstrated earlier with Getopt::Std. We can do that using an option name of an empty string:

GetOptions('' => $read_from_stdio);

For the time being we have limited our discussion to scalars, but GetOptions is also capable of handling multiple values in both list and hash forms. We'll see how to do that a little later, after we have dealt with option prefixes and defining option values.

Option Prefixes

With the exception of the single bare minus option, options can be specified with either a single or a double minus, or if we define it first, any prefix we like. The double-minus prefix is treated as a special case compared to all other prefixes, however. Options that have been prefixed with a double minus are treated as case insensitive by default, whereas all other prefixes are case sensitive, though we can alter case sensitivity using Getopts::Long::Config. This means that --o and --O both define the option o, whereas -o and -O define the options o and O. The double-minus prefix is also treated differently in option bundling.

The archaic prefix + is also accepted by default but is now deprecated. We can explicitly disallow it, as well as redefining the prefixes we do allow (for instance the backslash), by specifying our own prefix. There are two ways to do this, the first of which is to specify the new prefix as the first argument to GetOptions. However, this is deprecated in modern usage. A better way is to use the Getopt::Long::Configure subroutine. To redefine the prefix this way, we put something like

# configure a prefix of '/'
Getopt::Long::Configure ("prefix=/");

The second way to specify a prefix is to configure the prefix_pattern option of Getopt::Long, which allows us to specify a range of prefixes. It takes a regular expression as an argument, so we need to express the options we want to allow in terms of a regular expression. To allow single, double, and backslash prefixes, we can use (-|--|/), as in this example:

# configure prefixes of --, - or / (but not +
Getopt::Long::Configure ("prefix_pattern=(--|-|/)");

Note that because prefix_pattern is used in a regular expression, we must use parentheses to encapsulate the options and escape any characters that have special significance for regular expressions.

An alternative to simply disabling a prefix is to handle it ourselves before calling GetOptions. For example, a few utilities use + to explicitly negate an option. We can handle that by replacing + with --no and defining the option to be Boolean:

#!/usr/bin/perl
# negate.pl
use warnings;
use strict;

use Getopt::Long;
# translate + negation to Getopt::Long compatible --no negation
foreach (@ARGV) {
    s/^+/--no/;   # substitute elements of @ARGV directly
}

my %opts;
GetOptions (\%opts, 'verbose', 'background'),

This requires defining the options we want to be able to negate as negatable Boolean values. We see how to do that, as well as handle other types of option, next.

Defining Option Values

All the options we have seen so far have been Boolean options. By adding extra information to the option name in the form of attributes, we can define (and enforce) different kinds of options, including negatable options, incremental options, and integer value options.

Negatable Boolean Options

A negatable Boolean option is one that can be switched off as well as switched on. The Boolean options we have seen so far are one-way—once we have specified what they are, they cannot be undone. That might seem like a strange thing to want to do, but if we are editing a previous command line or calling an external program from within another, it is often more convenient to disable an option explicitly than check through the arguments to see if it has been set to start with. Some features just make more sense enabled by default too, so we might want to define options simply to turn them off.

Negatable Boolean options are defined by suffixing the option name with an exclamation mark. We can use this to create an option that is normally on, but which we can turn off by prefixing the option name with no:

$quiet = 1;
GetOptions ("quiet!" => $quiet);

This now allows us to specify -noquiet to switch off quiet mode:

> perl myscript -noquiet

And -quiet to turn it on again:

> perl myscript -noquiet -otheroption -quiet

Sometimes it is useful to know whether an option variable is not set because that is its default value or because it was explicitly cleared by the option. Since disabling a negatable option sets the corresponding value to zero, setting the original value to the undefined value allows us to check whether the option was specified on the command line or not:

#!/usr/bin/perl
# check.pl
use warnings;
use strict;
use Getopt::Long;

my $option = undef;   # make it undef explicitly, just to make it clear
GetOptions ("option!" => $option);

if (defined $option) {
   # the option was seen on the command line
} else {
   # the option was not specified
}

Since undef evaluates as false in a conditional context, we can still determine whether an option has been set in places where we don't care whether the option was specified or not, and still retain that information for use in places where we do. If we are using a hash to define all our options, then there is no need to go this far; we can just test the option name with exists to see if it has been set:

#!/usr/bin/perl
# exists.pl
use warnings;
use strict;

use Getopt::Long;
my %opts;
GetOptions (\%opts, 'option!'),
if (exists $opts{'option'}) {   # the option was seen on the command line
}

Incremental Options

Incremental options increase by one each time they are seen on the command line, starting from the original value. A classic case of such an option is a verbose flag, where the level of information a program returns increases according to the level of verbosity set, which is equal to the number of verbose options we use.

In order to prevent Perl returning an undefined value error, the starting value of an incremental option variable should be initialized to a defined value, most probably zero. Here is an example of implementing a verbose option as an incremental option:

#!/usr/bin/perl
# increment.pl
use warnings;
use strict;

use Getopt::Long;

my $verbose = 0;   # default verbosity = off
GetOptions ("verbose+" => $verbose);

Now, to set different levels of verbosity, we just specify the option the required number of times:

> perl increment.pl -verbose   # $verbose == 1
> perl increment.pl -verbose -verbose   # $verbose == 2
> perl increment.pl -verbose -verbose -verbose   # $verbose == 3

In fact, we can save a lot of typing and just specify -v several times, or even just -vvv, because GetOptions can also handle abbreviations for us, as we will see.

Integer, Floating-Point, and String Options

To define an option that takes a value, we modify the option name by suffixing it either with an equals sign for a mandatory value or with a colon if the value is optional. Following the equals sign or colon we then specify s, i, or f to acquire a string (that is, anything other than a space), an integer, or a floating-point value:

GetOptions("mandatorystring=s"  => $option1);
GetOptions("optionalstring:s"   => $option2);
GetOptions("mandatoryinteger=i" => $option3);
GetOptions("optionalfloat:f"    => $option4);

The Getopt::Long module allows options and values to be separated by either a space or an equals sign. In most cases, it does not matter which we use, with the single exception of negative numbers (more about these in a moment):

--mandatorystring = text -nextoption ...
--mandatorystring text -nextoption ...

The distinction between a mandatory and optional value is, of course, that we can omit the value if it is optional. If we specify an option (say, -mandatorystring) but leave out a mandatory value, GetOptions emits a warning:


Option option requires an argument

The integer and floating-point variations are similar, but check that the supplied value is an integer or floating-point value. Note that we cannot supply a hexadecimal (or octal or binary) integer. This will cause GetOptions to emit a warning, for example:


Value "0xff" invalid for option integer (number expected)

Options with optional values will parse the following argument only if it does not look like an option itself. This can be important if we want to accept a negative integer as a value. For example, consider the following option and value, as typed on a command line:

> perl myscript -absolutezero -273

If we define absolutezero as a mandatory value (say, an integer) with a name of absolutezero=i, then -273 is interpreted as the value of absolutezero by GetOptions. However, if we make the value optional with absolutezero:i, then GetOptions will interpret the - of -273 as an option prefix and assume that absolutezero has no value.

We can solve this problem in three ways. The first, as we have just seen, is to make the value mandatory by specifying the name with an equals sign. The second is to use = as a separator between the option name and the value. For example:

> perl myscript -absolutezero=-273

The last is simply to disallow the - character as an option prefix by redefining the prefix (or prefixes) that GetOptions will recognize, as we discussed earlier.

Abbreviations

GetOptions automatically performs abbreviation matching on its options. That is, if an option can be abbreviated and still be uniquely identified, we can abbreviate it all the way down to a single character, so long as it is still unique. For example, the option verbose, if defined on its own, can be specified as -verbose, -verbos, -verbo, -verb, -ver, -ve, or -v. If we want to prevent this behavior, we use the Configure subroutine to disable it:

Getopt::Long::Configure("no_auto_abbrev");

Abbreviation down to single characters is great, but this doesn't work if we have two options that start with the same letter such as

GetOptions(verbose => $verbose, visible => $visible);

To specify either option, we now have to specify at least -ve or -vi respectively. The best way to avoid this problem is simply to give our options more distinct names, but if we can't avoid it, we can optionally define an alias.

Aliases

Aliases take the form of a pipe-separated list of names (|). For example, to provide an internationally friendly color option, we could use

GetOptions("color|colour" => $color);

The first name in the list is the true name, in the sense that this is the name used to define the $opt_N variable or the key of the options hash (if specified) supplied as the first argument. Any of the names in the list will now set this option.

Similarly, we can use an alias to allow one of two options to be recognized by a single letter if neither can be abbreviated:

GetOptions("verbose|v" => $verbose, "visible" => $visible);

Now we can say -v for verbose and -vi for visible. Note that if we want to combine an alias list with an option value specifier, we just put the specifier on the end of the list—we don't need to apply it to every alias. The following short program implements an incrementable verbose option and a negatable visible option:

#!/usr/bin/perl
# visible.pl
use warnings;
use strict;

use Getopt::Long;

my ($verbose, $visible) = (0, −1);


GetOptions(
   "verbose|v+" => $verbose,
   "visible!" => $visible,
);

print "Verbose is $verbose ";
print "Visible is $visible ";

Interestingly, since visible is negatable as novisible, and it is the only such option, we can abbreviate it to just -nov. Even just -n will work in this case, as there are no other options that begin with n.

Handling Option Values

We have already seen how to read option values into scalar variables, and we mentioned at the time that GetOptions can also handle multiple values as lists and hashes. It does this in a rather cunning way by checking the type of reference that we supply for each option and handling it as appropriate. Take the following short program:

#!/usr/bin/perl
# filenames.pl
use strict;
use warnings;

use Getopt::Long;

my @filenames;

GetOptions("file=s" => @filenames);

print scalar(@filenames)," files entered ";

foreach (@filenames) {
    print " $_ ";
}

We can specify the file option as many times as we like, the result of which is a list of file names held in the array @filenames. We don't even have to be consistent about the prefix:

> perl filename.pl -f foo.txt --f bar.doc -file baz.pl --file clunk.txt

This doesn't allow us to pass several values to a single option, however. If we wanted to do that, we could use a comma as a separator and then use split ourselves after GetOptions has done its work. If that seems inconvenient, wait until we come to handling values via code references . . .

In a similar manner to handling a list of values by supplying an array reference, we can also handle a list of key-value pairs by supplying a hash reference. When GetOptions sees a hash reference, it automatically looks for an equals sign in the value and tries to split it into a key and value:

#!/usr/bin/perl
# config.pl
use warnings;
use strict;

use Getopt::Long;

my %config;

GetOptions("config=s" => \%config);

print scalar(keys %config)," definitions ";
foreach (sort keys %config) {
    print " $_ => $config{$_} ";
}

Now we can use the config option several times to build a hash of configuration variables:

> perl config.pl --config verbose = 3 --config visible = on

GetOptions also allows a code reference in place of a reference to a scalar, array, or hash. This lets us do in-line processing of values as GetOptions processes them. For example, to allow comma-separated values to define a list, we can define a subroutine to split the supplied value and plug it into the target array:

#!/usr/bin/perl
# splitval1.pl
use warnings;
use strict;

use Getopt::Long;

our @file;   # same name as option, 'use vars @file' if Perl < 5.6

sub parsevalue {
    # allow symbolic references within this sub only
    no strict 'refs';

    my ($option, $value) = @_;
    push @$option, split(',', $value);
}

GetOptions("file=s" => &parsevalue);

print scalar(@file)," files entered ";
foreach (@file) {
    print " $_ ";
}

In this example, we have defined a subroutine parsevalue and given its reference to GetOptions. When it encounters a file option, it passes the name of the option (file) and the value to parsevalue as parameters. In turn, parsevalue splits the value using a comma as the separator and pushes the result onto a variable with the same name as the option. To achieve that, we have used a symbolic reference for which we have to (within the subroutine only) disable strict references. We can now enter file names either one by one with separate file options or in one big comma-separated list:

> perl splitval2.pl --file first --file second,third -f fourth,fifth

The parsevalue subroutine is an example of a generic argument processor. It will work with any option because it uses the name of the option to deduce the array to update. The only catch to this is that we have to define the options as global variables using our rather than my, since symbolic references do not resolve to lexically scoped variables. To avoid symbolic references entirely, we can put most of the processing work in parsevalue but use a temporary anonymous subroutine to assign the result to our variable of choice:

#!/usr/bin/perl
# splitval2.pl
use warnings;
use strict;

use Getopt::Long;

my @file;   # lexical, no longer needs to be the same name

sub parsevalue {
   my ($option, $value) = @_;
   return split(',', $value);
}

GetOptions("file=s" => sub {push @file, parsevalue(@_)});

We can invoke this version of the program in exactly the same way as the first one, as shown previously.

Documenting Options

In the event of an error occurring when parsing the command line, neither Getopt::Std nor Getopt::Long support displaying usage information beyond warning of specific problems. It is traditional (and polite) to supply the user with some help about what the problem is, and at the very least a brief description of the command-line syntax—the "usage." For example, for a script that takes two optional flags and a list of file names, we might write

unless (GetOptions(\%opts, 'verbose', 'visible')) {
   print "Usage: $0 [-v|-verbose] [-vi|-visible] <filename> ";
}

We can create a short HERE document to describe each option with brief information on its meaning and use, and for a command-line tool of any seriousness maintaining this document is an essential part of its creation. Here is a subroutine that displays a friendlier usage message:

use File::Basename qw(basename);

sub usage {
    my $tool=basename($0);

    print STDERR "@_ " if @_;
    print STDERR <<_USAGE_END;
Usage: $tool -h[elp] | [-v[erbose]] [-vi[sible]] <filename>
    -h  | --help        this text
    -v  | --verbose     enable verbose diagnostics
    -vi | --visible     make actions visible

Type 'perldoc $tool' for more information
_USAGE_END
}

This usage subroutine uses File::Basename to print out just the name of our command and not the whole path to it. It also allows us to pass in an additional message, which is printed out first, so we can generate our own usage errors too:

usage("You must specify a filename") unless @ARGV;

Finally, it reminds the user that more documentation is available in the source that can be read with perldoc (this assumes we have any, of course).

We can also use the Pod::Usage module to place the usage information into the source file itself as POD (literally, "Plain Old Documentation"). This can be effective in some situations, although it is also fairly limited in what it allows us to do. See Chapter 18 for more information on POD and the various Pod modules, including Pod::Usage.

Bundling Options

Bundling, as explained earlier, is the combination of several options into one, a part of the POSIX standard that both Getopt::Std and Getopt::Long support. For example, we can specify -a, -b, -c, and -f options with

> perl myscript -abcf filename

The module Getopt::Long supports two kinds of bundling, neither of which is enabled by default. To enable the simplest, we call the Configure subroutine:

#!/usr/bin/perl
# bundle1.pl
use warnings;
use strict;

use Getopt::Long;

Getopt::Long::Configure("bundling");

my ($a, $b, $c, $file);
GetOptions(a => $a, b => $b, c => $c, "file=s" => $file);

This enables traditional single-letter bundling with the single-minus prefix (in fact, any prefix except the double minus). Any sequence of letters prefixed with a single minus is treated as a collection of single-letter options, not a complete option name or abbreviation:

-abc   # equivalent to -a -b -c, not -abc

We can even combine values into the bundle as long as they look like values and not options (this presumes that we defined a, b, and c as value parameters):

-a1b32c80   # equivalent to -a 1 -b 32 -c 80

However, a double minus will never be treated as a bundle, so --abc will always set the option abc.

The second kind of bundling causes GetOptions to try to match single-minus prefixed options to long names first, and only treat them as a bundle if no long option name matches. In this case, -abc would match the abc option just as --abc does. Here is a short example program that uses this form of bundling:

#!/usr/bin/perl
# bundle2.pl
use warnings;
use strict;

use Getopt::Long;

Getopt::Long::Configure("bundling_override");

my ($a, $b, $c, $abc) = (0,0,0,0); # initialize with zero

GetOptions(a => $a, b => $b, c => $c, "abc:s" => $abc);

print "a: $a ";
print "b: $b ";
print "c: $c ";
print "abc: $abc ";

Executing this program with various different arguments demonstrates how and when the override takes effect:

-a -b -c  # sets 'a', 'b', and 'c'
-ab -c    # sets 'a', 'b', and 'c'
-abc      # matches 'abc', sets 'abc'
-acb      # doesn't match 'abc' - sets 'a' 'b' and 'c'
--a       # matches 'a' - sets 'a'
--ab      # abbreviation - sets 'abc'
-A        # doesn't match anything, warns of unknown 'A'
--A       # case insensitive - sets 'a'
--abcd    # doesn't match anything, warns of unknown 'abcd'
-abcd     # sets 'a', 'b', and 'c', warns of unknown option 'd'

As the last example illustrates, the long name abc is only matched with a single-minus prefix if we specify it completely and exactly, so the letters a to d are interpreted as bundled options instead. Bundles also disable abbreviations, as in effect abbreviations are no longer uniquely identifiable. For example, -ab sets a and b rather than being interpreted as an abbreviation for -abc. However, we can still abbreviate --abc as --ab if we use a double-minus prefix, since this disables any attempt at bundling.

An upshot of bundling is that we can no longer derive single-letter abbreviations for long options automatically, so if we want to support both a -v and a --verbose option with the same meaning, we now have to spell them out as alternatives. For legibility within our program, it makes sense to put the long name first, so it will control the name of the variable or hash key generated as a result. For example:

Getopt::Long::Configure("bundling");
GetOptions(qw[
    help|h
    debug|d+
    verbose|v+
    visible|vi!
    message|msg|m=s
]);

Case Sensitivity

By default, Getopt::Long automatically treats double-minus–prefixed options as case insensitive. This is not the case for any other prefix, most notably the single-minus prefix, which is considered case sensitive. However, we can set the sensitivity of double-minus–prefixed options by configuring ignore_case in our program before we call GetOptions. For example:

Getopt::Long::Configure("ignore_case");      # default behavior
Getopt::Long::Configure("no_ignore_case");   # '--Option' case sensitive

We can set the sensitivity of all options, including double-minus–prefixed ones, with ignore_case_always:

Getopt::Long::Configure("ignore_case_always");
Getopt::Long::Configure("no_ignore_case_always");

Clearing either configuration value with no_ also clears the other, so no_ignore_case and no_ignore_case_always are actually the same. However, no_ignore_case_always sounds better if we then specify ignore_case too. For instance, the default configuration is equivalent to

Getopt::Long::Configure("no_ignore_case_always", "ignore_case");

If we want to reverse the normal state of affairs and make long options case sensitive and short options case insensitive, we could do that with the following two configuration changes:

#!/usr/bin/perl
# casesens.pl
use warnings;
use strict;

use Getopt::Long;

my %opts;

Getopt::Long::Configure("ignore_case_always", "no_ignore_case");
GetOptions(\%opts, 'verbose', 'visible', 'background'),

Of course, the point of long options is that their name is descriptive and unique, so the case should not matter. Therefore, it is doubtful that this configuration is actually all that desirable, even if it is possible.

Handling Unrecognized Option and Value Arguments

When GetOptions encounters an option that it does not recognize, it (usually) issues a warning. However, how it reacts to a value that it does not recognize is another matter. If a value is expected as an optional or mandatory suffix to an option, it is easy to verify that it follows whatever format the option was defined with. But a value that is encountered when no value was expected does not fall under these rules.

In fact, GetOptions has three modes of operation for dealing with unexpected situations such as these.

permute Mode

In permute mode (the default unless POSIXLY_CORRECT has been defined in the environment), unexpected value arguments are simply ignored and left in the @ARGV array. Processing continues past the unknown argument, while further options and values are parsed as normal. At the exit of the subroutine, @ARGV contains all the arguments that were not used as either options or values. permute mode is set explicitly by calling

Getopt::Long::Configure('permute'),

The permute mode gets its name because its effect is equivalent to permuting the command line by moving all the unrecognized value arguments to the end. That is, the following two command lines are equivalent, assuming that none of the options take a value:

> perl myscript -a one -b two -c three
> perl myscript -a -b -c one two three

Having GetOptions return unrecognized values (if not options) in @ARGV can be useful, for example, in combination with the <> operator, as we discussed earlier in the chapter.

However, in permute mode we can handle these unrecognized arguments ourselves by defining a special subroutine and passing a reference to it to GetOptions. This works in a very similar way to the handling of normal options with code references that we looked at earlier. In deference to the fact that this is in the spirit of the <> operator, the name for the option used to trigger this subroutine is <>. The following script simply builds an array called @oob_values containing the unrecognized values; without the subroutine, the @ARGV array would contain these values instead:

#!/usr/bin/perl
# unrecog.pl
use warnings;
use strict;

use Getopt::Long;

my ($verbose, $size, @oob_values);

sub handle_unknown {
    # push extra values onto out-of-band value list
    push @oob_values, @_;
}

GetOptions(
    "verbose+" => $verbose,      # verbose option
    "size=i"   => $size,           # size option
    "<>"       => &handle_unknown,  # unknown values
);

print "Verbose ", $verbose?'on':'off'," ";
print "Size is ", (defined $size)?$size:'undefined'," ";
print "Extras: ", join(',', @oob_values)," " if @oob_values;

Interestingly, handle_unknown is called as each unknown value is encountered, which means that the values of the other option variables may change from one call to the next. It is the current value of these options that we make use of in the processing. For example, the value of $verbose is 0, 1, and then 2 each time handle_unknown is called in the following command line:

> perl unrecog.pl this -v that -v other

Setting permute mode automatically disables require_order mode. Setting the environment variable POSIXLY_CORRECT to a true value disables permute and enables require_order (see the upcoming section "POSIX mode").

require_order Mode

In require_order mode, the first encounter with an unknown value argument causes GetOptions to cease processing the rest of @ARGV and return immediately—as if a naked double minus, --, had been encountered. This mode is set explicitly by calling

Getopt::Long::Configure("require_order");

Unlike permute mode, it is not possible to define an unknown argument handler in this mode. Setting require_order mode automatically disables permute mode and is the default if the environment variable POSIXLY_CORRECT is defined.

pass_through Mode

In pass_through mode, unrecognized option arguments are passed through untouched in the same way that unrecognized value arguments are passed. This allows unrecognized options and their values to be passed on as arguments to other programs executed from inside Perl. The pass_through mode is not enabled by default but can be set explicitly by calling

Getopt::Long::Configure("pass_through");

The pass_through mode can be combined with either of the require_order or permute modes. In the case of require_order mode, enabling pass_through mode will cause GetOptions to stop processing immediately, but it will not cause GetOptions to emit a warning and will leave the unrecognized option in @ARGV. In the case of permute mode, all unrecognized options and values are collected and left at the end of @ARGV after GetOptions returns. If a <> subroutine has been defined, both unrecognized option and value arguments are passed to it.

Irrespective of which mode is in use, the bare double minus -- always terminates the processing of @ARGV immediately. The -- itself is removed from @ARGV, but the following arguments are left as is. This applies even if we are using permute mode and have defined a <> subroutine to handle unknown value arguments.

POSIX Mode

The Getopt::Long module was written with the POSIX standard for command-line arguments in mind, which is the origin of the double-minus prefix for long option names, among other things. This module is more flexible than the POSIX standard strictly allows, however, which can be very convenient or a nuisance, depending on our aims. In order to satisfy both camps, the module can be put into a POSIX-compliant mode, which disables all the nonstandard features by defining the environment variable POSIXLY_CORRECT:

setenv POSIXLY_CORRECT 1   # Unix (csh)
export POSIXLY_CORRECT=1   # Unix (newer ksh/bash)
POSIXLY_CORRECT=1; export POSIXLY_CORRECT   # Unix (older ksh)
set POSIXLY_CORRECT = 1   # Windows

We can also set POSIX mode from within Perl by adding POSIXLY_CORRECT to the %ENV hash. In order for this to work properly, we have to define the variable in a BEGIN block before the use statement, so that the variable is defined before the module is used:

BEGIN {
   $ENV{'POSIXLY_CORRECT'} = 1;
}

use Getopt::Long;

Enabling POSIX mode has the following effects:

  • The archaic + prefix is suppressed. Only - and -- are recognized by default. (The configuration option prefix_pattern is set to (--|-).)
  • Abbreviation matching is disabled. (The configuration option auto_abbrev is unset.)
  • Non-option arguments, that is, arguments that do not start with an option prefix and are not values of a preceding option, may not be freely mixed with options and their values. Processing terminates on encountering the first non-option argument. (The configuration option require_order is set.)

As the preceding shows, the primary effect of POSIXLY_CORRECT is to alter the default values of several of the module's configuration options. We could, of course, configure them ourselves directly, but defining the environment variable is more convenient and will also keep up to date should the module change in the future. We can always alter the configuration afterwards, say to reenable abbreviations, if we choose.

Generating Help and Version Messages

As a convenience, Getopt::Long also provides two additional subroutines, VersionMessage and HelpMessage. The first generates a simple version message by extracting the script or module version from $VERSION. The second generates basic usage information by extracting and printing the SYNOPSIS section of the embedded POD documentation, if present. For example:

use Getopt::Long qw(GetOptions HelpMessage VersionMessage);

my %opts
GetOptions(\%opts,
    help    => sub { HelpMessage() },
    version => sub { VersionMessage(
                       -message => "You are using:",
                       -exitval => 'NOEXIT'
                     ) },
    flag    => $opts{flag}
);

The HelpMessage subroutine uses the pod2usage subroutine from Pod::Usage to do the actual work and will accept a numeric exit status, string message, or a list of named options as arguments that are passed down to that function. Specifically, it understands -output to send output to a given file name or filehandle, -message to define an additional prefixed message, and -exitval to define the exit status. If the exit status is NOEXIT, control is returned to the program.

VersionMessage emulates Pod::Usage in that it will accept a string message, numeric exit status, or a list of key-value pairs, but it does not actually make use of the module to carry out its job. Only the options previously noted are handled. See Chapter 18 for more information on Pod::Usage.

Summary of Configuration Options

We have already mentioned the Configure subroutine in Getopt::Long and described most of its options. Most options are Boolean and can be set by specifying their name to Getopt::Long::Configure, or cleared by prefixing their name with no_. The prefix and prefix_pattern options both take values that are specified with an equals sign. The Configure subroutine will accept any number of options at one time. For example, to enable bundling and to change the allowed prefixes to a single minus or a forward slash, we can use

Getopt::Long::Configure("bundling", "prefix_pattern = (-|/)");

In more recent versions of Getopt::Long, since version 2.24, we can also configure Getopt::Long at the time we first use it. The special token :config is used to mark genuine imports from configuration options:

use Getopt::Long qw[HelpMessage :config bundling prefix_pattern=(-|/)];

Note that it is also perfectly acceptable to call Configure more than once if need be.

Table 14-1 shows a short summary of each option. Options that have a default value and a POSIX default value alter their default behavior if the environment variable POSIXLY_CORRECT is set.

Table 14-1. Getopt::Long::Configure Options

Option Name Default Values Action
auto_abbrev

set

POSIX: unset

Allow long option names to be abbreviated so long as the abbreviation is unique. Not compatible with single-minus options when bundling is in effect.
bundling unset Interpret single-minus option names as bundles of single-character options. Clearing bundling also clears bundling_override.
bundling_override unset Interpret single-minus options as long names if possible, or bundles otherwise. Setting or clearing this option also sets or clears bundling.
default n/a Reset all options to their default value, as modified by POSIXLY_CORRECT.
getopt_compat

set

POSIX: unset

Allow the archaic + as well as - and -- to start options.

A shortcut for prefix_pattern.

ignore_case set Ignore case of long (double-minus–prefixed) options. Clearing this also clears ignore_case_always.
ignore_case_always unset Ignore case of all options, however prefixed. Clearing this also clears ignore_case; however, ignore_case may subsequently be set.
pass_through unset Allow unknown options to pass through as well as values, rather than raise a warning. Used with permute or require_order.
permute

set

POSIX: unset

Allow unknown values to pass through. Exclusive with require_order.
prefix n/a Set the prefix string for options, for example, - or /. Only one prefix can be specified. To set alternative prefixes, use prefix_pattern.
prefix_pattern

(-|--|+)

POSIX: (-|--)

Set the list of prefix strings for options. This is a regular expression pattern, therefore special characters like + must be escaped and the whole list enclosed in parentheses.
require_order

unset

POSIX: set

Terminate processing on first unrecognized value (or option if pass_through set). Exclusive with permute.

Getting and Setting the Program Name

The name of the script for which Perl was invoked is given by the special variable $0, or $PROGRAM_NAME with the English module loaded. This is the full pathname to the script, so it is frequently shortened to just the basename (sometimes called leafname) with File::Basename:

use File::Basename qw(basename);
my $program_name=basename $0;

We can assign to $0 to change the name of the program within Perl. On some, but not all, platforms, this will also change the external name of the program, for example, as listed by the ps command on Unix:

$0="aliasedname.pl"; #(try to) change our name

Perl knows some tricks for several platforms to enable the external visibility of assignment to $0 to work, even in circumstances where it would not normally work. However, as this varies from one platform to another, the only way to know for certain if a given platform will support it is to try it.

In practice, it is usually possible to find the original name from the operating system, so this is a poor technique for disguising the origins of a program. It is more suitable for providing status information after the program name.

Reading from the Environment

Command-line arguments are maybe the most obvious means of passing information into a command-line Perl program, but the environment is equally important.

The special variable %ENV is one of the main sources of information available to a Perl program when it starts. This hash, defined by Perl automatically, contains key-value pairs of the script's environment. This is, for example, the primary mechanism for transmitting details of a client request from a web server to a CGI script run by that server.

Even in programs designed to be executed from a shell, the environment is an excellent way to configure less commonly used features to avoid creating excessive numbers of options and values. For instance, rather than creating a --debug option, we could instead look for an environment variable ENABLE_DEBUG to enable the same feature. Care should be taken over choosing the names of these variables, to minimize the possibility of conflicts with other programs.

We can dump out the contents of Perl's environment with a short script or directly on the command line:

> perl -we 'foreach (sort keys %ENV) { print "$_ => $ENV{$_} "}'

In an xterm window running on a Linux X Window System desktop, this produces something like


DISPLAY => :0.0
ENV => /home/gurgeh/.bashrc
HISTFILESIZE => 1000
HOME => /home/gurgeh
HOSTDISPLAY => localhost.localdomain:0.0
HOSTNAME => localhost.localdomain
HOSTTYPE => i386
LOGNAME => gurgeh
MAIL => /var/spool/mail/gurgeh
OSTYPE => Linux
PATH => /usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:.
SHELL => /bin/bash
SHLVL => 6
TERM => xterm
TZ => Ikroh/Chiark_Orbital
USER => gurgeh
WINDOWID => 62914563

In a Windows DOS or NT shell, we would instead type (because Windows does not understand single quotes)

> perl -e "foreach (sort keys %ENV) { print qq($_ => $ENV{$_} ); }"

This would produce something like the following:


ALLUSERSPROFILE => C:Documents and SettingsAll Users
APPDATA => C:Documents and SettingsKen WronkiewiczApplication Data
CLASSPATH => C:WINNTSystem32QTJava.zip
COMMONPROGRAMFILES => C:Program FilesCommon Files
COMPUTERNAME => WIREMONSTER2
COMSPEC => C:WINNTsystem32cmd.exe
DIRCMD => /a
HOMEDRIVE => C:
HOMEPATH =>
INCLUDE => C:Program FilesMicrosoft Visual StudioVC98atlinclude;C:Program
FilesMicrosoft Visual StudioVC98mfcinclude;C:Program Files
Microsoft Visual StudioVC98include
LIB => C:Program FilesMicrosoft Visual StudioVC98mfclib;C:Program Files
Microsoft Visual StudioVC98lib
LOGONSERVER => \WIREMONSTER2
MSDEVDIR => C:Program FilesMicrosoft Visual StudioCommonMSDev98
NUMBER_OF_PROCESSORS => 1
OS => Windows_NT
OS2LIBPATH => C:WINNTsystem32os2dll;
...

The exact contents of %ENV can vary wildly depending on the underlying platform, the operating system, and the chosen shell and user preferences. However, we can usually expect $ENV{PATH} to be defined, as well as (on a Unix system at least) HOME, USER, TERM, SHELL, and OSTYPE (though the latter is often better deduced by looking at the special variable $^O or $OSNAME with use English). Keep in mind when using environment variables that no environment variable is guaranteed to exist in every platform, and that the same concept is often embodied in different variables from one platform to another.

Configuring Programs via %ENV

One major reason for examining %ENV is to allow users to create local definitions for our own environment variables. This provides a simple and easy way to configure a script without having to go to the trouble of looking for and reading a configuration file. This sort of configuration is common on Unix systems. For example, to provide a program with a default location for locating scripts but allow that default to be overridden if the environment variable MY_SCRIPTDIR is set, we might write

$default_scriptdir = "/usr/local/myapp/scripts";
$scriptdir = $ENV{MY_SCRIPTDIR}?$ENV{MY_SCRIPTDIR}:$default_scriptdir;

More creatively, we can scan for any environment variable with a specific prefix, say MY_, and create a configuration hash based on it:

foreach (keys %ENV) {
    # regular expressions are covered in Chapter 11
    /^MY_(.*)/ and $conf{$1} = $ENV{$_};
}

This is an ideal mechanism for establishing defaults, too, if we iterate over a list of keys in a default hash:

%defaults = {
   SCRIPTDIR => '/usr/local/myapp/scripts',
   # other defaults...
}

foreach (keys %defaults) {
   $conf{$1} = (defined $ENV{"MY_$1"})?$ENV{"MY_$1"}:$defaults{$1};
}

We can modify, add, or delete entries in %ENV just as we can with any other hash. %ENV is not a copy of the script's environment; it actually is the script's environment. This means that any changes we make to %ENV change the environment for any child processes that are started after the change, for example, with fork. It is not possible to change the environment of the parent, and therefore Perl scripts cannot return information back to the parent via the environment. (Even though Windows platforms emulate fork through threads, this behavior is maintained for consistency.)

Handling Tainted Input from %ENV

Taint mode is a security feature that marks data retrieved from an external source as potentially dangerous. If tainted data is passed in an unsafe operation, which primarily means any attempt to run or communicate with an external process or file, Perl will raise a fatal security error. The main use of tainting is in CGI and other server-side applications that may be executed by unknown and unauthenticated third parties. We cover it here because one of the primary sources of input for CGI scripts is the environment, as we noted earlier.

Taint mode is enabled with the -T option, and is automatically switched on if the real and effective user IDs are different, which is typical on Unix-based web servers. The concept of real and effective user IDs doesn't apply to non-Unix platforms, so there the -T option needs to be supplied or specified in Perl's startup configuration (via PERL5OPT, for example).

All the values in the %ENV hash fall into the category of insecure input, so attempting to use them in a potentially insecure operation will cause a fatal error. To prevent this, we must either avoid using %ENV, place operations into the safe block, or untaint the values explicitly. Regular expressions can be used to untaint data, though this should be used with extreme caution. To untaint DOCUMENT_ROOT, for instance, a variable we might expect to trust since it is set by the web server and should not change, we could use

$ENV{DOCUMENT_ROOT} =˜ /(.*)/ and $docroot = $1;

Of course, sometimes we might want to avoid untainting data simply because we used a regular expression on it. To avoid this, we can use the re pragmatic module described in the discussion on regular expressions in Chapter 11. More on taint mode and CGI programming can be found in Chapters 15 and 23.

The Env.pm Module

Perl provides a module, Env.pm, that simplifies the handling of the %ENV hash by allowing us to import environment variables into our program as scalar or array variables. In its simplest form of usage, we can use it to pull in several environment variables as scalars:

# import environment variables via Env.pm
use Env qw(PATH HOME TERM);

# environment variables now available as scalars:
print $PATH, $HOME, $TERM;

Note that it does not matter if the environment variable exists yet. As soon as it is defined, either by the imported name or the %ENV hash, the new value will be reflected in both places.

We can also read and write environment variables in arrays if we prefix the variable name with @:

use Env qw(@PATH);          # access path via array
$first_dir = $PATH[0];      # find name of first directory in path
unshift @PATH, $scriptdir;  # add a new directory to start of path

The separator used by the Env module for splitting environment variables is the value of $Config::Config{path_sep}, which by default is set to a colon. This is the standard separator for most multiple-value environment variables (and path information variables in particular) on Unix. We can change it to handle other kinds of variables, for example, comma-separated values:

use Env qw(@PATH);
$Config::Config {'path_sep'} = ',';
use Env qw(@MY_CSV_VAR);

Note, however, that all variables are stored as scalar strings in the %ENV hash underneath whatever labels we give them. This means that any alteration to an array variable causes the module to rebuild and then resplit the variable to regenerate the array. That will cause problems if we changed the separator in the meantime.

Interestingly, we can access the same variable in both scalar and array form by importing both names:

#!/usr/bin/perl
# config.pl
use warnings;
use strict;

use Env qw($PATH @PATH);

$sep = $Config::Config{'path_sep'};
# add current directory if not already present
unless ($PATH =˜ /(^|$sep).($sep|$)/) {
    push @PATH, '.';
}

Since both variables access the same underlying environment variable, a change to either (or the underlying $ENV{PATH}) will change the other too.


Tip For the curious, the Env module is a good example of a simple tied object class. Each imported variable is actually an object in disguise that simply accesses the environment variable of the same name. Ties and tied objects are covered in more detail in Chapter 19.


Writing Shells in Perl

Shells are a particular subclass of interactive program that are worth a little special attention. To most people, a shell is what they type commands into. More accurately, a shell is a command interpreter that provides an interface between the user, the operating system, and its services. On a Unix machine, there are many shells to choose from, including the Bourne shell, sh; C shell, csh; Korn shell, ksh; and the Bourne Again shell, bash. Windows has several shells available, the standard one being COMMAND.COM. Windows NT/2000/XP has a (slightly, some would say) improved shell, cmd.exe.

Perl was partly created as a better solution to the various different shells and scripts that existed on Unix systems beforehand. Its major advantage is that, unlike all the shells mentioned previously, scripts written in Perl do not depend on any given Unix or Windows shell being available (though, of course, Perl itself needs to be available).

Perl does not have a shell mode as such, but it is very easy to create one by running Perl with suitable arguments. Perl also comes with a couple of modules that close the gap between Perl and the shell it is running in. The module Shell.pm allows unrecognized functions in Perl code to be evaluated by the underlying shell, effectively integrating the shell's own abilities into Perl. This is interesting, although potentially dangerous too, and manifestly not portable. Conversely, ExtUtils::Command goes the other way, providing emulations of several important Unix commands that will function on Windows platforms, allowing us to use commands like rm, mv, cp, and chmod on non-Unix platforms.

If we simply want a shell to try out Perl commands, then we can use the Perl debugger as a passable shell by typing

> perl -dwe 1

This debugs the program 1, with warnings enabled. In the process, it provides a prompt at which we can define subroutines and evaluate expressions. For more advanced uses, there are several shell programs and modules available from CPAN and elsewhere, two of the most popular being perlsh, available from http://www.bgw.org/projects/perlsh/, and psh, available from http://www.focusresearch.com/gregor/psh/ and also from CPAN in the Psh package.

Simple Perl Shells

Creating a Perl shell is actually remarkably easy, and this shell can be a useful tool to wrap modules of our own devising to provide a flexible interface to their API. If we don't require any particular degree of sophistication, we can generally create a shell script that runs Perl as a shell in a single line. Here is an example that will work on Unix or Windows, and which uses the -n switch to put an implicit while (<>) {...} around the code we specify with -e:

> perl -nwe "eval $_; print q|perl> |"

To explain this in more detail, the e switch specifies a line of code for Perl to execute, in this case an eval followed by a prompt. The w enables warnings, which is always a good idea, and the n puts the code specified by e into a permanent loop. When run, this takes Perl code typed in by the user and evaluates it—a very simple shell. The only catch to this is that it doesn't display the prompt the first time around. Here's a slightly improved shell that fixes that problem and also adds strict syntax checking for good measure:

> perl -Mstrict -we "while(1) { print q|perl> |; eval <> }"

This is very similar to the previous example, except that we have used an explicit loop, moving the implicit <> inside the loop as an explicit eval after the prompt. Alternatively, we can use a BEGIN block, as this example shows:

> perl -nwe 'BEGIN { print "perl> " } eval $_; print "perl> "';

On Windows, we must escape the dollar sign:

> perl -nwe "BEGIN { print q|perl> | } eval $_; print q|perl> |";

The implementations for Unix and Windows are slightly different as Unix systems exchange the single and double quotes and remove the backslash from $_, which is protected only because the outer quotes are doubled. While this shell is not very capable or useful, it can provide the foundation for many more focused shell applications. To get a usable generic shell, though, we need to do some more coding.

Writing a More Useful Shell

Here is a simple Perl script that implements a shell using the ReadLine module (for more on this module see the next chapter). This enables us to take advantage of the readline library on our system to provide features such as a history list or in-line editing to make the user's life easier. If the library isn't present, the script will still work, it just won't be as powerful.

#!/usr/bin/perl
# shell1.pl
use warnings;
use strict;

# create readline object
use Term::ReadLine;
my $term = new Term::ReadLine "Perl Shell";

# switch off any highlighting
$term->ornaments(0);

# enable autoflush (output appears instantly)
$|=1;

# evaluate entered expressions until 'quit'
do {
    my $input = $term->readline("perl> ");
    print(" "),last if $input eq "quit";
    eval $input;
} while (1);

As this script shows, it is possible to create a reasonably capable Perl shell with only a few lines of Perl code. The biggest drawback with this shell application is that it evaluates each line as we enter it, so it's no good for multiline statements like foreach loops (or indeed the preceding do...while loop) unless we concatenate the statement onto one line.

We can fix this in two ways. First, we can teach the shell to understand the backslash character, , for line continuation, so it is possible for us to type something like this and get the expected output:

> perl shell1.pl


perl> print "Hello
perl> World "

Second, we can look for curly braces on the start and end of lines and keep a count of the number of open braces that haven't been close yet, so we can legitimately type the following:

> perl shell1.pl

and get this output:


perl> for (1..10) {
perl> print "$_";
perl> }

Here is an improved version of our first shell that handles both these cases and makes a few other improvements on the way:

#!/usr/bin/perl
# shell2.pl
use warnings;
use strict;

# create readline object
use Term::ReadLine;

my $term = new Term::ReadLine "Perl Shell";

# switch off any highlighting
$term->ornaments(0);

# Enable autoflush (output appears instantly)
$|=1;

# Declare some variables
my $this;   # current line
my $input;   # accumulated input
my $bracing = 0;   # number of unclosed open braces

# Evaluate entered expressions until 'quit'
while (($this = $term->readline("perl> ")) ne "quit") {
    if ($this =˜ s/\$//) {
        # if the line ends with '', collect more lines
        $input = $this;
        # keep track of the braces even so
        $bracing += ($this =˜ /{s*$/);
        $bracing -= ($this =˜ /^s*}/);
        # get the next line and redo
        $this = $term->readline(" > ");
        redo;
    } else {

        # doesn't end with ''
        $input.= $this;
        # keep track of the braces
        $bracing += ($this =˜ /{s*$/);
        $bracing -= ($this =˜ /^s*}/);
        # if braces outstanding, collect more lines
        if ($bracing) {
            $this = $term->readline("{$bracing} > ");
            redo;
        }
    }

    if ($input =˜ s/^!s*//) {
        # input beginning with '!' is a system command
        system $input;
    } elsif ($input =˜ s/^?s*//) {
        # input beginning with `?` is a 'perldoc' query
        if ($input =˜ /^([A-Z]|perl)/) {
            # straight perldoc if it's capitalized or starts 'perl'
            system "perldoc",$input;
        } else {
            # otherwise assume it's a function
            system "perldoc","-f",$input;
        }
    } else {
        # Evaluate it as Perl code
        eval $input;
        warn($@),undef $@ if $@;
    }

    $input="";
}

This script contains a few points of interest. First, it uses the redo command to restart the loop without executing the condition in the while loop. This is how the input line is grown without being overridden at the start of the loop. The backslash continuation (the first clause in the upper if statement) is basically similar to the example we saw back when we discussed loops in Chapter 6. The other clause handles lines that don't end with a backslash and gets another line if there are still braces outstanding. For the sake of simplicity, we don't check for multiple opening or closing braces on the same line, since it is actually quite tricky to handle all possible cases.

Whenever the code cannot immediately be executed, be it because a backslash was used or braces are still outstanding, the shell needs to read another line. It does this by calling readline again, this time with a modified prompt to indicate that the next line is extending previously entered input. In the case of a backslash, we change the prompt from perl> to just > so the user is clued in to the change in behavior. In the case of braces, we indicate the level of nesting by putting the value of $bracing into the prompt. In both cases, we read another line and concatenate it to the input previously read. We then restart the loop with redo, skipping the readline in the while condition.

If there are no outstanding braces or backlashes, we go to the evaluation part of the loop. Here we have embellished things slightly, just to illustrate how features can be added. The second if statement checks the input for a leading ! or ?. Since the conditions are substitution statements that substitute the ! or ? for nothing, they are stripped in the process of matching. In the case of !, the shell passes the rest of the input to the real shell to execute—this allows us to "break out" of our Perl shell if we want to execute a shell command. In the case of ?, the shell passes the rest of the input to perldoc and provides us with a basic help system. To keep the command flexible but simple, we check the start of the input following the ? and make a guess as to whether it is a manual page (beginning with perl), a module (which almost always begin with a capital letter, with the exception of pragmatic modules like strict and vars), or a function name (none of the above). This isn't perfect, partly for the reasons just given, but it's not bad for a start.

With this shell, we can enter loops and if statements, and even define subroutines line by line and still have the shell understand them:

perl> sub hello {
{1} > print "Hello World "
{1} > }
perl>
perl> hello()



Hello World


perl>

We can also read in modules with use and then make use of them, for example:

perl> use Term::ReadKey
perl> ReadMode 4
perl> use CGI qw(:standard)
perl> use vars '$cgi';
perl> $cgi = new CGI
perl> ...

The one thing we have to watch out for is that my and our declarations will not last past the current statement, because they are lexically scoped and exist only inside the scope of the eval. To create variables that last from one command to the next, we need to declare them with use vars. This is probably a candidate for a special command if we decided to extend the shell.

Integrating the Shell into Perl

The standard Perl library comes with a module called Shell.pm, which provides the ability for unrecognized function names to be passed to the underlying shell for execution rather than simply raising an error. (Whether or not this is a good idea is highly debatable.)

Here is an example script for a shell that integrates the Unix ls, mv, and rm commands into Perl. It scans the directory supplied as its argument (or the current directory otherwise) and lowercases the file names of all files and directories it finds, deleting any files that end with a tilde. To find the files, it uses ls (the argument -1 makes sure that ls returns a simple list of files, one per line—usually it will do this anyway when talking to a program but it never hurts to be explicit); to rename them it uses mv, and to delete them it uses rm:

#!/usr/bin/perl
# xshell1.pl
use warnings;
use strict;

use Shell;

my $dir = (@ARGV)?$ARGV[0]:".";
my @files = split " ",ls(-1);

foreach (@files) {
    print "File $_ ";

    if (/˜$/) {
        # delete files ending in ˜
        rm($_);
        print "deleted";
    } else {
        # rename to lowercase
        my $newname = lc $_;
        if ($newname ne $_) {
            mv($_,lc $_);
            print "renamed $newname";
        } else {
            print "ok";
        }
    }
    print " ";
}

When pointed at a directory containing the files File1, FILE2, File3˜, fIlE4, and FIle5˜, this script, when run, looks like this:

> perl xshell1.pl


File FIle5˜ deleted
File File1 mv: 'File1' and 'file1' are the same file
renamed file1
File File3˜ deleted
File fIlE4 mv: 'fIlE4' and 'file4' are the same file
renamed file4
File file2 ok
File test.pl ok

The Shell module works regardless of what the underlying shell actually is, though, of course, the underlying shell may support entirely different commands. Consequently, this is not a very portable solution.

Unrestrained access to the underlying shell is also potentially dangerous—we could end up executing all kinds of dangerous commands without meaning to as a result of even a minor bug in our code. A better solution is to restrict the shell commands to those we actually want to allow. We can do that by passing the Shell module a list of the commands we want to access:

use Shell qw(ls mv rm);

Now we can make use of the ls, mv, and rm commands, but nothing else will be interpreted as a shell command. As a bonus, we can omit the parentheses and use the commands as functions rather than subroutines, because importing their names predeclares them:

#!/usr/bin/perl
# xshell2.pl
use warnings;
use strict;

use Shell qw(ls mv rm);

my $dir = (@ARGV)?$ARGV[0]:".";
my @files = split " ",ls -1;

foreach (@files) {
    print "File $_ ";
    if (/˜$/) {
        # delete files ending in ˜
        rm $_;
        print "deleted";
    } else {
        # rename to lowercase
        my $newname = lc $_;
        if ($newname ne $_) {
            mv $_,lc($_);
            print "renamed $newname";
        } else {
            print "ok";
        }
    }
    print " ";
}

If we set the variable $Shell::capture_stderr, we can also capture the standard error of the shell command and retrieve it along with the normal output of the command (if any). This isn't entirely portable, however, though it should work in most shells. For example, to list a directory that may not exist:

use Shell qw(ls);
$Shell::capture_stderr = 1;
ls $ARGV[0];

The catch with this is that should the command generate error output as well as normal output, both will be mixed together. Consequently, this approach is better left to situations where the command either generates normal output or an error message, where the two can be easily distinguished.

Emulating Unix Commands on Windows

Another module related to shell commands that comes as standard with Perl is the ExtUtils::Command module. This provides something of the opposite role to Shell, implementing Unix commands in Perl such that they can be executed on Windows systems. Table 14-2 presents a list of the implemented commands; the ellipsis (. . .) indicates that more than one parameter can be passed.

Table 14-2. ExtUtils::Command Commands

Name Parameters Action
cat file... Type out the contents of the file(s).
mv file... newfile|directory Rename file(s) to newfile or directory.
cp file... newfile|directory Copy file(s) to newfile or directory.
touch file... Update modification time of the file(s).
rm_f file... Delete the file(s).
rm_rf (file|directory)... Recursively delete files/directories.
mkpath directorypath... Create each chain of directories passed.
eqtime srcfile dstfile Give dstfile the same times as srcfile.
chmod mode file... Change the permissions on the file(s).
test_f file Test that file is a file (not a link/directory).

Here's one example of how these commands can be used:

> perl -MExtUtils::Command -e mv filename newfilename

Just because the commands implemented by ExtUtils::Command are designed to work directly from the command line does not mean that we cannot use them as portable file manipulation tools within our own programs too. However, ExtUtils::Command was not written with programmatic use in mind, so all the subroutines in it use @ARGV as the source for their arguments, requiring us to wrap them with local subroutines that convert arguments passed in @_ to a local copy of the @ARGV array.

As an example, here is the script we introduced earlier using the Shell module, rewritten to be portable by using ExtUtils::Command instead:

#!/usr/bin/perl
# xshell3.pl
use warnings;
use strict;

use ExtUtils::Command ();   # empty list - no import

# programmatic wrappers for ExtUtils::Command subroutines
sub mv   { local @ARGV = @_;ExtUtils::Command::mv();   }
sub cp   { local @ARGV = @_;ExtUtils::Command::cp();   }
sub rm_f { local @ARGV = @_;ExtUtils::Command::rm_f(); }

my $dir = (@ARGV)?$ARGV[0]:".";
my @files = <$dir/*>;

foreach (@files) {
    print "File $_ ";
    if (/˜$/) {
        # delete files ending in ˜
        rm_f $_;
        print "deleted";
    } else {
        # rename to lowercase
        my $newname = lc $_;
        if ($newname ne $_) {
            mv $_,lc($_);
            print "renamed $newname";
        } else {
            print "ok";
        }
    }
    print " ";
}

A key reason for the existence of this module is to allow Perl modules to compile and build themselves without having to cater for different platforms in their Makefiles. The ExtUtils::Command module makes heavy use of modules in the File:: hierarchy to attempt cross-platform portability.

Summary

In this chapter, we looked at getting information into programs through the command line and the environment. We first looked at the special array @ARGV: how to use it to pass arguments to Perl, how to set variables from it, and how it handles files. We then examined two modules that we can use for processing command-line options. We looked first at the simpler Getopt::Std and its two functions getopt and getopts, before examining in more detail the Getopt::Long module. We saw, among other things, how to define option values, use abbreviations and aliases, document and bundle options, and handle unrecognized options and values.

We followed this up with a look at the special array %ENV and examined how to use it to read values from the environment. We can also change the contents of %ENV to alter the environment seen by any external programs that we invoke from within our Perl code. We can also use the Env module to wrap %ENV in a more convenient interface.

In the last part of the chapter, we examined many aspects of using shells with Perl. We saw how to write our own simple Perl shells, how to invoke the Perl debugger as a quick-and-dirty shell with no additional programming effort, and how to integrate shell commands directly into Perl using the Shell module. Finally, we covered the ExtUtils::Command module, which allows us to implement Unix commands such that they can be executed on Windows.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset