CHAPTER 9

Using Modules

Modules are the basic unit of reusable code in Perl, the equivalent of libraries in other languages. Perl's standard library is almost entirely made up of modules. When we talk about Perl libraries, we usually mean modules that are included as part of the standard Perl library—the collection of Perl code that comes with the standard Perl distribution. Although the words "module" and "library" are frequently used interchangeably, they are actually not quite equivalent because not all libraries are implemented as modules. Modules are closely related to packages, which we have already been exposed to in previous chapters. In many cases a module is the physical container of a package—the file in which the package is defined. As a result, they are often named for the package they implement: the CGI.pm module implements the CGI package, for instance.

A library is simply a file supplied by either the standard Perl library or another package that contains routines that we can use in our own programs. Older Perl libraries were simple collections of routines collected together into a file, generally with a .pl extension. The do and require functions can be used to load this kind of library into our own programs, making the routines and variables they define available in our own code.

Modern Perl libraries are defined as modules and included into programs with the use directive, which requires that the module name ends in .pm. The use keyword provides additional complexity over do and require, the most notable difference being that the inclusion happens at compile time rather than run time.

Modules come in two distinct flavors, functional and pragmatic. Functional modules are generally just called modules or library modules. They provide functionality in the form of routines and variables that can be used from within our own code. They are what we usually think of as libraries. Conversely, pragmatic modules implement pragmas that modify the behavior of Perl at compile time, adding to or constraining Perl's syntax to permit additional constructs or limit existing ones. They can easily be told apart because pragmatic modules are always in lowercase and rarely more than one word long. Functional libraries use uppercase letters and often have more than one word in their name, separated by double semicolons. Since they are modules, they are also loaded using the use directive. The strict, vars, and warnings modules are all examples of pragmatic modules that we frequently use in Perl scripts. Some pragmatic modules also provide routines and variables that can be used at run time, but most do not.

In this chapter, we will examine the different ways of using modules in our scripts. In the next chapter, we will look inside modules, that is, examine them from the perspective of how they work as opposed to how to use them.

Loading Code Using do, require, and use

Perl provides three mechanisms for incorporating code (including modules) found in other files into our own programs. These are the do, require, and use statements, in increasing order of complexity and usefulness. Any code that is loaded by these statements is recorded in the special hash %INC (more about this later).

The simplest is do, which executes the contents of an external file by reading it and then evaling the contents. If that file happens to contain subroutine declarations, then those declarations are evaluated and become part of our program:

do '/home/perl/loadme.pl';

A more sophisticated version of do is the require statement. If the filename is defined within quotes, it is looked for as-is; otherwise it appends a .pm extension and translates any instance of :: into a directory separator:

# include the old-style (and obsolete) getopts library
require 'getopts.pl';

# include the newer Getopt::Std library
# (i.e. PATH/Getopt/Std.pm)
require Getopt::Std;

The first time require is asked to load a file, it checks that the file has not already been loaded by looking in the %INC hash. If it has not been loaded yet, it searches for the file in the paths contained in the special array @INC.

More sophisticated still is the use statement. This does exactly the same as require, but it evaluates the included file at compile time, rather than at run time as require does. This allows modules to perform any necessary initializations and to modify the symbol table with subroutine declarations before the main body of the code is compiled. This in turn allows syntax checks to recognize valid symbols defined by the module and flag errors on nonexistent ones. For example, this is how we include the Getopt::Std module at compile time:

# include Getopt::Std at compile time
use Getopt::Std;

Like require, use takes a bare unquoted module name as a parameter, appending a .pm to it and translating instances of :: or the archaic ` into directory separators. Unlike require, use does not permit any filename to be specified with quotes and will flag a syntax error if we attempt to do so. Only true library modules may be included via use.

The traditional way to cause code to be executed at compile time is with a BEGIN block, so this is (almost) equivalent to

BEGIN {
    require Getopt::Std;
}

However, use also attempts to call the import method (an object-oriented subroutine) in the module being included, if present. This provides the module with the opportunity to define symbols in our namespace, making it simpler to access its features without prefixing them with the module's package name. use Module is therefore actually equivalent to

BEGIN {
    require Module;
    Module->import;   # or 'import Module'
}

This one simple additional step is the foundation of Perl's entire import mechanism. There is no more additional complexity or built-in support for handling modules or importing variables and subroutine names. It is all based around a simple function call that happens at compile time. The curious thing about this is that there is no requirement to actually export anything from an import subroutine. Object-oriented modules rarely export symbols and so often commandeer the import mechanism to configure the class data of the module instead. A statement like use strict vars is a trivial but common example of this alternative use in action.

As the preceding expansion of use illustrates, the automatic calling of import presumes that the module actually creates a package of the same name. The import method is searched for in the package with the name of the module, irrespective of what loading the module actually causes to be defined.

Import Lists

As we just discussed, the major advantage that use has over require is the concept of importing. While it is true that we can import directly by simply calling import ourselves, it is simpler and more convenient with use.

If we execute a use statement with only a module name as an argument, we cause the import subroutine within the module to be called with no argument. This produces a default response from the module. This may be to do nothing at all, or it may cause the module to import a default set of symbols (subroutines, definitions, and variables) for our use. Object-oriented modules tend not to import anything, since they expect us to use them by calling methods. If a module does not provide an import method, nothing happens of course.

Function-oriented modules often import symbols by default and optionally may import further symbols if we request them in an import list. An import list is a list of items specified after the module name. It can be specified as a comma-separated list within parentheses or as a space-separated list within a qw operator. If we only need to supply one item, we can also supply it directly as a string. However, whatever the syntax, it is just a regular Perl list:

# importing a list of symbols with a comma-separated list:
use Module ('sub1', 'sub2', '$scalar', '@list', ':tagname'),

# it is more legible to use 'qw':
use Module qw(sub1 sub2 $scalar @list :tagname);

# a single symbol can be specified as a simple string:
use Module 'sub1';

# if strict references are not enabled, a bareword can be used:
use strict refs;

The items in the list are interpreted entirely at the discretion of the module. For functional modules, however, they are usually symbols to be exported from the module into our own namespace. Being selective about what we import allows us to constrain imports to only those that we actually need. Symbols can either be subroutine names, variables, or sometimes tags, prefixed by a :, if the module in question supports them. These are a feature of the Exporter module, which is the source of many modules' import mechanisms. We discuss it from the module developer's point of view in Chapter 10.

We cannot import any symbol or tag into our code—the module must define it to be able to export it. A few modules like the CGI module have generic importing functions that handle anything we pass to them. However, most do not, and we will generate a syntax error if we attempt to export symbols from the module that it does not supply.

Suppressing Default Imports

Sometimes we want to be able to use a module without importing anything from it, even by default. To do that, we can specify an empty import list, which is subtly different from supplying no import list at all:

use Module;    # import default symbols
use Module();  # suppress all imports

When used in this second way, the import step is skipped entirely, which can be handy if we wish to make use of a module but not import anything from it, even by default (recall that we can always refer to a package's variables and subroutines by their fully qualified names). The second use statement is exactly equivalent to a require, except that it takes place at compilation time, that is

BEGIN { require Module; }

Note that we cannot suppress the default import list and then import a specific symbol—any import will trigger the default (unless of course the module in question has an import method that will allow this). Remember that the entire import mechanism revolves around a subroutine method called import in the module being loaded.

Disabling Features with no

no, which is the opposite of the use directive, attempts to unimport features imported by use. This concept is entirely module dependent. In reality, it is simply a call to the module's unimport subroutine. Different modules support this in different ways, including not supporting it at all. For modules that do support no, we can unimport a list of symbols with

no Module qw(symbol1 symbol2 :tagname);

This is equivalent to

BEGIN {
    require Module;
    unimport('symbol1', 'symbol2', ':tagname'),
}

Unlike use, a no statement absolutely needs the subroutine unimport to exist—there would be no point without it. A fatal error is generated if this is not present.

Whether or not no actually removes the relevant symbols from our namespace or undoes whatever initialization was performed by use depends entirely on the module. It also depends on what its unimport subroutine actually does. Note that even though it supposedly turns off features, no still requires the module if it has not yet been loaded. In general, no happens after a module has been used, so the require has no effect as the module will already be present in %INC.

The if Pragma

An interesting variation on use is use if, which allows us to conditionally load a module based on arbitrary criteria. Since use happens at compile time, we must make sure to use constructs that are already defined at that time, such as special variables or environment variables. For example:

use if $ENV{WARNINGS_ON},"warnings";

This will enable warnings in our code if the environment variable WARNINGS_ON is defined. Functional modules can also be loaded the same way:

use if $ENV{USE_XML_PARSER},"XML::Parser";
use if !$ENV{USE_XML_PARSER},"XML::SAX";

The if pragma is, of course, implemented by the if.pm module and is a perfectly ordinary piece of Perl code. To do its magic, it defines an import that loads the specified module with require if the condition is met. From this we can deduce that it works with use, but not require, since that does not automatically invoke import.

If we need to specify an import list, we can just tack it on to the end of the statement as usual. The following statement loads and imports a debug subroutine from My::Module::Debug if the program name (stored in $0) has the word debug in its name. If not, an empty debug subroutine is defined, which Perl will then optimize out of the code wherever it is used.

use if $0=˜/debug/ My::Module::Debug => qw(debug);
*debug = sub { } unless *debug{CODE};

An unimport method is also defined so we can also say no if:

use strict;
no if $ENV{NO_STRICT_REFS} => strict => 'refs';

This switches on all strict modes, but then it switches off strict references (both at compile time) if the environment variable NO_STRICT_REFS has a true value.

Testing for Module Versions and the Version of Perl

Quite separately from their usual usage, both the require and use directives support an alternative syntax, taking a numeric value as the first or only argument. When specified on its own, this value is compared to the version of Perl itself. It causes execution to halt if the comparison reveals that the version of Perl being used is less than that stated by the program. For instance, to require that only Perl version 5.6.0 or higher is used to run a script, we can write any of the following:

require 5.6.0;
use 5.6.0;
require v5.6.0;    # archaic as of Perl 5.8, see Chapter 3

Older versions of Perl used a version resembling a floating-point number. This format is also supported, for compatibility with older versions of Perl:

require 5.001;     # require Perl 5.001 or higher
require 5.005_03;  # require Perl 5.005 patch level 3 or higher

Note that for patch levels (the final part of the version number), the leading zero is important. The underscore is just a way of separating the main version from the patch number and is a standard way to write numbers in Perl, not a special syntax. 5.005_03 is the same as 5.00503, but more legible.

A version number may also be specified after a module name (and before the import list, if any is present), in which case it is compared to the version defined for the module. For example, to require CGI.pm version 2.36 or higher, we can write

use CGI 2.36 qw(:standard);

If the version of CGI.pm is less than 2.36, this will cause a compile-time error and abort the program. Note that there is no comma between the module name, the version, or the import list. As of Perl 5.8, a module that does not define a version at all will fail if a version number is requested. Prior to this, such modules would always load successfully.

It probably comes as no surprise that, like the import mechanism, this is not built-in functionality. Requesting a version simply calls a subroutine called VERSION() to extract a numeric value for comparison. Unless we override it with a local definition, this subroutine is supplied by the UNIVERSAL module, from which all packages inherit. In turn, UNIVERSAL::VERSION() returns the value of the variable $PackageName::VERSION. This is how most modules define their version number.

Pragmatic Modules

Pragmatic modules implement pragmas that alter the behavior of the Perl compiler to expand or constrict the syntax of the Perl language itself. One such pragma that should be familiar to us by now is the strict pragma. Others are vars, overload, attributes, and in fact any module with an all-lowercase name: it is conventional for pragmatic modules to be defined using all lowercase letters. Unlike functional modules, their effect tends to be felt at compile time, rather than run time. A few pragmatic modules also define functions that we can call later, but not very many.

It sometimes comes as a surprise to programmers new to Perl that all pragmas are defined in terms of ordinary modules, all of which can be found as files in the standard Perl library. The strict pragma is implemented by strict.pm, for example. Although it is not necessary to understand exactly how this comes about, a short diversion into the workings of pragmatic modules can be educational.

How Pragmatic Modules Work

Many of these modules work their magic by working closely with special variables such as $^H, which provides a bitmask of compiler "hints" to the Perl compiler, or $^W, which controls warnings. A quick examination of the strict module (the documentation for which is much longer than the actual code) illustrates how three different flags within $^H are tied to the use strict pragma:

package strict;
$strict::VERSION = "1.01";

my %bitmask = (
     refs => 0x00000002,
     subs => 0x00000200,
     vars => 0x00000400
);
sub bits {
    my $bits = 0;
    foreach my $s (@_){ $bits |= $bitmask{$s} || 0; };
    $bits;
}

sub import {
    shift;
    $^H |= bits(@_ ? @_ : qw(refs subs vars));
}

sub unimport {
    shift;
    $^H &=˜ bits(@_ ? @_ : qw(refs subs vars));
}
1;

From this, we can see that all the strict module really does is toggle the value of three different bits in the $^H special variable. The use keyword sets them, and the no keyword clears them. The %bitmask hash variable provides the mapping from the names we are familiar with to the numeric bit values they control.

The strict module is particularly simple, which is why we have used it here. The entirety of the code in strict.pm is shown earlier. Chapter 10 delves into the details of import and unimport methods and should make all of the preceding code clear.

Scope of Pragmatic Modules

Most pragmatic modules have lexical scope, since they control the manner in which Perl compiles code—by nature a lexical process. For example, this short program illustrates how strict references can be disabled within a subroutine to allow symbolic references:

#!/usr/bin/perl
# pragmascope.pl
use warnings;
use strict;
# a subroutine to be called by name
sub my_sub {
    print @_;
}

# a subroutine to call other subroutines by name
sub call_a_sub {
    # allow symbolic references inside this subroutine only
    no strict 'refs';

    my $sub = shift;
    # call subroutine by name - a symbolic reference
    &$sub(@_);
}

# all strict rules in effect here
call_a_sub('my_sub', "Hello pragmatic world ");

Running this program produces the following output:

> perl pragmascope.pl


Hello pragmatic world

The exceptions are those pragmas that predeclare symbols, variables, and subroutines in preparation for the run-time phase, or modify the values of special variables, which generally have a file-wide scope.

The Special Hash %INC

As mentioned earlier, any file or module that the do, require, and use statements load is recorded in the special hash %INC, which we can then examine to see what is loaded in memory. The keys of %INC are the names of the modules requested, converted to a pathname so that :: becomes something like / or instead. The values are the names of the actual files that were loaded as a result, including the path where they were found. Loading a new module updates the contents of this hash as shown in the following example:

#!/usr/bin/perl
# INC.pl
use strict;

print "\%INC contains: ";
     foreach (keys %INC) {
     print "  $INC{$_} ";
}

require File::Copy;
do '/home/perl/include.pl';

print " \%INC now contains: ";
     foreach (keys %INC) {
     print "  $INC{$_} ";
}

The program execution command and corresponding output follows:

> perl INC.pl


%INC contains:
  /usr/lib/perl5/5.8.5/strict.pm

%INC now contains:
  /usr/lib/perl5/5.8.5/strict.pm
  /usr/lib/perl5/5.8.5/vars.pm
  /usr/lib/perl5/5.8.5/File/Copy.pm
  /usr/lib/perl5/5.8.5/File/Spec/Unix.pm
  /usr/lib/perl5/5.8.5/warnings/register.pm
  /usr/lib/perl5/5.8.5/i586-linux-thread-multi/Config.pm
  /usr/lib/perl5/5.8.5/Exporter.pm
  /usr/lib/perl5/5.8.5/warnings.pm
  /usr/lib/perl5/5.8.5/File/Spec.pm
  /usr/lib/perl5/5.8.5/Carp.pm

Note that %INC contains Exporter.pm and Carp.pm, although we have not loaded them explicitly in our example. The reason for this is that the former is required and the latter is used by Copy.pm, also required in the example. For instance, the IO module is a convenience module that loads all the members of the IO:: family. Each of these loads further modules. The result is that no less than 29 modules loaded as a consequence of issuing the simple directive use IO.

It should also be noted that we did not specify in our example the full path to the modules. use and require, as well as modules like ExtUtils::Installed (more on this later in the chapter), look for their modules in the paths specified by the special array @INC.

The Special Array @INC

This built-in array is calculated when Perl is built and is provided automatically to all programs. To find the contents of @INC, we can run a one-line Perl script like the following for a Linux terminal:

> perl –e 'foreach (@INC) { print "$_ "; }'

On a Linux Perl 5.6 installation, we get the following listing of the pathnames that are tried by default for locating modules:


/usr/local/lib/perl5/5.6.0/i686-linux-thread
/usr/local/lib/perl5/5.6.0
/usr/local/lib/perl5/site_perl/5.6.0/i686-linux-thread
/usr/local/lib/perl5/site_perl/5.6.0
/usr/local/lib/perl5/site_perl

Equivalently for Windows, the Perl script is

> perl -e "foreach (@INC) { print "$_ ";}"


C:/perl/ActivePerl/lib
C:/perl/ActivePerl/site/lib
.

When we issue a require or use to load a module, Perl searches this list of directories for a file with the corresponding name, translating any instances of :: (or the archaic `) into directory separators. The first file that matches is loaded, so the order of the directories in @INC is significant.

It is not uncommon to want to change the contents of @INC, to include additional directories into the search path or (less commonly) to remove existing directories. We have two basic approaches to doing this—we can either modify the value of @INC from outside the application or modify @INC (directly or with the use lib pragma) from within it.

Modifying @INC Externally

We can augment the default value of @INC with three external mechanisms: the -I command-line option and the PERL5OPT and PERL5LIB environment variables.

The -I option takes one or more comma-separated directories as an argument and adds them to the start of @INC:

> perl -I/home/httpd/perl/lib,/usr/local/extra/lib/modules perl -e 'print join
 " ",@INC'


/home/httpd/perl/lib
/usr/local/extra/lib/modules
/usr/local/lib/perl5/5.8.5/i686-linux-thread
/usr/local/lib/perl5/5.8.5
/usr/local/lib/perl5/site_perl/5.8.5/i686-linux-thread
/usr/local/lib/perl5/site_perl/5.8.5
/usr/local/lib/perl5/site_perl

We can define the same option to equivalent effect within PERL5OPT, along with any other options we want to pass:

> PERL5OPT="I/home/httpd/perl/lib,/usr/local/p5lib" perl -e 'print join " ",@INC'

Note we do not include the leading minus in the environment variable; otherwise it is the same as specifying the option on the command line. However, if all we want to do is provide additional search locations, or we want to separate library paths from other options, we should use PERL5LIB instead. This takes a colon-separated list of paths in the same style as the PATH environment variable in Unix shells:

> PERL5LIB="/home/httpd/perl/lib:/usr/local/p5lib" perl -e 'print join " ",@INC'

Modifying @INC Internally

Since @INC is an array, all of the standard array manipulation functions will work on it:

# add directory to end of @INC
push @INC, "/home/httpd/perl/lib";

# add current directory to start of @INC using the 'getcwd'
# function of the 'Cwd' module
use Cwd;
unshift @INC, getcwd();

However, since the use directive causes modules to be loaded at compile time rather than run time, modifying @INC this way will not work for used modules, only required ones. To modify @INC so that it takes effect at compile time, we must enclose it in a BEGIN block:

# add directory to start of @INC at compile-time
BEGIN {
    unshift @INC, '/home/httpd/perl/lib';
}

use MyModule;   # a module in 'home/httpd/perl/lib'...
...

The use lib Pragma

Since BEGIN blocks are a little clunky, we can instead use the lib pragma to add entries to @INC in a friendlier manner. As well as managing the contents of @INC more intelligently, this module provides both a more legible syntax and a degree of error checking over what we try to add. The use lib pragma takes one or more library paths and integrates them into @INC. This is how we could add the directory /home/httpd/perl/lib using the lib pragma:

use lib '/home/httpd/perl/lib';

This is almost but not quite the same as using an unshift statement inside a BEGIN block, as in the previous example. The difference is that if an architecture-dependent directory exists under the named path and it contains an auto directory, then this directory is assumed to contain architecture-specific modules and is also added, ahead of the path named in the pragma. In the case of the Linux system used as an example earlier, this would attempt to add the directories

/home/httpd/perl/lib/i386-linux/auto
/home/httpd/perl/lib

Note that the first directory is only added if it exists, but the actual path passed to lib is added regardless of whether it exists or not. If it does exist, however, it must be a directory; attempting to add a file to @INC will produce an error from the lib pragma.

We can also remove paths from @INC with the no directive:

no lib 'home/httpd/perl/lib';

This removes the named library path or paths, and it also removes any corresponding auto directories, if any exist.

The lib pragma has two other useful properties that make it a superior solution to a BEGIN block. First, if we attempt to add the same path twice, the second instance is removed. Since paths are added to the front of @INC, this effectively allows us to bump a path to the front:

# search for modules in site_perl directory first
use lib '/usr/lib/perl5/site_perl';

Second, we can restore the original value of @INC as built-in to Perl with the statement

@INC = @lib::ORIG_INC;

Note that the lib pragma only accepts Unix-style paths, irrespective of the platform—this affects Windows in particular.

Locating Libraries Relative to the Script

A common application of adding a library to @INC is to add a directory whose path is related to that of the script being run. For instance, the script might be in /home/httpd/perl/bin/myscript and the library modules that support it in /home/httpd/perl/lib. It is undesirable to have to hard-code this information into the script, however, since then we cannot relocate or install it in a different directory.

One way to solve this problem is to use the getcwd function from Cwd.pm to determine the current directory and calculate the location of the library directory from it. However, we do not need to, because Perl provides the FindBin module for exactly this purpose.

FindBin calculates paths based on the current working directory and generates six variables containing path information, any of which we can either import into our own code or access directly from the module. These variables are listed in Table 9-1.

Table 9-1. FindBin Variables

Variable Path Information
$Bin The path to the directory from which the script was run
$Dir An alias for $Bin
$Script The name of the script
$RealBin The real path to the directory from which the script was run, with all symbolic links resolved
$RealDir An alias for $RealBin
$RealScript The real name of the script, with all symbolic links resolved

Using FindBin we can add a relative library directory by retrieving the $Bin/$Dir or $RealBin/$RealDir variables and feeding them, suitably modified, to a use lib pragma:

use FindBin qw($RealDir);   # or $Bin, $Dir, or $RealBin ...
use lib "$RealDir/../lib";

Using the FindBin module has significant advantages over trying to do the same thing ourselves with getcwd and its relatives. It handles various special cases for Windows and VMS systems, and it deals with the possibility that the script name was passed to Perl on the command line rather than triggering Perl via a #! header (and of course it's shorter too).

Variations on FindBin are available from CPAN. Two worth mentioning are FindBin::Real and FindBin::libs. FindBin::Real is a functional module with the same features as FindBin but with the variables replaced with subroutines. It handles cases where modules in different directories both attempt to determine the script path, and so is preferred for use in modules. (In typical use, only the script itself needs to find out where it came from.) FindBin::libs takes note of the fact that FindBin is mostly used to locate library directories and combines the two into a single module that first locates library directories relative to the script and then adds them to @INC.

Checking for the Availability of a Module

One way to check if a given module is available is to look in the %INC hash to see if the module is present. We can avoid fatal errors by checking for each module and using it only if already loaded. In the following example, if Module1 is loaded, then we use it, otherwise we look to see if Module2 is loaded:

if ($INC{'Module1'}) {
    # use some functions from Module1
} elsif ($INC{'Module2'}) {
    # use some functions from Module2
}

However, the simplest way would be to try to load it using require. Since this ordinarily produces a fatal error, we use an eval to protect the program from errors:

warn "GD module not available" unless eval {require GD; 1};

In the event that the GD module, which is a Perl interface to the libgd graphics library, is not available, eval returns undef, and the warning is emitted. If it does exist, the 1 at the end of the eval is returned, suppressing the warning. This gives us a way of optionally loading modules if they are present but continuing without them otherwise, so we can enable optional functionality if they are present. In this case, we can generate graphical output if GD is present or resort to text otherwise. The special variable $@ holds the syntax error message that is generated by the last eval function.

Note that a serious problem arises with the preceding approach if require is replaced with use. The reason is that eval is a run-time function, whereas use is executed at compile time. So, use GD would be executed before anything else, generating a fatal error if the GD module was not available. To solve this problem, simply enclose the whole thing in a BEGIN block, making sure that the whole block is executed at compile time:

BEGIN {
    foreach ('GD', 'CGI', 'Apache::Session') {
       warn "$_ not available" unless eval "use $_; 1";
    }
}

Finding Out What Modules Are Installed

We can find out which library module packages are installed and available for use with the ExtUtils::Installed module. This works not by scanning @INC for files ending in .pm, but by analyzing the .packlist files left by module distributions during the installation process. Not unsurprisingly, this may take the module a few moments to complete, especially on a large and heavily extended system, but it is a lot faster than searching the file system. Only additional modules are entered here, however—modules supplied as standard with Perl are not included.

Scanning .packlist files allows the ExtUtils::Installed module to produce more detailed information, for example, the list of files that should be present for a given module package. Conversely, this means that it does not deal with modules that are not installed but are simply pointed to by a modified @INC array. This is one good reason to create properly installable modules, which we discuss later in the chapter. The resulting list is of installed module packages, not modules, and the standard Perl library is collected under the name Perl, so on a standard Perl installation we may expect to see only Perl returned from this module.

To use the module, we first create an ExtUtils::Installed object with the new method:

use ExtUtils::Installed;
$inst = ExtUtils::Installed->new();

On a Unix-based system, this creates an installation object that contains the details of all the .packlist files on the system, as determined by the contents of @INC. If we have modules present in a directory outside the normal directories contained in @INC, then we can include the extra directory by modifying @INC before we create the installation object, as we saw at the start of the previous section.

Once the installation object is created, we can list all available modules in alphabetical order with the modules method. For example, this very short script simply lists all installed modules:

# list all installed modules;
print join " ", $inst->modules();

On a standard Perl installation this will produce just the word "Perl", or possibly Perl plus one or two other modules in vendor-supplied installations, as standard library modules are not listed individually A more established Perl installation with additional packages installed might produce something like this:


Apache::DBI
Apache::Session
Archive::Tar
CGI
CPAN
CPAN::WAIT
Compress::Zlib
Curses
DBI::FAQ
Date::Manip
Devel::Leak
Devel::Symdump
Digest::MD5
...

The ExtUtils::Installed module does far more than simply list installed module packages, however. It provides the basics of library package management by providing us with the ability to list the files and directories that each module distribution created when it was installed, and to verify that list against what is currently present. In addition to new and modules we saw in action earlier, ExtUtils::Installed provides six other methods that are listed in Table 9-2.

Table 9-2. ExtUtils::Installed Methods

Method Description
directories Returns a list of installed directories for the module. For example:
@dirs = $inst->directories($module);
A second optional parameter of prog, doc, or all (the default) may be given to restrict the returned list to directories containing code, manual pages, or both:
directories(module, 'prog'|'doc'|'all'),
Further parameters are taken to be a list of directories within which all returned directories must lie:
directories(module, 'prog'|'doc'|'all', @dirs);
For instance, this lists installed directories contained by @locations:
@dirs = $inst->directories($module, 'prog', @locations);
directory_tree Returns a list of installed directories for the module, in the same way as directories, but also including any intermediate directories between the actual installed directories and the directories given as the third and greater parameters:
directory_tree(module, 'prog'|'doc'|'all', @dirs);
For instance, the following example lists installed directories and parents under /usr:
@dist = $inst->directories($module, 'all', '/usr'),
files Returns a list of installed files for the module, for example:
@files = $inst->files($module);
A second optional parameter of prog, doc, or all may be given to restrict the returned list to files containing code, documentation, or both:
files (module, 'prog'|'doc'|'all')
Further parameters are taken to be a list of directories within which all returned files must lie:
files(module, 'prog'|'doc'|'all', @dirs)
This is how we list the installed files contained by @dirs:
@files = $inst->files($module, 'prog', @dirs);
packlist Returns an ExtUtils::Packlist object containing the raw details of the .packlist file for the given module:
packlist(module);
See the ExtUtils::Packlist documentation for more information.
validate Checks the list of files and directories installed against those currently present, returning a list of all files and directories missing. If nothing is missing, an empty list is returned:
validate(module);
For instance:
$valid = $inst->validate($module)?0:1;
version Returns the version number of the module, or undef if the module does not supply one. The CPAN module uses this when the r command is used to determine which modules need updating, for example:
version(module);

The ability to distinguish file types is a feature of the extended .packlist format in any recent version of Perl. Note that not every installed module yet provides a packing list that supplies this extra information, so many modules group all their installed files and directories under prog (the assumed default) and nothing under doc. To get a more accurate and reliable split between program and documentation files, we can use additional paths such as /usr/lib/perl5/man as the third and greater parameters.

As a more complete example of how we can use the features of ExtUtils::Installed, here is a short script to run on Unix that lists every installed module distribution, the files that it contains, and the version of the package, complete with a verification check:

#!/usr/bin/perl
# installedfiles.pl
use warnings;
use strict;

use ExtUtils::Installed;

my $inst = new ExtUtils::Installed;

foreach my $package ($inst->modules) {
    my $valid = $inst->validate($package)?"Failed":"OK";
    my $version = $inst->version($package);
    $version = 'UNDEFINED' unless defined $version;

    print " --- $package v$version [$valid] --- ";
    if (my @source = $inst->files($package, 'prog')) {
       print " ", join " ", @source;
    }
    if (my @docs = $inst->files($package, 'doc')) {
       print " ", join " ", @docs;
    }
}

Postponing Module Loading Until Use with autouse

Modules can be very large, frequently because they themselves use other large modules. It can, therefore, be convenient to postpone loading them until they are actually needed. This allows a program to start faster, and it also allows us to avoid loading a module at all if none of its features are actually used.

We can achieve this objective with the autouse pragmatic module, which we can use in place of a conventional use statement to delay loading the module. To use it, we need to specify the name of the module, followed by a => (since that is more legible than a comma) and a list of functions:

use autouse 'Module' => qw(sub1 sub2 Module::sub3);

This will predeclare the named functions, in the current package if not qualified with a package name, and trigger the loading of the module when any of the named functions are called:

sub1("This causes the module to be loaded");

We can also supply a prototype for the subroutine declaration, as in

use autouse 'Module' => 'sub3($$@)';

However, there is no way for this prototype to be checked against the real subroutine since it has not been loaded, so if it is wrong we will not find out until we attempt to run the program.

There are two important caveats to bear in mind when using the autouse pragma. First, the module will only be loaded when one of the functions named on the autouse line is seen. Attempting to call another function in the module, even if it is explicitly called with a package name, will cause a run-time error unless the module has already been loaded. For instance, this does not delay loading the Getopt::Long module:

use autouse 'Getopt::Long';
# ERROR: ''Getopt::Long' is not loaded, so 'GetOptions' is unavailable
GetOptions(option =>$verbose);

But this does:

use autouse 'Getopt::Long' => 'GetOptions';
# OK, 'GetOptions' triggers load of 'Getopt::Long'
GetOptions(option =>$verbose);

Second, autouse only works for modules that use the default import method provided by the Exporter module (see Chapter 10). Modules that provide their own import method such as the CGI module cannot be used this way, unless they in turn inherit from Exporter. Any module that defines an export tag like :tagname falls into this category. Such modules frequently provide their own specialized loading techniques instead, CGI.pm being one good example.

A significant problem when using autouse is that initialization of modules that have been autoused does not occur until they are needed at run time. BEGIN blocks are not executed, nor are symbols imported. This can cause significant problems, as well as hiding syntax errors that would otherwise be found at compile time. For this reason, it is smart to include modules directly for development and testing purposes and to only use autouse in the production version (though we must still test that, we can at least eliminate autouse as a cause of problems in the debugging phase).

Alternatively, use a debug flag to switch between the two. Perl's -s option and the if pragma make a handy way to achieve this:

#!/usr/bin/perl -s -w
# debugorautouse.pl
use strict;
use vars '$debug';
use if  $debug,            'File::Basename' => 'basename';
use if !$debug, autouse => 'File::Basename' => 'basename';

print "Before: ",join(",",keys %INC)," ";
my $prog=basename($0);
print "After : ",join(",",keys %INC)," ";

If we execute this program with no arguments, we will see that File::Basename is not loaded before the call to basename but is loaded afterward. If on the other hand we run the program as

> debugorautouse.pl -debug

then the -s option that is specified on the first line causes the global variable $debug to be set. In turn this cases the module to be loaded immediately, so File::Basename appears in the list both before and after the call is made. If we want to actually process command-line arguments, the -s option is not so convenient, but we can as easily test for a suitable invented environment variable like $ENV{PLEASE_DEBUG_MY_PERL}.

If a module is already present when an autouse declaration is seen, it is translated directly into the equivalent use statement. For example:

use Module;
use autouse 'Module' => qw(sub1 sub2);

is the same as

use Module qw(sub1 sub2);

This means that it does no harm to attempt to autouse a module that is already loaded (something that might commonly happen inside a module, which has no idea what is already loaded), but conversely the autouse provides no benefit.

The autouse module is an attempt to provide load-on-demand based on the requirements of the user. The AUTOLOAD subroutine and the AutoLoader and Selfloader modules also provide us with the ability to load modules and parts of modules on demand, but as part of the module's design. See Chapter 10 for more details.

Summary

Over the course of this chapter, we examined what Perl modules are and how they related to files and packages. We started out with an examination of the do, require, and use statements and the differences between them. We then went on to look at the import mechanism provided by Perl and how it can be used to add definitions from modules that we use. We considered the difference between functional and pragmatic modules and found that pragmatic modules turn out to be very much like their functional brethren.

Perl searches for modules using the special array variable @INC and places the details of what was found where in the corresponding special hash variable %INC. We saw how to manipulate @INC in various ways, including directly, and the use lib pragma. We also found out how to ask Perl what modules have been added to the library that did not originally come with Perl.

Finally, we looked at delaying the loading of modules until they are needed with the autouse pragma. This has powerful possibilities for limiting the impact of a Perl application on memory, but not without drawbacks, notably that if a dependent module is not present we will not find out at compile time. Instead, we will only know the first time the application tries to use something from it, which could be a considerable time after it started.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset