Modules are the basic unit of reusable code in Perl, the equivalent of libraries in other languages. Perl's standard library is almost entirely made up of modules. When we talk about Perl libraries, we usually mean modules that are included as part of the standard Perl library—the collection of Perl code that comes with the standard Perl distribution. Although the words "module" and "library" are frequently used interchangeably, they are actually not quite equivalent because not all libraries are implemented as modules. Modules are closely related to packages, which we have already been exposed to in previous chapters. In many cases a module is the physical container of a package—the file in which the package is defined. As a result, they are often named for the package they implement: the CGI.pm
module implements the CGI
package, for instance.
A library is simply a file supplied by either the standard Perl library or another package that contains routines that we can use in our own programs. Older Perl libraries were simple collections of routines collected together into a file, generally with a .pl
extension. The do
and require
functions can be used to load this kind of library into our own programs, making the routines and variables they define available in our own code.
Modern Perl libraries are defined as modules and included into programs with the use
directive, which requires that the module name ends in .pm
. The use
keyword provides additional complexity over do
and require
, the most notable difference being that the inclusion happens at compile time rather than run time.
Modules come in two distinct flavors, functional and pragmatic. Functional modules are generally just called modules or library modules. They provide functionality in the form of routines and variables that can be used from within our own code. They are what we usually think of as libraries. Conversely, pragmatic modules implement pragmas that modify the behavior of Perl at compile time, adding to or constraining Perl's syntax to permit additional constructs or limit existing ones. They can easily be told apart because pragmatic modules are always in lowercase and rarely more than one word long. Functional libraries use uppercase letters and often have more than one word in their name, separated by double semicolons. Since they are modules, they are also loaded using the use
directive. The strict, vars
, and warnings
modules are all examples of pragmatic modules that we frequently use in Perl scripts. Some pragmatic modules also provide routines and variables that can be used at run time, but most do not.
In this chapter, we will examine the different ways of using modules in our scripts. In the next chapter, we will look inside modules, that is, examine them from the perspective of how they work as opposed to how to use them.
Perl provides three mechanisms for incorporating code (including modules) found in other files into our own programs. These are the do, require
, and use
statements, in increasing order of complexity and usefulness. Any code that is loaded by these statements is recorded in the special hash %INC
(more about this later).
The simplest is do
, which executes the contents of an external file by reading it and then evaling
the contents. If that file happens to contain subroutine declarations, then those declarations are evaluated and become part of our program:
do '/home/perl/loadme.pl';
A more sophisticated version of do
is the require
statement. If the filename is defined within quotes, it is looked for as-is; otherwise it appends a .pm
extension and translates any instance of ::
into a directory separator:
# include the old-style (and obsolete) getopts library
require 'getopts.pl';
# include the newer Getopt::Std library
# (i.e. PATH/Getopt/Std.pm)
require Getopt::Std;
The first time require
is asked to load a file, it checks that the file has not already been loaded by looking in the %INC
hash. If it has not been loaded yet, it searches for the file in the paths contained in the special array @INC
.
More sophisticated still is the use
statement. This does exactly the same as require
, but it evaluates the included file at compile time, rather than at run time as require
does. This allows modules to perform any necessary initializations and to modify the symbol table with subroutine declarations before the main body of the code is compiled. This in turn allows syntax checks to recognize valid symbols defined by the module and flag errors on nonexistent ones. For example, this is how we include the Getopt::Std
module at compile time:
# include Getopt::Std at compile time
use Getopt::Std;
Like require, use
takes a bare unquoted module name as a parameter, appending a .pm
to it and translating instances of ::
or the archaic `
into directory separators. Unlike require, use
does not permit any filename to be specified with quotes and will flag a syntax error if we attempt to do so. Only true library modules may be included via use
.
The traditional way to cause code to be executed at compile time is with a BEGIN
block, so this is (almost) equivalent to
BEGIN {
require Getopt::Std;
}
However, use
also attempts to call the import
method (an object-oriented subroutine) in the module being included, if present. This provides the module with the opportunity to define symbols in our namespace, making it simpler to access its features without prefixing them with the module's package name. use Module
is therefore actually equivalent to
BEGIN {
require Module;
Module->import; # or 'import Module'
}
This one simple additional step is the foundation of Perl's entire import
mechanism. There is no more additional complexity or built-in support for handling modules or importing variables and subroutine names. It is all based around a simple function call that happens at compile time. The curious thing about this is that there is no requirement to actually export anything from an import subroutine. Object-oriented modules rarely export symbols and so often commandeer the import mechanism to configure the class data of the module instead. A statement like use strict vars
is a trivial but common example of this alternative use in action.
As the preceding expansion of use
illustrates, the automatic calling of import
presumes that the module actually creates a package of the same name. The import method is searched for in the package with the name of the module, irrespective of what loading the module actually causes to be defined.
As we just discussed, the major advantage that use
has over require
is the concept of importing. While it is true that we can import directly by simply calling import
ourselves, it is simpler and more convenient with use
.
If we execute a use
statement with only a module name as an argument, we cause the import
subroutine within the module to be called with no argument. This produces a default response from the module. This may be to do nothing at all, or it may cause the module to import a default set of symbols (subroutines, definitions, and variables) for our use. Object-oriented modules tend not to import anything, since they expect us to use them by calling methods. If a module does not provide an import method, nothing happens of course.
Function-oriented modules often import symbols by default and optionally may import further symbols if we request them in an import list. An import list is a list of items specified after the module name. It can be specified as a comma-separated list within parentheses or as a space-separated list within a qw
operator. If we only need to supply one item, we can also supply it directly as a string. However, whatever the syntax, it is just a regular Perl list:
# importing a list of symbols with a comma-separated list:
use Module ('sub1', 'sub2', '$scalar', '@list', ':tagname'),
# it is more legible to use 'qw':
use Module qw(sub1 sub2 $scalar @list :tagname);
# a single symbol can be specified as a simple string:
use Module 'sub1';
# if strict references are not enabled, a bareword can be used:
use strict refs;
The items in the list are interpreted entirely at the discretion of the module. For functional modules, however, they are usually symbols to be exported from the module into our own namespace. Being selective about what we import allows us to constrain imports to only those that we actually need. Symbols can either be subroutine names, variables, or sometimes tags, prefixed by a :
, if the module in question supports them. These are a feature of the Exporter
module, which is the source of many modules' import mechanisms. We discuss it from the module developer's point of view in Chapter 10.
We cannot import any symbol or tag into our code—the module must define it to be able to export it. A few modules like the CGI
module have generic importing functions that handle anything we pass to them. However, most do not, and we will generate a syntax error if we attempt to export symbols from the module that it does not supply.
Sometimes we want to be able to use
a module without importing anything from it, even by default. To do that, we can specify an empty import list, which is subtly different from supplying no import list at all:
use Module; # import default symbols
use Module(); # suppress all imports
When used in this second way, the import step is skipped entirely, which can be handy if we wish to make use of a module but not import anything from it, even by default (recall that we can always refer to a package's variables and subroutines by their fully qualified names). The second use
statement is exactly equivalent to a require
, except that it takes place at compilation time, that is
BEGIN { require Module; }
Note that we cannot suppress the default import list and then import a specific symbol—any import will trigger the default (unless of course the module in question has an import
method that will allow this). Remember that the entire import mechanism revolves around a subroutine method called import
in the module being loaded.
no
, which is the opposite of the use
directive, attempts to unimport features imported by use
. This concept is entirely module dependent. In reality, it is simply a call to the module's unimport
subroutine. Different modules support this in different ways, including not supporting it at all. For modules that do support no
, we can unimport a list of symbols with
no Module qw(symbol1 symbol2 :tagname);
This is equivalent to
BEGIN {
require Module;
unimport('symbol1', 'symbol2', ':tagname'),
}
Unlike use
, a no
statement absolutely needs the subroutine unimport
to exist—there would be no point without it. A fatal error is generated if this is not present.
Whether or not no
actually removes the relevant symbols from our namespace or undoes whatever initialization was performed by use
depends entirely on the module. It also depends on what its unimport
subroutine actually does. Note that even though it supposedly turns off features, no
still require
s the module if it has not yet been loaded. In general, no
happens after a module has been use
d, so the require
has no effect as the module will already be present in %INC
.
An interesting variation on use
is use if
, which allows us to conditionally load a module based on arbitrary criteria. Since use
happens at compile time, we must make sure to use constructs that are already defined at that time, such as special variables or environment variables. For example:
use if $ENV{WARNINGS_ON},"warnings";
This will enable warnings in our code if the environment variable WARNINGS_ON
is defined. Functional modules can also be loaded the same way:
use if $ENV{USE_XML_PARSER},"XML::Parser";
use if !$ENV{USE_XML_PARSER},"XML::SAX";
The if
pragma is, of course, implemented by the if.pm
module and is a perfectly ordinary piece of Perl code. To do its magic, it defines an import
that loads the specified module with require
if the condition is met. From this we can deduce that it works with use
, but not require
, since that does not automatically invoke import
.
If we need to specify an import list, we can just tack it on to the end of the statement as usual. The following statement loads and imports a debug subroutine from My::Module::Debug
if the program name (stored in $0
) has the word debug
in its name. If not, an empty debug subroutine is defined, which Perl will then optimize out of the code wherever it is used.
use if $0=˜/debug/ My::Module::Debug => qw(debug);
*debug = sub { } unless *debug{CODE};
An unimport
method is also defined so we can also say no if
:
use strict;
no if $ENV{NO_STRICT_REFS} => strict => 'refs';
This switches on all strict modes, but then it switches off strict references (both at compile time) if the environment variable NO_STRICT_REFS
has a true value.
Quite separately from their usual usage, both the require
and use
directives support an alternative syntax, taking a numeric value as the first or only argument. When specified on its own, this value is compared to the version of Perl itself. It causes execution to halt if the comparison reveals that the version of Perl being used is less than that stated by the program. For instance, to require
that only Perl version 5.6.0 or higher is used to run a script, we can write any of the following:
require 5.6.0;
use 5.6.0;
require v5.6.0; # archaic as of Perl 5.8, see Chapter 3
Older versions of Perl used a version resembling a floating-point number. This format is also supported, for compatibility with older versions of Perl:
require 5.001; # require Perl 5.001 or higher
require 5.005_03; # require Perl 5.005 patch level 3 or higher
Note that for patch levels (the final part of the version number), the leading zero is important. The underscore is just a way of separating the main version from the patch number and is a standard way to write numbers in Perl, not a special syntax. 5.005_03
is the same as 5.00503
, but more legible.
A version number may also be specified after a module name (and before the import list, if any is present), in which case it is compared to the version defined for the module. For example, to require CGI.pm
version 2.36 or higher, we can write
use CGI 2.36 qw(:standard);
If the version of CGI.pm
is less than 2.36
, this will cause a compile-time error and abort the program. Note that there is no comma between the module name, the version, or the import list. As of Perl 5.8, a module that does not define a version at all will fail if a version number is requested. Prior to this, such modules would always load successfully.
It probably comes as no surprise that, like the import mechanism, this is not built-in functionality. Requesting a version simply calls a subroutine called VERSION()
to extract a numeric value for comparison. Unless we override it with a local definition, this subroutine is supplied by the UNIVERSAL
module, from which all packages inherit. In turn, UNIVERSAL::VERSION()
returns the value of the variable $PackageName::VERSION
. This is how most modules define their version number.
Pragmatic modules implement pragmas that alter the behavior of the Perl compiler to expand or constrict the syntax of the Perl language itself. One such pragma that should be familiar to us by now is the strict
pragma. Others are vars, overload, attributes
, and in fact any module with an all-lowercase name: it is conventional for pragmatic modules to be defined using all lowercase letters. Unlike functional modules, their effect tends to be felt at compile time, rather than run time. A few pragmatic modules also define functions that we can call later, but not very many.
It sometimes comes as a surprise to programmers new to Perl that all pragmas are defined in terms of ordinary modules, all of which can be found as files in the standard Perl library. The strict
pragma is implemented by strict.pm
, for example. Although it is not necessary to understand exactly how this comes about, a short diversion into the workings of pragmatic modules can be educational.
Many of these modules work their magic by working closely with special variables such as $^H
, which provides a bitmask of compiler "hints" to the Perl compiler, or $^W
, which controls warnings. A quick examination of the strict
module (the documentation for which is much longer than the actual code) illustrates how three different flags within $^H
are tied to the use strict
pragma:
package strict;
$strict::VERSION = "1.01";
my %bitmask = (
refs => 0x00000002,
subs => 0x00000200,
vars => 0x00000400
);
sub bits {
my $bits = 0;
foreach my $s (@_){ $bits |= $bitmask{$s} || 0; };
$bits;
}
sub import {
shift;
$^H |= bits(@_ ? @_ : qw(refs subs vars));
}
sub unimport {
shift;
$^H &=˜ bits(@_ ? @_ : qw(refs subs vars));
}
1;
From this, we can see that all the strict module really does is toggle the value of three different bits in the $^H
special variable. The use
keyword sets them, and the no
keyword clears them. The %bitmask
hash variable provides the mapping from the names we are familiar with to the numeric bit values they control.
The strict
module is particularly simple, which is why we have used it here. The entirety of the code in strict.pm
is shown earlier. Chapter 10 delves into the details of import and unimport methods and should make all of the preceding code clear.
Most pragmatic modules have lexical scope, since they control the manner in which Perl compiles code—by nature a lexical process. For example, this short program illustrates how strict
references can be disabled within a subroutine to allow symbolic references:
#!/usr/bin/perl
# pragmascope.pl
use warnings;
use strict;
# a subroutine to be called by name
sub my_sub {
print @_;
}
# a subroutine to call other subroutines by name
sub call_a_sub {
# allow symbolic references inside this subroutine only
no strict 'refs';
my $sub = shift;
# call subroutine by name - a symbolic reference
&$sub(@_);
}
# all strict rules in effect here
call_a_sub('my_sub', "Hello pragmatic world
");
Running this program produces the following output:
> perl pragmascope.pl
Hello pragmatic world
The exceptions are those pragmas that predeclare symbols, variables, and subroutines in preparation for the run-time phase, or modify the values of special variables, which generally have a file-wide scope.
As mentioned earlier, any file or module that the do, require
, and use
statements load is recorded in the special hash %INC
, which we can then examine to see what is loaded in memory. The keys of %INC
are the names of the modules requested, converted to a pathname so that ::
becomes something like /
or instead. The values are the names of the actual files that were loaded as a result, including the path where they were found. Loading a new module updates the contents of this hash as shown in the following example:
#!/usr/bin/perl
# INC.pl
use strict;
print "\%INC contains:
";
foreach (keys %INC) {
print " $INC{$_}
";
}
require File::Copy;
do '/home/perl/include.pl';
print "
\%INC now contains:
";
foreach (keys %INC) {
print " $INC{$_}
";
}
The program execution command and corresponding output follows:
> perl INC.pl
%INC contains:
/usr/lib/perl5/5.8.5/strict.pm
%INC now contains:
/usr/lib/perl5/5.8.5/strict.pm
/usr/lib/perl5/5.8.5/vars.pm
/usr/lib/perl5/5.8.5/File/Copy.pm
/usr/lib/perl5/5.8.5/File/Spec/Unix.pm
/usr/lib/perl5/5.8.5/warnings/register.pm
/usr/lib/perl5/5.8.5/i586-linux-thread-multi/Config.pm
/usr/lib/perl5/5.8.5/Exporter.pm
/usr/lib/perl5/5.8.5/warnings.pm
/usr/lib/perl5/5.8.5/File/Spec.pm
/usr/lib/perl5/5.8.5/Carp.pm
Note that %INC
contains Exporter.pm
and Carp.pm
, although we have not loaded them explicitly in our example. The reason for this is that the former is require
d and the latter is use
d by Copy.pm
, also require
d in the example. For instance, the IO
module is a convenience module that loads all the members of the IO::
family. Each of these loads further modules. The result is that no less than 29 modules loaded as a consequence of issuing the simple directive use IO
.
It should also be noted that we did not specify in our example the full path to the modules. use
and require
, as well as modules like ExtUtils::Installed
(more on this later in the chapter), look for their modules in the paths specified by the special array @INC
.
This built-in array is calculated when Perl is built and is provided automatically to all programs. To find the contents of @INC
, we can run a one-line Perl script like the following for a Linux terminal:
> perl –e 'foreach (@INC) { print "$_
"; }'
On a Linux Perl 5.6 installation, we get the following listing of the pathnames that are tried by default for locating modules:
/usr/local/lib/perl5/5.6.0/i686-linux-thread
/usr/local/lib/perl5/5.6.0
/usr/local/lib/perl5/site_perl/5.6.0/i686-linux-thread
/usr/local/lib/perl5/site_perl/5.6.0
/usr/local/lib/perl5/site_perl
Equivalently for Windows, the Perl script is
> perl -e "foreach (@INC) { print "$_
";}"
C:/perl/ActivePerl/lib
C:/perl/ActivePerl/site/lib
.
When we issue a require
or use
to load a module, Perl searches this list of directories for a file with the corresponding name, translating any instances of ::
(or the archaic `
) into directory separators. The first file that matches is loaded, so the order of the directories in @INC
is significant.
It is not uncommon to want to change the contents of @INC
, to include additional directories into the search path or (less commonly) to remove existing directories. We have two basic approaches to doing this—we can either modify the value of @INC
from outside the application or modify @INC
(directly or with the use lib
pragma) from within it.
We can augment the default value of @INC
with three external mechanisms: the -I
command-line option and the PERL5OPT
and PERL5LIB
environment variables.
The -I
option takes one or more comma-separated directories as an argument and adds them to the start of @INC
:
> perl -I/home/httpd/perl/lib,/usr/local/extra/lib/modules perl -e 'print join
"
",@INC'
/home/httpd/perl/lib
/usr/local/extra/lib/modules
/usr/local/lib/perl5/5.8.5/i686-linux-thread
/usr/local/lib/perl5/5.8.5
/usr/local/lib/perl5/site_perl/5.8.5/i686-linux-thread
/usr/local/lib/perl5/site_perl/5.8.5
/usr/local/lib/perl5/site_perl
We can define the same option to equivalent effect within PERL5OPT
, along with any other options we want to pass:
> PERL5OPT="I/home/httpd/perl/lib,/usr/local/p5lib" perl -e 'print join "
",@INC'
Note we do not include the leading minus in the environment variable; otherwise it is the same as specifying the option on the command line. However, if all we want to do is provide additional search locations, or we want to separate library paths from other options, we should use PERL5LIB
instead. This takes a colon-separated list of paths in the same style as the PATH
environment variable in Unix shells:
> PERL5LIB="/home/httpd/perl/lib:/usr/local/p5lib" perl -e 'print join "
",@INC'
Since @INC
is an array, all of the standard array manipulation functions will work on it:
# add directory to end of @INC
push @INC, "/home/httpd/perl/lib";
# add current directory to start of @INC using the 'getcwd'
# function of the 'Cwd' module
use Cwd;
unshift @INC, getcwd();
However, since the use
directive causes modules to be loaded at compile time rather than run time, modifying @INC
this way will not work for use
d modules, only require
d ones. To modify @INC
so that it takes effect at compile time, we must enclose it in a BEGIN
block:
# add directory to start of @INC at compile-time
BEGIN {
unshift @INC, '/home/httpd/perl/lib';
}
use MyModule; # a module in 'home/httpd/perl/lib'...
...
The use lib Pragma
Since BEGIN
blocks are a little clunky, we can instead use the lib
pragma to add entries to @INC
in a friendlier manner. As well as managing the contents of @INC
more intelligently, this module provides both a more legible syntax and a degree of error checking over what we try to add. The use lib
pragma takes one or more library paths and integrates them into @INC
. This is how we could add the directory /home/httpd/perl/lib
using the lib
pragma:
use lib '/home/httpd/perl/lib';
This is almost but not quite the same as using an unshift
statement inside a BEGIN
block, as in the previous example. The difference is that if an architecture-dependent directory exists under the named path and it contains an auto
directory, then this directory is assumed to contain architecture-specific modules and is also added, ahead of the path named in the pragma. In the case of the Linux system used as an example earlier, this would attempt to add the directories
/home/httpd/perl/lib/i386-linux/auto
/home/httpd/perl/lib
Note that the first directory is only added if it exists, but the actual path passed to lib
is added regardless of whether it exists or not. If it does exist, however, it must be a directory; attempting to add a file to @INC
will produce an error from the lib
pragma.
We can also remove paths from @INC
with the no
directive:
no lib 'home/httpd/perl/lib';
This removes the named library path or paths, and it also removes any corresponding auto
directories, if any exist.
The lib
pragma has two other useful properties that make it a superior solution to a BEGIN
block. First, if we attempt to add the same path twice, the second instance is removed. Since paths are added to the front of @INC
, this effectively allows us to bump a path to the front:
# search for modules in site_perl directory first
use lib '/usr/lib/perl5/site_perl';
Second, we can restore the original value of @INC
as built-in to Perl with the statement
@INC = @lib::ORIG_INC;
Note that the lib
pragma only accepts Unix-style paths, irrespective of the platform—this affects Windows in particular.
A common application of adding a library to @INC
is to add a directory whose path is related to that of the script being run. For instance, the script might be in /home/httpd/perl/bin/myscript
and the library modules that support it in /home/httpd/perl/lib
. It is undesirable to have to hard-code this information into the script, however, since then we cannot relocate or install it in a different directory.
One way to solve this problem is to use the getcwd
function from Cwd.pm
to determine the current directory and calculate the location of the library directory from it. However, we do not need to, because Perl provides the FindBin
module for exactly this purpose.
FindBin
calculates paths based on the current working directory and generates six variables containing path information, any of which we can either import into our own code or access directly from the module. These variables are listed in Table 9-1.
Variable | Path Information |
$Bin |
The path to the directory from which the script was run |
$Dir |
An alias for $Bin |
$Script |
The name of the script |
$RealBin |
The real path to the directory from which the script was run, with all symbolic links resolved |
$RealDir |
An alias for $RealBin |
$RealScript |
The real name of the script, with all symbolic links resolved |
Using FindBin
we can add a relative library directory by retrieving the $Bin/$Dir
or $RealBin/$RealDir
variables and feeding them, suitably modified, to a use lib
pragma:
use FindBin qw($RealDir); # or $Bin, $Dir, or $RealBin ...
use lib "$RealDir/../lib";
Using the FindBin
module has significant advantages over trying to do the same thing ourselves with getcwd
and its relatives. It handles various special cases for Windows and VMS systems, and it deals with the possibility that the script name was passed to Perl on the command line rather than triggering Perl via a #!
header (and of course it's shorter too).
Variations on FindBin
are available from CPAN. Two worth mentioning are FindBin::Real
and FindBin::libs. FindBin::Real
is a functional module with the same features as FindBin
but with the variables replaced with subroutines. It handles cases where modules in different directories both attempt to determine the script path, and so is preferred for use in modules. (In typical use, only the script itself needs to find out where it came from.) FindBin::libs
takes note of the fact that FindBin
is mostly used to locate library directories and combines the two into a single module that first locates library directories relative to the script and then adds them to @INC
.
One way to check if a given module is available is to look in the %INC
hash to see if the module is present. We can avoid fatal errors by checking for each module and using it only if already loaded. In the following example, if Module1
is loaded, then we use it, otherwise we look to see if Module2
is loaded:
if ($INC{'Module1'}) {
# use some functions from Module1
} elsif ($INC{'Module2'}) {
# use some functions from Module2
}
However, the simplest way would be to try to load it using require
. Since this ordinarily produces a fatal error, we use an eval
to protect the program from errors:
warn "GD module not available" unless eval {require GD; 1};
In the event that the GD
module, which is a Perl interface to the libgd
graphics library, is not available, eval
returns undef
, and the warning is emitted. If it does exist, the 1
at the end of the eval
is returned, suppressing the warning. This gives us a way of optionally loading modules if they are present but continuing without them otherwise, so we can enable optional functionality if they are present. In this case, we can generate graphical output if GD
is present or resort to text otherwise. The special variable $@
holds the syntax error message that is generated by the last eval
function.
Note that a serious problem arises with the preceding approach if require
is replaced with use
. The reason is that eval
is a run-time function, whereas use
is executed at compile time. So, use GD
would be executed before anything else, generating a fatal error if the GD module was not available. To solve this problem, simply enclose the whole thing in a BEGIN
block, making sure that the whole block is executed at compile time:
BEGIN {
foreach ('GD', 'CGI', 'Apache::Session') {
warn "$_ not available" unless eval "use $_; 1";
}
}
We can find out which library module packages are installed and available for use with the ExtUtils::Installed
module. This works not by scanning @INC
for files ending in .pm
, but by analyzing the .packlist
files left by module distributions during the installation process. Not unsurprisingly, this may take the module a few moments to complete, especially on a large and heavily extended system, but it is a lot faster than searching the file system. Only additional modules are entered here, however—modules supplied as standard with Perl are not included.
Scanning .packlist
files allows the ExtUtils::Installed
module to produce more detailed information, for example, the list of files that should be present for a given module package. Conversely, this means that it does not deal with modules that are not installed but are simply pointed to by a modified @INC
array. This is one good reason to create properly installable modules, which we discuss later in the chapter. The resulting list is of installed module packages, not modules, and the standard Perl library is collected under the name Perl
, so on a standard Perl installation we may expect to see only Perl
returned from this module.
To use the module, we first create an ExtUtils::Installed
object with the new
method:
use ExtUtils::Installed;
$inst = ExtUtils::Installed->new();
On a Unix-based system, this creates an installation object that contains the details of all the .packlist
files on the system, as determined by the contents of @INC
. If we have modules present in a directory outside the normal directories contained in @INC
, then we can include the extra directory by modifying @INC
before we create the installation object, as we saw at the start of the previous section.
Once the installation object is created, we can list all available modules in alphabetical order with the modules
method. For example, this very short script simply lists all installed modules:
# list all installed modules;
print join "
", $inst->modules();
On a standard Perl installation this will produce just the word "Perl", or possibly Perl plus one or two other modules in vendor-supplied installations, as standard library modules are not listed individually A more established Perl installation with additional packages installed might produce something like this:
Apache::DBI
Apache::Session
Archive::Tar
CGI
CPAN
CPAN::WAIT
Compress::Zlib
Curses
DBI::FAQ
Date::Manip
Devel::Leak
Devel::Symdump
Digest::MD5
...
The ExtUtils::Installed
module does far more than simply list installed module packages, however. It provides the basics of library package management by providing us with the ability to list the files and directories that each module distribution created when it was installed, and to verify that list against what is currently present. In addition to new
and modules
we saw in action earlier, ExtUtils::Installed
provides six other methods that are listed in Table 9-2.
Table 9-2. ExtUtils::Installed Methods
The ability to distinguish file types is a feature of the extended .packlist
format in any recent version of Perl. Note that not every installed module yet provides a packing list that supplies this extra information, so many modules group all their installed files and directories under prog
(the assumed default) and nothing under doc
. To get a more accurate and reliable split between program and documentation files, we can use additional paths such as /usr/lib/perl5/man
as the third and greater parameters.
As a more complete example of how we can use the features of ExtUtils::Installed
, here is a short script to run on Unix that lists every installed module distribution, the files that it contains, and the version of the package, complete with a verification check:
#!/usr/bin/perl
# installedfiles.pl
use warnings;
use strict;
use ExtUtils::Installed;
my $inst = new ExtUtils::Installed;
foreach my $package ($inst->modules) {
my $valid = $inst->validate($package)?"Failed":"OK";
my $version = $inst->version($package);
$version = 'UNDEFINED' unless defined $version;
print "
--- $package v$version [$valid] ---
";
if (my @source = $inst->files($package, 'prog')) {
print " ", join "
", @source;
}
if (my @docs = $inst->files($package, 'doc')) {
print "
", join "
", @docs;
}
}
Modules can be very large, frequently because they themselves use other large modules. It can, therefore, be convenient to postpone loading them until they are actually needed. This allows a program to start faster, and it also allows us to avoid loading a module at all if none of its features are actually used.
We can achieve this objective with the autouse
pragmatic module, which we can use in place of a conventional use
statement to delay loading the module. To use it, we need to specify the name of the module, followed by a =>
(since that is more legible than a comma) and a list of functions:
use autouse 'Module' => qw(sub1 sub2 Module::sub3);
This will predeclare the named functions, in the current package if not qualified with a package name, and trigger the loading of the module when any of the named functions are called:
sub1("This causes the module to be loaded");
We can also supply a prototype for the subroutine declaration, as in
use autouse 'Module' => 'sub3($$@)';
However, there is no way for this prototype to be checked against the real subroutine since it has not been loaded, so if it is wrong we will not find out until we attempt to run the program.
There are two important caveats to bear in mind when using the autouse
pragma. First, the module will only be loaded when one of the functions named on the autouse
line is seen. Attempting to call another function in the module, even if it is explicitly called with a package name, will cause a run-time error unless the module has already been loaded. For instance, this does not delay loading the Getopt::Long
module:
use autouse 'Getopt::Long';
# ERROR: ''Getopt::Long' is not loaded, so 'GetOptions' is unavailable
GetOptions(option =>$verbose);
But this does:
use autouse 'Getopt::Long' => 'GetOptions';
# OK, 'GetOptions' triggers load of 'Getopt::Long'
GetOptions(option =>$verbose);
Second, autouse
only works for modules that use the default import
method provided by the Exporter
module (see Chapter 10). Modules that provide their own import
method such as the CGI
module cannot be used this way, unless they in turn inherit from Exporter
. Any module that defines an export tag like :tagname
falls into this category. Such modules frequently provide their own specialized loading techniques instead, CGI.pm
being one good example.
A significant problem when using autouse
is that initialization of modules that have been autoused
does not occur until they are needed at run time. BEGIN
blocks are not executed, nor are symbols imported. This can cause significant problems, as well as hiding syntax errors that would otherwise be found at compile time. For this reason, it is smart to include modules directly for development and testing purposes and to only use autouse
in the production version (though we must still test that, we can at least eliminate autouse
as a cause of problems in the debugging phase).
Alternatively, use a debug flag to switch between the two. Perl's -s
option and the if
pragma make a handy way to achieve this:
#!/usr/bin/perl -s -w
# debugorautouse.pl
use strict;
use vars '$debug';
use if $debug, 'File::Basename' => 'basename';
use if !$debug, autouse => 'File::Basename' => 'basename';
print "Before: ",join(",",keys %INC),"
";
my $prog=basename($0);
print "After : ",join(",",keys %INC),"
";
If we execute this program with no arguments, we will see that File::Basename
is not loaded before the call to basename
but is loaded afterward. If on the other hand we run the program as
> debugorautouse.pl -debug
then the -s
option that is specified on the first line causes the global variable $debug
to be set. In turn this cases the module to be loaded immediately, so File::Basename
appears in the list both before and after the call is made. If we want to actually process command-line arguments, the -s
option is not so convenient, but we can as easily test for a suitable invented environment variable like $ENV{PLEASE_DEBUG_MY_PERL}
.
If a module is already present when an autouse
declaration is seen, it is translated directly into the equivalent use
statement. For example:
use Module;
use autouse 'Module' => qw(sub1 sub2);
is the same as
use Module qw(sub1 sub2);
This means that it does no harm to attempt to autouse
a module that is already loaded (something that might commonly happen inside a module, which has no idea what is already loaded), but conversely the autouse
provides no benefit.
The autouse
module is an attempt to provide load-on-demand based on the requirements of the user. The AUTOLOAD
subroutine and the AutoLoader
and Selfloader
modules also provide us with the ability to load modules and parts of modules on demand, but as part of the module's design. See Chapter 10 for more details.
Over the course of this chapter, we examined what Perl modules are and how they related to files and packages. We started out with an examination of the do, require
, and use
statements and the differences between them. We then went on to look at the import mechanism provided by Perl and how it can be used to add definitions from modules that we use. We considered the difference between functional and pragmatic modules and found that pragmatic modules turn out to be very much like their functional brethren.
Perl searches for modules using the special array variable @INC
and places the details of what was found where in the corresponding special hash variable %INC
. We saw how to manipulate @INC
in various ways, including directly, and the use lib
pragma. We also found out how to ask Perl what modules have been added to the library that did not originally come with Perl.
Finally, we looked at delaying the loading of modules until they are needed with the autouse
pragma. This has powerful possibilities for limiting the impact of a Perl application on memory, but not without drawbacks, notably that if a dependent module is not present we will not find out at compile time. Instead, we will only know the first time the application tries to use something from it, which could be a considerable time after it started.