We have already seen how modules work from the user's perspective through the do, require
, and use
statements. We have also seen the relationship between modules and packages. In this chapter, we examine the internals of implementing modules.
In order for modules to be easily reusable, they need to be well behaved. That means not defining variables and subroutines outside their own package unless explicitly asked to do so. It also means not allowing external definitions to be made unless the design of the module permits it. Exporting definitions from one package into another allows them to be used without prefixing the name of the original package, but it also runs the risk of a namespace collision, so both the module and the application need to be able to cooperate, to control what happens. They can do this through the import mechanism, which defines the interface between the module and the application that uses it.
At the application end, we specify our requirements with the use
or require
statements, with which we can pass a list of symbols (often, but not necessarily, subroutine names). Conversely, at the module end, we define an import
subroutine to control how we respond to import
(or, from our point of view, export
) requests. The Exporter
module provides one such import
subroutine that handles most common cases for us. Either way, the interface defined through the import mechanism abstracts the actual module code, making it easier to reuse the module, and minimizing the chances of an application breaking if we made changes to the module.
Perl provides a number of special blocks that can be used in any Perl code but which are particularly useful for packages, and accordingly we spend some time discussing them in this chapter. The BEGIN, END, INIT
, and CHECK
blocks allow a module to define initialization and cleanup code to be automatically executed at key points during the lifetime of the module. The AUTOLOAD
subroutine permits a package to react to unknown subroutine calls and stand in for them.
A package
declaration is the naming of a new namespace in which further declarations of variables and subroutines are placed. A module is simply a library file that contains such a declaration and an associated collection of subroutines and variables. The link between package name and module file would therefore appear to be a strong one, but this is not necessarily true. As we saw in the last chapter, the module that is loaded corresponds to the named package, but this does not imply that the module actually defines anything in that package. As an example, many of Perl's pragmatic modules are purely concerned with compile-time semantics and do not contribute anything new to the symbol table.
In fact, a module doesn't have to include a package
declaration at all. Any constructs it creates will simply be put into whatever package was in effect at the time it was loaded—main
, if none has been declared yet. However, this is unusual. By including a package definition we are able to use many different modules without worrying about clashes between similarly named definitions. The notable exception are libraries that include several modules from one master module; the master defines the namespace and then loads all its children, none of which define a namespace and so place their definitions into the parent's namespace.
Similarly, the package
declarations in a module don't have to correspond to the name of the module supplied to the use
statement—they just usually do. In some cases, we might use a different package or define symbols in more than one package at the same time. Since a module file usually only contains declarations of subroutines and variables, rather than code that actually does something, executing it has no visible effect. However, subroutines and package variables are added to the symbol table, usually under the namespace defined by the package declaration.
Whatever else it does, in order for a module to be loaded successfully by either require
or use
, it must return a true value. Unless the module actually contains a return
statement outside a subroutine definition, this must be the last statement in the file. Since a typical module contains mainly subroutine definitions (which don't return anything), we usually need to add an explicit return value to let Perl know that the module is happy. We can also add code for initialization that does do something actively and have that return a conditional value. This means we can, for example, programmatically have a module fail compilation if, say, an essential resource like a configuration file that it needs is missing.
Most modules do not have any initialization code to return a value, so in general we satisfy Perl by appending a 1
—or in fact any true value—to the end of the module file. As the last statement in the file, this is returned to the use
or require
that triggered the loading of the module file, which tests the value for truth to determine whether or not the module loaded successfully. Taking all this together, the general form of a module file is simply
package My::Module;
... use other modules ...
... declare global variables ...
... define subroutines ...
1;
Note that here "global variables" can mean either lexically scoped file globals (which are global to the file but not accessible from outside it) or package variables that are in the namespace of the package but accessible from elsewhere by qualifying their name with a prefix of the package name.
Although Perl does not force the file name to follow the package name, this module would most likely be called Module.pm
and placed in a directory called My
, which in turn can be located anywhere that Perl is told to look for modules. This can be any of the paths in @INC
, in our own personal location provided to Perl at run time through the use lib
pragma, the -I
option, or the PERL5LIB
environment variable, or even added by other Perl modules.
The package
directive changes the default namespace for variables and subroutine declarations, but we are still free to define our own fully qualified definitions if we choose. For instance, rather than creating a module file containing
package My::Module;
sub mysub {
return "Eep!
";
}
1;
we could, with equal effect (but losing some maintainability), declare the subroutine to be in the package explicitly:
sub My::Module::mysub {
return "Eep!
";
}
1;
It isn't very likely that we would do this in reality—if the subroutine was copied to a different source file, it would need to be renamed. It has possibilities if we are generating subroutines on the fly, a subject we will cover in more detail when we discuss autoloading, but otherwise a package
declaration is far more convenient. The same goes for our
and use vars
declarations, which are simply shorthand that use the package declaration to omit the full variable name.
It can be occasionally useful for a subroutine to know the name of the package in which it is defined. Since this is a compile-time issue (package declarations are lexical, even though they affect run time scope), we could manually copy the package name from the top of the module or whichever internal package declaration the subroutine falls under.
However, this is prone to failure if the package name changes at any point. This is a more serious problem than it might at first appear because it will not necessarily lead to a syntax error.
To avoid this kind of problem, we should avoid ever naming the package explicitly except in the package declaration itself. Within the code, we can instead use the special bareword token __PACKAGE__
like so:
sub self_aware_sub {
print "I am in the ",__PACKAGE__," package.
";
}
As a more expressive but less functional example, the following series of package declarations shows how the value produced by __PACKAGE__
changes if more than one package is present in a given file:
package My::Module;
print __PACKAGE__,"
";
package My::Module::Heavy;
print __PACKAGE__,"
";
package My::Module::Light;
print __PACKAGE__,"
";
package A::Completely::Different::Package;
print __PACKAGE__,"
";
Each time the __PACKAGE__ token
is printed out, Perl expands it into the current package name, producing My::Module
, then My::Module::Heavy
, and so on.
When Perl loads and compiles a file containing this token, the interpreter first scans and substitutes the real package name for any instances of __PACKAGE__
it finds before proceeding to the compilation stage. This avoids any potential breakages if the package name should change.
The Symbol
module provides subroutines for creating and manipulating variable names with respect to packages without dealing with the package name directly, notably the gensym
and qualify
subroutines.
The gensym
subroutine generates and returns a reference to a fully anonymous typeglob—that is, a typeglob that does not have an entry anywhere in any symbol table. We can use the anonymous typeglob as we like, for example, as a filehandle (though IO::Handle
does this better in these more enlightened days, and, as a point of fact, uses gensym
underneath). It takes no arguments and just returns the reference:
use Symbol;
my $globref = gensym;
open ($globref, $filename);
...
More useful is the qualify
subroutine, which provides a quick and convenient way to generate fully qualified names (and therefore symbolic references) for variables from unqualified ones. It operates on strings only, and with one argument it generates a name in the current package. For example:
#!/usr/bin/perl
# symbol1.pl
use warnings;
use Symbol;
my $fqname = qualify('scalar'),
$$fqname = "Hello World
";
print $scalar; # produces 'Hello World'
Since this is a simple script without a package declaration, the variable created here is actually called $main::scalar
. If we supply a package name as a second argument to qualify
, it places the variable into that package instead.
#!/usr/bin/perl
# symbol2.pl
use warnings;
use Symbol;
my $fqname = qualify('scalar','My::Module'),
$$fqname = "Hello World
";
print $My::Module::scalar;
In both cases, qualify
will only modify the name of the variable passed to it if it is not already qualified. It will correctly qualify special variables and the standard filehandles like STDIN
into the main
package, since these variables always exist in main
, wherever they are used. This makes it a safer and simpler way than trying to make sure our symbolic references are correct and in order when we are assembling them from strings.
Unfortunately, qualify
is not very useful if we have strict references enabled via use strict
, since these are symbolic references. Instead, we can use qualify_to_ref
, which takes a symbolic name and turns it into a reference for us, using the same rules as qualify
to determine the package name:
#!/usr/bin/perl
# symbol3.pl
use warnings;
use strict;
use Symbol;
my $fqref = qualify_to_ref('scalar','My::Module'),
$$fqref ="Hello World
";
print $My::Module::scalar;
All three of these examples work but produce a warning from Perl that the variable main::scalar
(or My::Module::scalar
) is only used once, which is true. Perl doesn't see that we defined the variable name through a reference, so it (correctly) points out that we appear to have used a variable we haven't defined. The correct thing to do would be to declare the variable so we can use it without complaint, as this modified example, complete with embedded package, illustrates:
#!/usr/bin/perl
# symbol4.pl
use warnings;
use strict;
use Symbol;
my $fqref = qualify_to_ref('scalar','My::Module'),
$$fqref ="Hello World
";
print My::Module::get_scalar();
package My::Module;
our $scalar; # provide access to scalar defined above
sub get_scalar {
return $scalar;
}
While it is rare that we would want to remove a package during the course of a program's execution, it can be done by removing all traces of the package's namespace from the symbol table hierarchy. One reason to do this might be to free up the memory used by the package variables and subroutines of a module no longer required by an application. For example, to delete the My::Module
package, we could write
my $table = *{'My::Module::'}{'HASH'};
undef %$table;
my $parent = *{'My::'}{'HASH'};
my $success = delete $parent->{'Module::'};
This is more than a little hairy, but it basically boils down to deleting the entries of the symbol table for My::Module
and removing the Module
namespace entry from the My
namespace. We delete the hash explicitly because we store the result of the delete
in a variable, and thus the symbol table too. This is because Perl cannot reuse the memory allocated by it or the references contained in it, while something still holds a reference to it. Deleting the actual table means that delete
returns an empty hash on success, which is still good for a Boolean test but avoids trailing a complete and unrecycled symbol table along with it.
Fortunately, the Symbol
module provides a delete_package
function that does much the same thing but hides the gory details. It also allows us more freedom as to how we specify the package name (we don't need the trailing semicolons, for instance, and it works on any package). To use it, we need to import it specifically, since it is not imported by default:
use Symbol qw(delete_package);
...
print "Deleted!
" if delete_package('My::Module'),
The return value from delete_package
is undefined if the delete failed, or a reference is made to the (now empty) namespace.
If we wanted to create a package that we could remove programmatically, we could do so by combining delete_package
with an unimport
subroutine; see "Importing and Exporting" later in the chapter for an example.
Perl defines four different kinds of special blocks that are executed at different points during the compile or run phases. The most useful of these is BEGIN
, which allows us to compile and execute code placed in a file before the main compilation phase is entered. At the other end of the application's life, the END
block is called just as the program exits. We can also define CHECK
and INIT
blocks, which are invoked at the end of the compilation phase and just prior to the execution phase respectively, though these are considerably rarer.
All four blocks look and behave like subroutines, only without the leading sub
. Like signal handlers, they are never called directly by code but directly by the interpreter when it passes from one phase of existence to another. The distinction between the block types is simply that each is executed at a different phase transition. The precise order is
BEGIN
(compile phase)
CHECK
INIT
(run phase)
END
Before we examine each block type in more detail, here is a short program that demonstrates all four blocks in use and also shows how they relate to the main code and a __DIE__
signal handler:
#!/usr/bin/perl
# blocks.pl
use warnings;
use strict;
$SIG{__DIE__} = sub {
print "Et tu Brute?
";
};
print "It's alive!
";
die "Sudden death!
";
BEGIN {
print "BEGIN
";
}
END {
print "END
";
}
INIT {
print "INIT
"
}
CHECK {
print "CHECK
"
}
When run, this program prints out
BEGIN
CHECK
INIT
It's alive!
Et tu Brute?
Sudden death!
END
Note that in Perl versions before 5.6, CHECK
blocks are ignored entirely, so we would not see the CHECK
line. Apart from this, the program would run perfectly. Of course, if the CHECK
block needs to perform vital functions, we may have a problem; therefore CHECK
blocks are best used for checks that are better made after compilation but which can also be made, less efficiently perhaps, at run time too.
We can define multiple instances of each block; each one is executed in order, with BEGIN
and INIT
blocks executing in the order in which they are defined (top to bottom) and CHECK
and END
blocks executed in reverse order of definition (bottom to top). The logic for END
and CHECK
blocks executing in reverse is clearer once their purpose is understood. For example, BEGIN
blocks allow modules to initialize themselves and may be potentially dependent upon the initialization of prior modules. Corresponding END
blocks are executed in the reverse order to allow dependent modules to free their resources before earlier modules free the resources on which they rely—last in, first out.
As an example, consider a network connection to a remote application—we might open a connection in one BEGIN
block and start a new session in another, possibly in a different module. When the application ends, we need to stop the session and then close the connection—the reverse order. The order in which the modules are loaded means the END
blocks will execute in the correct order automatically. The new CHECK
block has a similar symmetry with BEGIN
, but around the compilation phase only, not the whole lifetime of the application. Likewise, INIT
pairs with END
across the run-time phase.
Additional blocks read in by do
or require
are simply added to the respective list at the time they are defined. Then, if we have a BEGIN
and END
block and we require
a module that also has a BEGIN
and END
block, our BEGIN
block is executed first, followed by the module's BEGIN
block. At the end of the script, the module's END
block is called first, then ours. However, if we include a module with use
rather than require
, the order of BEGIN
blocks is determined by the order of the use
relative to our BEGIN
block and any other use
statements. This is because use
creates a BEGIN
block of its own, as we have already seen.
Blocks nest too—a BEGIN
inside a BEGIN
will execute during the compilation phase of the outer block. A chain of use
statements, one module including the next at compile time, does this implicitly, and similarly chains the END
blocks (if any).
If we need to perform initialization within a module before it is used, we can place code inside the source file to perform whatever tasks we need to do, for example, loading a configuration file:
package My::Module;
return initialize();
sub initialize {
...
}
... other sub and var declarations ...
This module doesn't need a 1
at the end because its success or failure is returned explicitly. However, the initialization only takes place once the module starts to execute; we can't predefine anything before defining critical subroutines. A BEGIN
block solves this problem. It forces execution of a module's initialization code before the rest of it compiles.
As an example, here is a module that computes a list of variables to export at compile time and exports them before the code that uses the module compiles. For simplicity, we have used a local hash to store the variable definitions and kept it to scalars, but it is easily extensible:
# My/SymbolExporter.pm
package My::SymbolExporter;
use strict;
BEGIN {
use vars '@SYMBOLS';
# temporary local configuration - we could read from a file too
my %conf = (
e => 'mc2',
time => 'money',
party => 'a good time',
);
sub initialize {
no strict 'refs';
foreach (keys %conf) {
# define variable with typeglob
*{__PACKAGE__.'::'.$_} = $conf{$_};
# add variable (with leading '$') to export list
push @SYMBOLS, "$$_";
}
return 1;
}
return undef unless initialize;
}
use Exporter;
our @ISA = qw(Exporter);
our @EXPORT = ('@SYMBOLS',@SYMBOLS);
Ordinarily, we'd use the Exporter
module or an import
method to deal with this sort of problem, but these are really just extensions to the basic BEGIN
block. Just to prove it works, here is a script that uses this module and prints out the variables it defines:
#!/usr/bin/perl
# symbolexportertest.pl
use warnings;
use strict;
use My::SymbolExporter;
print "Defined: @SYMBOLS
";
print "e = $e
";
print "time = $time
";
print "party = '$party'
";
Another use of BEGIN
blocks is to preconfigure a module before we use it. For example, the AnyDBM_File
module allows us to reconfigure its @ISA
array by writing something like the following:
BEGIN {
@AnyDBM_File::ISA = qw(GDBM_File SDBM_File);
}
use AnyDBM_File;
Inside the module, the code simply checks to see if the variable is defined before supplying a default definition:
our @ISA = qw(NDBM_File DB_File GDBM_File SDBM_File ODBM_File) unless @ISA;
It is vital that we put our definition in a BEGIN
block so that it is executed and takes effect before the use
statement is processed. Without this, the implicit BEGIN
block of the use
statement would cause the module to be loaded before our definition is established despite the fact it appears first in the source.
The opposite of BEGIN
blocks are END
blocks. These are called just as Perl is about to exit (even after a __DIE__
handler) and allow a module to perform closing duties like cleaning up temporary files or shutting down network connections cleanly:
END {
unlink $tempfile;
shutdown $socket, 2;
}
The value that the program is going to exit with is already set in the special variable $?
when the END
blocks are processed, so we can modify $?
to change it if we choose. However, END
blocks are also not caught if we terminate on a signal and (obviously) not if we use exec
to replace the application with a new one.
The CHECK
and INIT
blocks are considerably rarer than BEGIN
and END
, but they are still occasionally useful.
CHECK
blocks execute in reverse order just after the compilation phase ends and correspond to the END
blocks, which run at the end of the run phase. Their purpose is to perform any kind of checking that might be required of the compiled source before proceeding with the run phase. (However, they are not available in Perl prior to version 5.6.)
# Perl > = 5.6.0 for CHECK blocks
use 5.6.0;
# check that conditional compilation found at least one implementation
CHECK {
die "No platform recognized" unless
defined &Unixsub or
defined &win32sub or
defined &macsub or
defined &os2sub;
}
This block will be called as soon as Perl has finished compiling all the main code (and after all BEGIN
blocks have been executed), so it is the ideal point to check for the existence of required entities before progressing to the execution stage. By placing the code in a CHECK
block rather than in the module's main source, we give it a chance to object before other modules—which may be used before it—get a chance to run.
The INIT
blocks execute just before the run phase and just after the compile phase—CHECK
blocks are also included if any are defined. They execute in order of definition and correspond to BEGIN
blocks, which run just before the compile phase. Their purpose is to initialize variables and data structures before the main run phase starts:
# establish a package variable for all modules
INIT {
$My::Module::start_time = time;
}
Both block types have little effect over simply placing code at the top of a file when only one of either type exists. However, if several modules define their own CHECK
and INIT
blocks, Perl will queue them up and run through them all before commencing execution of the main application code.
Normally when we try to call a nonexistent subroutine, Perl generates a syntax error, if possible at compile time. However, by defining a special subroutine called AUTOLOAD
, we can intercept nonexistent calls and deal with them in our own way at run time.
Autoloading is a powerful aspect of Perl. When used wisely, it provides us with some very handy techniques, such as the ability to write one subroutine that handles many different cases and masquerade it as many subroutines each handling a single case. This is a great technique for allowing a module to be powerful and flexible without the expense of creating many possibly redundant routines with a corresponding cost in memory. We can also, with deft usage of the eval
and sub
keywords, generate new subroutines on demand.
The cost of autoloading is twofold, however: first, calling a subroutine not yet compiled will incur a speed penalty at that point, since Perl must call the AUTOLOAD
subroutine to resolve the call. Second, it sidesteps the normal compile-time checks for subroutine existence, since there is no way for Perl to know if the subroutine name is valid or not until an attempt is made to call it during execution.
Several modules in the standard library take advantage of autoloading to delay the compilation of subroutines until the moment they are required. The autouse
module introduced in the last chapter even provides a simple generic interface that delays loading an entire module until one of its subroutines is called. However, there is no granularity: when the module is loaded, it is all loaded at once. The AutoSplit
and AutoLoader
modules solve this problem. AutoSplit
carves up a module file into separate subroutines, which the AutoLoader
module can subsequently read and compile at the moment each routine is required. These modules are typically used during the distribution and installation of modules, since the extraction of subroutines from the original source by AutoSplit
is a manual process. The SelfLoader
module provides a simpler but easier solution. It allows us to store code as text inside the module file, compiling it at the time it is needed. While not as efficient as AutoLoader
, which does not even load the subroutine code if it doesn't need it, it does not need any additional processing steps to work.
Autoloading is automatically enabled in any package in which we define a subroutine called AUTOLOAD
. This subroutine will automatically intercept all attempts to call nonexistent subroutines and will receive the arguments for each nonexistent subroutine. At the same time, the name of the missing subroutine is placed in the special package variable $AUTOLOAD
. To illustrate, here is a short example that intercepts nonexistent subroutine calls and prints out the name and arguments passed:
#!/usr/bin/perl
# autoload.pl
use warnings;
use strict;
sub AUTOLOAD {
our $AUTOLOAD; # "use vars '$AUTOLOAD'" for Perl < 5.6
$" = ',';
print "You called '$AUTOLOAD(@_)'
";
}
fee('fie','foe','fum'),
testing(1,2,3);
When run, this script should produce
You called 'main::fee(fie,foe,fum)'
You called 'main::testing(1,2,3)'
We use our
to declare interest in the package's $AUTOLOAD
variable (Perl prior to version 5.6 needs to use use vars
instead). Since only the AUTOLOAD
subroutine needs to know the value of $AUTOLOAD
, we place the our
declaration inside the subroutine to define a temporary alias.
In general, creating an autoloader stymies compile-time checkers. But interestingly, defining a prototype for the autoloader is perfectly valid and can help eliminate subroutine calls that are simply a result of mistyping a call to a real subroutine. If all the subroutine calls we want to intercept have the same prototype, then calls whose parameters do not match the prototype will still fail at compile time, since Perl knows that the AUTOLOAD
subroutine is not interested in handling them. In the preceding example, both example calls use three scalar arguments, so a prototype of ($$$)
would be appropriate. Of course, a mistyped subroutine call can still match the prototype, so this does not completely save us from mistakes.
We can use AUTOLOAD
subroutines in a variety of ways that break down into one of two general approaches: use the AUTOLOAD
subroutine as a substitute for a collection of subroutines, or use the AUTOLOAD
subroutine to define missing subroutines on the fly.
Using an AUTOLOAD Subroutine As a Substitute
The first and simplest use of the autoloader is simply to stand in for another subroutine or collection of similar subroutines. We can define the interface to a module in terms of these other calls but actually implement them in the AUTOLOAD
subroutine. The disadvantage of this is that it takes Perl slightly longer to carry out the redirection to the autoloader subroutine (although conversely the compile time is faster). The advantage is that we can replace potentially hundreds of subroutine definitions with just one. This has benefits in maintainability as well as startup time.
Here is a simple example that illustrates the general technique with a few simple statistical calculations that sum, average, and find the biggest and smallest of a list of supplied numeric values:
#!/usr/bin/perl
# autostat.pl
use warnings;
use strict;
use Carp;
sub AUTOLOAD {
our $AUTOLOAD;
my $result;
SWITCH: foreach ($AUTOLOAD) {
/sum/ and do {
$result = 0;
map { $result+= $_ } @_;
last;
};
/average/ and do {
$result = 0;
map { $result+= $_ } @_;
$result/=scalar(@_);
last;
};
/biggest/ and do {
$result = shift;
map { $result = ($_ > $result)?$_:$result } @_;
last;
};
/smallest/ and do {
$result = shift;
map { $result = ($_ < $result)?$_:$result } @_;
last;
}
}
croak "Undefined subroutine $AUTOLOAD called" unless defined $result;
return $result;
}
my @values = (1,4,9,16,25,36);
print "Sum: ",sum(@values),"
";
print "Average: ",average(@values),"
";
print "Biggest: ",biggest(@values),"
";
print "Smallest: ",smallest(@values),"
";
print "Oddest: ",oddest(@values),"
";
This AUTOLOAD
subroutine supports four different statistical operations and masquerades under four different names. If we call any of these names, then the autoloader performs the requested calculation and returns the result. If we call any other name, it croak
s and exits. We use croak
from the Carp
module, because we want to return an error for the place from which the AUTOLOAD
subroutine was called, as that is where the error really is.
This script also illustrates the problem with autoloading—errors in subroutine names are not caught until run time. With real subroutines, the call to oddest
would be caught at compile time. With this script, it isn't caught until the autoloader is actually called and discovers that it isn't a name that it recognizes.
The preceding example demonstrates the general principle of substituting for a collection of other subroutines, but it doesn't really provide any benefit; it would be as easy to define the subroutines individually (or indeed just get them from List::Util
, but that's beside the point), as the implementations are separate within the subroutine. However, we can be more creative with how we name subroutines. For example, we can use an autoloader to recognize and support the prefix print_
for each operation. Here is a modified version of the previous example that handles both the original four operations and four new variants that print out the result as well:
#!/usr/bin/perl
# printstat.pl
use warnings;
use strict;
use Carp;
sub AUTOLOAD {
our $AUTOLOAD;
my $subname; # get the subroutine name
$AUTOLOAD =˜/([^:]+)$/ and $subname = $1;
my $print; # detect the 'print_' prefix
$subname =˜s/^print_// and $print = 1;
my $result;
SWITCH: foreach ($subname) {
/^sum$/ and do {
$result = 0;
map { $result+= $_ } @_;
last;
};
/^average$/ and do {
$result = 0;
map { $result+= $_ } @_;
$result/= scalar(@_);
last;
};
/^biggest$/ and do {
$result = shift;
map { $result = ($_>$result)?$_:$result } @_;
last;
};
/^smallest$/ and do {
$result = shift;
map { $result = ($_<$result)?$_:$result } @_;
last;
};
}
croak "Undefined subroutine $subname called" unless defined $result;
print ucfirst($subname),": $result
" if $print;
return $result;
}
my @values = (1,4,9,16,25,36);
print_sum(@values);
print_average(@values);
print_biggest(@values);
print_smallest(@values);
The subroutine name actually passed in the $AUTOLOAD
variable contains the package prefix, main::
, as well. In the previous example, we did not check from the start of the name, so this did not matter. Here we do care though, so we strip all possible package prefixes by extracting from the end of the name as much text as we can, not including a semicolon. This gives us the unqualified subroutine name.
Now we can detect and remove the print_
prefix. We take advantage of the fact that we are left with just the subroutine name to anchor the regular expressions at the start and end for a little extra efficiency—the first example worked only because we did not use anchors and none of our subroutine names contained another. If we wanted to be even more inventive, we could remove the trailing $
anchors and use a trailing suffix in the subroutine name to further adapt each function.
Defining Subroutines on the Fly
The run-time performance penalty of using the autoloader can be mitigated by having the autoloader define a new subroutine to perform the requested task, instead of handling the job itself. Any subsequent calls will now pass directly to the new subroutine and not the autoloader.
As an example, here is a simple autoloader that defines subroutines to return HTML syntax, much in the way that the CGI
module can. It isn't nearly as feature-rich as that module, but it is a lot smaller too:
#!/usr/bin/perl
# autofly.pl
use warnings;
use strict;
sub AUTOLOAD {
our $AUTOLOAD;
my $tag;
$AUTOLOAD =˜ /([^:]+)$/ and $tag = $1;
SWITCH: foreach ($tag) {
/^start_(.*)/ and do {
eval "sub $tag { return "<$1>@_" }";
last;
};
/^end_(.*)/ and do {
eval "sub $tag { return "</$1>" }";
last;
};
# note the escaping with of @_ below so it is not
# expanded before the subroutine is defined
eval "sub $tag { return "<$tag>@_</$tag>" }";
}
no strict 'refs';
&$tag; # pass @_ directly for efficiency
}
# generate a quick HTML document
print html(
head(title('Autoloading Demo')),
body(ul(
start_li('First'),
start_li('Second'),
start_li('Third'),
))
);
This autoloader supports automatic tag completion, as well as generating the start and end of tags if start_
or end_
is prefixed to the subroutine name. It works by defining a subroutine to generate the new tag, then calling it. The first time start_li
is called, the autoloader generates a new subroutine called start_li
, then calls it. The second time start_li
is called, the subroutine already exists, so Perl calls it directly, and the autoloader is not involved.
A little deftness with interpolation is required for the subroutines to be defined correctly. We want the tag name itself interpolated, both as the subroutine name and inside the returned string, but we want interpolation of the passed arguments delayed until the subroutine is actually called. To achieve that, we put double quotes around the returned string but escape both them and @_
so that they are not interpreted when the subroutine is defined—instead they only become active when it is actually called.
A variation on the theme of delaying the definition of subroutines and methods when they are first called is to retrieve their definition from somewhere else and compile it when they are first called. For instance, we may have a large and complex module with many features, of which we may only actually use some. In order to avoid compiling all the subroutines redundantly, we can put aside compiling them until they are called. If they are never called, we need never define them.
The essence of this approach is to define a subroutine initially as a stub only, so that the subroutine is defined in the symbol table but does not as yet implement the feature it is intended to provide. The stub does not contain much code, so it is quick to compile and does not occupy much memory. When the stub is actually called, it compiles and replaces itself with the real subroutine. Here is a short program that shows one way to do this:
#!/usr/bin/perl
# autodefine.pl
use warnings;
use strict;
sub my_subroutine {
print "Defining sub...
";
# uncomment next line and remove 'no warnings' for Perl < 5.6
# local $^W = 0;
eval 'no warnings; sub my_subroutine { print "Autodefined!
"; }';
&my_subroutine;
}
my_subroutine; # calls autoloader
my_subroutine; # calls defined subroutine
Running this program produces
Defining sub...
Autodefined!
Autodefined!
A variant of this approach would be to store all the subroutine definitions in a different file, or after a __DATA__
token, and read the subroutine code from there, which is the approach taken by SelfLoader
. Alternatively, we can create a typeglob alias to an evaluated anonymous subroutine, with equal effect to the preceding example:
#!/usr/bin/perl
# globdefine.pl
use warnings;
use strict;
sub my_subroutine {
print "Defining sub...
";
no warnings;
# remove above and add the following for Perl < 5.6
# local $^W = 0;
*my_subroutine = eval {
sub {
print "Autodefined!
";
}
};
&my_subroutine;
}
my_subroutine;
my_subroutine;
In both cases we suppress the redefinition warning by switching off warnings locally with no warnings
, or by locally clearing $^W
. In this case, we know we want to redefine the subroutine, so we don't need Perl telling us about it.
The drawback of this approach compared to defining an AUTOLOAD
subroutine is that we need to define a stub for each subroutine we want to delay compilation for. The advantage is that because a stub is present we don't lose the ability to syntax check subroutine names at compile time. This is particularly useful if we are also providing prototypes for our subroutines, since they clearly cannot be checked at compile time if they are only created at run time (unless they all have the same prototypes and we prototype the autoloader itself, as noted earlier). The contents of the subroutines are only checked at run time, however, an unavoidable compromise if we wish to avoid parsing them until they are used.
The Perl standard library provides three modules that implement the strategy of delayed loading of subroutines in two different ways. The autouse
module we already looked at in the last chapter, as it is a mechanism for the calling rather than called module. Of the remaining two, the more complex is AutoLoader
, which loads additional files containing subroutine definitions as required. For this to work, they must previously have been generated by the AutoSplit
module using an AUTOLOAD
subroutine. This implies that the module is split into separate pieces prior to being used, that is, an installation process is required.
The SelfLoader
module operates along broadly similar lines, but it keeps all the subroutines to be loaded later inside the source file. The advantage is that we do not need to remember to use AutoSplit
. Conversely, we must load all the source code into memory in an uncompiled form so that it can be compiled on demand.
Using the AutoLoader
In order to use the AutoLoader
module, we need to adapt our modules to its requirements. The first and most important step is to place the subroutines we want to delay loading after an __END__
token. Anything before is compiled at compile time, anything after is compiled at run time on demand. This may require a little reorganization of the source, of course.
Once this is done, we add a use
statement to include the AutoLoader
module and import its AUTOLOAD
subroutine, which does the work of retrieving the subroutines once they are split out. Note that importing the subroutine is important—the AutoLoader
will not work without it:
use AutoLoader qw(AUTOLOAD);
(Why does AutoLoader
not automatically export AUTOLOAD
for us? Because we could implement our own AUTOLOAD
routine to handle special cases and invoke AutoLoader
's from it. This lets us control the autoloading process if we need to.)
The __END__
token causes the Perl interpreter to stop reading the file at this point, so it never sees the subroutines placed after it. To make them available again, we use the AutoSplit
module to carve out the subroutines after the __END__
token into separate files placed in an auto directory relative to the module file. This often takes place in installation scripts and typically takes the form of a one-line Perl program. For example, to autosplit a module from the directory in which it is placed, use the following:
> perl -MAutoSplit -e 'autosplit qw(My/AutoModule.pm ./auto)'
This takes a module called My::AutoModule
, contained in a file called AutoModule.pm
, in a directory called My
in the current directory, and splits it into parts inside an auto
directory (which is created at the time if it doesn't already exist). Inside it we will now find the directories My/AutoModule
. We in turn find within the directories an index file called autosplit.ix
that describes the split-out subroutines. Along with it we find one file for each subroutine split out of the module, named for the subroutine with the suffix .al
(for autoload
).
Be aware that lexical my
variables at file scope are not visible to autoloaded subroutines. This is obvious when we realize that the scope of the file has necessarily changed because we now have multiple files. On the other hand, variables declared with our
(or use vars
) will be fine, since they are package-scoped.
As an example of how AutoLoader
is used, take this simple module file that implements a package called My::AutoModule
:
# My/AutoModule.pm
package My::AutoModule;
use strict;
use Exporter;
use AutoLoader qw(AUTOLOAD);
our @ISA = qw(Exporter);
our @EXPORT = qw(one two three);
sub one {
print "This is always compiled
";
}
__END__
sub two {
print "This is sub two
";
}
sub three {
print "This is sub three
";
}
1;
The file, which in this case is named AutoModule.pm
and is contained in a directory called My
to match the package name, has three subroutines. The first, one
, is a regular subroutine—it is always compiled. The others, two
and three
, are actually just text at the end of the file—the __END__
ensures that Perl never sees them and never even reads them in. Note that the only changes from a normal module are the use AutoLoader
line and the __END__
token. The trailing 1;
is not actually needed any longer, but we retain it in case we ever convert the module back into an unsplit one.
When we split the file, it creates three files, autosplit.ix, two.al
, and three.al
, all in the auto/My/AutoModule
directory. Since we specified .
as the installation directory, this new directory is immediately adjacent to the original AutoModule.pm
file. If we had wanted to split a module that was installed into the Perl standard library tree, we would have used a different path here, according to the position of the file we want to split.
The autosplit.ix
file contains the essential information about the subroutines that have been split out:
# Index created by AutoSplit for My/AutoModule.pm
# (file acts as timestamp)
package My::AutoModule;
sub two;
sub three;
1;
Close inspection of this file reveals that it is in fact a snippet of Perl code that predeclares two subroutines, the two that were split out, in the package My::AutoModule
. When the module is used in an application, the line use AutoLoader
causes the AutoLoader
module to be read in and initialized for that module. This has the effect of loading this index file, and thus declaring the subroutines.
The point of this may seem obscure, since the AUTOLOAD
subroutine will seek the split-out files regardless, but it allows us to declare prototypes for subroutines and have them checked at compile time. It also allows us to call subroutines without parentheses, in the list operator style. Here is a short script that calls the subroutines defined by this module:
#!/usr/bin/perl
# automoduletest.pl
use warnings;
use strict;
use lib '.';
use My::AutoModule;
one;
two;
three;
The .al
files contain the subroutines that were split out. Due to varying locations, slightly different scripts used, and so on, we may have small variations in the actual contents of the .al
files obtained, but the following sample provides a rough idea of what can be expected:
# NOTE: Derived from My/AutoModule.pm.
# Changes made here will be lost when autosplit again.
# See AutoSplit.pm.
package My::AutoModule;
#line 18 "My/AutoModule.pm (autosplit into ./auto/My/AutoModule/two.al)"
sub two {
print "This is sub two
";
}
# end of My::AutoModule::two
1;
The AutoSplit
module is smart enough to check that the AutoLoader
module is actually used by a file before it attempts to split it. We can disable this check (if we insist), as well as determine whether old subroutine .al
files are removed if they no longer exist, and check to see if the module has actually changed. To do this, we add one or more of three optional Boolean arguments to the autosplit
subroutine:
> perl -MAutoSplit -e 'autosplit qw(My/AutoModule.pm ./auto), [keep],
[check],
[changed]'
Substitute a 0
or 1
for the parameters to set or unset that argument. If any of these Boolean arguments are true, then the following actions occur:
keep
: Deletes any .al
files for subroutines that no longer exist in the module (ones that do still exist are overwritten anyway). The default is 0
, so .al
files are automatically preserved.check
: Causes the autosplit
subroutine to verify that the file it is about to split actually contains a use AutoLoader
directive before proceeding. The default is 1
.changed
: Suppresses the split if the timestamp of the original file is not newer than the timestamp of the autosplit.ix
file in the directory into which the split files are going to be placed. The default is 1
.For example, the explicit version of the preceding two-argument call would be
> perl -MAutoSplit -e 'autosplit "My/AutoModule.pm","./auto", 0, 1, 1'
Again, the equivalent for Windows is
> perl -MAutoSplit -e "autosplit"My/AutoModule.pm","./auto", 0, 1, 1"
We are not obliged to use the AutoLoader
module's AUTOLOAD
subroutine directly, but we need to use it if we want to load in split files. If we already have an AUTOLOAD
subroutine and want to also use AutoLoader
, we must not import the AUTOLOADER
subroutine from AutoLoader
but instead call it from our own AUTOLOAD
subroutine:
use AutoLoader;
sub AUTOLOAD {
... handle our own special cases ...
# pass up to AutoLoader
$AutoLoader::AUTOLOAD = $AUTOLOAD;
goto &AutoLoader::AUTOLOAD;
}
Note the goto
—this is needed so that the call stack reflects the correct package names in the right place, or more specifically, doesn't include our own AUTOLOAD
subroutine in the stack, which will otherwise confuse the AutoLoader
module's AUTOLOAD
subroutine. Of course, if we have our own AUTOLOAD
subroutine, we might not need the module at all—multiple autoloading strategies in the same module or application is probably getting a little overcomplex.
Using the SelfLoader
The SelfLoader
module is very similar in use to the AutoLoader
module, but it avoids the need to split the module into files as a separate step. To use it, we use
the SelfLoader
module and place the subroutines we want to delay the loading of after a __DATA__
token. Here is a module called My::SelfModule
that is modified from the My::AutoModule
module given earlier to use SelfLoader
instead:
# My/SelfModule.pm
package My::SelfModule;
use strict;
use Exporter;
use SelfLoader;
our @ISA = qw(Exporter);
our @EXPORT = qw(zero one two three);
sub one {
print "This is always compiled
";
}
__DATA__
sub two {
print "This is sub two
";
}
sub three {
print "This is sub three
";
}
1;
This module is identical to the AutoLoader
version except for the two alterations. We replace use AutoLoader qw(AUTOLOAD)
with use SelfLoader
and __END__
with __DATA__
. If we also want to place actual data in the module file, we can do so as long as it is read before loading the SelfLoader
module, that is, in a BEGIN
block prior to the use SelfStubber
statement.
The SelfLoader
module exports its AUTOLOAD
subroutine by default, however, so if we want to define our own and call SelfLoader
from it, we need to specify an explicit empty list:
use SelfLoader ();
sub AUTOLOAD {
# ... handle cases to be processed here
# pass up to SelfLoader
$SelfLoader::AUTOLOAD = $AUTOLOAD;
goto &SelfLoader::AUTOLOAD;
}
To test this module, we can use a script similar to the one used for My::AutoModule
, except that My::SelfModule
must be used instead. We also need to add parentheses to the subroutine calls because SelfLoader
does not provide declarations (as we discover if we try to run it). To solve this problem, we can make use of the Devel::SelfStubber
module to generate the declaration stubs we need to add:
> perl -MDevel::SelfStubber -e 'Devel::SelfStubber->stub("My::SelfModule",".")'
And for Windows:
> perl -MDevel::SelfStubber -e "Devel::SelfStubber->stub ("My::SelfModule",".")"
This generates the following declarations for our example module, which we can add to the module to solve the problem:
sub My::SelfModule::two ;
sub My::SelfModule::three ;
We can also regenerate the entire module, stubs included, if we first set the variable $Devel::SelfStubber::JUST_STUBS = 0
. This gets a little unwieldy for a command line, but it is possible. Take as an example the following command (which should all be typed on one line):
> perl -MDevel::SelfStubber -e '$Devel::SelfStubber::JUST_STUBS
= 0; Devel::SelfStubber->stub("My::SelfModule",".")' > My/SelfModule-stubbed.pm
For Windows, because of the different quoting conventions, this becomes
> perl -MDevel::SelfStubber -e "$Devel::SelfStubber::JUST_STUBS
= 0; Devel::SelfStubber->stub("My::SelfModule",".")" >
My/SelfModule-stubbed.pm
This generates a new module, SelfModule-stubbed.pm
, which we have named differently just for safety; it is still My::SelfModule
inside. If all looks well, we can move or copy SelfModule-stubbed.pm
over Selfmodule.pm
. Note that running this command more than once can generate extra sets of stubs, which may cause problems or at least confusion, and we may even end up with an empty file if we forget to put the __DATA__
token in. For this reason, it is not advisable to attempt to replace a file with a stubbed version in one step.
In the previous chapter, we looked at how to import symbols from one package into our own using the use
and import
statements. Now we will see the other side of the fence—the perspective of the module.
The term "importing" means taking symbols from another package and adding them to our own. From the perspective of the module being imported from, it is "exporting," of course. Either way, the process consists of taking a symbol visible in the namespace of one package and making it usable without qualifying it with a namespace prefix in another. For instance, even if we can see it, we would rather not refer to a variable called
$My::Package::With::A::Long::Name::scalar
It would be much better if we could refer to this variable simply as $scalar
in our own code. From Chapter 5, we know that we can do this explicitly using typeglobs to create aliases:
my *scalar =$My::Package::With::A::Long::Name::scalar;
Likewise, to create an alias for a subroutine:
my *localsub =&My::Package::With::A::Long::Name::packagesub;
This is a simple case of symbol table manipulation, and it isn't all that tricky once we understand it; refer to Chapter 8 for more detail if necessary. However, this is clumsy code. We have to create an alias for every variable or subroutine we want to import. It is also prone to problems in later life, since we are defining the interface—the directly visible symbols—between this package and our own code, in our own code. This is very bad design because the package is not in control of how it is used. At best it is a maintenance nightmare; at worst, if the package is updated, there is a high chance our code will simply break.
Good programming practice dictates that packages should have a well-defined (and documented) interface and that all dependent code should use that interface to access it. The package, not the user of the package, should dictate what the interface is. Therefore, we need a way to ask the package to create appropriate aliases for us; this is the import mechanism that the use
and no
declarations invoke automatically. By passing responsibility for imports to the package, it gets to decide whether or not the request is valid, and reject it if not.
The import mechanism is not all that complex, and a basic understanding of it can help with implementing more complex modules with more involved export requirements. It is also applicable to simpler import mechanisms that, rather than actually exporting symbols, allow us to configure a package using the import list as initialization data. Object-oriented modules, which rarely export symbols, commonly use the import mechanism this way. However, if our requirements are simple, we can for the most part ignore the technicalities of the import mechanism and use the Exporter
module to define our interface for us. For the majority of packages, the Exporter
can handle all the necessary details. If we just want to export a few subroutines, skip part of the next section of this chapter and head straight to the section titled "The Exporter Module."
Perl's mechanism for importing symbols is simple, elegant, and shockingly ad hoc, all at the same time. In a nutshell, we call a subroutine (actually an object method) called import
in the package that we want to import symbols from. It decides what to do, then returns an appropriate value.
The import stage is a secondary stage beyond actually reading and compiling a module file, so it is not handled by the require
directive; instead, it is a separate explicit step. Written out explicitly, we could do it like this:
require My::Module; # load in the module
My::Module->import; # call the 'import' subroutine
Since this is a call to an object method, Perl allows us to invert the package and subroutine names, so we can also say
import My::Module;
This doesn't mean we have to start programming everything as objects, however. It is just a convenient use of Perl's object-oriented syntax, just as the print
statement is (to the surprise of many programmers). The syntax fools many programmers into thinking that import
is actually a Perl keyword, since it looks exactly like require
, but in fact it is only a subroutine. This typical import
statement appears to be a core Perl feature for importing symbols, but in fact all it does is call the subroutine import
in the package My::Module
and pass the arguments subone, subtwo
, and $scalar
to it:
import My::Module qw(subone subtwo $scalar);
The import
subroutine is rarely invoked directly because the use
directive binds up a require
and a call to import
inside a BEGIN
block. For example, use My::Module
is therefore (almost) equivalent to
BEGIN {
require My::Module;
import My::Module;
}
Given that use
does all the work for us, are there any reasons to need to know how to do the same job explicitly? Loading modules on demand during program execution can be easily achieved by using require
and importing without the BEGIN
block, as in the first example. This doesn't work with use
because it happens at compile time due to the implicit BEGIN
, and it disregards the surrounding run-time context.
Note that the preceding import
has no parentheses; any arguments passed to use
therefore get automatically passed directly to the import
subroutine without being copied, as covered in Chapter 9. If there is no import
subroutine defined, however, the preceding will complain, whereas use
will not. A more correct import
statement would be
import My::Module if My::Module->can('import'),
# 'can' is a universal method (see Chapter 18)
Similarly, the no
directive calls a function called unimport
. The sense of no
is to be the opposite of use
, but this is a matter purely of convention and implementation, since the unimport
subroutine is just another subroutine. In this case though, Perl will issue an error if there is no unimport
method defined by the module. The no My::Module
code is (roughly, with the same proviso as earlier) equivalent to
BEGIN {
require My::Module;
unimport My::Module;
}
It may seem strange that no
incorporates a require
within it, but there is no actual requirement that we use
a module before we no
parts of it. Having said that, the module may not work correctly if the import
subroutine is not called initially. If use
has already pulled in the module, the require
inside no
will see that the module is already in %INC
, and so won't load it again. This means that in most cases no
is just a way of calling unimport
in the module package at compile time.
In the same way that aliasing can be done with typeglobs, removing aliases can be done by editing an entry out of the symbol table. Here is an example that does just that, using the delete_package
subroutine of the Symbol
module that we introduced previously:
# Uninstallable.pm
package Uninstallable;
use Symbol qw(delete_package);
our $message = "I'm here
"; # package global
sub unimport {
delete_package(__PACKAGE__);
}
1;
This module, which for the purposes of testing we shall call Uninstallable.pm
(because we can uninstall it, not because we can't install it), defines one variable simply so we can tell whether or not it is present by testing for it. The next short script shows how. Note the BEGIN
blocks to force the print
statements to happen at the same time as use
—otherwise the package would be uninstalled before the first print
executes.
#!/usr/bin/perl
# uninstall.pl
use strict;
BEGIN { print "Now you see me: "; }
use Uninstallable;
BEGIN { print $Uninstallable::message; }
BEGIN { print "Now you don't!
"; }
no Uninstallable;
BEGIN { print $Uninstallable::message; }
When run, presuming the module and script are both in the current directory:
> perl -I. uninstall.pl
you'll see the following output:
Now you see me: I'm here
Now you don't!
As interesting as this is, it is rare (though not impossible) that we would actually want to delete a package programmatically. Where they are implemented, most unimport
subroutines simply clear flags that an import
sets. Many of Perl's pragmatic modules like strict
and warnings
work this way, for example, and are actually very small modules in themselves.
Bypassing import
The use
and no
directives incorporate one extra trick: if we pass them an explicit empty parameter list, they don't call the import
function at all. This means that we can suppress a module's default import if we only want to use some of its features. Take the CGI
module as an example:
use CGI; # parse environment, set up variables
use CGI qw(:standard); # import a specific set of features
use CGI (); # just load CGI, don't parse anything
Suppressing the default import by passing an empty list is more useful than it might seem. The CGI
module in the previous examples does rather a lot more than simply importing a few symbols by default; it examines the environment and generates a default CGI object for functional programming, as well as automatically generating a number of methods. If we just want to use the CGI
module's HTML generation features, we don't need all that, so we can stop the module initializing itself by explicitly passing nothing to it.
While most modules make use of the Exporter
module covered later, they are not compelled to do so. Here is a simple exporting subroutine that illustrates how a module can implement a simple import
subroutine:
# default import
sub import {
my $caller = caller(1); # get calling package
*{"$caller::mysub"} =&mysub; # export 'mysub'
*{"$caller::myscalar"} =$myscalar; # export '$myscalar'
*{"$caller::myhash"} =\%myhash; # export '%myhash'
}
The principal technique is that we find the caller's package by inspecting the subroutine stack with caller
. It so happens that when called in a scalar context, caller
returns just the package name, so caller(1)
returns the package of the caller—in other words, the place from which the use
was issued. Once we know this, we simply use it to define typeglobs in the calling package filled with references to the variables we want to export.
This import
subroutine doesn't pay any attention to the arguments passed to it (the first one of which is the package name). It just exports three symbols explicitly. This isn't very polite, as the calling package might not need all of them, and might even have its own versions of them. Here is a more polite and more versatile import
subroutine that exports only the requested subroutines, if they exist:
# export if defined
sub import {
my $caller = caller(1); # get the name of the calling package
my $package = shift; # remove leading package argument from @_
no strict refs; # we need symbolic references to do this
foreach (@_) { # for each request
if (defined &{"$package::$_"}) { # if we have it...
*{"$caller::$_"} =&{"$package::$_"} # ...make an alias
} else {
die "Unable to export $_ from $package
"; # otherwise, abort
}
}
}
Usually, the package passed in is the package we are in anyway—we could as easily have said &{$_}
as &{"$package::$_"}
. However, it is good practice to use the package name in case the import
method is inherited by another package—by using the passed name, our import
will also serve for any packages that inherit from it (via @ISA
). This is exactly how the Exporter
module works, in fact.
The preceding example only works for subroutines, so it only constructs subroutine references. A more versatile version would examine (and remove if appropriate) the first character of the symbol and construct a scalar, array, hash, code, or typeglob reference accordingly. Here is an example that does that, though for brevity, we have removed the check for whether the symbol actually exists:
# export arbitrary
sub import {
my $caller = caller(1); # get calling package
my $package = shift; # remove package from arguments
no strict refs; # we need symbolic references for this
foreach (@_) {
my $prefix;
s/^([&%$@*])// and $prefix = $1;
$prefix eq '$' and *{"$caller::$_"} =${"$package::$_"}, last;
$prefix eq '%' and *{"$caller::$_"} =\%{"$package::$_"}, last;
$prefix eq '@' and *{"$caller::$_"} =@{"$package::$_"}, last;
$prefix eq '*' and *{"$caller::$_"} =*{"$package::$_"}, last;
*{"$caller::$_"} =&{"$package::$_"}, last;
}
}
It is up to the import
subroutine whether or not to carry out additional default imports when an explicit list is passed. In general, the answer is no, but it is usual to define a special symbol like :DEFAULT
that imports all the default symbols explicitly. This allows the module user maximum flexibility in what to allow into their namespace:
sub import {
my $package = shift;
# if an empty import list, use defaults
return _default_import() unless @_;
foreach (@_) {
/:DEFAULT/ and _default_import(), last;
_export_if_present($package,$_);
}
}
sub _default_import {
# ... as above ...
}
sub _export_if_present {
my ($package,$symbol) = @_;
my $prefix;
$symbol = s/^([&%$@*])// and $prefix = $1;
if ($prefix and $prefix ne '&') {
SWITCH: foreach ($prefix) {
m'$' and do {
if (defined ${"$package::$_"}) {
*{"$caller::$_"}=${"$package::$_"};
return;
}
};
m'@' and do {
# ... ditto for arrays ...
};
m'%' and do {
# ... ditto for hashes ...
};
m'*' and do {
# ... ditto for typeglobs ...
};
}
} elsif (defined &{"$package::$_"}) {
*{"$caller::$_"}=&{"$package::$_"}
} else {
die "Unable to export $_ from $package
";
}
}
The import
method is not obliged to export symbols in response to being called. It can choose to do anything it likes and treat the list of names passed to it in any way it sees fit. As an indication of what else can be done with import lists, here is an import
subroutine that invents generators for HTML tags by defining a subroutine for any symbol passed to it that it doesn't recognize. (The CGI
module uses exactly this approach, though its HTML methods are a good deal more advanced. It is also a much bigger module than these eight lines of code.)
sub import {
my $package = shift;
foreach (@_) {
# for each passed symbol, generate a tag subroutine in the
# caller's package.
*{"$package::$_"} = sub {
"<$tag>
".join("
",@_)."</$tag>
";
};
}
}
This is frequently a better way to handle automatic generation of subroutines than autoloading is, since it is more controlled and precise. Also we have to declare the subroutines we want to use at compile time (as use
calls import
then) where they can be subjected to syntax checking. Autoloading, by contrast, actually disables compile-time checks, since it is perfectly valid for a subroutine not to exist before it is called.
When to Export, When Not to Export
Having shown how to export symbols, it is worth taking a moment to consider whether we should. The point of packages is to increase reusability by restraining the visibility of variables and subroutines. We can write application code in the main
package free from worry about name clashes because modules place their variables and subroutines into their own packages. Importing symbols goes against this strategy, and uncontrolled importing of lots of symbols pollutes code with unnecessary definitions that degrade maintainability and may cause unexpected bugs. In general we should take time to consider
These steps are an essential part of defining the interface to the package, and therefore a critical element of designing reusable code.
Object-oriented modules should usually not export anything at all; the entire point of object orientation is to work through the objects themselves, not to bypass them by importing parts of the module class into our own code. Additionally, exporting symbols directly bypasses the inheritance mechanism, which makes code that uses the exported symbols hard to reuse and likely to break. There are a few rare cases where modules provide both functional and object-oriented interfaces, but only in the simplest modules that are not intended to be inherited from is this a viable strategy.
In summary, the export list of a module is far more than just a list of symbols that will/may be imported into another package; it is the functional interface to the module's features, and as such should be designed, not gradually expanded. The Exporter
module helps with this by allowing us to define lists of conditionally exported symbols.
The Exporter Module
The Exporter
module provides a generic import
subroutine that modules can configure to their own taste. It handles almost all possible issues that a traditional exporting module needs to consider, and for many modules it is all they need.
Using the Exporter
To use the Exporter
, a module needs to do three things: use Exporter
, inherit from it, and define the symbols eligible for export. Here is a very short module that demonstrates the basic technique, using fully qualified names for the package variables @ISA
and @EXPORT
:
# My/Module.pm
package My::Module;
use Exporter;
# inherit from it
@My::Module::ISA = qw(Exporter);
# define export symbols
@My::Module::EXPORT = qw(greet_planet);
sub greet_planet {
return "Hello World
";
}
Here we have an @ISA
array that tells the interpreter that the module is a subclass of Exporter
and to refer to it for any methods the module does not provide. Specifically that means import
and unimport
, of course. We don't need to worry too much about the object-oriented nature of inheriting from Exporter
, unless we want to define our own import
subroutine and still make use of the one provided by Exporter
(we will get to that in a moment).
The @EXPORT
array defines the actual symbols we want to export. When import
is invoked for our module, the call is relayed up to the Exporter
module, which provides the generic import
method. It in turn examines the definition of @EXPORT
in our module, @My::Module::EXPORT
and satisfies or denies the requested import list accordingly.
To illustrate, here's a short script that uses the preceding module, assuming it is in a file named Module.pm
in a directory called My
in the same directory as the script:
#!/usr/bin/perl
# import.pl
use warnings;
use strict;
use lib '.'; #look in current directory for My/Module.pm
use My::Module;
print greet_planet;
Importing from the Exporter
One advantage of the Exporter module
is that the import
method it provides is well developed and handles many different situations for us. Even if we decide to provide our own import
subroutine, we may want to use Exporter
too, just for the richness of the features it provides (and if we don't, we probably ought to document it). For example, it accepts regular expressions as well as literal symbol names, which means that we can define a collection of symbols with similar prefixes and then allow them to be imported together rather than individually. Here is how we can import a collection of symbols all starting with prefix_
from a module that uses the Exporter
module:
use My::Module qw(/^prefix_/);
The Exporter
also understands negations, so we can import all symbols that do not match a given name or regular expression:
# import everything except the subroutine 'greet_planet'
use My::Module qw(!greet_planet);
# import anything not beginning with 'prefix_'
use My::Module qw(!/^prefix_/);
We can also collect symbols together into groups and then import the groups by prefixing the group name with a colon. Again, this isn't a core Perl feature, it is just something that the Exporter
module's import
method does. For example:
use My::Module qw(:mygroup);
We'll see how to actually define a group in a moment.
Default and Conditional Exports
The @EXPORT
variable defines a list of default exports that will be imported into our code if we use the module with no arguments, that is:
use My::Module;
But not either of these:
use My::Module ();
use My::Module qw(symbola symbolb symbolc);
If we give an explicit list of symbols to import, even if it is an empty list, Exporter
will export only those symbols. Otherwise, we get the default, which is entirely up to the module (and hopefully documented).
Since exporting symbols automatically is not actually all that desirable (the application didn't ask for them, so we shouldn't spray it with symbols), Exporter
also allows us to define conditional exports in the @EXPORT_OK
array. Any symbol in this array may be exported if named explicitly, but it will not be exported by default.
In My::Module
(Module.pm
):
# change sub to be exported only on request
@EXPORT_OK = qw(greet_planet);
In application (import.pl
):
# now we must import the sub explicitly
use My::Module qw(greet_planet);
The contents of the @EXPORT
array are also checked when an explicit list is given, so any name or regular expression passed to import
will be imported if it matches a name in either the @EXPORT
or @EXPORT_OK
list. However, any explicit list suppresses the exporting of the default list—which is the point, of course.
We can ask for the default symbols explicitly by using the special export tag :DEFAULT
. The advantage is that we augment it with additional explicit requests. For example, this statement imports all the default symbols and additionally imports two more (presumably on the @EXPORT_OK
list):
use My::Module qw(:DEFAULT symbola symbolb);
Alternatively, we can import the default list but skip over selected symbols:
use My::Module qw(:DEFAULT !symbola !symbolb);
Since this is a common case, we can also omit the :DEFAULT
tag and simply put
use My::Module qw(!symbola !symbolb);
In fact, this is the same as the example of negation we gave earlier; in effect, an implicit :DEFAULT
is placed at the front of the list if the first item in the list is negated.
As a working example of the different ways that import lists can be defined, here is a short demonstration module, called TestExport.pm
, and a test script that we can use to import symbols from it in different ways. First the module, which exports two subroutines by default and two if asked:
# TestExport.pm
package TestExport;
use strict;
use Exporter;
our @ISA = qw(Exporter);
our @EXPORT = qw(sym1 sym2);
our @EXPORT_OK = qw(sym3 sym4);
sub sym1 {print "sym1
";}
sub sym2 {print "sym2
";}
sub sym3 {print "sym3
";}
sub sym4 {print "sym4
";}
1;
The following script contains a number of different use
statements that import different symbols from the module, depending on their argument. To use it, uncomment one (and only one) use
statement, and the script will print out the subroutines that were imported as a result. It also demonstrates a simple way of scanning the symbol table, as well as the use of %INC
to check for a loaded module.
#!/usr/bin/perl
# testexport.pl
use warnings;
use strict;
# :DEFAULT import
#use TestExport;
# no imports
#use TestExport();
# just 'sym1'
#use TestExport qw(sym1);
# everything but 'sym1'
#use TestExport qw(!sym1);
# just 'sym3'
#use TestExport qw(sym3);
# everything but 'sym3'
#use TestExport qw(!sym3);
# implicit :DEFAULT
#use TestExport qw(!sym1 sym3);
# no implicit :DEFAULT
#use TestExport qw(sym3 !sym1);
unless (exists $INC{'TestExport.pm'}) {
die "Uncomment a 'use' to see its effect
";
}
foreach (keys %::) {
print "Imported: $_
" if /^sym/;
}
Note that in these examples we have concentrated on subroutines, since these are the symbols we most commonly export, though we are equally free to export variables too.
Export Lists
In addition to adding symbol names to @EXPORT
and @EXPORT_OK
, we can define collections of symbols as values in the hash variable %EXPORT_TAGS
. The key is a tag name that refers to the collection. For example:
our (@EXPORT @EXPORT_OK %EXPORT_TAGS);
$EXPORT_TAGS{'subs'} = [qw(mysub myothersub subthree yellowsub)];
$EXPORT_TAGS{'vars'} = [qw($scalar @array %hash)];
Or, more succinctly:
our %EXPORT_TAGS = (
subs => [qw(mysub myothersub subthree yellowsub)],
vars => [qw($scalar @array %hash)],
);
Note that in accordance with the principles of nested data structures, we need to assign an array reference to each tag name key—otherwise we just count the list.
However, defining a list and assigning it to a tag does not automatically add the names in the list to either @EXPORT
or @EXPORT_OK
; in order for the tag to be imported successfully, the names have to be in one or other of the arrays too. Fortunately, Exporter
makes this simple for us by providing a pair of subroutines to add the symbols associated with a tag to either list automatically. To add a tag to the default export list:
Exporter::export_tags('subs'),
To add a tag to the conditional export list:
Exporter::export_ok_tags('vars'),
We can now import various permutations of tags and symbol names:
# import two tags
use My::Module qw(:subs :vars);
# import the default list excepting the items in ':subs'
use My::Module qw(:DEFAULT !:subs);
# import ':subs' excepting the subroutine 'myothersub'
use My::Module qw(:subs !myothersub);
To show tags in action, here is a modified example of the TestExport
module we gave earlier, rewritten to use tags instead. We define the default and on-request export lists using the export_tags
and export_ok_tags
subroutines:
# TestTagExport.pm
package TestTagExport;
use strict;
use Exporter;
our @ISA = qw(Exporter);
our %EXPORT_TAGS = (
onetwo => ['sym1','sym2'],
threefour => ['sym3','sym4'],
onetwothree => [qw(sym1 sym2 sym3)],
all => [qw(sym1 sym2 sym3 sym4)],
);
Exporter::export_tags('onetwo'),
Exporter::export_ok_tags('threefour'),
sub sym1 {print "sym1
";}
sub sym2 {print "sym2
";}
sub sym3 {print "sym3
";}
sub sym4 {print "sym4
";}
1;
Here is a script that tests out the export properties of the new module, concentrating on tags rather than symbols, though all the tests that applied to the first module will work with the same effect with this example:
#!/usr/bin/perl
# testtagexport.pl
use warnings;
use strict;
# import tag
#use TestTagExport;
# import symbol plus tag
#use TestTagExport qw(:threefour sym2);
# import tag minus symbol
#use TestTagExport qw(:onetwothree !sym2);
# import one tag minus another
#use TestTagExport qw(:onetwothree !:DEFAULT);
unless (exists $INC{'TestTagExport.pm'}) {
die "Uncomment a 'use' to see its effect
";
}
foreach (keys %::) {
print "Imported: $_
" if /^sym/;
}
Versions
The use
and require
directives support a version number syntax in addition to their regular use in module loading. The Exporter
module also allows us to handle this usage by defining a require_version
method that is passed the package name (because it is a method) and the version number requested:
our $VERSION = "1.23";
# this subroutine name has special meaning to Exporter
sub require_version {
my ($pkg,$requested_version) = @_;
return $requested_version ge $VERSION;
}
If we do not supply a require_version
method, then a default definition provided by Exporter
is used instead; this also tests the requested version against the value of $VERSION
defined in the local package (if any is defined), but it uses a numeric comparison, which works well for comparing version number objects/strings (see Chapter 3).
Handling Failed Exports
The Exporter
module automatically causes an application to die
if it attempts to import a symbol that is not legal. However, by defining another array, @EXPORT_FAIL
, we can define a list of symbols to handle specially in the event that Exporter
does not recognize them. For example, to handle cross-platform special cases, we might define three different subroutines:
our (@EXPORT_FAIL);
@EXPORT_FAIL = qw(win32sub macsub Unixsub);
In order to handle symbols named in the failure list, we need to define a subroutine, or rather a method, called export_fail
. The input to this method is a list of the symbols that the Exporter
did not recognize, and the return value should be any symbols that the module was unable to process:
sub export_fail {
my $pkg = shift;
my @fails;
foreach (@_) {
# test each symbol to see if we want to define it
push @fails,$_ if supported($_);
}
# return list of failed exports (none if success)
return @fails;
}
sub supported {
my $symbol = shift;
... test for special cases ...
return $ok_on_this_platform;
}
If an export_fail
method isn't defined, then Exporter
supplies its own, which returns all the symbols, causing them all to fail as if the @EXPORT_FAIL
array was not defined at all. Note that we cannot have Exporter
call export_fail
for any unrecognized symbol, only those listed in the @EXPORT_FAIL
array. However, if we wanted to handle situations like this ourselves, we can always define our own import
method, which we discuss next.
Using the Exporter with a Local import Method
If a module needs to do its own initialization in addition to using Exporter
, we need to define our own import
method. Since this will override the import
method defined by Exporter
, we will need to take steps to call it explicitly. Fortunately, the Exporter
module has been written with this in mind.
Assuming we're familiar with object-oriented programming, we might guess that calling SUPER::import
from our own import
subroutine would do the trick, since SUPER::
is the named method in the parent package or packages. Unfortunately, although this works, it imports symbols to the wrong package, because Exporter
's import
method examines the package name of the caller to determine where to export symbols. Since that is the module, and not the user of the module, the export doesn't place anything in the package that issues the use
statement. Instead, we use the export_to_level
method, which traces back up the calling stack and supplies the correct package name to Exporter
's import
method. Here's how to use it:
our @ISA = qw(Exporter);
our @EXPORT_OK = qw(mysub myothersub subthree yellowsub);
sub import {
my $package = $_[0];
do_our_own_thing(@_);
$package->export_to_level(1, @_);
}
The first argument to export_to_level
is a call-stack index (identical to that passed to the caller
function). This is used to determine the package to export symbols to, thereby allowing export_to_level
to be completely package independent. Note that because the package information needs to be preserved intact, it is important that we do not remove the package name passed as the first argument, which is why we used $_[0]
and not shift
in the preceding example.
The Exporter
module also has a special verbose mode we can use when we are debugging particularly complex import problems. To enable it, define the variable $Exporter::Verbose
before using the module. Note that for this to be successful it needs to be in a BEGIN
block:
BEGIN {
$Exporter::Verbose = 1;
}
Note also that this will produce debug traces for all modules that use Exporter
. Since a very large number of modules use Exporter
, this may produce a lot of output. However, since BEGIN
blocks (including the implicit ones in use
statements) are executed in order, we can plant BEGIN
blocks in between the use
statements to restrain the reporting to just those modules we are interested in:
use Exporter;
use A::Module::Needed::First;
BEGIN { print "Loading...
"; $Exporter::Verbose = 1;}
use My::Problematic::Exporting::Module;
BEGIN { print "...loaded ok
"; $Exporter::Verbose = 0;}
use Another::Module;
Package attributes are an extension to the predefined attribute mechanism provided by the attributes module and covered in Chapter 7. Perl only understands four native attributes by default (lvalue, locked, method
, and unique
, of which locked
and method
are now deprecated), but the idea of package attributes is that we can implement our own attributes that work on a package-wide basis. To implement them, we define specially named subroutines within the package. The easiest way to create these is with the Attribute::Handlers
module, although the underlying mechanism is not (as is often the case in Perl) all that complex. The attribute mechanism is still somewhat experimental in Perl 5.8, so some of its more idiosyncratic properties are likely to be smoothed out over time.
Each data type may have a different set of attributes associated with it. For example, a scalar attribute is implemented by writing FETCH_SCALAR_ATTRIBUTES
and MODIFY_SCALAR_ATTRIBUTES
subroutines, and similarly for ARRAY, HASH
, and CODE
. The package may implement the details of storage and retrieval any way it likes based on the arguments passed.
FETCH_
subroutines are called by attributes::get
whenever we use it on a reference of the correct type in the same package. They are passed a single argument, which is a reference to the entity being queried. They should return a list of the attributes defined for that entity.
MODIFY_
subroutines are called during the import stage of compilation. They take a package name and a reference as their first two arguments, followed by a list of the attributes to define. They return a list of unrecognized attributes, which should be empty if all the attributes could be handled.
Both FETCH_
and MODIFY_
subroutines may be accessed by corresponding routines in a package implementing a derived object class. The parent package is called with SUPER::FETCH_TYPE_ATTRIBUTES
and SUPER::MODIFY_TYPE_ATTRIBUTES
. The intent is that a subclass should first call its parent and then deal with any attributes returned. In the case of FETCH_
, it should add its own attributes to the list provided by the parent and return it. In the case of MODIFY_
, it should deal with the list of unrecognized attributes passed back from the parent. This is essentially the same mechanism that the Exporter
module uses.
With the Attribute::Handlers
module, we can invent our own attributes and register handlers to be triggered when a variable or subroutine is declared with them, without any need to get involved in defining explicit FETCH_
and MODIFY_
subroutines. Here is a minimal example that shows an attribute handler in action:
#!/usr/bin/perl
use strict;
use warnings;
use Attribute::Handlers;
{
package Time;
sub Now : ATTR(SCALAR) {
my ($pkg,$sym,$ref,$attr,$data,$when)=@_;
$$ref=time;
}
}
my Time $now : Now;
print $now; # produces the time in seconds since 1970/1/1
This creates a handler called Now
in the Time
package that can be applied to scalar attributes—attempting to declare this attribute on an array, hash, or subroutine will cause a syntax error. When a scalar variable is declared and typed to the Time
package and then given Now
as an attribute, the handler is called. Interesting though this syntax looks, Perl does not really support the typing of variables. Providing a scalar variable with a type is really just a suggestion to Perl to do something with the variable if the opportunity arises. Package attributes are one of the two features that provided a "something," the other being the compile-time checking of hash keys accesses in pseudohashes and restricted hashes. The effect of the type is only at compile time; it does not persist into the execution phase.
The handler is passed six values, of which the third, the reference, points to the scalar variable on which the attribute is being defined. The action of the handler is to assign the current time to the dereferenced variable. As a result, when we print the variable out, we find it already has a value of the current time (in seconds since January 1, 1970).
Observant readers will notice that the declaration of the handler subroutine is itself implemented using an attribute called ATTR
. The data value associated with it is SCALAR
, which tells the ATTR
handler—defined in Attribute::Handlers
—how to set up the FETCH_
and MODIFY_
subroutines to call the Now
subroutine.
The other parameters are as follows:
$pkg
: The name of the package. In this case, it is Time
, but it could also be the name of a package implementing a subclass.$sym
: For package declarations, the symbolic name of the variable or subroutine being defined, qualified by its package. In the case of a lexical variable like the one shown previously, there is no symbol and so no name, so the string LEXICAL
is passed.$attr
: The name of the attribute, here Now
.$data
: The data passed with the attribute, if any. For ATTR
it was SCALAR
; for our Now
attribute, we didn't pass any.$when
: The phase of execution—BEGIN, INIT, CHECK
, or END
.By default a handler is executed during the check phase transition of the interpreter, which is to say Perl compiles it as a CHECK
block (see earlier in the chapter for more on what a CHECK
block is). We can create handlers that execute at any of the four transition points BEGIN, CHECK, INIT
, or END
, all of them, or a selection. The following example defines a handler in the UNIVERSAL
package that executes at BEGIN, INIT
, and CHECK
. It records the total startup time of all BEGIN
blocks (including use
statements) that are declared after it, everything that occurs in the CHECK
phase transition, and any INIT
handlers that were declared before it. For variety, it also defines an attribute for hash variables:
#!/usr/bin/perl
use strict;
use warnings;
use Attribute::Handlers;
{
package UNIVERSAL;
use Time::HiRes qw(gettimeofday);
# calculate the startup time
sub Startup : ATTR(HASH,BEGIN,INIT,CHECK) {
my ($pkg,$sym,$ref,$attr,$data,$when)=@_;
if ($when eq 'BEGIN') {
# at begin, store current time
my ($secs,$usecs)=gettimeofday();
%$ref=( secs => $secs, usecs => $usecs );
print "Startup BEGIN...
";
} elsif ($when eq 'INIT') {
# at init, calculate time elapsed
my ($secs,$usecs)=gettimeofday();
$ref->{secs} = $secs - $ref->{secs};
$ref->{usecs} = $usecs - $ref->{usecs};
if ($ref->{usecs} < 0) {
$ref->{usecs} += 1_000_000;
$ref->{secs} -= 1;
}
print "Startup INIT...
";
} else {
# we could do something time-consuming here
print "Startup CHECK...
";
}
}
}
our %time : Startup;
BEGIN { print "Beginning...
"; sleep 1 }; #happens after Startup BEGIN
CHECK { print "Checking...
"; sleep 1 }; #between Startup BEGIN and INIT
INIT { print "Initialising...
"; sleep 1 }; #happens after Startup INIT
print "BEGIN+CHECK took ",$time{secs}*1_000_000+$time{usecs},"uS
";
Why is this handler declared in the UNIVERSAL
package? In this case, mainly because typing a variable (by prefixing it with the name of a package) is an object-oriented mechanism that only works on scalar variables. It works fine for our first example because it is a SCALAR
attribute, but this is a handler for hash variables.
Declaring a handler in UNIVERSAL
has the useful property of making it available to any and all hashes, anywhere. However, it also allows for the possibility of collisions between different modules. Unfortunately, a colon is not a legal character in an attribute name, so we can't create a handler in the package Time
and then declare constructs with it, unless we do so in a package that subclasses from the Time
package via the @ISA
array or use base
pragma.
The preceding handler does not implement a clause for the END
phase transition. This might seem like a useful thing to do—after all, we could time the running time of the program that way. But this won't work, because the hash is a lexically scoped variable. Even though it is declared with our
and so exists as a package variable, the lexical scope ends before the END
block is executed. Consequently, Attribute::Handlers
cannot bind the attribute at this phase. As a consequence, we can only usefully define END
handlers for subroutine declarations.
Attributes for different data classes can coexist peacefully, although we will need to say no warnings 'redefine'
to stop Perl complaining that we have more than one subroutine with the same name. While this true, the Attribute::Handlers
module resolves the problem because the attributes remap the subroutine calls into autogenerated FETCH_
and MODIFY_
subroutines. However, we cannot declare more than one attribute handler for the same type of data but at different phases:
use warnings;
no warnings 'redefine';
sub MyAttr : ATTR(SCALAR,BEGIN,INIT) {...} # first attribute handler is defined
sub MyAttr : ATTR(HASH,BEGIN,INIT {...} # Redefine 'Now', different data type, OK
sub MyAttr : ATTR(HASH,CHECK) {...} # ERROR: same data type again
Without qualification or with the special data type ANY
, a handler will be called for all variables and code references. The ANY
label allows the phase transitions to be specified, otherwise it is no different from the unqualified version. These handlers will execute for any variable or subroutine for which the attribute is declared:
sub MyAttr : ATTR {...}
sub MyAttr : ATTR(ANY) {...}
sub MyAttr : ATTR(ANY,BEGIN,INIT) {...}
The data passed to a handler is natively presented as a string containing the whole text between the opening and closing parentheses; it is not treated as normal Perl syntax. However, Attributes::Handlers
makes some attempt to parse the string if it looks like it might be defining something other than a string. A comma-separated list is not treated specially, but an opening square or curly brace is, if it is matched at the end. This example illustrates several valid ways to pass data arguments that will be parsed into corresponding data structures:
#!/usr/bin/perl
# attrhandler3.pl
use strict;
use warnings;
use Attribute::Handlers;
{
package MyPackage;
sub Set : ATTR(SCALAR) {
my ($pkg,$sym,$ref,$attr,$data,$when)=@_;
$$ref=$data;
}
}
my MyPackage $list : Set(a,b,c);
print "@$list
"; # prodices 'a b c'
my MyPackage $aref : Set([a,b,c]);
print "@$aref
"; # produces 'ARRAY(0xNNNNNN)'
my MyPackage $string : Set('a,b,c'),
print "$string
"; # produces 'a,b,c'
my MyPackage $href : Set({a=>1,b=>2,c=>3});
print map {
"$_ => $href->{$_}
"
} keys %$href; # produces 'a => 1' ...
my MyPackage $qwaref : Set(qw[a b c]);
print "@$qwaref
"; # produces 'a b c'
Handlers also allow ways to make otherwise complex syntax simpler by encapsulating it, for example, the tie mechanism. The following example wraps an interface to tie an arbitrary DBM database with any of the standard DBM implementations inside an attribute handler that hides away the details and awkward syntax of the tie and replaces it with an intuitive attribute instead:
#!/usr/bin/perl
use strict;
use warnings;
use Attribute::Handlers;
{
package UNIVERSAL;
use Fcntl qw(O_RDWR O_CREAT);
sub Database : ATTR(HASH) {
my ($pkg,$sym,$ref,$attr,$data)=@_;
my ($file,$type,$mode,$perm);
if (my $reftype=ref $data) {
die "Data reference not an ARRAY"
unless $reftype eq 'ARRAY';
$file = shift @$data;
$type = shift(@$data) || 'SDBM_File';
$mode = shift(@$data) || O_RDWR|O_CREAT;
$perm = shift(@$data) || 0666;
} else {
$file = $data;
($type,$mode,$perm)=('SDBM_File',O_RDWR|O_CREAT,0666);
}
eval "require ${type}" or
die "${type} not found";
tie %$ref, $type, $file, $mode, $perm;
}
}
my %sdbm : Database(mysdbmfile);
$sdbm{key} = 'value';
my %gdbm : Database('mygdbmfile.dbm',GDBM_File);
$gdbm{key} = 'value';
Since we can be passed either a single string (the database file name) or an array reference (file name plus mode plus permissions), the handler needs to check what data type the data parameter actually is. Either way, defaults are filled in if not specified. Other than this, there is not much in the way of real complexity here. Note the quotes on 'mygdbmfile.dbm'
, though—these are needed because without them the dot will be parsed as a string concatenation and silently disappear from the resulting file name.
If we just want to create a quick and dirty mapping to a tieable module, then we can create handlers automatically with the autotie
and autotieref
keywords, both of which allow us to construct one or more handlers by simply associating handler names with the module to be tied in a hash reference passed as an argument to the use
statement of the Attribute::Handlers
module:
#!/usr/bin/perl
# attrhandlerautotie.pl
use strict;
use warnings;
use Attribute::Handlers autotie => {Database => 'MLDBM'};
use Fcntl qw(O_RDWR O_CREAT);
my %dbm : Database(mydbmfile,O_RDWR|O_CREAT,0666);
$dbm{key} = 'value';
Here we use the MLDBM
module to automatically use the most appropriate underlying DBM implementation (see perldoc MLDBM
for how the selection is made). We lose the ability to supply helpful defaults, but we need to write no code at all to implement the handler.
The autotieref
keyword works identically to autotie
, but it passes the attribute's data arguments to the internally generate tie statement as an array reference rather than as a list of arguments. This is purely to satisfy those modules that actually require an array reference instead of a list; use whichever is appropriate to the circumstance.
An installable module is one that we can bundle up, take somewhere else, and then install by unpacking it and executing an installation script. If we want to make our scripts and modules easily portable between systems, it is far better to automate the process of installation than manually copy files into a library directory. In addition, if we want to distribute the module more widely or upload it to CPAN for the enjoyment of all, we need to make sure that the module is well behaved and has all the right pieces in all the right places. Fortunately, the h2xs
utility supplied with Perl automates a great deal of this process for us, allowing us to concentrate on the actual code. (CPAN also has several modules that aim to expand on the features and ease-of-use of h2xs
that may be worth investigating, for example, Module::Starter
.)
The h2xs
utility is technically designed for creating Perl interfaces to C or C++ libraries, but it is perfectly capable of setting up the basic infrastructure for a pure Perl module as well—we just don't avail ourselves of its more advanced features.
Note that an installable module doesn't have to just be one file. Typically a module distribution contains the main module plus a number of other supporting modules, which may or may not be directly usable themselves. The whole ensemble is "the module," whereas the one we actually load into our applications is the primary interface.
When we are writing modules for our own personal use, we can be fairly lax about how they are structured; a package
declaration at the top and a 1
at the bottom are all we really need. However, a well-written and well-behaved module for general consumption should have some essential attributes:
$VERSION
or in a VERSION
subroutine that returns a version number.strict
pragma. It must be said that there are several examples of modules in the Perl standard library that do not adhere to these standards. Mostly these are tried and tested modules from early in the development of the standard library that are known to work. For new modules, strict
mode is a good idea.NAME
: The package name and brief descriptionSYNOPSIS
: Code example of how the module is usedDESCRIPTION
: A description of what the module doesEXPORT
: What the module exportsSEE ALSO
: Any related modules or Perl documentationHISTORY
: Optionally, a history of changesTools are written to look for these sections, such as the podselect
utility and the translators that are based on it. These can use properly constructed documentation to extract information intelligently and selectively. Additional optional sections include BUGS, CHANGES, AUTHOR, COPYRIGHT
, and SUPPORTED PLATFORMS
.
Remembering to do all this can be irritating, which is why h2xs
can create a skeleton module with all of the preceding already defined and in place for us. All we have to do is replace the content of each section with something more meaningful.
The first and main step to use h2xs
to create an installable module is to create a working directory tree where the module source code will reside. This resembles a local library directory (and indeed we can use the module directly if we add it to @INC
via Perl's -I
option or the use lib
pragma). In its most basic usage, h2xs
creates a directory tree based on the package name we give it and creates an initial module file with all the basic attributes in place. We use -n
to name both module and directory structure and -X
to tell h2xs
not to bother trying to wrap any C or C++ code. For example:
> h2xs -X -n Installable::Module
This creates a directory Installable-Module
inside, which are the files and directories listed in Table 10-1.
Table 10-1. Initial Contents of a Distributable Module Directory
The actual donkeywork of creating the makefile is done by a collection of modules in the ExtUtils
family, the principal one being ExtUtils::MakeMaker
. A single call to this module actually takes up the bulk of the Makefile.PL
script:
use 5.008005;
use ExtUtils::MakeMaker;
# See lib/ExtUtils/MakeMaker.pm for details of how to influence
# the contents of the Makefile that is written.
WriteMakefile(
NAME => 'Installable::Module',
VERSION_FROM => 'lib/Installable/Module.pm', # finds $VERSION
PREREQ_PM => {}, # e.g., Module::Name => 1.1
($] >= 5.005 ? ## Add these new keywords supported since 5.005
(ABSTRACT_FROM => 'lib/Installable/Module.pm', # retrieve from module
AUTHOR => 'You <your@email>') : ()),
);
This is a newer Makefile.PL
; older versions of h2xs
create a slightly different directory structure and a Makefile.PL
without the trailing definitions. However, the format and use is broadly similar.
Notice the use
statement at the start of this file. It requires a Perl version of at least 5.8.5, but only because in this case it was the version of the interpreter that h2xs
used. If we know our module doesn't need such a current version, we can override it with the -b
option to h2xs
. For example, for Perl version 5.005, or 5.5.0 in the new version numbering format, we would use
> h2xs -X -n Installable::Module -b 5.5.0
Sometimes we already have a module, and we just want to convert it into an installable one. The best option here is to create a new module source file and then copy the existing source code from the old module file into it. This way we get the extra files correctly generated by h2xs
for us, each in its proper place, and each containing a valid structurally correct skeleton to aid in adapting the module to conform with the guidelines.
Either way, once we have the directory set up and the appropriate files created within it, we can create a functional and (preferably) fully documented module.
To create an installable package file from our module source, we only need to create the makefile and then use make dist
(or nmake
or dmake
on a Windows system) to create the distribution file:
> perl Makefile.PL
> make dist
If we have added other modules to our source code or additional files we want to include with the distribution, we add them to the MANIFEST
file. At the start, this file contains just the files generated by h2xs
, that is Changes, MANIFEST, Makefile.PL, Module.pm
, and test.pl
.
Assuming the make dist
executes successfully, we should end up with an archived installation file comprising the package name (with colons replaced by hyphens) and the version number. On a Unix system, our example module gets turned into Installable-Module-0.01.tar.gz
. To test it, we can invoke
> make disttest
We can now take this package to another system and install it with
> gunzip Installable-Module-0.01.tar.gz
> tar -xvf Installable-Module-0.01.tar
Once the source is unpacked, we create the makefile
and run the install
target from it.
> cd Installable-Module-0.01
> perl Makefile.PL
> make
> make test
> su
Password:
# make install
This will install files into the default installation location, which is usually the standard Perl library. We can instead opt to install the package into the site_perl
directory under Perl's main installation tree with the install_site
target:
> su
Password:
# make install_site
Alternatively, we can have install
install the module into the site_perl
directory automatically by adding a definition for INSTALLDIRS
to the key-value pair of WriteMakefile
:
use 5.005005;
use ExtUtils::MakeMaker;
# See lib/ExtUtils/MakeMaker.pm for details of how to influence
# the contents of the Makefile that is written.
WriteMakefile(
'INSTALLDIRS' => 'site',
...as before...
);
The valid values for this parameter are perl, site
, and vendor
. Of the three, site
is really the only good choice if we want to keep our own modules from entering the official Perl library. Note that we will need to have permission to actually install the file anywhere under the standard Perl library root. Once the installation is complete, we should be able to see details of it by running perldoc perllocal
.
Alternatively, to install a module into our own separate location, we can supply the LIB
or PREFIX
parameters when we create the makefile
. For example, to install modules into a master library directory lib/perl
in our home directory on a Unix system, we could type
> cd Installable-Module-0.01
> perl Makefile.PL PREFIX=˜/lib/perl
> su
Password:
# make install
The PREFIX
parameter overrides the initial part of all installation paths, allowing installation into a different location. The various installation locations for modules, manual pages, and scripts are given sensible defaults derived from this initial path. Individual paths can then be overridden specifically if necessary with the following parameters:
INST_ARCHLIB
: Architecture-dependent filesINST_LIB
: Primary module source filesINST_BIN
: Binary executablesINST_SCRIPT
: ScriptsINST_MAN1DIR
: Section 1 manual pagesINST_MAN3DIR
: Section 3 manual pagesPREFIX
is therefore ideal for installing a module into a private local directory for testing.
The LIB
parameter allows the implementation files of a module to be installed in a nonstandard place, but with accompanying files such as scripts and manual pages sent to a default location or those derived from PREFIX
. This makes the module findable by documentation queries (for example, the man
command on Unix) while allowing it to reside elsewhere.
The makefile generated by ExtUtils::MakeMaker
contains an impressively larger number of different make targets. Amongst them is the test
target, which executes the test script test.pl
generated by h2xs
. To add a test stage to our package, we only have to edit this file to add the tests we want to carry out.
Tests are carried out in the aegis of the Test::Harness
module, which we will cover in Chapter 17, but which is particularly aimed at testing installable packages. The Test::Harness
module expects a particular kind of output, which the pregenerated test.pl
satisfies with a redundant automatically succeeding test. To create a useful test, we need to replace this pregenerated script with one that actually carries out tests and produces an output that complies with what the Test::Harness
module expects to see.
Once we have a real test script that carries out genuine tests in place, we can use it by invoking the test
target, as we saw in the installation examples earlier:
> make test
By default the install
target does not include test
as a dependent target, so we do need to run it separately if we want to be sure the module works. The CPAN
module automatically carries out the test stage before the install stage, however, so when we install modules using it we don't have to remember the test stage.
Once a module has been successfully turned into a package (and preferably reinstalled, tested, and generally proven), it is potentially a candidate for CPAN. Uploading a module to CPAN allows it to be shared among other Perl programmers, commented on, improved, and made part of the library of Perl modules available to all within the Perl community.
This is just the functional stage of creating a module for general distribution, however. Packages cannot be uploaded to CPAN arbitrarily. First we need to get registered so we have an upload directory to upload things into. It also helps to discuss modules with other programmers and see what else is already available that might do a similar job. It definitely helps to choose a good package name and to discuss the choice first. Remember that Perl is a community as well as a language; for contributions to be accepted (and indeed, noticed at all), it helps to talk about them.
Information on registration and other aspects of contribution to CPAN are detailed on the Perl Authors Upload Server (PAUSE) page at http://cpan.org/modules/04pause.html (or any mirror). The module distribution list is at http://cpan.org/modules/01modules.index.html, while details of all the modules currently held by CPAN and its many mirrors is in http://cpan.org/modules/03modlist.data.gz.
In this chapter, we explored the insides of modules and packages and how to write our own modules. We saw how packages affect the symbol table and looked at a few ways to take advantage of this knowledge to examine and even manipulate package contents programmatically.
We then looked at Perl's special phase transition blocks, BEGIN, CHECK, INIT
, and END
, and how we can use them to create modules that can initialize themselves and carry out various kinds of checks between phases of the interpreter's operation.
The next main topic discussed was the autoloading mechanism, which allows us to intercept calls to subroutines that do not exist and define them on the fly if we want to. From there we looked at importing and exporting, completing the discussion started in the previous chapter from the viewpoint of the module being imported from. We looked at the basics of the import mechanism, how we can use it to do other things than importing, and how to use the Exporter
module to handle many common import and export requirements.
We also looked at package attributes and implementing our own attributes for subroutines and variables with the Attribute::Handlers
module. Like the import/export mechanism, this completes the previous discussion started in Chapter 7, where we introduced using Perl's built-in attributes from the perspective of the implementing module.
Finally, we went through the process of creating an installable module, including the use of h2xs
to create the initial working directory and files, bundling the completed module into a distributable archive, and then installing the archive on another platform.