CHAPTER 10

Inside Modules and Packages

We have already seen how modules work from the user's perspective through the do, require, and use statements. We have also seen the relationship between modules and packages. In this chapter, we examine the internals of implementing modules.

In order for modules to be easily reusable, they need to be well behaved. That means not defining variables and subroutines outside their own package unless explicitly asked to do so. It also means not allowing external definitions to be made unless the design of the module permits it. Exporting definitions from one package into another allows them to be used without prefixing the name of the original package, but it also runs the risk of a namespace collision, so both the module and the application need to be able to cooperate, to control what happens. They can do this through the import mechanism, which defines the interface between the module and the application that uses it.

At the application end, we specify our requirements with the use or require statements, with which we can pass a list of symbols (often, but not necessarily, subroutine names). Conversely, at the module end, we define an import subroutine to control how we respond to import (or, from our point of view, export) requests. The Exporter module provides one such import subroutine that handles most common cases for us. Either way, the interface defined through the import mechanism abstracts the actual module code, making it easier to reuse the module, and minimizing the chances of an application breaking if we made changes to the module.

Perl provides a number of special blocks that can be used in any Perl code but which are particularly useful for packages, and accordingly we spend some time discussing them in this chapter. The BEGIN, END, INIT, and CHECK blocks allow a module to define initialization and cleanup code to be automatically executed at key points during the lifetime of the module. The AUTOLOAD subroutine permits a package to react to unknown subroutine calls and stand in for them.

Modules and Packages

A package declaration is the naming of a new namespace in which further declarations of variables and subroutines are placed. A module is simply a library file that contains such a declaration and an associated collection of subroutines and variables. The link between package name and module file would therefore appear to be a strong one, but this is not necessarily true. As we saw in the last chapter, the module that is loaded corresponds to the named package, but this does not imply that the module actually defines anything in that package. As an example, many of Perl's pragmatic modules are purely concerned with compile-time semantics and do not contribute anything new to the symbol table.

In fact, a module doesn't have to include a package declaration at all. Any constructs it creates will simply be put into whatever package was in effect at the time it was loaded—main, if none has been declared yet. However, this is unusual. By including a package definition we are able to use many different modules without worrying about clashes between similarly named definitions. The notable exception are libraries that include several modules from one master module; the master defines the namespace and then loads all its children, none of which define a namespace and so place their definitions into the parent's namespace.

Similarly, the package declarations in a module don't have to correspond to the name of the module supplied to the use statement—they just usually do. In some cases, we might use a different package or define symbols in more than one package at the same time. Since a module file usually only contains declarations of subroutines and variables, rather than code that actually does something, executing it has no visible effect. However, subroutines and package variables are added to the symbol table, usually under the namespace defined by the package declaration.

Whatever else it does, in order for a module to be loaded successfully by either require or use, it must return a true value. Unless the module actually contains a return statement outside a subroutine definition, this must be the last statement in the file. Since a typical module contains mainly subroutine definitions (which don't return anything), we usually need to add an explicit return value to let Perl know that the module is happy. We can also add code for initialization that does do something actively and have that return a conditional value. This means we can, for example, programmatically have a module fail compilation if, say, an essential resource like a configuration file that it needs is missing.

Most modules do not have any initialization code to return a value, so in general we satisfy Perl by appending a 1—or in fact any true value—to the end of the module file. As the last statement in the file, this is returned to the use or require that triggered the loading of the module file, which tests the value for truth to determine whether or not the module loaded successfully. Taking all this together, the general form of a module file is simply

package My::Module;

... use other modules ...
... declare global variables ...
... define subroutines ...

1;

Note that here "global variables" can mean either lexically scoped file globals (which are global to the file but not accessible from outside it) or package variables that are in the namespace of the package but accessible from elsewhere by qualifying their name with a prefix of the package name.

Although Perl does not force the file name to follow the package name, this module would most likely be called Module.pm and placed in a directory called My, which in turn can be located anywhere that Perl is told to look for modules. This can be any of the paths in @INC, in our own personal location provided to Perl at run time through the use lib pragma, the -I option, or the PERL5LIB environment variable, or even added by other Perl modules.

Manipulating Packages

The package directive changes the default namespace for variables and subroutine declarations, but we are still free to define our own fully qualified definitions if we choose. For instance, rather than creating a module file containing

package My::Module;

sub mysub {
    return "Eep! ";
}

1;

we could, with equal effect (but losing some maintainability), declare the subroutine to be in the package explicitly:

sub My::Module::mysub {
    return "Eep! ";
}

1;

It isn't very likely that we would do this in reality—if the subroutine was copied to a different source file, it would need to be renamed. It has possibilities if we are generating subroutines on the fly, a subject we will cover in more detail when we discuss autoloading, but otherwise a package declaration is far more convenient. The same goes for our and use vars declarations, which are simply shorthand that use the package declaration to omit the full variable name.

Finding Package Names Programmatically

It can be occasionally useful for a subroutine to know the name of the package in which it is defined. Since this is a compile-time issue (package declarations are lexical, even though they affect run time scope), we could manually copy the package name from the top of the module or whichever internal package declaration the subroutine falls under.

However, this is prone to failure if the package name changes at any point. This is a more serious problem than it might at first appear because it will not necessarily lead to a syntax error.

To avoid this kind of problem, we should avoid ever naming the package explicitly except in the package declaration itself. Within the code, we can instead use the special bareword token __PACKAGE__ like so:

sub self_aware_sub {
    print "I am in the ",__PACKAGE__," package. ";
}

As a more expressive but less functional example, the following series of package declarations shows how the value produced by __PACKAGE__ changes if more than one package is present in a given file:

package My::Module;
print __PACKAGE__," ";
package My::Module::Heavy;
print __PACKAGE__," ";
package My::Module::Light;
print __PACKAGE__," ";
package A::Completely::Different::Package;
print __PACKAGE__," ";

Each time the __PACKAGE__ token is printed out, Perl expands it into the current package name, producing My::Module, then My::Module::Heavy, and so on.

When Perl loads and compiles a file containing this token, the interpreter first scans and substitutes the real package name for any instances of __PACKAGE__ it finds before proceeding to the compilation stage. This avoids any potential breakages if the package name should change.

Manipulating Package Names Programmatically

The Symbol module provides subroutines for creating and manipulating variable names with respect to packages without dealing with the package name directly, notably the gensym and qualify subroutines.

The gensym subroutine generates and returns a reference to a fully anonymous typeglob—that is, a typeglob that does not have an entry anywhere in any symbol table. We can use the anonymous typeglob as we like, for example, as a filehandle (though IO::Handle does this better in these more enlightened days, and, as a point of fact, uses gensym underneath). It takes no arguments and just returns the reference:

use Symbol;

my $globref = gensym;
open ($globref, $filename);
...

More useful is the qualify subroutine, which provides a quick and convenient way to generate fully qualified names (and therefore symbolic references) for variables from unqualified ones. It operates on strings only, and with one argument it generates a name in the current package. For example:

#!/usr/bin/perl
# symbol1.pl
use warnings;

use Symbol;

my $fqname = qualify('scalar'),
$$fqname = "Hello World ";
print $scalar;   # produces 'Hello World'

Since this is a simple script without a package declaration, the variable created here is actually called $main::scalar. If we supply a package name as a second argument to qualify, it places the variable into that package instead.

#!/usr/bin/perl
# symbol2.pl
use warnings;

use Symbol;

my $fqname = qualify('scalar','My::Module'),
$$fqname = "Hello World ";
print $My::Module::scalar;

In both cases, qualify will only modify the name of the variable passed to it if it is not already qualified. It will correctly qualify special variables and the standard filehandles like STDIN into the main package, since these variables always exist in main, wherever they are used. This makes it a safer and simpler way than trying to make sure our symbolic references are correct and in order when we are assembling them from strings.

Unfortunately, qualify is not very useful if we have strict references enabled via use strict, since these are symbolic references. Instead, we can use qualify_to_ref, which takes a symbolic name and turns it into a reference for us, using the same rules as qualify to determine the package name:

#!/usr/bin/perl
# symbol3.pl
use warnings;
use strict;

use Symbol;

my $fqref = qualify_to_ref('scalar','My::Module'),
$$fqref ="Hello World ";
print $My::Module::scalar;

All three of these examples work but produce a warning from Perl that the variable main::scalar (or My::Module::scalar) is only used once, which is true. Perl doesn't see that we defined the variable name through a reference, so it (correctly) points out that we appear to have used a variable we haven't defined. The correct thing to do would be to declare the variable so we can use it without complaint, as this modified example, complete with embedded package, illustrates:

#!/usr/bin/perl
# symbol4.pl
use warnings;
use strict;

use Symbol;

my $fqref = qualify_to_ref('scalar','My::Module'),
$$fqref ="Hello World ";
print My::Module::get_scalar();

package My::Module;

our $scalar;   # provide access to scalar defined above

sub get_scalar {
    return $scalar;
}

Removing a Package

While it is rare that we would want to remove a package during the course of a program's execution, it can be done by removing all traces of the package's namespace from the symbol table hierarchy. One reason to do this might be to free up the memory used by the package variables and subroutines of a module no longer required by an application. For example, to delete the My::Module package, we could write

my $table = *{'My::Module::'}{'HASH'};
undef %$table;
my $parent = *{'My::'}{'HASH'};
my $success = delete $parent->{'Module::'};

This is more than a little hairy, but it basically boils down to deleting the entries of the symbol table for My::Module and removing the Module namespace entry from the My namespace. We delete the hash explicitly because we store the result of the delete in a variable, and thus the symbol table too. This is because Perl cannot reuse the memory allocated by it or the references contained in it, while something still holds a reference to it. Deleting the actual table means that delete returns an empty hash on success, which is still good for a Boolean test but avoids trailing a complete and unrecycled symbol table along with it.

Fortunately, the Symbol module provides a delete_package function that does much the same thing but hides the gory details. It also allows us more freedom as to how we specify the package name (we don't need the trailing semicolons, for instance, and it works on any package). To use it, we need to import it specifically, since it is not imported by default:

use Symbol qw(delete_package);

...

print "Deleted! " if delete_package('My::Module'),

The return value from delete_package is undefined if the delete failed, or a reference is made to the (now empty) namespace.

If we wanted to create a package that we could remove programmatically, we could do so by combining delete_package with an unimport subroutine; see "Importing and Exporting" later in the chapter for an example.

BEGIN Blocks, END Blocks, and Other Animals

Perl defines four different kinds of special blocks that are executed at different points during the compile or run phases. The most useful of these is BEGIN, which allows us to compile and execute code placed in a file before the main compilation phase is entered. At the other end of the application's life, the END block is called just as the program exits. We can also define CHECK and INIT blocks, which are invoked at the end of the compilation phase and just prior to the execution phase respectively, though these are considerably rarer.

All four blocks look and behave like subroutines, only without the leading sub. Like signal handlers, they are never called directly by code but directly by the interpreter when it passes from one phase of existence to another. The distinction between the block types is simply that each is executed at a different phase transition. The precise order is

BEGIN
(compile phase)
CHECK
INIT
(run phase)
END

Before we examine each block type in more detail, here is a short program that demonstrates all four blocks in use and also shows how they relate to the main code and a __DIE__ signal handler:

#!/usr/bin/perl
# blocks.pl
use warnings;
use strict;

$SIG{__DIE__} = sub {
    print "Et tu Brute? ";
};

print "It's alive! ";
die "Sudden death! ";

BEGIN {
    print "BEGIN ";
}

END {
    print "END ";
}

INIT {
    print "INIT "
}

CHECK {
    print "CHECK "
}

When run, this program prints out


BEGIN
CHECK
INIT
It's alive!
Et tu Brute?
Sudden death!
END

Note that in Perl versions before 5.6, CHECK blocks are ignored entirely, so we would not see the CHECK line. Apart from this, the program would run perfectly. Of course, if the CHECK block needs to perform vital functions, we may have a problem; therefore CHECK blocks are best used for checks that are better made after compilation but which can also be made, less efficiently perhaps, at run time too.

We can define multiple instances of each block; each one is executed in order, with BEGIN and INIT blocks executing in the order in which they are defined (top to bottom) and CHECK and END blocks executed in reverse order of definition (bottom to top). The logic for END and CHECK blocks executing in reverse is clearer once their purpose is understood. For example, BEGIN blocks allow modules to initialize themselves and may be potentially dependent upon the initialization of prior modules. Corresponding END blocks are executed in the reverse order to allow dependent modules to free their resources before earlier modules free the resources on which they rely—last in, first out.

As an example, consider a network connection to a remote application—we might open a connection in one BEGIN block and start a new session in another, possibly in a different module. When the application ends, we need to stop the session and then close the connection—the reverse order. The order in which the modules are loaded means the END blocks will execute in the correct order automatically. The new CHECK block has a similar symmetry with BEGIN, but around the compilation phase only, not the whole lifetime of the application. Likewise, INIT pairs with END across the run-time phase.

Additional blocks read in by do or require are simply added to the respective list at the time they are defined. Then, if we have a BEGIN and END block and we require a module that also has a BEGIN and END block, our BEGIN block is executed first, followed by the module's BEGIN block. At the end of the script, the module's END block is called first, then ours. However, if we include a module with use rather than require, the order of BEGIN blocks is determined by the order of the use relative to our BEGIN block and any other use statements. This is because use creates a BEGIN block of its own, as we have already seen.

Blocks nest too—a BEGIN inside a BEGIN will execute during the compilation phase of the outer block. A chain of use statements, one module including the next at compile time, does this implicitly, and similarly chains the END blocks (if any).

BEGIN Blocks

If we need to perform initialization within a module before it is used, we can place code inside the source file to perform whatever tasks we need to do, for example, loading a configuration file:

package My::Module;

return initialize();

sub initialize {
    ...
}

... other sub and var declarations ...

This module doesn't need a 1 at the end because its success or failure is returned explicitly. However, the initialization only takes place once the module starts to execute; we can't predefine anything before defining critical subroutines. A BEGIN block solves this problem. It forces execution of a module's initialization code before the rest of it compiles.

As an example, here is a module that computes a list of variables to export at compile time and exports them before the code that uses the module compiles. For simplicity, we have used a local hash to store the variable definitions and kept it to scalars, but it is easily extensible:

# My/SymbolExporter.pm

package My::SymbolExporter;

use strict;

BEGIN {
    use vars '@SYMBOLS';
    # temporary local configuration - we could read from a file too
    my %conf = (
        e => 'mc2',
        time => 'money',
        party => 'a good time',
    );

    sub initialize {
        no strict 'refs';
        foreach (keys %conf) {
            # define variable with typeglob
            *{__PACKAGE__.'::'.$_} = $conf{$_};

            # add variable (with leading '$') to export list
            push @SYMBOLS, "$$_";
        }
        return 1;
    }

    return undef unless initialize;
}

use Exporter;
our @ISA = qw(Exporter);
our @EXPORT = ('@SYMBOLS',@SYMBOLS);

Ordinarily, we'd use the Exporter module or an import method to deal with this sort of problem, but these are really just extensions to the basic BEGIN block. Just to prove it works, here is a script that uses this module and prints out the variables it defines:

#!/usr/bin/perl
# symbolexportertest.pl
use warnings;
use strict;

use My::SymbolExporter;

print "Defined: @SYMBOLS ";

print "e = $e ";
print "time = $time ";
print "party = '$party' ";

Another use of BEGIN blocks is to preconfigure a module before we use it. For example, the AnyDBM_File module allows us to reconfigure its @ISA array by writing something like the following:

BEGIN {
    @AnyDBM_File::ISA = qw(GDBM_File SDBM_File);
}

use AnyDBM_File;

Inside the module, the code simply checks to see if the variable is defined before supplying a default definition:

our @ISA = qw(NDBM_File DB_File GDBM_File SDBM_File ODBM_File) unless @ISA;

It is vital that we put our definition in a BEGIN block so that it is executed and takes effect before the use statement is processed. Without this, the implicit BEGIN block of the use statement would cause the module to be loaded before our definition is established despite the fact it appears first in the source.

END Blocks

The opposite of BEGIN blocks are END blocks. These are called just as Perl is about to exit (even after a __DIE__ handler) and allow a module to perform closing duties like cleaning up temporary files or shutting down network connections cleanly:

END {
    unlink $tempfile;
    shutdown $socket, 2;
}

The value that the program is going to exit with is already set in the special variable $? when the END blocks are processed, so we can modify $? to change it if we choose. However, END blocks are also not caught if we terminate on a signal and (obviously) not if we use exec to replace the application with a new one.

The CHECK and INIT Blocks

The CHECK and INIT blocks are considerably rarer than BEGIN and END, but they are still occasionally useful.

CHECK blocks execute in reverse order just after the compilation phase ends and correspond to the END blocks, which run at the end of the run phase. Their purpose is to perform any kind of checking that might be required of the compiled source before proceeding with the run phase. (However, they are not available in Perl prior to version 5.6.)

# Perl > = 5.6.0 for CHECK blocks
use 5.6.0;

# check that conditional compilation found at least one implementation
CHECK {
    die "No platform recognized" unless
    defined &Unixsub or
    defined &win32sub or
    defined &macsub or
    defined &os2sub;
}

This block will be called as soon as Perl has finished compiling all the main code (and after all BEGIN blocks have been executed), so it is the ideal point to check for the existence of required entities before progressing to the execution stage. By placing the code in a CHECK block rather than in the module's main source, we give it a chance to object before other modules—which may be used before it—get a chance to run.

The INIT blocks execute just before the run phase and just after the compile phase—CHECK blocks are also included if any are defined. They execute in order of definition and correspond to BEGIN blocks, which run just before the compile phase. Their purpose is to initialize variables and data structures before the main run phase starts:

# establish a package variable for all modules
INIT {
    $My::Module::start_time = time;
}

Both block types have little effect over simply placing code at the top of a file when only one of either type exists. However, if several modules define their own CHECK and INIT blocks, Perl will queue them up and run through them all before commencing execution of the main application code.

Autoloading

Normally when we try to call a nonexistent subroutine, Perl generates a syntax error, if possible at compile time. However, by defining a special subroutine called AUTOLOAD, we can intercept nonexistent calls and deal with them in our own way at run time.

Autoloading is a powerful aspect of Perl. When used wisely, it provides us with some very handy techniques, such as the ability to write one subroutine that handles many different cases and masquerade it as many subroutines each handling a single case. This is a great technique for allowing a module to be powerful and flexible without the expense of creating many possibly redundant routines with a corresponding cost in memory. We can also, with deft usage of the eval and sub keywords, generate new subroutines on demand.

The cost of autoloading is twofold, however: first, calling a subroutine not yet compiled will incur a speed penalty at that point, since Perl must call the AUTOLOAD subroutine to resolve the call. Second, it sidesteps the normal compile-time checks for subroutine existence, since there is no way for Perl to know if the subroutine name is valid or not until an attempt is made to call it during execution.

Several modules in the standard library take advantage of autoloading to delay the compilation of subroutines until the moment they are required. The autouse module introduced in the last chapter even provides a simple generic interface that delays loading an entire module until one of its subroutines is called. However, there is no granularity: when the module is loaded, it is all loaded at once. The AutoSplit and AutoLoader modules solve this problem. AutoSplit carves up a module file into separate subroutines, which the AutoLoader module can subsequently read and compile at the moment each routine is required. These modules are typically used during the distribution and installation of modules, since the extraction of subroutines from the original source by AutoSplit is a manual process. The SelfLoader module provides a simpler but easier solution. It allows us to store code as text inside the module file, compiling it at the time it is needed. While not as efficient as AutoLoader, which does not even load the subroutine code if it doesn't need it, it does not need any additional processing steps to work.

Autoloading Subroutines

Autoloading is automatically enabled in any package in which we define a subroutine called AUTOLOAD. This subroutine will automatically intercept all attempts to call nonexistent subroutines and will receive the arguments for each nonexistent subroutine. At the same time, the name of the missing subroutine is placed in the special package variable $AUTOLOAD. To illustrate, here is a short example that intercepts nonexistent subroutine calls and prints out the name and arguments passed:

#!/usr/bin/perl
# autoload.pl
use warnings;
use strict;

sub AUTOLOAD {
    our $AUTOLOAD;   # "use vars '$AUTOLOAD'" for Perl < 5.6
    $" = ',';
    print "You called '$AUTOLOAD(@_)' ";
}

fee('fie','foe','fum'),
testing(1,2,3);

When run, this script should produce


You called 'main::fee(fie,foe,fum)'
You called 'main::testing(1,2,3)'

We use our to declare interest in the package's $AUTOLOAD variable (Perl prior to version 5.6 needs to use use vars instead). Since only the AUTOLOAD subroutine needs to know the value of $AUTOLOAD, we place the our declaration inside the subroutine to define a temporary alias.

In general, creating an autoloader stymies compile-time checkers. But interestingly, defining a prototype for the autoloader is perfectly valid and can help eliminate subroutine calls that are simply a result of mistyping a call to a real subroutine. If all the subroutine calls we want to intercept have the same prototype, then calls whose parameters do not match the prototype will still fail at compile time, since Perl knows that the AUTOLOAD subroutine is not interested in handling them. In the preceding example, both example calls use three scalar arguments, so a prototype of ($$$) would be appropriate. Of course, a mistyped subroutine call can still match the prototype, so this does not completely save us from mistakes.

We can use AUTOLOAD subroutines in a variety of ways that break down into one of two general approaches: use the AUTOLOAD subroutine as a substitute for a collection of subroutines, or use the AUTOLOAD subroutine to define missing subroutines on the fly.

Using an AUTOLOAD Subroutine As a Substitute

The first and simplest use of the autoloader is simply to stand in for another subroutine or collection of similar subroutines. We can define the interface to a module in terms of these other calls but actually implement them in the AUTOLOAD subroutine. The disadvantage of this is that it takes Perl slightly longer to carry out the redirection to the autoloader subroutine (although conversely the compile time is faster). The advantage is that we can replace potentially hundreds of subroutine definitions with just one. This has benefits in maintainability as well as startup time.

Here is a simple example that illustrates the general technique with a few simple statistical calculations that sum, average, and find the biggest and smallest of a list of supplied numeric values:

#!/usr/bin/perl
# autostat.pl
use warnings;
use strict;

use Carp;

sub AUTOLOAD {
    our $AUTOLOAD;
my $result;
    SWITCH: foreach ($AUTOLOAD) {
        /sum/ and do {
            $result = 0;
            map { $result+= $_ } @_;
            last;
        };
        /average/ and do {
            $result = 0;
            map { $result+= $_ } @_;
            $result/=scalar(@_);
            last;
        };
        /biggest/ and do {
            $result = shift;
            map { $result = ($_ > $result)?$_:$result } @_;
            last;
        };
        /smallest/ and do {
            $result = shift;
            map { $result = ($_ < $result)?$_:$result } @_;
            last;
        }
    }
    croak "Undefined subroutine $AUTOLOAD called" unless defined $result;
    return $result;
}

my @values = (1,4,9,16,25,36);

print "Sum: ",sum(@values)," ";
print "Average: ",average(@values)," ";
print "Biggest: ",biggest(@values)," ";
print "Smallest: ",smallest(@values)," ";
print "Oddest: ",oddest(@values)," ";

This AUTOLOAD subroutine supports four different statistical operations and masquerades under four different names. If we call any of these names, then the autoloader performs the requested calculation and returns the result. If we call any other name, it croaks and exits. We use croak from the Carp module, because we want to return an error for the place from which the AUTOLOAD subroutine was called, as that is where the error really is.

This script also illustrates the problem with autoloading—errors in subroutine names are not caught until run time. With real subroutines, the call to oddest would be caught at compile time. With this script, it isn't caught until the autoloader is actually called and discovers that it isn't a name that it recognizes.

The preceding example demonstrates the general principle of substituting for a collection of other subroutines, but it doesn't really provide any benefit; it would be as easy to define the subroutines individually (or indeed just get them from List::Util, but that's beside the point), as the implementations are separate within the subroutine. However, we can be more creative with how we name subroutines. For example, we can use an autoloader to recognize and support the prefix print_ for each operation. Here is a modified version of the previous example that handles both the original four operations and four new variants that print out the result as well:

#!/usr/bin/perl
# printstat.pl
use warnings;
use strict;

use Carp;

sub AUTOLOAD {
    our $AUTOLOAD;

    my $subname; # get the subroutine name
    $AUTOLOAD =˜/([^:]+)$/ and $subname = $1;

    my $print; # detect the 'print_' prefix
    $subname =˜s/^print_// and $print = 1;

    my $result;
    SWITCH: foreach ($subname) {
        /^sum$/ and do {
            $result = 0;
            map { $result+= $_ } @_;
            last;
        };
        /^average$/ and do {
            $result = 0;
            map { $result+= $_ } @_;
            $result/= scalar(@_);
            last;
        };
        /^biggest$/ and do {
            $result = shift;
            map { $result = ($_>$result)?$_:$result } @_;
            last;
        };
        /^smallest$/ and do {
            $result = shift;
            map { $result = ($_<$result)?$_:$result } @_;
            last;
        };
    }
    croak "Undefined subroutine $subname called" unless defined $result;
    print ucfirst($subname),": $result " if $print;
    return $result;
}

my @values = (1,4,9,16,25,36);

print_sum(@values);
print_average(@values);
print_biggest(@values);
print_smallest(@values);

The subroutine name actually passed in the $AUTOLOAD variable contains the package prefix, main::, as well. In the previous example, we did not check from the start of the name, so this did not matter. Here we do care though, so we strip all possible package prefixes by extracting from the end of the name as much text as we can, not including a semicolon. This gives us the unqualified subroutine name.

Now we can detect and remove the print_ prefix. We take advantage of the fact that we are left with just the subroutine name to anchor the regular expressions at the start and end for a little extra efficiency—the first example worked only because we did not use anchors and none of our subroutine names contained another. If we wanted to be even more inventive, we could remove the trailing $ anchors and use a trailing suffix in the subroutine name to further adapt each function.

Defining Subroutines on the Fly

The run-time performance penalty of using the autoloader can be mitigated by having the autoloader define a new subroutine to perform the requested task, instead of handling the job itself. Any subsequent calls will now pass directly to the new subroutine and not the autoloader.

As an example, here is a simple autoloader that defines subroutines to return HTML syntax, much in the way that the CGI module can. It isn't nearly as feature-rich as that module, but it is a lot smaller too:

#!/usr/bin/perl
# autofly.pl
use warnings;
use strict;

sub AUTOLOAD {
    our $AUTOLOAD;

    my $tag;
    $AUTOLOAD =˜ /([^:]+)$/ and $tag = $1;

    SWITCH: foreach ($tag) {
        /^start_(.*)/ and do {
            eval "sub $tag { return "<$1>@_" }";
            last;
        };
        /^end_(.*)/ and do {
            eval "sub $tag { return "</$1>" }";
            last;
        };
        # note the escaping with of @_ below so it is not
        # expanded before the subroutine is defined
        eval "sub $tag { return "<$tag>@_</$tag>" }";
    }
    no strict 'refs';
    &$tag; # pass @_ directly for efficiency
}

# generate a quick HTML document
print html(
    head(title('Autoloading Demo')),
    body(ul(
        start_li('First'),
        start_li('Second'),
        start_li('Third'),
    ))
);

This autoloader supports automatic tag completion, as well as generating the start and end of tags if start_ or end_ is prefixed to the subroutine name. It works by defining a subroutine to generate the new tag, then calling it. The first time start_li is called, the autoloader generates a new subroutine called start_li, then calls it. The second time start_li is called, the subroutine already exists, so Perl calls it directly, and the autoloader is not involved.

A little deftness with interpolation is required for the subroutines to be defined correctly. We want the tag name itself interpolated, both as the subroutine name and inside the returned string, but we want interpolation of the passed arguments delayed until the subroutine is actually called. To achieve that, we put double quotes around the returned string but escape both them and @_ so that they are not interpreted when the subroutine is defined—instead they only become active when it is actually called.

Self-Defining Instead of Autoloading Subroutines

A variation on the theme of delaying the definition of subroutines and methods when they are first called is to retrieve their definition from somewhere else and compile it when they are first called. For instance, we may have a large and complex module with many features, of which we may only actually use some. In order to avoid compiling all the subroutines redundantly, we can put aside compiling them until they are called. If they are never called, we need never define them.

The essence of this approach is to define a subroutine initially as a stub only, so that the subroutine is defined in the symbol table but does not as yet implement the feature it is intended to provide. The stub does not contain much code, so it is quick to compile and does not occupy much memory. When the stub is actually called, it compiles and replaces itself with the real subroutine. Here is a short program that shows one way to do this:

#!/usr/bin/perl
# autodefine.pl
use warnings;
use strict;

sub my_subroutine {
    print "Defining sub... ";

    # uncomment next line and remove 'no warnings' for Perl < 5.6
    # local $^W = 0;
    eval 'no warnings; sub my_subroutine { print "Autodefined! "; }';

    &my_subroutine;
}

my_subroutine;  # calls autoloader
my_subroutine;  # calls defined subroutine

Running this program produces


Defining sub...
Autodefined!
Autodefined!

A variant of this approach would be to store all the subroutine definitions in a different file, or after a __DATA__ token, and read the subroutine code from there, which is the approach taken by SelfLoader. Alternatively, we can create a typeglob alias to an evaluated anonymous subroutine, with equal effect to the preceding example:

#!/usr/bin/perl
# globdefine.pl
use warnings;
use strict;
sub my_subroutine {
    print "Defining sub... ";
    no warnings;
    # remove above and add the following for Perl < 5.6
    # local $^W = 0;

   *my_subroutine = eval {
       sub {
          print "Autodefined! ";
       }
    };

    &my_subroutine;
}

my_subroutine;
my_subroutine;

In both cases we suppress the redefinition warning by switching off warnings locally with no warnings, or by locally clearing $^W. In this case, we know we want to redefine the subroutine, so we don't need Perl telling us about it.

The drawback of this approach compared to defining an AUTOLOAD subroutine is that we need to define a stub for each subroutine we want to delay compilation for. The advantage is that because a stub is present we don't lose the ability to syntax check subroutine names at compile time. This is particularly useful if we are also providing prototypes for our subroutines, since they clearly cannot be checked at compile time if they are only created at run time (unless they all have the same prototypes and we prototype the autoloader itself, as noted earlier). The contents of the subroutines are only checked at run time, however, an unavoidable compromise if we wish to avoid parsing them until they are used.

AutoLoader and SelfLoader

The Perl standard library provides three modules that implement the strategy of delayed loading of subroutines in two different ways. The autouse module we already looked at in the last chapter, as it is a mechanism for the calling rather than called module. Of the remaining two, the more complex is AutoLoader, which loads additional files containing subroutine definitions as required. For this to work, they must previously have been generated by the AutoSplit module using an AUTOLOAD subroutine. This implies that the module is split into separate pieces prior to being used, that is, an installation process is required.

The SelfLoader module operates along broadly similar lines, but it keeps all the subroutines to be loaded later inside the source file. The advantage is that we do not need to remember to use AutoSplit. Conversely, we must load all the source code into memory in an uncompiled form so that it can be compiled on demand.

Using the AutoLoader

In order to use the AutoLoader module, we need to adapt our modules to its requirements. The first and most important step is to place the subroutines we want to delay loading after an __END__ token. Anything before is compiled at compile time, anything after is compiled at run time on demand. This may require a little reorganization of the source, of course.

Once this is done, we add a use statement to include the AutoLoader module and import its AUTOLOAD subroutine, which does the work of retrieving the subroutines once they are split out. Note that importing the subroutine is important—the AutoLoader will not work without it:

use AutoLoader qw(AUTOLOAD);

(Why does AutoLoader not automatically export AUTOLOAD for us? Because we could implement our own AUTOLOAD routine to handle special cases and invoke AutoLoader's from it. This lets us control the autoloading process if we need to.)

The __END__ token causes the Perl interpreter to stop reading the file at this point, so it never sees the subroutines placed after it. To make them available again, we use the AutoSplit module to carve out the subroutines after the __END__ token into separate files placed in an auto directory relative to the module file. This often takes place in installation scripts and typically takes the form of a one-line Perl program. For example, to autosplit a module from the directory in which it is placed, use the following:

> perl -MAutoSplit -e 'autosplit qw(My/AutoModule.pm ./auto)'

This takes a module called My::AutoModule, contained in a file called AutoModule.pm, in a directory called My in the current directory, and splits it into parts inside an auto directory (which is created at the time if it doesn't already exist). Inside it we will now find the directories My/AutoModule. We in turn find within the directories an index file called autosplit.ix that describes the split-out subroutines. Along with it we find one file for each subroutine split out of the module, named for the subroutine with the suffix .al (for autoload).

Be aware that lexical my variables at file scope are not visible to autoloaded subroutines. This is obvious when we realize that the scope of the file has necessarily changed because we now have multiple files. On the other hand, variables declared with our (or use vars) will be fine, since they are package-scoped.

As an example of how AutoLoader is used, take this simple module file that implements a package called My::AutoModule:

# My/AutoModule.pm

package My::AutoModule;

use strict;
use Exporter;
use AutoLoader qw(AUTOLOAD);

our @ISA = qw(Exporter);
our @EXPORT = qw(one two three);

sub one {
    print "This is always compiled ";
}

__END__

sub two {
    print "This is sub two ";
}

sub three {
    print "This is sub three ";
}

1;

The file, which in this case is named AutoModule.pm and is contained in a directory called My to match the package name, has three subroutines. The first, one, is a regular subroutine—it is always compiled. The others, two and three, are actually just text at the end of the file—the __END__ ensures that Perl never sees them and never even reads them in. Note that the only changes from a normal module are the use AutoLoader line and the __END__ token. The trailing 1; is not actually needed any longer, but we retain it in case we ever convert the module back into an unsplit one.

When we split the file, it creates three files, autosplit.ix, two.al, and three.al, all in the auto/My/AutoModule directory. Since we specified . as the installation directory, this new directory is immediately adjacent to the original AutoModule.pm file. If we had wanted to split a module that was installed into the Perl standard library tree, we would have used a different path here, according to the position of the file we want to split.

The autosplit.ix file contains the essential information about the subroutines that have been split out:

# Index created by AutoSplit for My/AutoModule.pm
#   (file acts as timestamp)
package My::AutoModule;
sub two;
sub three;
1;

Close inspection of this file reveals that it is in fact a snippet of Perl code that predeclares two subroutines, the two that were split out, in the package My::AutoModule. When the module is used in an application, the line use AutoLoader causes the AutoLoader module to be read in and initialized for that module. This has the effect of loading this index file, and thus declaring the subroutines.

The point of this may seem obscure, since the AUTOLOAD subroutine will seek the split-out files regardless, but it allows us to declare prototypes for subroutines and have them checked at compile time. It also allows us to call subroutines without parentheses, in the list operator style. Here is a short script that calls the subroutines defined by this module:

#!/usr/bin/perl
# automoduletest.pl
use warnings;
use strict;

use lib '.';
use My::AutoModule;

one;
two;
three;

The .al files contain the subroutines that were split out. Due to varying locations, slightly different scripts used, and so on, we may have small variations in the actual contents of the .al files obtained, but the following sample provides a rough idea of what can be expected:

# NOTE: Derived from My/AutoModule.pm.
# Changes made here will be lost when autosplit again.
# See AutoSplit.pm.
package My::AutoModule;

#line 18 "My/AutoModule.pm (autosplit into ./auto/My/AutoModule/two.al)"
sub two {
    print "This is sub two ";
}

# end of My::AutoModule::two
1;

The AutoSplit module is smart enough to check that the AutoLoader module is actually used by a file before it attempts to split it. We can disable this check (if we insist), as well as determine whether old subroutine .al files are removed if they no longer exist, and check to see if the module has actually changed. To do this, we add one or more of three optional Boolean arguments to the autosplit subroutine:

> perl -MAutoSplit -e 'autosplit qw(My/AutoModule.pm ./auto), [keep],
[
check],
[changed]'

Substitute a 0 or 1 for the parameters to set or unset that argument. If any of these Boolean arguments are true, then the following actions occur:

  • keep: Deletes any .al files for subroutines that no longer exist in the module (ones that do still exist are overwritten anyway). The default is 0, so .al files are automatically preserved.
  • check: Causes the autosplit subroutine to verify that the file it is about to split actually contains a use AutoLoader directive before proceeding. The default is 1.
  • changed: Suppresses the split if the timestamp of the original file is not newer than the timestamp of the autosplit.ix file in the directory into which the split files are going to be placed. The default is 1.

For example, the explicit version of the preceding two-argument call would be

> perl -MAutoSplit -e 'autosplit "My/AutoModule.pm","./auto", 0, 1, 1'

Again, the equivalent for Windows is

> perl -MAutoSplit -e "autosplit"My/AutoModule.pm","./auto", 0, 1, 1"

We are not obliged to use the AutoLoader module's AUTOLOAD subroutine directly, but we need to use it if we want to load in split files. If we already have an AUTOLOAD subroutine and want to also use AutoLoader, we must not import the AUTOLOADER subroutine from AutoLoader but instead call it from our own AUTOLOAD subroutine:

use AutoLoader;

sub AUTOLOAD {
    ... handle our own special cases ...

    # pass up to AutoLoader
    $AutoLoader::AUTOLOAD = $AUTOLOAD;
    goto &AutoLoader::AUTOLOAD;
}

Note the goto—this is needed so that the call stack reflects the correct package names in the right place, or more specifically, doesn't include our own AUTOLOAD subroutine in the stack, which will otherwise confuse the AutoLoader module's AUTOLOAD subroutine. Of course, if we have our own AUTOLOAD subroutine, we might not need the module at all—multiple autoloading strategies in the same module or application is probably getting a little overcomplex.

Using the SelfLoader

The SelfLoader module is very similar in use to the AutoLoader module, but it avoids the need to split the module into files as a separate step. To use it, we use the SelfLoader module and place the subroutines we want to delay the loading of after a __DATA__ token. Here is a module called My::SelfModule that is modified from the My::AutoModule module given earlier to use SelfLoader instead:

# My/SelfModule.pm

package My::SelfModule;
use strict;
use Exporter;
use SelfLoader;

our @ISA = qw(Exporter);
our @EXPORT = qw(zero one two three);

sub one {
    print "This is always compiled ";
}

__DATA__

sub two {
    print "This is sub two ";
}
sub three {
    print "This is sub three ";
}

1;

This module is identical to the AutoLoader version except for the two alterations. We replace use AutoLoader qw(AUTOLOAD) with use SelfLoader and __END__ with __DATA__. If we also want to place actual data in the module file, we can do so as long as it is read before loading the SelfLoader module, that is, in a BEGIN block prior to the use SelfStubber statement.

The SelfLoader module exports its AUTOLOAD subroutine by default, however, so if we want to define our own and call SelfLoader from it, we need to specify an explicit empty list:

use SelfLoader ();

sub AUTOLOAD {
    # ... handle cases to be processed here

    # pass up to SelfLoader
    $SelfLoader::AUTOLOAD = $AUTOLOAD;
    goto &SelfLoader::AUTOLOAD;
}

To test this module, we can use a script similar to the one used for My::AutoModule, except that My::SelfModule must be used instead. We also need to add parentheses to the subroutine calls because SelfLoader does not provide declarations (as we discover if we try to run it). To solve this problem, we can make use of the Devel::SelfStubber module to generate the declaration stubs we need to add:

> perl -MDevel::SelfStubber -e 'Devel::SelfStubber->stub("My::SelfModule",".")'

And for Windows:

> perl -MDevel::SelfStubber -e "Devel::SelfStubber->stub ("My::SelfModule",".")"

This generates the following declarations for our example module, which we can add to the module to solve the problem:


sub My::SelfModule::two ;
sub My::SelfModule::three ;

We can also regenerate the entire module, stubs included, if we first set the variable $Devel::SelfStubber::JUST_STUBS = 0. This gets a little unwieldy for a command line, but it is possible. Take as an example the following command (which should all be typed on one line):

> perl -MDevel::SelfStubber -e '$Devel::SelfStubber::JUST_STUBS
  = 0; Devel::SelfStubber->stub("My::SelfModule",".")' > My/SelfModule-stubbed.pm

For Windows, because of the different quoting conventions, this becomes

> perl -MDevel::SelfStubber -e "$Devel::SelfStubber::JUST_STUBS
  = 0; Devel::SelfStubber->stub("My::SelfModule",".")" >
My/SelfModule-stubbed.pm

This generates a new module, SelfModule-stubbed.pm, which we have named differently just for safety; it is still My::SelfModule inside. If all looks well, we can move or copy SelfModule-stubbed.pm over Selfmodule.pm. Note that running this command more than once can generate extra sets of stubs, which may cause problems or at least confusion, and we may even end up with an empty file if we forget to put the __DATA__ token in. For this reason, it is not advisable to attempt to replace a file with a stubbed version in one step.

Importing and Exporting

In the previous chapter, we looked at how to import symbols from one package into our own using the use and import statements. Now we will see the other side of the fence—the perspective of the module.

The term "importing" means taking symbols from another package and adding them to our own. From the perspective of the module being imported from, it is "exporting," of course. Either way, the process consists of taking a symbol visible in the namespace of one package and making it usable without qualifying it with a namespace prefix in another. For instance, even if we can see it, we would rather not refer to a variable called

$My::Package::With::A::Long::Name::scalar

It would be much better if we could refer to this variable simply as $scalar in our own code. From Chapter 5, we know that we can do this explicitly using typeglobs to create aliases:

my *scalar =$My::Package::With::A::Long::Name::scalar;

Likewise, to create an alias for a subroutine:

my *localsub =&My::Package::With::A::Long::Name::packagesub;

This is a simple case of symbol table manipulation, and it isn't all that tricky once we understand it; refer to Chapter 8 for more detail if necessary. However, this is clumsy code. We have to create an alias for every variable or subroutine we want to import. It is also prone to problems in later life, since we are defining the interface—the directly visible symbols—between this package and our own code, in our own code. This is very bad design because the package is not in control of how it is used. At best it is a maintenance nightmare; at worst, if the package is updated, there is a high chance our code will simply break.

Good programming practice dictates that packages should have a well-defined (and documented) interface and that all dependent code should use that interface to access it. The package, not the user of the package, should dictate what the interface is. Therefore, we need a way to ask the package to create appropriate aliases for us; this is the import mechanism that the use and no declarations invoke automatically. By passing responsibility for imports to the package, it gets to decide whether or not the request is valid, and reject it if not.

The import mechanism is not all that complex, and a basic understanding of it can help with implementing more complex modules with more involved export requirements. It is also applicable to simpler import mechanisms that, rather than actually exporting symbols, allow us to configure a package using the import list as initialization data. Object-oriented modules, which rarely export symbols, commonly use the import mechanism this way. However, if our requirements are simple, we can for the most part ignore the technicalities of the import mechanism and use the Exporter module to define our interface for us. For the majority of packages, the Exporter can handle all the necessary details. If we just want to export a few subroutines, skip part of the next section of this chapter and head straight to the section titled "The Exporter Module."

The Import Mechanism

Perl's mechanism for importing symbols is simple, elegant, and shockingly ad hoc, all at the same time. In a nutshell, we call a subroutine (actually an object method) called import in the package that we want to import symbols from. It decides what to do, then returns an appropriate value.

The import stage is a secondary stage beyond actually reading and compiling a module file, so it is not handled by the require directive; instead, it is a separate explicit step. Written out explicitly, we could do it like this:

require My::Module;   # load in the module
My::Module->import;   # call the 'import' subroutine

Since this is a call to an object method, Perl allows us to invert the package and subroutine names, so we can also say

import My::Module;

This doesn't mean we have to start programming everything as objects, however. It is just a convenient use of Perl's object-oriented syntax, just as the print statement is (to the surprise of many programmers). The syntax fools many programmers into thinking that import is actually a Perl keyword, since it looks exactly like require, but in fact it is only a subroutine. This typical import statement appears to be a core Perl feature for importing symbols, but in fact all it does is call the subroutine import in the package My::Module and pass the arguments subone, subtwo, and $scalar to it:

import My::Module qw(subone subtwo $scalar);

The import subroutine is rarely invoked directly because the use directive binds up a require and a call to import inside a BEGIN block. For example, use My::Module is therefore (almost) equivalent to

BEGIN {
    require My::Module;
    import My::Module;
}

Given that use does all the work for us, are there any reasons to need to know how to do the same job explicitly? Loading modules on demand during program execution can be easily achieved by using require and importing without the BEGIN block, as in the first example. This doesn't work with use because it happens at compile time due to the implicit BEGIN, and it disregards the surrounding run-time context.

Note that the preceding import has no parentheses; any arguments passed to use therefore get automatically passed directly to the import subroutine without being copied, as covered in Chapter 9. If there is no import subroutine defined, however, the preceding will complain, whereas use will not. A more correct import statement would be

import My::Module if My::Module->can('import'),
# 'can' is a universal method (see Chapter 18)

Similarly, the no directive calls a function called unimport. The sense of no is to be the opposite of use, but this is a matter purely of convention and implementation, since the unimport subroutine is just another subroutine. In this case though, Perl will issue an error if there is no unimport method defined by the module. The no My::Module code is (roughly, with the same proviso as earlier) equivalent to

BEGIN {
   require My::Module;
   unimport My::Module;
}

It may seem strange that no incorporates a require within it, but there is no actual requirement that we use a module before we no parts of it. Having said that, the module may not work correctly if the import subroutine is not called initially. If use has already pulled in the module, the require inside no will see that the module is already in %INC, and so won't load it again. This means that in most cases no is just a way of calling unimport in the module package at compile time.

In the same way that aliasing can be done with typeglobs, removing aliases can be done by editing an entry out of the symbol table. Here is an example that does just that, using the delete_package subroutine of the Symbol module that we introduced previously:

# Uninstallable.pm
package Uninstallable;

use Symbol qw(delete_package);

our $message = "I'm here "; # package global

sub unimport {
    delete_package(__PACKAGE__);
}

1;

This module, which for the purposes of testing we shall call Uninstallable.pm (because we can uninstall it, not because we can't install it), defines one variable simply so we can tell whether or not it is present by testing for it. The next short script shows how. Note the BEGIN blocks to force the print statements to happen at the same time as use—otherwise the package would be uninstalled before the first print executes.

#!/usr/bin/perl
# uninstall.pl
use strict;

BEGIN { print "Now you see me: "; }
use Uninstallable;
BEGIN { print $Uninstallable::message; }

BEGIN { print "Now you don't! "; }
no Uninstallable;
BEGIN { print $Uninstallable::message; }

When run, presuming the module and script are both in the current directory:

> perl -I. uninstall.pl

you'll see the following output:


Now you see me: I'm here
Now you don't!

As interesting as this is, it is rare (though not impossible) that we would actually want to delete a package programmatically. Where they are implemented, most unimport subroutines simply clear flags that an import sets. Many of Perl's pragmatic modules like strict and warnings work this way, for example, and are actually very small modules in themselves.

Bypassing import

The use and no directives incorporate one extra trick: if we pass them an explicit empty parameter list, they don't call the import function at all. This means that we can suppress a module's default import if we only want to use some of its features. Take the CGI module as an example:

use CGI;                # parse environment, set up variables
use CGI qw(:standard);  # import a specific set of features
use CGI ();             # just load CGI, don't parse anything

Suppressing the default import by passing an empty list is more useful than it might seem. The CGI module in the previous examples does rather a lot more than simply importing a few symbols by default; it examines the environment and generates a default CGI object for functional programming, as well as automatically generating a number of methods. If we just want to use the CGI module's HTML generation features, we don't need all that, so we can stop the module initializing itself by explicitly passing nothing to it.

Exporting

While most modules make use of the Exporter module covered later, they are not compelled to do so. Here is a simple exporting subroutine that illustrates how a module can implement a simple import subroutine:

# default import
sub import {
    my $caller = caller(1);               # get calling package
    *{"$caller::mysub"} =&mysub;       # export 'mysub'
    *{"$caller::myscalar"} =$myscalar; # export '$myscalar'
    *{"$caller::myhash"} =\%myhash;     # export '%myhash'
}

The principal technique is that we find the caller's package by inspecting the subroutine stack with caller. It so happens that when called in a scalar context, caller returns just the package name, so caller(1) returns the package of the caller—in other words, the place from which the use was issued. Once we know this, we simply use it to define typeglobs in the calling package filled with references to the variables we want to export.

This import subroutine doesn't pay any attention to the arguments passed to it (the first one of which is the package name). It just exports three symbols explicitly. This isn't very polite, as the calling package might not need all of them, and might even have its own versions of them. Here is a more polite and more versatile import subroutine that exports only the requested subroutines, if they exist:

# export if defined
sub import {
    my $caller = caller(1);   # get the name of the calling package
    my $package = shift;      # remove leading package argument from @_
    no strict refs;           # we need symbolic references to do this

    foreach (@_) {                                   # for each request
       if (defined &{"$package::$_"}) {            # if we have it...
          *{"$caller::$_"} =&{"$package::$_"}   # ...make an alias
       } else {
         die "Unable to export $_ from $package "; # otherwise, abort
       }
    }
}

Usually, the package passed in is the package we are in anyway—we could as easily have said &{$_} as &{"$package::$_"}. However, it is good practice to use the package name in case the import method is inherited by another package—by using the passed name, our import will also serve for any packages that inherit from it (via @ISA). This is exactly how the Exporter module works, in fact.

The preceding example only works for subroutines, so it only constructs subroutine references. A more versatile version would examine (and remove if appropriate) the first character of the symbol and construct a scalar, array, hash, code, or typeglob reference accordingly. Here is an example that does that, though for brevity, we have removed the check for whether the symbol actually exists:

# export arbitrary
sub import {
    my $caller = caller(1);   # get calling package
    my $package = shift;   # remove package from arguments
    no strict refs;   # we need symbolic references for this

    foreach (@_) {
        my $prefix;
        s/^([&%$@*])// and $prefix = $1;

        $prefix eq '$' and *{"$caller::$_"} =${"$package::$_"}, last;
        $prefix eq '%' and *{"$caller::$_"} =\%{"$package::$_"}, last;
        $prefix eq '@' and *{"$caller::$_"} =@{"$package::$_"}, last;
        $prefix eq '*' and *{"$caller::$_"} =*{"$package::$_"}, last;
        *{"$caller::$_"} =&{"$package::$_"}, last;
    }
}

It is up to the import subroutine whether or not to carry out additional default imports when an explicit list is passed. In general, the answer is no, but it is usual to define a special symbol like :DEFAULT that imports all the default symbols explicitly. This allows the module user maximum flexibility in what to allow into their namespace:

sub import {
    my $package = shift;

    # if an empty import list, use defaults
    return _default_import() unless @_;

    foreach (@_) {
        /:DEFAULT/ and _default_import(), last;
        _export_if_present($package,$_);
    }
}

sub _default_import {
    # ... as above ...
}

sub _export_if_present {
    my ($package,$symbol) = @_;
    my $prefix;
$symbol = s/^([&%$@*])// and $prefix = $1;

    if ($prefix and $prefix ne '&') {
        SWITCH: foreach ($prefix) {
            m'$' and do {
                if (defined ${"$package::$_"}) {
                   *{"$caller::$_"}=${"$package::$_"};
                   return;
               }
           };
           m'@' and do {
               # ... ditto for arrays ...
           };
           m'%' and do {
               # ... ditto for hashes ...
           };
           m'*' and do {
               # ... ditto for typeglobs ...
           };
        }
    } elsif (defined &{"$package::$_"}) {
        *{"$caller::$_"}=&{"$package::$_"}
    } else {
        die "Unable to export $_ from $package ";
    }
}

The import method is not obliged to export symbols in response to being called. It can choose to do anything it likes and treat the list of names passed to it in any way it sees fit. As an indication of what else can be done with import lists, here is an import subroutine that invents generators for HTML tags by defining a subroutine for any symbol passed to it that it doesn't recognize. (The CGI module uses exactly this approach, though its HTML methods are a good deal more advanced. It is also a much bigger module than these eight lines of code.)

sub import {
    my $package = shift;

    foreach (@_) {
       # for each passed symbol, generate a tag subroutine in the
       # caller's package.
       *{"$package::$_"} = sub {
          "<$tag> ".join(" ",@_)."</$tag> ";
       };
    }
}

This is frequently a better way to handle automatic generation of subroutines than autoloading is, since it is more controlled and precise. Also we have to declare the subroutines we want to use at compile time (as use calls import then) where they can be subjected to syntax checking. Autoloading, by contrast, actually disables compile-time checks, since it is perfectly valid for a subroutine not to exist before it is called.

When to Export, When Not to Export

Having shown how to export symbols, it is worth taking a moment to consider whether we should. The point of packages is to increase reusability by restraining the visibility of variables and subroutines. We can write application code in the main package free from worry about name clashes because modules place their variables and subroutines into their own packages. Importing symbols goes against this strategy, and uncontrolled importing of lots of symbols pollutes code with unnecessary definitions that degrade maintainability and may cause unexpected bugs. In general we should take time to consider

  • What should and should not be exported by default from a module (as little as possible)
  • What should be allowed to be exported
  • What should be denied export

These steps are an essential part of defining the interface to the package, and therefore a critical element of designing reusable code.

Object-oriented modules should usually not export anything at all; the entire point of object orientation is to work through the objects themselves, not to bypass them by importing parts of the module class into our own code. Additionally, exporting symbols directly bypasses the inheritance mechanism, which makes code that uses the exported symbols hard to reuse and likely to break. There are a few rare cases where modules provide both functional and object-oriented interfaces, but only in the simplest modules that are not intended to be inherited from is this a viable strategy.

In summary, the export list of a module is far more than just a list of symbols that will/may be imported into another package; it is the functional interface to the module's features, and as such should be designed, not gradually expanded. The Exporter module helps with this by allowing us to define lists of conditionally exported symbols.

The Exporter Module

The Exporter module provides a generic import subroutine that modules can configure to their own taste. It handles almost all possible issues that a traditional exporting module needs to consider, and for many modules it is all they need.

Using the Exporter

To use the Exporter, a module needs to do three things: use Exporter, inherit from it, and define the symbols eligible for export. Here is a very short module that demonstrates the basic technique, using fully qualified names for the package variables @ISA and @EXPORT:

# My/Module.pm
package My::Module;

use Exporter;

# inherit from it
@My::Module::ISA = qw(Exporter);

# define export symbols
@My::Module::EXPORT = qw(greet_planet);

sub greet_planet {
    return "Hello World ";
}

Here we have an @ISA array that tells the interpreter that the module is a subclass of Exporter and to refer to it for any methods the module does not provide. Specifically that means import and unimport, of course. We don't need to worry too much about the object-oriented nature of inheriting from Exporter, unless we want to define our own import subroutine and still make use of the one provided by Exporter (we will get to that in a moment).

The @EXPORT array defines the actual symbols we want to export. When import is invoked for our module, the call is relayed up to the Exporter module, which provides the generic import method. It in turn examines the definition of @EXPORT in our module, @My::Module::EXPORT and satisfies or denies the requested import list accordingly.

To illustrate, here's a short script that uses the preceding module, assuming it is in a file named Module.pm in a directory called My in the same directory as the script:

#!/usr/bin/perl
# import.pl
use warnings;
use strict;

use lib '.'; #look in current directory for My/Module.pm
use My::Module;

print greet_planet;

Importing from the Exporter

One advantage of the Exporter module is that the import method it provides is well developed and handles many different situations for us. Even if we decide to provide our own import subroutine, we may want to use Exporter too, just for the richness of the features it provides (and if we don't, we probably ought to document it). For example, it accepts regular expressions as well as literal symbol names, which means that we can define a collection of symbols with similar prefixes and then allow them to be imported together rather than individually. Here is how we can import a collection of symbols all starting with prefix_ from a module that uses the Exporter module:

use My::Module qw(/^prefix_/);

The Exporter also understands negations, so we can import all symbols that do not match a given name or regular expression:

# import everything except the subroutine 'greet_planet'
use My::Module qw(!greet_planet);

# import anything not beginning with 'prefix_'
use My::Module qw(!/^prefix_/);

We can also collect symbols together into groups and then import the groups by prefixing the group name with a colon. Again, this isn't a core Perl feature, it is just something that the Exporter module's import method does. For example:

use My::Module qw(:mygroup);

We'll see how to actually define a group in a moment.

Default and Conditional Exports

The @EXPORT variable defines a list of default exports that will be imported into our code if we use the module with no arguments, that is:

use My::Module;

But not either of these:

use My::Module ();
use My::Module qw(symbola symbolb symbolc);

If we give an explicit list of symbols to import, even if it is an empty list, Exporter will export only those symbols. Otherwise, we get the default, which is entirely up to the module (and hopefully documented).

Since exporting symbols automatically is not actually all that desirable (the application didn't ask for them, so we shouldn't spray it with symbols), Exporter also allows us to define conditional exports in the @EXPORT_OK array. Any symbol in this array may be exported if named explicitly, but it will not be exported by default.

In My::Module (Module.pm):

# change sub to be exported only on request
@EXPORT_OK = qw(greet_planet);

In application (import.pl):

# now we must import the sub explicitly
use My::Module qw(greet_planet);

The contents of the @EXPORT array are also checked when an explicit list is given, so any name or regular expression passed to import will be imported if it matches a name in either the @EXPORT or @EXPORT_OK list. However, any explicit list suppresses the exporting of the default list—which is the point, of course.

We can ask for the default symbols explicitly by using the special export tag :DEFAULT. The advantage is that we augment it with additional explicit requests. For example, this statement imports all the default symbols and additionally imports two more (presumably on the @EXPORT_OK list):

use My::Module qw(:DEFAULT symbola symbolb);

Alternatively, we can import the default list but skip over selected symbols:

use My::Module qw(:DEFAULT !symbola !symbolb);

Since this is a common case, we can also omit the :DEFAULT tag and simply put

use My::Module qw(!symbola !symbolb);

In fact, this is the same as the example of negation we gave earlier; in effect, an implicit :DEFAULT is placed at the front of the list if the first item in the list is negated.

As a working example of the different ways that import lists can be defined, here is a short demonstration module, called TestExport.pm, and a test script that we can use to import symbols from it in different ways. First the module, which exports two subroutines by default and two if asked:

# TestExport.pm

package TestExport;

use strict;
use Exporter;

our @ISA = qw(Exporter);
our @EXPORT = qw(sym1 sym2);
our @EXPORT_OK = qw(sym3 sym4);

sub sym1 {print "sym1 ";}
sub sym2 {print "sym2 ";}
sub sym3 {print "sym3 ";}
sub sym4 {print "sym4 ";}

1;

The following script contains a number of different use statements that import different symbols from the module, depending on their argument. To use it, uncomment one (and only one) use statement, and the script will print out the subroutines that were imported as a result. It also demonstrates a simple way of scanning the symbol table, as well as the use of %INC to check for a loaded module.

#!/usr/bin/perl
# testexport.pl
use warnings;
use strict;

# :DEFAULT import
#use TestExport;

# no imports
#use TestExport();

# just 'sym1'
#use TestExport qw(sym1);

# everything but 'sym1'
#use TestExport qw(!sym1);

# just 'sym3'
#use TestExport qw(sym3);

# everything but 'sym3'
#use TestExport qw(!sym3);

# implicit :DEFAULT
#use TestExport qw(!sym1 sym3);

# no implicit :DEFAULT
#use TestExport qw(sym3 !sym1);

unless (exists $INC{'TestExport.pm'}) {
    die "Uncomment a 'use' to see its effect ";
}

foreach (keys %::) {
    print "Imported: $_ " if /^sym/;
}

Note that in these examples we have concentrated on subroutines, since these are the symbols we most commonly export, though we are equally free to export variables too.

Export Lists

In addition to adding symbol names to @EXPORT and @EXPORT_OK, we can define collections of symbols as values in the hash variable %EXPORT_TAGS. The key is a tag name that refers to the collection. For example:

our (@EXPORT @EXPORT_OK %EXPORT_TAGS);

$EXPORT_TAGS{'subs'} = [qw(mysub myothersub subthree yellowsub)];
$EXPORT_TAGS{'vars'} = [qw($scalar @array %hash)];

Or, more succinctly:

our %EXPORT_TAGS = (
    subs => [qw(mysub myothersub subthree yellowsub)],
    vars => [qw($scalar @array %hash)],
);

Note that in accordance with the principles of nested data structures, we need to assign an array reference to each tag name key—otherwise we just count the list.

However, defining a list and assigning it to a tag does not automatically add the names in the list to either @EXPORT or @EXPORT_OK; in order for the tag to be imported successfully, the names have to be in one or other of the arrays too. Fortunately, Exporter makes this simple for us by providing a pair of subroutines to add the symbols associated with a tag to either list automatically. To add a tag to the default export list:

Exporter::export_tags('subs'),

To add a tag to the conditional export list:

Exporter::export_ok_tags('vars'),

We can now import various permutations of tags and symbol names:

# import two tags
use My::Module qw(:subs :vars);

# import the default list excepting the items in ':subs'
use My::Module qw(:DEFAULT !:subs);

# import ':subs' excepting the subroutine 'myothersub'
use My::Module qw(:subs !myothersub);

To show tags in action, here is a modified example of the TestExport module we gave earlier, rewritten to use tags instead. We define the default and on-request export lists using the export_tags and export_ok_tags subroutines:

# TestTagExport.pm

package TestTagExport;

use strict;
use Exporter;

our @ISA = qw(Exporter);
our %EXPORT_TAGS = (
   onetwo => ['sym1','sym2'],
   threefour => ['sym3','sym4'],
   onetwothree => [qw(sym1 sym2 sym3)],
   all => [qw(sym1 sym2 sym3 sym4)],
);

Exporter::export_tags('onetwo'),
Exporter::export_ok_tags('threefour'),

sub sym1 {print "sym1 ";}
sub sym2 {print "sym2 ";}
sub sym3 {print "sym3 ";}
sub sym4 {print "sym4 ";}

1;

Here is a script that tests out the export properties of the new module, concentrating on tags rather than symbols, though all the tests that applied to the first module will work with the same effect with this example:

#!/usr/bin/perl
# testtagexport.pl
use warnings;
use strict;

# import tag
#use TestTagExport;

# import symbol plus tag
#use TestTagExport qw(:threefour sym2);

# import tag minus symbol
#use TestTagExport qw(:onetwothree !sym2);

# import one tag minus another
#use TestTagExport qw(:onetwothree !:DEFAULT);

unless (exists $INC{'TestTagExport.pm'}) {
    die "Uncomment a 'use' to see its effect ";
}

foreach (keys %::) {
    print "Imported: $_ " if /^sym/;
}

Versions

The use and require directives support a version number syntax in addition to their regular use in module loading. The Exporter module also allows us to handle this usage by defining a require_version method that is passed the package name (because it is a method) and the version number requested:

our $VERSION = "1.23";

# this subroutine name has special meaning to Exporter
sub require_version {
    my ($pkg,$requested_version) = @_;
    return $requested_version ge $VERSION;
}

If we do not supply a require_version method, then a default definition provided by Exporter is used instead; this also tests the requested version against the value of $VERSION defined in the local package (if any is defined), but it uses a numeric comparison, which works well for comparing version number objects/strings (see Chapter 3).

Handling Failed Exports

The Exporter module automatically causes an application to die if it attempts to import a symbol that is not legal. However, by defining another array, @EXPORT_FAIL, we can define a list of symbols to handle specially in the event that Exporter does not recognize them. For example, to handle cross-platform special cases, we might define three different subroutines:

our (@EXPORT_FAIL);

@EXPORT_FAIL = qw(win32sub macsub Unixsub);

In order to handle symbols named in the failure list, we need to define a subroutine, or rather a method, called export_fail. The input to this method is a list of the symbols that the Exporter did not recognize, and the return value should be any symbols that the module was unable to process:

sub export_fail {
    my $pkg = shift;

    my @fails;
    foreach (@_) {
       # test each symbol to see if we want to define it
       push @fails,$_ if supported($_);
   }

   # return list of failed exports (none if success)
   return @fails;
}

sub supported {
    my $symbol = shift;
    ... test for special cases ...
    return $ok_on_this_platform;
}

If an export_fail method isn't defined, then Exporter supplies its own, which returns all the symbols, causing them all to fail as if the @EXPORT_FAIL array was not defined at all. Note that we cannot have Exporter call export_fail for any unrecognized symbol, only those listed in the @EXPORT_FAIL array. However, if we wanted to handle situations like this ourselves, we can always define our own import method, which we discuss next.

Using the Exporter with a Local import Method

If a module needs to do its own initialization in addition to using Exporter, we need to define our own import method. Since this will override the import method defined by Exporter, we will need to take steps to call it explicitly. Fortunately, the Exporter module has been written with this in mind.

Assuming we're familiar with object-oriented programming, we might guess that calling SUPER::import from our own import subroutine would do the trick, since SUPER:: is the named method in the parent package or packages. Unfortunately, although this works, it imports symbols to the wrong package, because Exporter's import method examines the package name of the caller to determine where to export symbols. Since that is the module, and not the user of the module, the export doesn't place anything in the package that issues the use statement. Instead, we use the export_to_level method, which traces back up the calling stack and supplies the correct package name to Exporter's import method. Here's how to use it:

our @ISA = qw(Exporter);
our @EXPORT_OK = qw(mysub myothersub subthree yellowsub);

sub import {
    my $package = $_[0];
    do_our_own_thing(@_);
    $package->export_to_level(1, @_);
}

The first argument to export_to_level is a call-stack index (identical to that passed to the caller function). This is used to determine the package to export symbols to, thereby allowing export_to_level to be completely package independent. Note that because the package information needs to be preserved intact, it is important that we do not remove the package name passed as the first argument, which is why we used $_[0] and not shift in the preceding example.

Debugging Exports

The Exporter module also has a special verbose mode we can use when we are debugging particularly complex import problems. To enable it, define the variable $Exporter::Verbose before using the module. Note that for this to be successful it needs to be in a BEGIN block:

BEGIN {
    $Exporter::Verbose = 1;
}

Note also that this will produce debug traces for all modules that use Exporter. Since a very large number of modules use Exporter, this may produce a lot of output. However, since BEGIN blocks (including the implicit ones in use statements) are executed in order, we can plant BEGIN blocks in between the use statements to restrain the reporting to just those modules we are interested in:

use Exporter;
use A::Module::Needed::First;

BEGIN { print "Loading... "; $Exporter::Verbose = 1;}
use My::Problematic::Exporting::Module;

BEGIN { print "...loaded ok "; $Exporter::Verbose = 0;}
use Another::Module;

Package Attributes

Package attributes are an extension to the predefined attribute mechanism provided by the attributes module and covered in Chapter 7. Perl only understands four native attributes by default (lvalue, locked, method, and unique, of which locked and method are now deprecated), but the idea of package attributes is that we can implement our own attributes that work on a package-wide basis. To implement them, we define specially named subroutines within the package. The easiest way to create these is with the Attribute::Handlers module, although the underlying mechanism is not (as is often the case in Perl) all that complex. The attribute mechanism is still somewhat experimental in Perl 5.8, so some of its more idiosyncratic properties are likely to be smoothed out over time.

Each data type may have a different set of attributes associated with it. For example, a scalar attribute is implemented by writing FETCH_SCALAR_ATTRIBUTES and MODIFY_SCALAR_ATTRIBUTES subroutines, and similarly for ARRAY, HASH, and CODE. The package may implement the details of storage and retrieval any way it likes based on the arguments passed.

FETCH_ subroutines are called by attributes::get whenever we use it on a reference of the correct type in the same package. They are passed a single argument, which is a reference to the entity being queried. They should return a list of the attributes defined for that entity.

MODIFY_ subroutines are called during the import stage of compilation. They take a package name and a reference as their first two arguments, followed by a list of the attributes to define. They return a list of unrecognized attributes, which should be empty if all the attributes could be handled.

Both FETCH_ and MODIFY_ subroutines may be accessed by corresponding routines in a package implementing a derived object class. The parent package is called with SUPER::FETCH_TYPE_ATTRIBUTES and SUPER::MODIFY_TYPE_ATTRIBUTES. The intent is that a subclass should first call its parent and then deal with any attributes returned. In the case of FETCH_, it should add its own attributes to the list provided by the parent and return it. In the case of MODIFY_, it should deal with the list of unrecognized attributes passed back from the parent. This is essentially the same mechanism that the Exporter module uses.

Attribute Handlers

With the Attribute::Handlers module, we can invent our own attributes and register handlers to be triggered when a variable or subroutine is declared with them, without any need to get involved in defining explicit FETCH_ and MODIFY_ subroutines. Here is a minimal example that shows an attribute handler in action:

#!/usr/bin/perl
use strict;
use warnings;
use Attribute::Handlers;

{
    package Time;

    sub Now : ATTR(SCALAR) {
        my ($pkg,$sym,$ref,$attr,$data,$when)=@_;
        $$ref=time;
    }
}

my Time $now : Now;
print $now; # produces the time in seconds since 1970/1/1

This creates a handler called Now in the Time package that can be applied to scalar attributes—attempting to declare this attribute on an array, hash, or subroutine will cause a syntax error. When a scalar variable is declared and typed to the Time package and then given Now as an attribute, the handler is called. Interesting though this syntax looks, Perl does not really support the typing of variables. Providing a scalar variable with a type is really just a suggestion to Perl to do something with the variable if the opportunity arises. Package attributes are one of the two features that provided a "something," the other being the compile-time checking of hash keys accesses in pseudohashes and restricted hashes. The effect of the type is only at compile time; it does not persist into the execution phase.

The handler is passed six values, of which the third, the reference, points to the scalar variable on which the attribute is being defined. The action of the handler is to assign the current time to the dereferenced variable. As a result, when we print the variable out, we find it already has a value of the current time (in seconds since January 1, 1970).

Observant readers will notice that the declaration of the handler subroutine is itself implemented using an attribute called ATTR. The data value associated with it is SCALAR, which tells the ATTR handler—defined in Attribute::Handlers—how to set up the FETCH_ and MODIFY_ subroutines to call the Now subroutine.

The other parameters are as follows:

  • $pkg: The name of the package. In this case, it is Time, but it could also be the name of a package implementing a subclass.
  • $sym: For package declarations, the symbolic name of the variable or subroutine being defined, qualified by its package. In the case of a lexical variable like the one shown previously, there is no symbol and so no name, so the string LEXICAL is passed.
  • $attr: The name of the attribute, here Now.
  • $data: The data passed with the attribute, if any. For ATTR it was SCALAR; for our Now attribute, we didn't pass any.
  • $when: The phase of execution—BEGIN, INIT, CHECK, or END.

By default a handler is executed during the check phase transition of the interpreter, which is to say Perl compiles it as a CHECK block (see earlier in the chapter for more on what a CHECK block is). We can create handlers that execute at any of the four transition points BEGIN, CHECK, INIT, or END, all of them, or a selection. The following example defines a handler in the UNIVERSAL package that executes at BEGIN, INIT, and CHECK. It records the total startup time of all BEGIN blocks (including use statements) that are declared after it, everything that occurs in the CHECK phase transition, and any INIT handlers that were declared before it. For variety, it also defines an attribute for hash variables:

#!/usr/bin/perl
use strict;
use warnings;
use Attribute::Handlers;

{
    package UNIVERSAL;
    use Time::HiRes qw(gettimeofday);

    # calculate the startup time
    sub Startup : ATTR(HASH,BEGIN,INIT,CHECK) {
        my ($pkg,$sym,$ref,$attr,$data,$when)=@_;

        if ($when eq 'BEGIN') {
            # at begin, store current time
            my ($secs,$usecs)=gettimeofday();
            %$ref=( secs => $secs, usecs => $usecs );

            print "Startup BEGIN... ";
        } elsif ($when eq 'INIT') {
            # at init, calculate time elapsed
            my ($secs,$usecs)=gettimeofday();
            $ref->{secs} = $secs - $ref->{secs};
            $ref->{usecs} = $usecs - $ref->{usecs};
            if ($ref->{usecs} < 0) {
                $ref->{usecs} += 1_000_000;
                $ref->{secs} -= 1;
            }

            print "Startup INIT... ";
        } else {
            # we could do something time-consuming here
            print "Startup CHECK... ";
        }
    }

}

our %time : Startup;

BEGIN { print "Beginning... ";    sleep 1 }; #happens after Startup BEGIN
CHECK { print "Checking... ";     sleep 1 }; #between Startup BEGIN and INIT
INIT  { print "Initialising... "; sleep 1 }; #happens after Startup INIT

print "BEGIN+CHECK took ",$time{secs}*1_000_000+$time{usecs},"uS ";

Why is this handler declared in the UNIVERSAL package? In this case, mainly because typing a variable (by prefixing it with the name of a package) is an object-oriented mechanism that only works on scalar variables. It works fine for our first example because it is a SCALAR attribute, but this is a handler for hash variables.

Declaring a handler in UNIVERSAL has the useful property of making it available to any and all hashes, anywhere. However, it also allows for the possibility of collisions between different modules. Unfortunately, a colon is not a legal character in an attribute name, so we can't create a handler in the package Time and then declare constructs with it, unless we do so in a package that subclasses from the Time package via the @ISA array or use base pragma.

The preceding handler does not implement a clause for the END phase transition. This might seem like a useful thing to do—after all, we could time the running time of the program that way. But this won't work, because the hash is a lexically scoped variable. Even though it is declared with our and so exists as a package variable, the lexical scope ends before the END block is executed. Consequently, Attribute::Handlers cannot bind the attribute at this phase. As a consequence, we can only usefully define END handlers for subroutine declarations.

Attributes for different data classes can coexist peacefully, although we will need to say no warnings 'redefine' to stop Perl complaining that we have more than one subroutine with the same name. While this true, the Attribute::Handlers module resolves the problem because the attributes remap the subroutine calls into autogenerated FETCH_ and MODIFY_ subroutines. However, we cannot declare more than one attribute handler for the same type of data but at different phases:

use warnings;
no warnings 'redefine';
sub MyAttr : ATTR(SCALAR,BEGIN,INIT) {...} # first attribute handler is defined
sub MyAttr : ATTR(HASH,BEGIN,INIT {...}    # Redefine 'Now', different data type, OK
sub MyAttr : ATTR(HASH,CHECK) {...}        # ERROR: same data type again

Without qualification or with the special data type ANY, a handler will be called for all variables and code references. The ANY label allows the phase transitions to be specified, otherwise it is no different from the unqualified version. These handlers will execute for any variable or subroutine for which the attribute is declared:

sub MyAttr : ATTR {...}
sub MyAttr : ATTR(ANY)  {...}
sub MyAttr : ATTR(ANY,BEGIN,INIT) {...}

The data passed to a handler is natively presented as a string containing the whole text between the opening and closing parentheses; it is not treated as normal Perl syntax. However, Attributes::Handlers makes some attempt to parse the string if it looks like it might be defining something other than a string. A comma-separated list is not treated specially, but an opening square or curly brace is, if it is matched at the end. This example illustrates several valid ways to pass data arguments that will be parsed into corresponding data structures:

#!/usr/bin/perl
# attrhandler3.pl
use strict;
use warnings;
use Attribute::Handlers;

{
    package MyPackage;

    sub Set : ATTR(SCALAR) {
        my ($pkg,$sym,$ref,$attr,$data,$when)=@_;
        $$ref=$data;
    }
}
my MyPackage $list : Set(a,b,c);
print "@$list ";   # prodices 'a b c'
my MyPackage $aref : Set([a,b,c]);
print "@$aref ";   # produces 'ARRAY(0xNNNNNN)'
my MyPackage $string : Set('a,b,c'),
print "$string ";  # produces 'a,b,c'
my MyPackage $href : Set({a=>1,b=>2,c=>3});
print map {
    "$_ => $href->{$_} "
} keys %$href;      # produces 'a => 1' ...
my MyPackage $qwaref : Set(qw[a b c]);
print "@$qwaref "; # produces 'a b c'

Handlers also allow ways to make otherwise complex syntax simpler by encapsulating it, for example, the tie mechanism. The following example wraps an interface to tie an arbitrary DBM database with any of the standard DBM implementations inside an attribute handler that hides away the details and awkward syntax of the tie and replaces it with an intuitive attribute instead:

#!/usr/bin/perl
use strict;
use warnings;
use Attribute::Handlers;

{
    package UNIVERSAL;
    use Fcntl qw(O_RDWR O_CREAT);

    sub Database : ATTR(HASH) {
        my ($pkg,$sym,$ref,$attr,$data)=@_;

        my ($file,$type,$mode,$perm);
        if (my $reftype=ref $data) {
            die "Data reference not an ARRAY"
                unless $reftype eq 'ARRAY';
            $file = shift @$data;
            $type = shift(@$data) || 'SDBM_File';
            $mode = shift(@$data) || O_RDWR|O_CREAT;
            $perm = shift(@$data) || 0666;
        } else {
            $file = $data;
            ($type,$mode,$perm)=('SDBM_File',O_RDWR|O_CREAT,0666);
        }

        eval "require ${type}" or
            die "${type} not found";

        tie %$ref, $type, $file, $mode, $perm;
    }
}

my %sdbm : Database(mysdbmfile);
$sdbm{key} = 'value';

my %gdbm : Database('mygdbmfile.dbm',GDBM_File);
$gdbm{key} = 'value';

Since we can be passed either a single string (the database file name) or an array reference (file name plus mode plus permissions), the handler needs to check what data type the data parameter actually is. Either way, defaults are filled in if not specified. Other than this, there is not much in the way of real complexity here. Note the quotes on 'mygdbmfile.dbm', though—these are needed because without them the dot will be parsed as a string concatenation and silently disappear from the resulting file name.

If we just want to create a quick and dirty mapping to a tieable module, then we can create handlers automatically with the autotie and autotieref keywords, both of which allow us to construct one or more handlers by simply associating handler names with the module to be tied in a hash reference passed as an argument to the use statement of the Attribute::Handlers module:

#!/usr/bin/perl
# attrhandlerautotie.pl
use strict;
use warnings;
use Attribute::Handlers autotie => {Database => 'MLDBM'};
use Fcntl qw(O_RDWR O_CREAT);

my %dbm : Database(mydbmfile,O_RDWR|O_CREAT,0666);
$dbm{key} = 'value';

Here we use the MLDBM module to automatically use the most appropriate underlying DBM implementation (see perldoc MLDBM for how the selection is made). We lose the ability to supply helpful defaults, but we need to write no code at all to implement the handler.

The autotieref keyword works identically to autotie, but it passes the attribute's data arguments to the internally generate tie statement as an array reference rather than as a list of arguments. This is purely to satisfy those modules that actually require an array reference instead of a list; use whichever is appropriate to the circumstance.

Creating Installable Modules

An installable module is one that we can bundle up, take somewhere else, and then install by unpacking it and executing an installation script. If we want to make our scripts and modules easily portable between systems, it is far better to automate the process of installation than manually copy files into a library directory. In addition, if we want to distribute the module more widely or upload it to CPAN for the enjoyment of all, we need to make sure that the module is well behaved and has all the right pieces in all the right places. Fortunately, the h2xs utility supplied with Perl automates a great deal of this process for us, allowing us to concentrate on the actual code. (CPAN also has several modules that aim to expand on the features and ease-of-use of h2xs that may be worth investigating, for example, Module::Starter.)

The h2xs utility is technically designed for creating Perl interfaces to C or C++ libraries, but it is perfectly capable of setting up the basic infrastructure for a pure Perl module as well—we just don't avail ourselves of its more advanced features.

Note that an installable module doesn't have to just be one file. Typically a module distribution contains the main module plus a number of other supporting modules, which may or may not be directly usable themselves. The whole ensemble is "the module," whereas the one we actually load into our applications is the primary interface.

Well-Written Modules

When we are writing modules for our own personal use, we can be fairly lax about how they are structured; a package declaration at the top and a 1 at the bottom are all we really need. However, a well-written and well-behaved module for general consumption should have some essential attributes:

  • Its own unique package name: In the case of modules designed for wider distribution, this should be not only chosen wisely but also checked against other modules already available from CPAN to see if it fits well with existing nomenclature. For modules destined for CPAN, consult the module distribution list at http://cpan.org/modules/01modules.index.html and the exhaustive list of uploaded modules in http://cpan.org/modules/03modlist.data.gz.
  • A version number: The main module file should have a version number defined inside it, either in the package variable $VERSION or in a VERSION subroutine that returns a version number.
  • Strict mode: No Perl code should really be without the strict pragma. It must be said that there are several examples of modules in the Perl standard library that do not adhere to these standards. Mostly these are tried and tested modules from early in the development of the standard library that are known to work. For new modules, strict mode is a good idea.
  • Documentation: All subroutine calls, exported and exportable symbols, and configuration details should be written up and distributed along with the module, preferably in the form of Plain Old Documentation (POD) within the main module file (see Chapter 18 for more on POD documents). It is not necessary for every module to be documented if some modules only support modules and are not intended to be used directly, but all salient features should be there. To be properly structured, the POD document should contain at least the following sections:
    • NAME: The package name and brief description
    • SYNOPSIS: Code example of how the module is used
    • DESCRIPTION: A description of what the module does
    • EXPORT: What the module exports
    • SEE ALSO: Any related modules or Perl documentation
    • HISTORY: Optionally, a history of changes

Tools are written to look for these sections, such as the podselect utility and the translators that are based on it. These can use properly constructed documentation to extract information intelligently and selectively. Additional optional sections include BUGS, CHANGES, AUTHOR, COPYRIGHT, and SUPPORTED PLATFORMS.

Remembering to do all this can be irritating, which is why h2xs can create a skeleton module with all of the preceding already defined and in place for us. All we have to do is replace the content of each section with something more meaningful.

Creating a Working Directory

The first and main step to use h2xs to create an installable module is to create a working directory tree where the module source code will reside. This resembles a local library directory (and indeed we can use the module directly if we add it to @INC via Perl's -I option or the use lib pragma). In its most basic usage, h2xs creates a directory tree based on the package name we give it and creates an initial module file with all the basic attributes in place. We use -n to name both module and directory structure and -X to tell h2xs not to bother trying to wrap any C or C++ code. For example:

> h2xs -X -n Installable::Module

This creates a directory Installable-Module inside, which are the files and directories listed in Table 10-1.

Table 10-1. Initial Contents of a Distributable Module Directory

File Description
lib/Installable/Module.pm This is the Perl module itself.
Makefile.PL This is a Perl script that generates a makefile script for the module. The makefile is generated by simply running Perl on this file, for example:
> perl Makefile.PL
In turn, the makefile defines various targets, notably dist, which creates a distribution file and install, which carries out the building, testing, and installation of the module.
t/Installable-Module.t A test script to test the module's functionality, which is compatible with the Test::Harness module and which is executed (using Test::Harness) by the test makefile target. We can create our own tests using the Test or Test::Simple/Test::More modules and add them to this script.
Changes A Changes file that documents the module's history. This file is suppressed by the -C option.
MANIFEST A list of the files in the distribution. By adding files to this list, we can add them to the distribution that is created by the dist target.

The actual donkeywork of creating the makefile is done by a collection of modules in the ExtUtils family, the principal one being ExtUtils::MakeMaker. A single call to this module actually takes up the bulk of the Makefile.PL script:

use 5.008005;
use ExtUtils::MakeMaker;
# See lib/ExtUtils/MakeMaker.pm for details of how to influence
# the contents of the Makefile that is written.
WriteMakefile(
    NAME              => 'Installable::Module',
    VERSION_FROM      => 'lib/Installable/Module.pm', # finds $VERSION
    PREREQ_PM         => {}, # e.g., Module::Name => 1.1
    ($] >= 5.005 ?     ## Add these new keywords supported since 5.005
      (ABSTRACT_FROM  => 'lib/Installable/Module.pm', # retrieve from module
       AUTHOR         => 'You <your@email>') : ()),
);

This is a newer Makefile.PL; older versions of h2xs create a slightly different directory structure and a Makefile.PL without the trailing definitions. However, the format and use is broadly similar.

Notice the use statement at the start of this file. It requires a Perl version of at least 5.8.5, but only because in this case it was the version of the interpreter that h2xs used. If we know our module doesn't need such a current version, we can override it with the -b option to h2xs. For example, for Perl version 5.005, or 5.5.0 in the new version numbering format, we would use

> h2xs -X -n Installable::Module -b 5.5.0

Sometimes we already have a module, and we just want to convert it into an installable one. The best option here is to create a new module source file and then copy the existing source code from the old module file into it. This way we get the extra files correctly generated by h2xs for us, each in its proper place, and each containing a valid structurally correct skeleton to aid in adapting the module to conform with the guidelines.

Either way, once we have the directory set up and the appropriate files created within it, we can create a functional and (preferably) fully documented module.

Building an Installable Package

To create an installable package file from our module source, we only need to create the makefile and then use make dist (or nmake or dmake on a Windows system) to create the distribution file:

> perl Makefile.PL
> make dist

If we have added other modules to our source code or additional files we want to include with the distribution, we add them to the MANIFEST file. At the start, this file contains just the files generated by h2xs, that is Changes, MANIFEST, Makefile.PL, Module.pm, and test.pl.

Assuming the make dist executes successfully, we should end up with an archived installation file comprising the package name (with colons replaced by hyphens) and the version number. On a Unix system, our example module gets turned into Installable-Module-0.01.tar.gz. To test it, we can invoke

> make disttest

We can now take this package to another system and install it with

> gunzip Installable-Module-0.01.tar.gz
> tar -xvf Installable-Module-0.01.tar

Once the source is unpacked, we create the makefile and run the install target from it.

> cd Installable-Module-0.01
> perl Makefile.PL
> make
> make test
> su


Password:


# make install

This will install files into the default installation location, which is usually the standard Perl library. We can instead opt to install the package into the site_perl directory under Perl's main installation tree with the install_site target:

> su


Password:


# make install_site

Alternatively, we can have install install the module into the site_perl directory automatically by adding a definition for INSTALLDIRS to the key-value pair of WriteMakefile:

use 5.005005;
use ExtUtils::MakeMaker;
# See lib/ExtUtils/MakeMaker.pm for details of how to influence
# the contents of the Makefile that is written.
WriteMakefile(
    'INSTALLDIRS' => 'site',
    ...as before...
);

The valid values for this parameter are perl, site, and vendor. Of the three, site is really the only good choice if we want to keep our own modules from entering the official Perl library. Note that we will need to have permission to actually install the file anywhere under the standard Perl library root. Once the installation is complete, we should be able to see details of it by running perldoc perllocal.

Alternatively, to install a module into our own separate location, we can supply the LIB or PREFIX parameters when we create the makefile. For example, to install modules into a master library directory lib/perl in our home directory on a Unix system, we could type

> cd Installable-Module-0.01
> perl Makefile.PL PREFIX=˜/lib/perl
> su


Password:


# make install

The PREFIX parameter overrides the initial part of all installation paths, allowing installation into a different location. The various installation locations for modules, manual pages, and scripts are given sensible defaults derived from this initial path. Individual paths can then be overridden specifically if necessary with the following parameters:

  • INST_ARCHLIB: Architecture-dependent files
  • INST_LIB: Primary module source files
  • INST_BIN: Binary executables
  • INST_SCRIPT: Scripts
  • INST_MAN1DIR: Section 1 manual pages
  • INST_MAN3DIR: Section 3 manual pages

PREFIX is therefore ideal for installing a module into a private local directory for testing.

The LIB parameter allows the implementation files of a module to be installed in a nonstandard place, but with accompanying files such as scripts and manual pages sent to a default location or those derived from PREFIX. This makes the module findable by documentation queries (for example, the man command on Unix) while allowing it to reside elsewhere.

Adding a Test Script

The makefile generated by ExtUtils::MakeMaker contains an impressively larger number of different make targets. Amongst them is the test target, which executes the test script test.pl generated by h2xs. To add a test stage to our package, we only have to edit this file to add the tests we want to carry out.

Tests are carried out in the aegis of the Test::Harness module, which we will cover in Chapter 17, but which is particularly aimed at testing installable packages. The Test::Harness module expects a particular kind of output, which the pregenerated test.pl satisfies with a redundant automatically succeeding test. To create a useful test, we need to replace this pregenerated script with one that actually carries out tests and produces an output that complies with what the Test::Harness module expects to see.

Once we have a real test script that carries out genuine tests in place, we can use it by invoking the test target, as we saw in the installation examples earlier:

> make test

By default the install target does not include test as a dependent target, so we do need to run it separately if we want to be sure the module works. The CPAN module automatically carries out the test stage before the install stage, however, so when we install modules using it we don't have to remember the test stage.

Uploading Modules to CPAN

Once a module has been successfully turned into a package (and preferably reinstalled, tested, and generally proven), it is potentially a candidate for CPAN. Uploading a module to CPAN allows it to be shared among other Perl programmers, commented on, improved, and made part of the library of Perl modules available to all within the Perl community.

This is just the functional stage of creating a module for general distribution, however. Packages cannot be uploaded to CPAN arbitrarily. First we need to get registered so we have an upload directory to upload things into. It also helps to discuss modules with other programmers and see what else is already available that might do a similar job. It definitely helps to choose a good package name and to discuss the choice first. Remember that Perl is a community as well as a language; for contributions to be accepted (and indeed, noticed at all), it helps to talk about them.

Information on registration and other aspects of contribution to CPAN are detailed on the Perl Authors Upload Server (PAUSE) page at http://cpan.org/modules/04pause.html (or any mirror). The module distribution list is at http://cpan.org/modules/01modules.index.html, while details of all the modules currently held by CPAN and its many mirrors is in http://cpan.org/modules/03modlist.data.gz.

Summary

In this chapter, we explored the insides of modules and packages and how to write our own modules. We saw how packages affect the symbol table and looked at a few ways to take advantage of this knowledge to examine and even manipulate package contents programmatically.

We then looked at Perl's special phase transition blocks, BEGIN, CHECK, INIT, and END, and how we can use them to create modules that can initialize themselves and carry out various kinds of checks between phases of the interpreter's operation.

The next main topic discussed was the autoloading mechanism, which allows us to intercept calls to subroutines that do not exist and define them on the fly if we want to. From there we looked at importing and exporting, completing the discussion started in the previous chapter from the viewpoint of the module being imported from. We looked at the basics of the import mechanism, how we can use it to do other things than importing, and how to use the Exporter module to handle many common import and export requirements.

We also looked at package attributes and implementing our own attributes for subroutines and variables with the Attribute::Handlers module. Like the import/export mechanism, this completes the previous discussion started in Chapter 7, where we introduced using Perl's built-in attributes from the perspective of the implementing module.

Finally, we went through the process of creating an installable module, including the use of h2xs to create the initial working directory and files, bundling the completed module into a distributable archive, and then installing the archive on another platform.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset