Chapter 17. Modules

Any fool can make things bigger, more complex, and more violent. It takes a touch of genius—and a lot of courage—to move in the opposite direction.

Albert Einstein

Code reuse is a core best practice, and modules are Perl's principal large-scale mechanism for code reuse. They are also at the heart of Perl's greatest software asset: the CPAN.

Refactoring source code into modules will not only increase the reusability of that code, it is also likely to make the code cleaner[113] and easier to maintain. If nothing else, the programs from which the original code is removed will become shorter, better abstracted, and consequently more maintainable.

The keys to good module design and implementation are: designing the interface first, keeping that interface small and functional, using a standard implementation template, and not reinventing the wheel. The guidelines in this chapter explore these issues.

Interfaces

Design the module's interface first.

The most important aspect of any module is not how it implements the facilities it provides, but the way in which it provides those facilities in the first place. If the module's API is too awkward, or too complex, or too extensive, or too fragmented, or even just poorly named, developers will avoid using it. They'll write their own code instead.

In that way, a poorly designed module can actually reduce the overall maintainability of a system.

Designing module interfaces requires both experience and creativity. The easiest way to work out how an interface should work is to "play test" it: to write examples of code that will use the module before the module itself is implemented[114] The key is to write that code as if the module were already available, and write it the way you'd most like the module to work.

Once you have some idea of the interface you want to create, convert your "play tests" into actual tests (see Chapter 18). Then it's just a Simple Matter Of Programming to make the module work the way that the code examples and tests want it to.

Of course, it may not be possible for the module to work the way you'd most like, in which case attempting to implement it that way will help you determine what aspects of your API are not practical, and allow you to work out what might be an acceptable alternative.

For example, when the IO::Prompt module (see Chapter 10) was being designed, having potential clients write hypothetical code fragments quickly made it obvious that what was needed was a drop-in replacement for the <> input operator. That is, to replace:

CMD:
while (my $cmd = <>) {
    chomp $cmd;
    last CMD if $cmd =~ m/A (?: q(?:uit)? | bye ) z/xms;

    my $args;
    if ($takes_arg{$cmd}) {
        $args = <>;
        chomp $args;
    }

    exec_cmd($cmd, $args);
}

with:

CMD:
while (my $cmd = prompt 'Cmd: ') {
    chomp $cmd;
    last CMD if $cmd =~ m/A (?: q(?:uit)? | bye ) z/xms;

    my $args;
    if ($takes_arg{$cmd}) {
        $args = prompt 'Args: ';
        chomp $args;
    }

    exec_cmd($cmd, $args);}

But to make this work, prompt( ) would have to reproduce the special test that a while (<>) performs on the result of the readline operation. That is, the result of a prompt( ) call had to automatically test for definedness in a boolean context, rather than for simple truth. Otherwise, a user typing in a zero or an empty line would cause the loop to terminate. This requirement constrained the prompt( ) subroutine to return an object with an overloaded boolean test method, rather than a simple string.

The module didn't exist at that point but, by programming with it anyway, the interface it would require had started to become clear.

Examining the code examples soon made it obvious that virtually every call to prompt( ) was going to be immediately followed by a chomp on the result. So it seemed obvious that prompted values should be automatically chomped. Except that there was one developer who submitted a sample code fragment that didn't chomp the input after prompting:

# Print only unique lines (retaining their order)...
INPUT:
while (my $line = prompt '> ') {
    next INPUT if $seen{$line};
    print $line;
    $seen{$line} = 1;}

This result initially suggested that the IO::Prompt module's interface needed a separate prompt_line( ) subroutine as well:

# Print only unique lines (retaining their order)...
INPUT:
while (my $line = prompt_line '> ') {
    next INPUT if $seen{$line};
    print $line;
    $seen{$line} = 1;}

However, in further play-testing, prompt_line( ) proved to have exactly the same set of options as prompt( ) and exactly the same behaviour in every respect except for autochomping. There seemed no justification for doubling the size of the interface, when the same effect could be achieved merely by adding an extra -line option to prompt( ):

# Print only unique lines (retaining their order)...
INPUT:
while (my $line = prompt -line, '> ') {
    next INPUT if $seen{$line};
    print $line;
    $seen{$line} = 1;}

This last decision was a consequence of a more general module design principle. If a module accomplishes a single "composable" task (e.g., prompt for input with some combination of echo-control, chomping, menu-generation, input constraints, default values), then it's better to provide that functionality through a single subroutine with multiple options, as IO::Prompt provides. On the other hand, if a module handles several related but distinct tasks (for example, find the unique elements in a list, find the maximum of a set of strings, sum a list of numbers), then those facilities are better supplied via separate functions, as List::Util does.

In one particular hypothetical program, the programmers had wanted to build a menu of items and then prompt for a choice. They had written a utility subroutine using prompt( ):

sub menu {
    my ($prompt_str, @choices) = @_;

    # Starting at a, list the options in a menu...
    my $letter = 'a';
    print "$prompt_str
";
    for my $alternative (@choices) {
        print "	", $letter++, ". $alternative
";
    }

    CHOICE:
    while (1) {
        # Take the first key pressed...
        my $choice = prompt 'Choose: ';

        # Reject any choice outside the valid range...
        redo CHOICE if $choice lt 'a' || $choice ge $letter;

        # Translate choice back to an index; return the corresponding data...
        return $choices[ ord($choice)-ord('a') ];
    }
}

# and later...my $answer = menu('Which is the most correct answer: ', @answers);

This seemed likely to be a common requirement, so a more sophisticated version of this menu( ) subroutine was added into the proposed IO::Prompt interface:

my $answer = prompt 'Choose the most correct answer: ',
                    -menu => @answers;

All of these decisions, and many others, were reached before the first version of the module was implemented, and many of the interface requirements that were uncovered though this play-testing were not part of the original design.

Some of them made the implementation code more complex than it otherwise would have been, but the result was that the "natural" and "obvious" code submitted by the play-testers eventually worked exactly as they had imagined. That, in turn, makes it far more likely that they will use the actual module.

Refactoring

Place original code inline.

Place duplicated code in a subroutine.

Place duplicated subroutines in a module.

The first time you're tempted to copy-paste-and-modify a piece of code:

package Process::Queue;
use Carp;
{
    use overload (
        # Type coercions don't make sense for process queues...
        q{""}   => sub {
            croak q{Can't stringify a Process::Queue};
        },
        q{0+}   => sub {
            croak q{Can't numerify a Process::Queue };
        },
        q{bool} => sub {
            croak q{Can't get the boolean value of a Process::Queue };
        },
    );
}
# and later...

package Socket;
use Carp;
{
    use overload (
        # Type coercions don't make sense for sockets...
        q{""}   => sub {
            croak q{Can't convert a Socket to a string};
        },
        q{0+}   => sub {
            croak q{Can't convert a Socket to a number};
        },
        q{bool} => sub {
            croak q{Can't get the boolean value of a Socket };
        },
    );
}

…don't do it!

Instead, convert the code into a subroutine, parameterize the parts you would have modified, and then replace both the original and duplicated code with calls to that subroutine:

use Carp;

sub _Class::cannot {
    # What kind of coercion cannot be done?
    my ($coerce) = @_;

    # Build a subroutine with the corresponding error message...
    return sub {
        my ($self) = @_;
        croak sprintf qq{Can't $coerce}, ref $self;
    };
}

# and later...

package Process::Queue;
{
    use overload (
        # Type coercions don't make sense for process queues...
        q{""}   => _Class::cannot('stringify a %s'),
        q{0+}   => _Class::cannot('numerify a %s'),
        q{bool} => _Class::cannot('get the boolean value of a %s'),
    );
}

# and later still...

package Socket;
{
    use overload (
        # Type coercions don't make sense for sockets...
        q{""}   => _Class::cannot('stringify a %s'),
        q{0+}   => _Class::cannot('numerify a %s'),
        q{bool} => _Class::cannot('get the boolean value of a %s'),
    );}

This refactoring might produce slightly more code, but that code will be cleaner, more self-documenting, and easier to maintain. And the next time you need the same functionality, the total amount of code will almost certainly be less than cutting and pasting would have produced.

Note that factoring out the messages still left a chunk of repeated code in each class. In such cases you should re-refactor the code:

use Carp;

sub _Class::cannot {
    # What kind of coercion cannot be done?
    my ($coerce) = @_;
    # Build a subroutine with the corresponding error message...
    return sub {
        my ($self) = @_;
        croak sprintf qq{Can't $coerce}, ref $self;
    };
}

sub _Class::allows_no_coercions {
    return (
        q{""}   => _Class::cannot('stringify a %s'),
        q{0+}   => _Class::cannot('numerify a %s'),
        q{bool} => _Class::cannot('get the boolean value of a %s'),
    );
}

# and later...


package Process::Queue;
{
    # Type coercions don't make sense for process queues...
    use overload  _Class::allows_no_coercions( );
}

# and later still...

package Socket;
{
    # Type coercions don't make sense for sockets...
    use overload  _Class::allows_no_coercions( );}

The first time you're tempted to copy and paste a subroutine definition into some other file, program, or system…don't do that either! Instead, place the subroutine in a module and export it:

package Coding::Toolkit::Coercions;
use Carp;

sub _Class::cannot {
    # What kind of coercion cannot be done?
    my ($coerce) = @_;

    # Build a subroutine with the corresponding error message...
    return sub {
        my ($self) = @_;
        croak sprintf qq{Can't $coerce}, ref $self;
    };
}

sub _Class::allows_no_coercions {
    return (
        q{""}   => _Class::cannot('stringify a %s'),
        q{0+}   => _Class::cannot('numerify a %s'),
        q{bool} => _Class::cannot('get the boolean value of a %s'),
    );
}1; # Magic true value required at the end of any module

Then import it wherever you need it:

use Coding::Toolkit::Coercions;

package Process::Queue;
{
    # Type coercions don't make sense for process queues...
    use overload  _Class::allows_no_coercions( );}

Version Numbers

Use three-part version numbers.

When specifying the version number of a module, don't use vstrings:

our $VERSION = v1.0.3;

They will break your code when it's run under older (pre-5.8.1) versions of Perl. They will also break it under newer versions of Perl, as they're deprecated in the 5.9 development branch and will be removed in the 5.10 release.

They're being removed because they're error-prone; in particular, because they're actually just weirdly specified character strings. For example, v1.0.3 is just shorthand for the character string "x{1}x{0}x{3}". So vstrings don't compare correctly under numeric comparison.

Don't use floating-point version numbers, either:

our $VERSION = 1.000_03;

It's too easy to get them wrong, as the preceding example does: it's equivalent to 1.0.30, not 1.0.3.

Instead, use the version CPAN module and the qv(...) version-object constructor:


use version; our $VERSION = qv('1.0.3'),

The resulting version objects are much more robust. In particular, they compare correctly under either numeric or string comparisons.

Note that, in the previous example, the use version statement and the $VERSION assignment were written on the same line. Loading and using the module in a single line is important, because it's likely that many users of your module will install it using either the ExtUtils::MakeMaker module or the Module::Build module. Each of these modules will attempt to extract and then evaluate the $VERSION assignment line in your module's source code, in order to ascertain the module's version number. But neither of them supports qv'd version numbers directly[115] By placing the $VERSION assignment on the same line as the use version, you ensure that when that line is extracted and executed, the qv( ) subroutine is correctly loaded from version.pm.

The module also supports the common CPAN practice of marking unstable development releases by appending an underscored alpha count to the version number of the previous stable release:

# This is the 12th alpha built on top of the 1.5 release...
use version; our $VERSION = qv('1.5_12'),

These "alpha versions" will also compare correctly. That is, qv('1.5_12') compares greater than qv('1.5') but less than qv('1.6').

Version Requirements

Enforce your version requirements programmatically.

Telling future maintainers about a module's version requirements is certainly a good practice:

package Payload;
# Only works under 5.6.1 and later

use IO::Prompt;                 # must be 0.2.0 or better, but not 0.3.1
use List::Util qw( max );       # must be 1.13 or better
use Benchmark qw( cmpthese );   # but no later than version 1.52# etc.

But telling Perl itself about these constraints is an even better practice, as the compiler can then enforce those requirements.

Perl has a built-in mechanism to do (some of ) that enforcement for you. If you call use with a decimal number instead of a module name, the compiler will throw an exception if Perl's own version number is less than you specified:

package Payload;
use 5.006001;           # Only works under 5.6.1 and later

Unfortunately, that version number has to be an old-style decimal version. You can't use the version module's qv( ) subroutine (as recommended in the previous guideline), because the compiler interprets the qv identifier as the name of a module to be loaded:

package Payload;
use version;
use qv('5.6.1'),        # Tries to load qv.pm

If you load a module with a normal use, but place a decimal version number after its name and before any argument list, then the compiler calls the module's VERSION method, which defaults to throwing an exception if the module's $VERSION variable is less than the version number that was specified:

use IO::Prompt  0.002;              # must be 0.2.0 or better
use List::Util  1.13   qw( max );   # must be 1.13 or better

Note that there are no commas on either side of the version number; that's how the compiler knows it's a version restriction, rather than just another argument to the module's import( ) subroutine.

Once again, the version number has to be an old-style decimal version. A qv( ) isn't recognized:

use IO::Prompt qv('0.2.0') qw( prompt );    # Syntax error

Perl doesn't provide a built-in way to specify "no later version than… " or "any version except… ", apart from testing those conditions explicitly:

package Payload;
use version;
use Carp;

use IO::Prompt qw( prompt );
use Benchmark qw( cmpthese );

# Version compatibility...
BEGIN {
    # Test against compiler version in $]
    # (there's no nice use English name for it)
    croak 'Payload only works under 5.6.1 and later, but not 5.8.0'
        if $] < qv('5.6.1') || $] == qv('5.8.0'),

    croak 'IO::Prompt must be 0.2.0 or better, but not 0.3.1 to 0.3.3'
        if $IO::Prompt::VERSION < qv('0.2.0')
        || $IO::Prompt::VERSION >= qv('0.3.1')
           && $IO::Prompt::VERSION <= qv('0.3.3'),

    croak 'Benchmark must be no later than version 1.52'
        if $Benchmark::VERSION > qv('1.52') ;
}

This approach is tedious, repetitive, and error-prone, so naturally there's a module on the CPAN to simplify the process of loading a module and verifying that its version number is acceptable. The module is named only and it can be used (under Perl 5.6.1 and later) like so:

package Payload;

# Works only under Perl 5.6.1 and later, but not 5.8.0
use only q{ 5.6.1-  !5.8.0 };

# IO::Prompt must be 0.2.0 or better, but not 0.3.1 to 0.3.3
use only 'IO::Prompt' => q{ 0.2-  !0.3.1-0.3.3 },  qw( prompt );

# Benchmark must be no later than version 1.52
use only Benchmark => q{ -1.52 },  qw( cmpthese );

That is, you write use only, followed by the module name, followed by a single string that specifies the range of versions that are acceptable. The use only first loads the module you requested, then checks whether the version it loaded matches the range of versions you specified.

You can specify ranges of acceptable versions ('1.2.1-1.2.8'), or a minimum acceptable version ('2.7.3-'), or a maximum acceptable version ('-1.9.17'). You can negate any of these, to specify unacceptable versions ('!1.2.7', '!3.2-3.2.9'). Most importantly, you can combine these different types of specification to provide "ranges with holes". For example, in the previous example:

use only 'IO::Prompt' => '0.2-  !0.3.1-0.3.3',  qw( prompt );

this means "any version at or above 0.2, except those in the range 0.3.1 to 0.3.3".

The only module has other, even more powerful features. It provides exceptional support for legacy applications that may rely on outdated versions of particular modules. You can install multiple versions of the same module, and then use only will select the most appropriate available version of the module for each program.

Exporting

Export judiciously and, where possible, only by request.

As with classes (see Chapter 15), modules should aim for an optimal interface, rather than a minimal one. In particular, you should provide any non-fundamental utility subroutines that client coders will frequently need, and are therefore likely to (re-)write themselves.

On the other hand, it's also important to minimize the number of subroutines that are exported by default. Especially if those subroutines have common names. For example, if you're writing a module to support software testing, then you might want to provide subroutines like ok( ), skip( ), pass( ), and fail( ):

package Test::Utils;

use base qw( Exporter );
our @EXPORT = qw( ok skip pass fail );    # Will export these by default# [subroutine definitions here]

But exporting those subroutines by default can make the module more difficult to use, because the names of those subroutines may collide with subroutine or method definitions in the software you're testing:

use Perl6::Rules;   # CPAN module implements a subset of Perl 6 regexes
use Test::Utils;    # Let's test it...

my ($matched)
    = 'abc' =~ m{ ab {ok 1} d    # Test nested code blocks in regexes
                | {ok 2; fail}   # Test explicit failure of alternatives
                | abc {ok 3}     # Test successful matches
                }xms;

if ($matched) {
    ok(4);
}

Unfortunately, both the Perl6::Rules and Test::Utils modules export a fail( ) subroutine by default. As a result, the example test is subtly broken, because the Test::Utils::fail( ) subroutine has been exported "over the top of" the previously exported Perl6::Rules::fail( ) subroutine. So the fail( ) call inside the regex isn't invoking the expected fail( ).

The point is that both modules are behaving badly. Neither of them ought to be exporting a subroutine with a name like fail( ) by default; they should be allowing those subroutines to be exported only by explicit request. For example:

package Test::Utils;

use base qw( Exporter );
our @EXPORT_OK = qw( ok skip pass fail );    # Can export these, on request# [subroutine definitions here]

In that case, both of them would then have to be loaded like so:

use Perl6::Rules qw( fail );
use Test::Utils  qw( ok skip );

and there would no longer be a conflict. Or, if they were going to collide, then the conflict would be immediately obvious:

use Perl6::Rules qw( fail );
use Test::Utils  qw( ok skip fail );

So make the interface of a module exportable on request, rather than exported by default.

The only exception to this guideline would be a module whose central purpose is always to make certain subroutines available (as IO::Prompt does with the prompt( ) subroutine, or Perl6::Slurp does with slurp( )—see Chapter 10). If a module will always be used because programmers definitely want one particular subroutine it provides, then it's cleaner to export that subroutine by default:

package Perl6::Slurp;

use base qw( Exporter );

our @EXPORT = qw( slurp );    # The whole point of loading this module                              # is to then call slurp( )

Declarative Exporting

Consider exporting declaratively.

The Exporter module has served Perl well over many years, but it's not without its flaws.

For a start, its interface is ungainly and hard to remember, which leads to unsanitary cutting and pasting. That interface also relies on subroutine names stored as strings in package variables. This design imposes all the inherent problems of using package variables, as well as the problems of symbolic references (see Chapters Chapter 5 and Chapter 11).

It's also redundant: you have to name each subroutine at least twice—once in its declaration and again in one (or more) of the export lists. And if those disadvantages weren't enough, there's also the ever-present risk of not successfully naming a particular subroutine twice, by misspelling it in one of the export lists.

Exporter also allows you to export variables from a module. Using variables as part of your interface is a bad interface practice (see the following guideline, "Interface Variables"), but actually aliasing them into another package is even worse. For a start, exported variables are ignored by use strict, so they may mask other problems in your code. But more importantly, exporting a module's state variables exposes that module's internal state in such a way that it can be modified without the module's name even appearing in the assignment:

use Serialize ($depth);

# and much later...

$depth = -20;        # Change the internal state of the Serialize module

That's neither obvious, nor robust, nor comprehensible, nor easy to maintain.

To set up a module with a full range of export facilities, including default exports, exports-by-request, and tagged export sets, you have to write something like this:

package Test::Utils;

use base qw( Exporter );

our @EXPORT    = qw( ok );                # Default export
our @EXPORT_OK = qw( skip pass fail );    # By explicit request only
our %EXPORT_TAGS = (
    ALL  => [@EXPORT, @EXPORT_OK],        # Everything if :ALL tagset requested

    TEST => [qw( ok pass fail )],         # These if :TEST tagset requested

    PASS => [qw( ok pass )],              # These if :PASS tagset requested

);

sub ok   {...}
sub pass {...}
sub fail {...}
sub skip {...}

The large amount of infrastructure code required to set up this interface can obscure what's actually being accomplished, which makes it harder to know if what's been accomplished is what was supposed to be accomplished.

A cleaner alternative is to use the Perl6::Export::Attrs CPAN module. With this module there is no separate specification of the export list. Instead, you just annotate the subroutines that you want exported, saying how you want them exported: by default, by request, or as part of particular tagsets.

Using Perl6::Export::Attrs, the export behaviour set up in the previous example could be specified with just:

package Test::Utils;
use Perl6::Export::Attrs;

sub ok   :Export( :DEFAULT, :TEST, :PASS ) {...}
sub pass :Export(           :TEST, :PASS ) {...}
sub fail :Export(           :TEST        ) {...}sub skip :Export                           {...}

These annotated definitions specify precisely the same behaviour as the earlier Exporter-based code. Namely that:

  • ok( ) will be exported when requested by name, or when the :TEST or :PASS tagset is requested. It will also be exported by default when no exports are explicitly requested.

  • pass( ) will be exported when requested by name, or when the :TEST or :PASS tagset is requested.

  • fail( ) will be exported when requested by name, or when the :TEST tagset is requested.

  • skip( ) will be exported only when specifically requested by name.

  • Every subroutine marked :Export will automatically be exported if the :ALL tagset is requested

Interface Variables

Never make variables part of a module's interface.

Variables make highly unsatisfactory interface components. They offer no control over who accesses their values, or how those values are changed. They expose part of the module's internal state information to the client code, and they provide no easy way to later impose constraints on how that state is used or modified.

This, in turn, forces every component of the module to re-verify any interface variable whenever it's used. For example, consider the parts of a module for serializing Perl data structures[116] shown in Example 17-1.

Example 17-1. Variables as a module's interface

package Serialize;
use Carp;
use Readonly;
use Perl6::Export::Attrs;
use List::Util qw( max );

Readonly my $MAX_DEPTH => 100;

# Package variables that specify shared features of the module...
our $compaction = 'none';
our $depth      = $MAX_DEPTH;

# Table of compaction tools...
my %compactor = (
  # Value of      Subroutine returning
  # $compaction   compacted form of arg
      none     =>   sub { return shift },
      zip      =>   &compact_with_zip,
      gzip     =>   &compact_with_gzip,
      bz       =>   &compact_with_bz,
      # etc.
);

# Subroutine to serialize a data structure, passed by reference...
sub freeze : Export {
    my ($data_structure_ref) = @_;

    # Check whether the $depth variable has a sensible value...
    $depth = max(0, $depth);

    # Perform actual serialization...
    my $frozen = _serialize($data_structure_ref);

    # Check whether the $compact variable has a sensible value...
    croak "Unknown compaction type: $compaction"
        if ! exists $compactor{$compaction};

    # Return the compacted form...
    return $compactor{$compaction}->($frozen);
}

# and elsewhere...

use Serialize qw( freeze );

$Serialize::depth      = -20;        # oops!
$Serialize::compaction = 1;          # OOPS!!!

# and later...

my $frozen_data = freeze($data_ref);     # BOOM!!!

Because the serialization depth and compaction mode are set via variables, the freeze( ) subroutine has to check those variables every time it's called. Moreover, if the variables are incorrectly set (as they are in the previous example), that fact will not be detected until freeze( ) is actually called. That might be hundreds of lines later, or in a different subroutine, or even in a different module entirely. That's going to make tracking down the source of the error very much harder.

The cleaner, safer, more future-proof alternative is to provide subroutines via which the client code can set state information, as illustrated in Example 17-2. By verifying the new state as it's set, errors such as negative depths and invalid compaction schemes will be detected and reported where and when they occur. Better still, those errors can sometimes be corrected on the fly, as the set_depth( ) subroutine demonstrates.

Example 17-2. Accessor subroutines instead of interface variables

package Serialize;
use Carp;
use Readonly;
use Perl6::Export::Attrs;

Readonly my $MAX_DEPTH => 100;

# Lexical variables that specify shared features of the module...
my $compaction = 'none';
my $depth      = $MAX_DEPTH;

# Table of compaction tools...
my %compactor = (
  # Value of       Subroutine returning
  # $compaction    compacted form of arg
      none     =>   sub { return shift },
      zip      =>   &compact_with_zip,
      gzip     =>   &compact_with_gzip,
      bz       =>   &compact_with_bz,
      # etc.
);

# Accessor subroutines for state variables...

sub set_compaction {
    my ($new_compaction) = @_;

    # Has to be a compaction type from the table...
    croak "Unknown compaction type ($new_compaction)"
        if !exists $compactor{$new_compaction};

    # If so, remember it...
    $compaction = $new_compaction;

    return;
}

sub set_depth {
    my ($new_depth) = @_;

    # Any non-negative depth is okay...
    if ($new_depth >= 0) {
        $depth = $new_depth;
    }
    # Any negative depth is an error, so fix it and report...
    else {
        $depth = 0;
        carp "Negative depth ($new_depth) interpreted as zero";
    }

    return;
}

# Subroutine to serialize a data structure, passed by reference...
sub freeze : Export {
    my ($data_structure_ref) = @_;

    return $compactor{$compaction}->( _serialize($data_structure_ref) );
}

# and elsewhere...

use Serialize qw( freeze );

Serialize::set_depth(-20);         # Warning issued and value normalized to zero
Serialize::set_compaction(1);      # Exception thrown here

# and later...my $frozen_data = freeze($data_ref);

Note that although subroutines are undoubtedly safer than raw package variables, you are still modifying non-local state information through them. Any change you make to a package's internal state can potentially affect every user of that package, at any point in your program.

Often, a better solution is to recast the module as a class. Then any code that needs to alter some internal configuration or state can create its own object of the class, and modify that object's internal state instead. Using that approach, the package shown in Example 17-2 would be rewritten as shown in Example 17-3.

Example 17-3. Objects instead of accessor subroutines

package Serialize;
use Class::Std;
use Carp;
{
    my %compaction_of : ATTR( default => 'none' );
    my %depth_of      : ATTR( default => 100    );

    # Table of compaction tools...
    my %compactor = (
      # Value of       Subroutine returning
      # $compaction    compacted form of arg
          none     =>   sub { return shift },
          zip      =>   &compact_with_zip,
          gzip     =>   &compact_with_gzip,
          bz       =>   &compact_with_bz,
          # etc.
    );

    # Accessor subroutines for state variables...
    sub set_compaction {
        my ($self, $new_compaction) = @_;

        # Has to be a compaction type from the table...
        croak "Unknown compaction type ($new_compaction)"
            if !exists $compactor{$new_compaction};

        # If so, remember it...
        $compaction_of{ident $self} = $new_compaction;

        return;
    }

    sub set_depth {
        my ($self, $new_depth) = @_;

        # Any non-negative depth is okay...
        if ($new_depth >= 0) {
            $depth_of{ident $self} = $new_depth;
        }
        # Any negative depth is an error, so fix it and report...
        else {
            $depth_of{ident $self} = 0;
            carp "Negative depth ($new_depth) interpreted as zero";
        }

        return;
    }

    # Method to serialize a data structure, passed by reference...
    sub freeze {
        my ($self, $data_structure_ref) = @_;

        my $compactor = $compactor{$compaction_of{ident $self}};

        return $compactor->( _serialize($data_structure_ref) );
    }

    # etc.
}

# and elsewhere...

# Create a new interface to the class...
use Serialize;
my $serializer = Serialize->new();

# Set up the state of that interface as required...
$serializer->set_depth(20);
$serializer->set_compaction('zip'),

# and later...my $frozen_data = $serializer->freeze($data_ref);

Creating Modules

Build new module frameworks automatically.

The "bones" of every new module are basically the same:

package <MODULE NAME>;

use version; our $VERSION = qv('0.0.1'),
use warnings;
use strict;
use Carp;

# Module implementation here

1; # Magic true value required at end of module
__END_ _

=head1 NAME

<MODULE NAME> - [One line description of module's purpose here]

=head1 VERSION

This document describes <MODULE NAME> version 0.0.1

=head1 SYNOPSIS

    use <MODULE NAME>;

    # And the rest of the documentation template here

    # (as described in Chapter 7)

So it makes sense to create each new module automatically, reusing the same templates for each. This rule applies not just to the .pm file itself, but also to the other standard components of a module distribution: the MANIFEST file, the Makefile.PL, the Build.PL, the README, the Changes file, and the lib/ and t/ subdirectories.

The easiest way to create all those components consistently is to use the Module::Starter CPAN module. After installing Module::Starter and setting up a minimal ~/.module-starter/config file:

author:  Yurnaam Heere
email:   [email protected]

you can then simply type:

> module-starter --module=New::Module::Name

on the command line. Module::Starter will immediately construct a new subdirectory named New-Module-Name/ and populate it with the basic files that are needed to create a complete module.

Better still, Module::Starter has a simple plug-in architecture that allows you to specify how it creates each new module directory and its contents. For example, you can use the Module::Starter::PBP plugin (also on the CPAN) to cause Module::Starter to use the module templates, documentation proformas, and testing tools recommended in this book.

After installing the Module::Starter::PBP module, you can type:

> perl -MModule::Starter::PBP=setup

on the command line and the plug-in will automatically configure itself, prompting for any information it needs in order to do so. Once Module::Starter::PBP is set up, you can easily edit the standard templates it will have installed, to customize the boilerplate code to your own needs.

The Standard Library

Use core modules wherever possible.

It's definitely best practice to avoid unnecessary work, and code reuse is a primary example of that. Perl has two main software libraries of reusable code: the standard Perl library and the CPAN. It's almost always a serious mistake to start hacking on a solution without at least exploring whether your problem has already been solved.

The library of modules that come standard with every Perl distribution is the ideal place to start. There are no issues of availability: if a core module solves your problem, then that solution will already have been installed anywhere that Perl itself is available. There are no issues of authorization either: if Perl has been approved for production use in your organization, the library modules will almost certainly be acceptable too.

Another major advantage is that the standard library contains some of the most heavily used Perl modules available. Frequent use means they're also some of the most strenuously stress-tested—and therefore more likely to be both reliable and efficient.

Perl's standard library contains modules for creating declaration attributes; optimizing the loading of modules; using arbitrary precision numbers, complex numbers, and a full range of trigonometric functions; adding I/O layers to the standard streams; interfacing with flat-file and relational databases; verifying and debugging Perl code; benchmarking and profiling program performance; CGI scripting; accessing the CPAN; serializing and deserializing data structures; calculating message digests; dealing with different character encodings; accessing system error constants; imposing exception semantics on failure-returning functions and subroutines; processing filenames in a filesystem-independent manner; searching for, comparing, and copying files; filtering source code; command-line argument processing; performing common operations on scalars, arrays, and hashes; internationalizing and localizing programs; setting up and using pipes and sockets; interacting with network protocols (including FTP, NNTP, ping, POP3, and SMTP); encoding and decoding MIME; data caching and subroutine memoization; accessing the complete POSIX function library; processing POD documentation; building software test suites; text processing; thread programming; acquiring and manipulating time and date information; and using Unicode.

There is rarely a good reason to roll your own code for any of those common tasks. For example, if you need a temporary filename, it may be tempting to throw the necessary code together yourself:

# Range of acceptable replacements for placeholders in template...
my @letter = ('A'..'Z'),

# Given a template, fill it in randomly, making sure the file doesn't exist...
sub tempfile {
    my ($template) = @_;
    my $filename;

    ATTEMPT:
    while (1) {
        $filename = $template;
        $filename =~ s{ X }{$letter[rand @letter]}gexms;
        last ATTEMPT if ! -e $filename;
    }

    return $filename;
}

my $filename = tempfile('.myapp_XXXXXX'),
open my $fh, '>', $filename
    or croak "Couldn't open temp file: $filename";

But that's a waste of time and effort, when the necessary functionality is better implemented, more thoroughly tested, and already sitting there waiting on your system:

use File::Temp qw( tempfile );

my ($fh, $filename) = tempfile('.myapp_XXXXXX'),

The perlmodlib documentation is a good place to start exploring the Perl Standard Library.

CPAN

Use CPAN modules where feasible.

The Comprehensive Perl Archive Network (CPAN) is often referred to as Perl's killer app, and rightly credited with much of Perl's success in recent years. It is a truly vast repository of code, providing solutions for just about every programming task you might commonly encounter.

As with Perl's standard library, many of the modules on the CPAN are heavily relied-upon—and severely stress-tested—by the global Perl community. This makes CPAN modules like DBI, DateTime, Device::SerialPort, HTML::Mason, POE, Parse::RecDescent, SpreadSheet::ParseExcel, Template::Toolkit, Text::Autoformat, and XML::Parser extremely reliable and powerful tools. Extremely reliable and powerful free tools.

Of course, not all the code archived on CPAN is equally reliable. There is no centralized quality control mechanism for the archive; that's not its purpose. There is an integrated ratings system for CPAN modules, but it is voluntary and many modules remain unrated. So it's important to carefully assess any modules you may be considering.

Nevertheless, if your organization allows it, always check the CPAN (http://search.cpan.org) before you try to solve a new problem yourself. An hour or so of searching, investigation, quality assessment, and prototyping will frequently save days or weeks of development effort. Even if you decide not to use an existing solution, those modules may give you ideas that will help you design and implement your own in-house version.

Of course, many organizations are wary of any external software, especially if it's open source. One way to encourage your organization to allow you to use the enormous resources of the CPAN is to explain it properly. In particular don't characterize your intent as "importing unknown software"; characterize it as "exporting known development delays, testing requirements, and maintenance costs".

Another resource that may help sway your local Powers That Be is the "Perl Success Stories" archive (http://perl.oreilly.com/news/success_stories.html). Companies like Hewlett Packard, Amazon.com, Barclays Bank, Oxford University Press, and NBC are leveraging the resources of CPAN to better compete in their respective markets. Exploring their successes may cast Perl's software archive in a new and attractive pecuniary light for your boss.



[113] Because revisiting any existing piece of code is likely to make it cleaner, once you get over the involuntary twitching.

[114] These examples will not be wasted when the design is complete. They can usually be recycled into demos, documentation examples, or the core of a test suite.

[115] At least, not as of the publication of this book. Patches correcting the problem have been submitted for both modules and the issue may have been resolved by now—in which case you should go back to putting each statement on a separate line (as recommended in Chapter 2).

[116] There are several such modules on the CPAN: Data::Dumper, YAML, FreezeThaw, and Storable.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset