Chapter 5. Variables

Variables won't. Constants aren't.

Osborn's Law

Compared to most mainstream languages, Perl has an embarrassingly rich variety of built-in variables. The largest group of these are the global punctuation variables$_, $/, $|, @_, @+, %!, %^H—which control a wide range of fundamental program behaviours, and which are largely responsible for Perl's unwarranted reputation as "executable line-noise". Other standard variables have more obvious names—@ARGV, %SIG, ${^TAINT}—but are still global in their scope, and in their effects as well.

Perl also provides self-declaring package variables. These will silently spring into existence the first time they're referred to, helpfully converting typos into valid, but incorrect, code.

This chapter presents a series of coding practices that can minimize the problems associated with Perl's sometimes over-helpful built-in variables. It also offers some techniques for making the most efficient use of variables you create yourself.

Lexical Variables

Avoid using non-lexical variables.

Stick to using only lexical variables (my), unless you genuinely need the functionality that only a package or punctuation variable can provide.

Using non-lexical variables increases the "coupling" of your code. If two otherwise unrelated sections of code both use a package variable, those two pieces of code can interact with each other in very subtle ways, just by the way they each interact with that shared variable. In other words, without full knowledge of every other piece of code that is called from a particular statement, it is impossible to know whether the value of a given non-lexical variable will somehow be changed by executing that statement.

Some of Perl's built-in non-lexical variables, such as $_, @ARGV, $AUTOLOAD, or $a and $b, are impossible to avoid. But most of the rest are not required in general programming, and there are usually better alternatives. Table 5-1 lists the commonly used Perl built-in variables and what you should use instead. Note that prior to Perl 5.8, you may need to specify use IO::Handle explicitly before using the suggestions that involve method calls on filehandles.

Table 5-1. Alternatives to built-in variables

Variable

Purpose

Alternative

$1, $2, $3, etc.

Store substrings captured from the previous regex match

Assign captures directly using list context regex matching, or unpack them into lexical variables immediately after the match (see Chapter 12). Note that these variables are still acceptable in the replacement string of a substitution, because there is no alternative. For example:

s{($DD)/($MMM)/($YYYY)}{$3-$2-$1}xms

$&

Stores the complete substring most recently matched by a regex

Place an extra set of capturing parentheses around the entire regex, or use Regexp::MatchContext (see the "Match Variables" guideline later in this chapter).

$`

Stores the substring that preceded the most recent successful regex match

Place a ((?s).*?) at the beginning of the regex to capture everything up to the start of the pattern you are actually interested in, or use Regexp::MatchContext.

$'

Stores the substring that followed the most recent successful regex match

Place a ((?s).*) at the end of the regex to capture everything after the pattern you are actually interested in, or use Regexp::MatchContext.

$*

Controls newline matching in regexes

Use the /m regex modifier.

$.

Stores the current line number of the current input stream

Use $fh->input_line_number().

$|

Controls autoflushing of the current output stream

Use $fh->autoflush().

$"

Array element separator when interpolating into strings

Use an explicit join.

$%, $=, $-, $~, $^, $:, $^L, $^A

Control various features of Perl's format mechanism

Use Perl6::Form::form instead (see Chapter 19).

$[

Determines the starting index of arrays and strings

Never change the starting index from zero.

@F

Stores the result of autosplitting the current line

Don't use the -a command-line flag when invoking perl.

$^W

Controls warnings

Under Perl 5.6.1 and later, specify use warnings instead.

Package Variables

Don't use package variables in your own development.

Even if you're occasionally forced to use Perl's built-in non-lexical variables, there's no reason to use ordinary package variables in your own development.

For example, don't use package variables to store state inside a module:

package Customer;

use Perl6::Export::Attrs;    # SeeChapter 17

# State variables...
our %customer;
our %opt;

sub list_customers : Export {
    for my $id (sort keys %customer) {
        if ($opt{terse}) {
            print "$customer{$id}{name}
";
        }
        else {
            print $customer{$id}->dump(  );
        }
    }
    return;
}

# and later in...
package main;
use Customer qw( list_customers );

$Customer::opt{terse} = 1;

list_customers();

Lexical variables are a much better choice. And if they need to be accessed outside the package, provide a separate subroutine to do that:

package Customer;

use Perl6::Export::Attrs;

# State variables...
my %customer;
my %opt;
sub set_terse {
    $opt{terse} = 1;
    return;
}

sub list_customers : Export {
    for my $id (sort keys %customer) {
        if ($opt{terse}) {
            print "$customer{$id}{name}
";
        }
        else {
            print $customer{$id}->dump(  );
        }
    }
    return;
}

# and elsewhere...

package main;
use Customer qw( list_customers );

Customer::set_terse(  );list_customers(  );

If you never use package variables, there's no possibility that people using your module could accidentally corrupt its internal state. Developers who are using your code simply cannot access the lexical state variables outside your module, so there is no possibility of incorrectly assigning to them.

Using a subroutine call like Customer::set_terse() to store or retrieve module state means that you (the module writer) retain control over how state variables are modified. For example, later in the development cycle it might be necessary to integrate a more general reporting package into the source code:

package Customer;

use Perl6::Export::Attrs;

# State variables...
my %customer;
my %opt;

use Reporter;

sub set_terse {
    return Reporter::set_terseness_for({ name => 1 });
}

sub list_customers : Export {
    for my $id (sort keys %customer) {
        Reporter::report({ name => $customer{$id} });
    }
    return;}

Note that, although there is no longer a $opt{terse} variable inside the package, any code that calls Customer::set_terse() will continue to work without change. If $opt{terse} had been a package variable, you would now have to either track down every assignment to it and change that code to call Reporter::set_terseness_for(), or replace $opt{terse} with a tied variable (see Chapter 19).

Generally speaking, it's bad practice to use variables anywhere in a module's interface. Chapter 17 discusses this point further.

Localization

If you're forced to modify a package variable, localize it.

Occasionally you will have no choice but to use a package variable, usually because some other developer has made it part of the module's public interface. But if you change the value of that variable, you're making a permanent decision for every other piece of code in your program that uses the module:

use YAML;
$YAML::Indent = 4;       # Indent hereafter 4 everywhere that YAML is used

By using a local declaration when making that change, you restrict its effects to the dynamic scope of the declaration:

use YAML;
local $YAML::Indent = 4; # Indent is 4 until control exits current scope

That is, by prefacing the assignment with the word local, you can temporarily replace the package variable $YAML::Indent until control reaches the end of the current scope. So any calls to the various subroutines in the YAML package from within the current scope will see an indent value of 4. And after the scope is exited, the previous indent value (whatever it was) will be restored.

This is much more neighbourly behaviour. Rather than imposing your personal preferences on the rest of the program, you're imposing them only on your small corner of the code.

Initialization

Initialize any variable you localize.

Many people seem to think that a localized variable keeps its pre-localization value. It doesn't. Whenever a variable is localized, its value is reset to undef[22].

So this probably won't work as expected:

use YAML;
# Localize the current value...    (No it doesn't!)
local $YAML::Indent;

# Then change it, if necessary...
if (defined $config{indent}) {
    $YAML::Indent = $config{indent};
}

Unless the if statement executes, the localized copy of $YAML::Indent will retain its post-localization value of undef.

To correctly localize a package variable but still retain its pre-localization value, you need to write this instead:

use YAML;
# Localize the current value...
local $YAML::Indent = $YAML::Indent;

# Then change it, if necessary...
if (defined $config{indent}) {
    $YAML::Indent = $config{indent};}

This version might look odd, redundant, and very possibly wrong, but it's actually both correct and necessary[23]. As with any other assignment, the righthand side of the localized assignment is evaluated first, yielding the original value of $YAML::Indent. Then the variable is localized, which installs a new container inside $YAML::Indent. Finally, the assignment—of the old value to the new container—is performed.

Of course, you may not have wanted to preserve the former indentation value, in which case you probably needed something like:

Readonly my $DEFAULT_INDENT => 4;

# and later...

use YAML;local $YAML::Indent = $DEFAULT_INDENT;

Even if you specifically did want that variable to be undefined, it's better to say so explicitly:

use YAML;
local $YAML::Indent = undef;

That way any readers of the code can immediately see that the lack of definition is intentional, rather than wondering whether it's an oversight.

Punctuation Variables

use English for the less familiar punctuation variables.

Avoiding punctuation variables completely is, unfortunately, not a realistic option. For a few of the less commonly used variables, there is no good alternative. Or you may be maintaining code that is already structured around the extensive use of these variables, and reworking that code is impractical.

For example:

local $| = 1;        # Autoflush output
local $" = qq{};   # Hash subscript separator
local $; =  q{, };   # List separator
local $, =  q{, };   # Output field separator
local $ = qq{
};   # Output record separator

eval {
    open my $pipe, '<', '/cdrom/install |'
        or croak "open failed: $!";

    @external_results = <$pipe>;

    close $pipe
        or croak "close failed: $?, $!";
};

carp "Internal error: $@" if $@;

In such cases, the best practice is to use the "long" forms of the variables instead, as provided by use English. The English.pm module gives readable identifiers to most of the punctuation variables. With it, you could greatly improve the readability and robustness of the previous example:

use English qw( -no_match_vars );   # See the "Match Variables" guideline later

local $OUTPUT_AUTOFLUSH         = 1;
local $SUBSCRIPT_SEPARATOR      = qq{};
local $LIST_SEPARATOR           =  q{, };
local $OUTPUT_FIELD_SEPARATOR   =  q{, };
local $OUTPUT_RECORD_SEPARATOR  = qq{
};

eval {
    open my $pipe, '/cdrom/install |'
        or croak "open failed: $OS_ERROR";

    @extrenal_results = <$pipe>;

    close $pipe
        or croak "close failed: $CHILD_ERROR, $OS_ERROR";
};

carp "Internal error: $EVAL_ERROR"    if $EVAL_ERROR;

The readability improvement is easy to see, but the greater robustness is perhaps less obvious. Take another look at the localization of the five variables:

local $OUTPUT_AUTOFLUSH         = 1;
local $SUBSCRIPT_SEPARATOR      = qq{};
local $LIST_SEPARATOR           =  q{, };
local $OUTPUT_FIELD_SEPARATOR   =  q{, };local $OUTPUT_RECORD_SEPARATOR  = qq{
};

and compare it with the non-English version:

local $| = 1;        # Autoflush output
local $" = qq{};   # Hash subscript sep
local $; =  q{, };   # List separator
local $, =  q{, };   # Output field sep
local $ = qq{
};# Output record sep

Did you spot the mistake in the "punctuated" version? The comment on the second assignment claims that it is setting the hash subscript separator variable. But in fact the code is setting $", which is the list separator variable. Meanwhile, the third line's comment claims to be setting the list separator, whereas it's actually setting the hash-subscript separator variable: $;.

Somehow during development or maintenance those two variables were switched. Unfortunately, the values being assigned to them weren't swapped, nor were the comments. But, because these particular punctuation variables are relatively uncommon, it's easy to just trust the comments[24], which can blind you to the actual problem.

In comparison, the use English version doesn't even have comments. It doesn't need them. The long variable names document the purpose of each variable directly. And it's unlikely you'll mistakenly assign a value meant for the hash-subscript separator to the list separator instead. No matter how bad your spelling, the odds of accidentally typing $SUBSCRIPT_SEPARATOR when you meant to type $LIST_SEPARATOR are very slight.

There is one exception to this rule that all inescapable punctuation variables ought to be replaced with their use English synonyms. That exception is the $ARG variable:

@danger_readings = grep { $ARG > $SAFETY_LIMIT } @reactor_readings;

Using $ARG is likely to make your code less clear to the average reader, compared with its original punctuation form:

@danger_readings = grep { $_ > $SAFETY_LIMIT } @reactor_readings;

The general principle here is simple: If you had to look up a punctuation variable in the perlvar documentation when you were writing or maintaining the code, then most people will have to look it up when they read the code. And they'll probably have to look that variable up every single time they read the code.

So the first time you have to look up a punctuation variable in perlvar, replace it with the alternative construct suggested in Table 5-1, or else with its use English equivalent.

Localizing Punctuation Variables

If you're forced to modify a punctuation variable, localize it.

The problems described earlier under "Localization" can also crop up whenever you're forced to change the value in a punctuation variable (often in I/O operations). All punctuation variables are global in scope. They provide explicit control over what would be completely implicit behaviours in most other languages: output buffering, input line numbering, input and output line endings, array indexing, et cetera.

It's usually a grave error to change a punctuation variable without first localizing it. Unlocalized assignments can potentially change the behaviour of code in entirely unrelated parts of your system, even in modules you did not write yourself but are merely using.

Using local is the cleanest and most robust way to temporarily change the value of a global variable. It should always be applied in the smallest possible scope, so as to minimize the effects of any "ambient behaviour" the variable might control:

Readonly my $SPACE => q{ };

if (@ARGV) {
    local $INPUT_RECORD_SEPARATOR  = undef;   # Slurp mode
    local $OUTPUT_RECORD_SEPARATOR = $SPACE;  # Autoappend a space to every print
    local $OUTPUT_AUTOFLUSH        = 1;       # Flush buffer after every print

    # Slurp, mutilate, and spindle...
    $text = <>;
    $text =~ s/
/[EOL]/gxms;
    print $text;}

A common mistake is to use unlocalized global variables, saving and restoring their original values at either end of the block, like so:

Readonly my $SPACE => q{ };

if (@ARGV) {
    my $prev_irs = $INPUT_RECORD_SEPARATOR;
    my $prev_ors = $OUTPUT_RECORD_SEPARATOR;
    my $prev_af  = $OUTPUT_AUTOFLUSH;

    $INPUT_RECORD_SEPARATOR  = undef;
    $OUTPUT_RECORD_SEPARATOR = $SPACE;
    $OUTPUT_AUTOFLUSH        = 1;

    $text = <>;
    $text =~ s/
/[EOL]/gxms;
    print $text;

    $INPUT_RECORD_SEPARATOR  = $prev_irs;
    $OUTPUT_RECORD_SEPARATOR = $prev_ors;
    $OUTPUT_AUTOFLUSH        = $prev_af;
}

This way is slower and far less readable. It's prone to cut-and-paste errors, mistyping, mismatched assignments, forgetting to restore one of the variables, or one of the other classic blunders. Use local instead.

Match Variables

Don't use the regex match variables.

Whenever you use English, it's important to load the module with a special argument:

use English qw( -no_match_vars );

This argument prevents the module from creating the three "match variables": $PREMATCH (or $`), $MATCH (or $&), and $POSTMATCH (or $'). Whenever these variables appear anywhere in a program, they force every regular expression in that program to save three extra pieces of information: the substring the match initially skipped (the "prematch"), the substring it actually matched (the "match"), and the substring that followed the match (the "postmatch").

Every regex has to do this every time any pattern match succeeds, because these punctuation variables are global in scope, and hence available everywhere. So the regex that sets them might not be in the same lexical scope, the same package, or even the same file as the code that next uses them. The compiler can't know which regex will have been the most recently successful at any point, so it has to play it safe and set the match variables every time any regex anywhere matches, in case that particular match is the one that precedes the use of one of the match variables.

This particular problem neatly illustrates why all non-lexical variables cause difficulties. The presence of $`, $&, or $' immediately couples a particular piece of code to (potentially) every single regex in your program. Leaving aside the extra workload that connection imposes on every pattern match, this also means that debugging pattern matches can be potentially much more difficult. If one of the match variables doesn't contain what you expected, it's possible that's because it was actually set by some pattern match other than the one you thought was setting it. And that pattern match could be anywhere in your source code.

Don't ever use the match variables:

use English;

my ($name, $birth_year)
    = $manuscript =~ m/(S+) s+ was s+ born s+ in s+ (d{4})/xms;

if ($name) {
    print $PREMATCH,
          qq{<born date="$birth_year" name="$name">},
          $MATCH,
          q{</born>},
          $POSTMATCH;
}

It's better to use extra capturing parentheses to retain the required context information:

my ($prematch, $match, $name, $birth_year, $postmatch)
    = $manuscript =~ m{ (A.*?)    # capture prematch from start

                        (          # then capture entire match...

                            (S+) s+ was s+ born s+ in s+ (d{4})
                        )
                        (.*z)     # then capture postmatch to end

                      }xms;
if ($name) {
    print $prematch,
          qq{<born date="$birth_year" name="$name">},
          $match,
          q{</born>},
          $postmatch;}

This solution avoids imposing a performance penalty on every regex match when you're only using the match variables from one. However, it does penalize this particular regex in another way: by making it much uglier, and burying the significant part of the regex under a mound of extra parentheses. It can also be tricky to remember that the entire match is now the second capture, and so the $match variable has to be declared ahead of $name and $birth_year. Indeed, having the entire match captured ahead of parts of the match may seem counterintuitive to subsequent readers of the code.

A cleaner solution is to use the Regexp::MatchContext CPAN module. This module extends the Perl regex syntax with a new metasyntactic construct: (?p). The module also exports three subroutines named PREMATCH(), MATCH(), and POSTMATCH(). These subroutines return those respective parts of the match context of the most recent regex with a (?p) marker anywhere inside it.

You could simplify the previous example by rewriting it like this:

use Regexp::MatchContext;

my ($name, $birth_year)
    = $manuscript =~ m/(?p) (S+) s+ was s+ born s+ in s+ (d{4})/xms;

if ($name) {
    print PREMATCH(),
          qq{<born date="$birth_year" name="$name">},
          MATCH(),
          q{</born>},
          POSTMATCH();}

Note how close this example is to the original version of the code. Apart from using three subroutines instead of three global variables, the only change from the original version is that you have to put a (?p) marker in the regex. That's a tiny bit more work, but it confers several significant advantages. For a start, it explicitly marks which regex is capturing the match variables, so it's easier to work out which code to debug when a match variable goes wrong.

Better still, unlike English, the Regexp::MatchContext module does the extra match-variable-preservation work only for those particular regexes that have a (?p) marker, so there's no longer an overhead imposed on all the other regexes in your program. And even in those regexes that do set the match variables, Regexp::MatchContext does most of the extra work lazily. That is, the information is extracted only when you actually use one of the match variables, not when the regex is originally matched.

Yet another advantage to using Regexp::MatchContext is that the subroutines it exports return a genuine substr-like substring, rather than a read-only copy. You can assign a value to MATCH() and that assignment will change the corresponding sections of the original string. For example, you could rework the following slightly obscure substitution:

$html =~ s{.*? (<body> .* </body>) .*}      # Locate components of page
          {   $STD_HEADER                   # Ensure standard header is used
            . verify_body($1)               # Check contents
            . '</html>'                     # Remove any trailing extras
          }exms;

replacing it with a more readable match-and-reassign version:

use Regexp::MatchContext;

if ($html =~ m{(?p) <body> .* </body>}xms) {   # Locate body of page (with context)
    PREMATCH()  = $STD_HEADER;                  # Ensure standard header is used
    MATCH()     = verify_body( MATCH() );       # Check contents
    POSTMATCH() = '</html>';                    # Remove any trailing extras}

Dollar-Underscore

Beware of any modification via $_.

One particularly easy way to introduce subtle bugs is to forget that $_ is often an alias for some other variable. Any assignment to $_ or any other form of transformation on it, such as a substitution or transliteration, is probably changing some other variable. So any change applied to $_ needs to be scrutinized particularly carefully.

This problem can be especially insidious when $_ isn't actually being named explicitly. For example, suppose you needed a subroutine that would return a copy of any string passed to it, with the leading and trailing whitespace trimmed from the copy. And suppose you also want that subroutine to default to trimming $_ if no explicit argument is provided (just as the built-in chomp does). You might write such a subroutine like this:

sub trimmed_copy_of {
    # Trim explicit arguments...
    if (@_ > 0) {
        my ($string) = @_;
        $string =~ s{A s* (.*?) s* z}{$1}xms;
        return $string;
    }
    # Otherwise, trim the default argument (i.e. $_)...
    else {
        s{A s* (.*?) s* z}{$1}xms;
        return $_;
    }
}

and then use it like so:

print trimmed_copy_of($error_mesg);

for (@diagnostics) {
    print trimmed_copy_of;
}

Unfortunately, that implementation of trimmed_copy_of() is fatally flawed. After using the function in the previous code, the contents of $error_mesg are unchanged (as they should be), but each of the elements of @diagnostics has been unexpectedly shaved. That's because trimmed_copy_of() correctly deals with explicit arguments by copying them into a separate variable and then changing that copy:

if (@_ > 0) {
    my ($string) = @_;
    $string =~ s{A s* (.*?) s* z}{$1}xms;
    return $string;
}

But the subroutine applies its substitution directly to the (implicit) $_, without first copying its contents:

else {
    s{A s* (.*?) s* z}{$1}xms;
    return $_;
}

Within the for loop, the $_ variable is sequentially aliased to each element of the array:

for (@diagnostics) {
    print trimmed_copy_of;
}

which means that the substitution applied to $_ inside trimmed_copy_of() will alter the original array elements.

Something has clearly gone wrong in the design or the implementation. Either trimmed_copy_of() should never change the string it's trimming, or it should always change it. If it should never trim the original, the subroutine needs to be written:

sub trimmed_copy_of {
    my $string = (@_ > 0) ? shift : $_;
    $string =~ s{A s* (.*?) s* z}{$1}xms;
    return $string;}

On the other hand, if the intention was that the subroutine consistently modify its (explicit or implicit) argument, then it should have been written like so:

sub trim_str {
    croak 'Useless use of trim_str() in non-void context'
        if defined wantarray;

    for my $orig_arg ( @_ ? @_ : $_ ) {               # all args or just $_
        $orig_arg =~ s{A s* (.*?) s* z}{$1}xms;   # change the actual args
    }

    return;}

in which case it would be used differently, too:

for my $warning ($error_mesg, @diagnostics) {
    trim_str $warning;
    print $warning;}

There are several features of this second version of the subroutine that are worth noting. First, because the behaviour of the subroutine changed, its name also needs to change. trimmed_copy_of() returns a trimmed copy, so it's named with a past participle that describes how the argument was modified. trim_str() does something to its actual argument, so it's named with an imperative verb indicating the action to be carried out.

Next, there's the rather unusual test and exception in this second version:

croak 'Useless use of trim_str() in non-void context'
        if defined wantarray;

You're probably more familiar with exceptions that warn about the useless use of constructs in void contexts, but here the subroutine dies if the context specifically isn't void. That's because the trim_str() subroutine exists solely to modify its arguments. It doesn't return a useful value, so anyone using it in a scalar context:

$tidy_text = trim_str $raw_text;

or a list context:

print trim_str $message;

is making a mistake. Killing them for it immediately is probably a kindness.

Finally, the heart of the trimming operation is:

    for my $orig_arg ( @_ ? @_ : $_ ) {              # all args or just $_
        $orig_arg =~ s{A s* (.*?) s* z}{$1}xms;    }

In other words, if there is at least one element in the subroutine's argument list (@_), then iterate through those arguments, changing each of them. Otherwise, iterate through only $_, changing it. The use of (@_ ? @_ : $_) to generate the for loop's list is sufficiently unusual and line-noisy that it warrants clarification with an end-of-line comment.

Note too that that loop could have been written as:

    for ( @_ ? @_ : $_ ) {                          # all args or just $_
        s{A s* (.*?) s* z}{$1}xms;
    }

but it would then almost certainly have been harder to comprehend and maintain. In that version, the implicit $_ alias within the for loop would be aliased either sequentially to the elements of @_ (which are themselves aliases to the subroutine's actual arguments) or to whatever the $_ outside the loop was aliased to. At which point your brain explodes.

Similar problems caused by unintended modifications via $_ can also crop up within the block of a map or grep. See "List Processing Side Effects" in Chapter 6 for specific advice on avoiding that particular kind of pedesagittry.

Array Indices

Use negative indices when counting from the end of an array.

The last, second last, third last, nth last elements of an array can be accessed by counting backwards from the length of the array, like so:

# Replace broken frames...
$frames[@frames-1] = $active{top};         # Final frame
$frames[@frames-2] = $active{prev};        # Penultimate frame
$frames[@frames-3] = $active{backup};# Prepenultimate frame

Alternatively, you can work backwards from the final index ($#array_name), like so:

# Replace broken frames...
$frames[$#frames  ] = $active{top};        # Final frame
$frames[$#frames-1] = $active{prev};       # Penultimate frame
$frames[$#frames-2] = $active{backup};# Prepenultimate frame

However, Perl provides a much cleaner notation for accessing the terminal elements of an array. Whenever an array access is specified with a negative number, that number is taken as an ordinal position in the array, counting backwards from the last element.

The preceding assignments are much better written as:

# Replace broken frames...
$frames[-1] = $active{top};                # 1st-last frame (i.e., final frame)
$frames[-2] = $active{prev};               # 2nd-last frame$frames[-3] = $active{backup};             # 3rd-last frame

Using negative indices is good practice, because the leading minus sign makes the index stand out as unusual, forcing the reader to think about what that index means and marking any "from the end" indices with an obvious prefix.

Equally importantly, the negative indices are unobscured by any repetition of the variable name within the square brackets. In the previous two versions, notice how similar the three indices are (in that all three start with either [@frames-... or [$#frames...). Each index differs by only around 20%: two characters out of nine or ten. In contrast, in the negative-index version, every index differs by 50%, making those differences much easier to detect visually.

Using negative indices consistently also increases the robustness of your code. Suppose @frames contains only two elements. If you wrote:

$frames[@frames-1] = $active{top};         # Final frame
$frames[@frames-2] = $active{prev};        # Penultimate frame
$frames[@frames-3] = $active{backup};# Prepenultimate frame

you'd be assigning values to $frames[1] (the last element), $frames[0] (the first element), and $frames[-1] (the last element again!) On the other hand, using -1, -2, and -3 as indices causes the interpreter to throw an exception when you try to assign to a nonexistent element:

Modification of non-creatable array value attempted, subscript -3 at frames.pl line 33.

Slicing

Take advantage of hash and array slicing.

The previous examples would be even less cluttered (and hence more readable) using an array slice and a hash slice:

@frames[-1,-2,-3]
    = @active{'top', 'prev', 'backup'};

An array slice is a syntactic shortcut that allows you to specify a list of array elements, without repeating the array name for each one. A slice looks similar to a regular array access, except that the array keeps its leading @ and you're then allowed to specify more than one index in the square brackets. An array slice like:

@frames[-1,-2,-3]

is exactly the same as:

($frames[-1], $frames[-2], $frames[-3])

just much less work to type in, or read. There's a similar syntax for accessing several elements of a hash: you change the leading $ of a regular hash access to @, then add as many keys as you like. The slice:

@active{'top', 'prev', 'backup'}

is exactly the same as:

($active{'top'}, $active{'prev'}, $active{'backup'})

The sliced version of the frames assignment will be marginally faster than three separate scalar assignments, though the difference in performance is probably not significant unless you're doing hundreds of millions of repetitions. The real benefit is in comprehensibility and extensibility.

Be careful, though. This version:

@frames[-1..-3]
    = @active{'top', 'prev', 'backup'};

is not identical in behaviour. In fact it's a no-op, since the -1..-3 range generates an empty list, just like any other range whose final value is less than its initial value. So the "negative range" actually selects an empty slice, which makes the previous code equivalent to:

() = @active{'top', 'prev', 'backup'};

To successfully use a range of negative numbers in an array slice, you would need to reverse the order, and remember to reverse the order of keys in the hash slice, too:

@frames[-3..-1]
    = @active{'backup', 'prev', 'top'};

That's subtle enough that it's almost certainly not worth the effort. In slices, ranges that include negative indices are generally more trouble than they're worth.

Slice Layout

Use a tabular layout for slices.

A slice-to-slice assignment like:

@frames[-1,-2,-3]
    = @active{'top', 'prev', 'backup'};

can also be written as:

  @frames[ -1,    -2,     -3     ]
= @active{'top', 'prev', 'backup'};

This second version makes it immediately apparent which hash entry is being assigned to which array element. Unfortunately, this approach is useful only when the number of keys/indices in the slices is small. As soon as either list exceeds a single line, the readability of the resulting code is made much worse by vertical alignments:

  @frames[ -1,    -2,     -3,      -4,          -5,      -6,
           -7,          -8     ]
= @active{'top', 'prev', 'backup', 'emergency', 'spare', 'rainy day',
          'alternate', 'default'};

Slice Factoring

Factor large key or index lists out of their slices.

As the final example in the previous guideline demonstrates, slices can quickly become unwieldy as the number of indices/keys increases.

A more readable and more scalable approach in such cases is to factor out the index/key equivalences in a separate tabular data structure:

Readonly my %CORRESPONDING => (
  # Key of         Index of
  # %active...     @frames...
    'top'        =>  -1,
    'prev'       =>  -2,
    'backup'     =>  -3,
    'emergency'  =>  -4,
    'spare'      =>  -5,
    'rainy day'  =>  -6,
    'alternate'  =>  -7,
    'default'    =>  -8,
);@frames[ values %CORRESPONDING ] = @active{ keys %CORRESPONDING };

Each key in %CORRESPONDING is one of the keys of %active, and each value in %CORRESPONDING is the corresponding index of @frames. So the righthand side of the assignment (@active{ keys %CORRESPONDING }) is a hash slice of %active that includes all the entries whose keys are listed in %CORRESPONDING. Similarly, @frames[ values %CORRESPONDING ] is an array slice of @frames that includes all the corresponding indices listed in %CORRESPONDING. That means that the assignment copies entries from %active to the corresponding elements of @frames, with the correspondence being specified by the key/value pairs in %CORRESPONDING.

Storing that key/value correspondence in a hash works because the values and keys functions always traverse the entries of a hash in the same order, so the Nth value returned by values will always be the value of the Nth key returned by keys. Because the two builtins preserve the order of the entries of %CORRESPONDING, the assignment between the two slices copies $active{'top'} into $frames[-1], $active{'prev'} into $frames[-2], $active{'backup'} into $frames[-3], etc.

This approach improves the maintainability of the code, as the %CORRESPONDING hash very clearly and prominently lists the mapping from %active keys to @frames indices. The actual assignment statement is also made considerably simpler. In addition, factoring out the correspondence between keys and indices makes the code very much easier to maintain. Adding an extra assignment now only requires listing an extra key/index pair; changing a key or index only requires updating an existing pair.

And, of course, this technique is not restricted to negative indices, nor do the indices need to be specified in any particular order. If there are a large number of fields being transferred, it can be useful to arrange the keys alphabetically, to make them easier for humans to look up. For example:

Readonly my %CORRESPONDING => (
    age        => 1,
    comments   => 6,
    fraction   => 8,
    hair       => 9,
    height     => 2,
    name       => 0,
    occupation => 5,
    office     => 11,
    shoe_size  => 4,
    started    => 7,
    title      => 10,
    weight     => 3,
);

@staff_member_details[ values %CORRESPONDING ]    = @personnel_record{ keys %CORRESPONDING };

Simple arrays can also be useful when refactoring the keys or indices of a single slice:

# This is the order in which stat() returns its information:
Readonly my @STAT_FIELDS
    => qw( dev ino mode nlink uid gid rdev size atime mtime ctime blksize blocks );

sub status_for {
    my ($file) = @_;

    # The hash to be returned...
    my %stat_hash = ( file => $file );

    # Load each stat datum into an appropriately named entry of the hash...
    @stat_hash{@STAT_FIELDS} = stat $file;

    return \%stat_hash;
}

# and later...warn 'File was last modified at ', status_for($file)->{mtime};

This kind of table-driven programming is highly scalable and particularly easy to maintain. Numerous variations on this technique will be advocated in subsequent chapters.



[22] Or, more accurately, the storage associated with the variable's name is temporarily replaced by a new, uninitialized storage.

[23] Okay, so it still looks odd. Two out of three ain't bad.

[24] Don't. Ever.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset