Chapter 6. Control Structures

Nothing is more difficult, and therefore more precious, than to be able to decide.

Napoleon I

Control structures are all about choosing: choosing whether to do something, choosing between two or more alternatives, choosing how often to repeat something. As in real life, much programming grief springs either from making the wrong choice or from using the wrong approach when making a choice.

This chapter looks at a range of programming practices that can help to make your code's decision making less error-prone, more efficient, and easier to verify.

The basic principles are simple: make the decision stand out; make the consequences of any decision stand out; base the decision on as few criteria as possible; don't phrase the decision negatively; avoid flag variables and count variables; and make it very easy to detect variations in the flow of control.

If Blocks

Use block if, not postfix if.

One of the most effective ways to make decisions and their consequences stand out is to avoid using the postfix form of if. For example, it's easier to detect the decision and consequences in:

if (defined $measurement) {
    $sum += $measurement;}

than in:

$sum += $measurement if defined $measurement;

Moreover, postfix tests don't scale well as the consequences increase. For example:

$sum += $measurement
and $count++
and next SAMPLE
    if defined $measurement;

and:

do {
    $sum += $measurement;
    $count++;
    next SAMPLE;
} if defined $measurement;

are both much harder to comprehend than:

if (defined $measurement) {
    $sum += $measurement;
    $count++;
    next SAMPLE;}

So always use the block form of if.

Postfix Selectors

Reserve postfix if for flow-of-control statements.

The only exception to the previous guideline comes about because of another of the principles enumerated at the start of this chapter: "make it very easy to detect variations in the flow of control".

Such variations come about when a next, last, redo, return, goto, die, croak, or throw[25] occurs in the middle of other code. These commands break up the orderly downward flow of execution, so it is critical that they are easy to detect. And, although they are usually associated with some conditional test, the fact that they may potentially interrupt the control flow is more important than the conditions under which they are doing so.

Hence it's better to place the next, last, redo, return, goto, die, croak, and throw keywords in the most prominent position on their code line. In other words, they should appear as far to the left as possible (as discussed in the "Keep Left" sidebar in Chapter 2).

If an if is being used solely to determine whether to invoke a flow-control statement, use the postfix form. Don't hide the action over on the right:

sub find_anomolous_sample_in {
    my ($samples_ref) = @_;

    MEASUREMENT:
    for my $measurement (@{$samples_ref}) {
        if ($measurement < 0) { last MEASUREMENT; }

        my $floor = int($measurement);
        if ($floor == $measurement) { next MEASUREMENT; }

        my $allowed_inaccuracy = scale($EPSILON, $floor);
        if ($measurement-$floor > $allowed_inaccuracy) {
            return $measurement;
        }
    }
    return;
}

Be "up front" about it:

sub find_anomolous_sample_in {
    my ($samples_ref) = @_;

    MEASUREMENT:
    for my $measurement (@{$samples_ref}) {
        last MEASUREMENT if $measurement < 0;

        my $floor = int($measurement);
        next MEASUREMENT if $floor == $measurement;

        my $allowed_inaccuracy = scale($EPSILON, $floor);
        return $measurement
            if $measurement-$floor > $allowed_inaccuracy;
    }
    return;}

Other Postfix Modifiers

Don't use postfix unless, for, while, or until.

The special dispensation to use postfix if in flow-control statements doesn't extend to any other types of statements. Nor does it extend to any of the other postfix statement modifiers.

The postfix looping modifiers create particular maintenance problems because they place the control flow (i.e., the loop specifier) to the right of the statement it controls. For example, a loop like:

print for grep {defined $_} @generated_lines;

makes it harder to notice the looped flow of control, especially if you also have statements like:

print $fh grep {defined $_} @generated_lines;

A proper for loop makes the iteration much more obvious:

for my $line (grep {defined $_} @generated_lines) {
    print $line;}

Note too that it's not possible to give a readable name to the iterator variable of a postfix loop, nor to easily nest conditional tests inside such a loop. Instead of being able to write the code in a straightforward, explicit, easy-to-follow, and extensible way:

for my $line (@generated_lines) {
    if (defined $line) {
        print lc $line;
    }}

you're forced to rely on boolean operations, and tempted by default behaviours:

defined and print lc for @generated_lines;

Worse still, using a postfix loop will sometime make it necessary to use explicit $_, which makes the resulting code much harder to understand:

$_ = lc for @generated_lines;

The same code is much clearer in block form:

for my $line (@generated_lines) {
    $line = lc $line;}

This disparity in readability grows greater as the number of statements to be iterated increases:

defined
and print lc
and (s{A cmd}{}xms or 1)
and push @non_cmds, $_
    for @generated_lines;

at which point most people would surely switch over to:

for my $line (@generated_lines) {
    if (defined $line) {
        print lc $line;
        $line =~ s{A cmd}{}xms;
        push @non_cmds, $line;
    }}

So why not start out that way? Starting with the postfix form means that you will almost inevitably have to rewrite some existing code using the block form. And rewriting existing code is a great way to introduce new bugs.

Negative Control Statements

Don't use unless or until at all.

Perl is unusual amongst programming languages in that it provides not only positive conditional tests (if and while), but also their negative counterparts (unless and until). Some people find that these keywords can make certain control structures read more naturally to them:

RANGE_CHECK:
until ($measurement > $ACCEPTANCE_THRESHOLD) {
    $measurement = get_next_measurement();
    redo RANGE_CHECK unless defined $measurement;
    # etc.
}

However, for many other developers, the relative unfamiliarity of these negated tests actually makes the resulting code harder to read than the equivalent "positive" version:

RANGE_CHECK:
while ($measurement <= $ACCEPTANCE_THRESHOLD) {
    $measurement = get_next_measurement();
    redo RANGE_CHECK if !defined $measurement;
    # etc.}

More importantly, the negative tests don't scale well. They almost always become much harder to comprehend as soon as their condition has two or more components, especially if any of those components is itself expressed negatively. For example, most people have significantly more difficulty understanding the double negatives in:

VALIDITY_CHECK:
until ($measurement > $ACCEPTANCE_THRESHOLD && ! $is_exception{$measurement}) {
    $measurement = get_next_measurement();
    redo VALIDITY_CHECK unless defined $measurement && $measurement ne '[null]';
    # etc.
}

So unless and until are inherently harder to maintain. In particular, whenever a negative control statement is extended to include a negative operator, it will have to be re-cast as a positive control, which requires you to change both the keyword and the conditional:

VALIDITY_CHECK:
while ($measurement < $ACCEPTANCE_THRESHOLD && $is_exception{$measurement}) {
    $measurement = get_next_measurement();
    redo VALIDITY_CHECK if !defined $measurement || $measurement eq '[null]';
    # etc.
}

Not only is that extra effort, it's error-prone effort. Reworking conditionals in that way is an excellent opportunity to introduce new and subtle bugs into your program…just as the previous example did. The original until condition was:

until ($measurement > $ACCEPTANCE_THRESHOLD && ! $is_exception{$measurement}) {

The corresponding "positive" version should have been:

while ($measurement <= $ACCEPTANCE_THRESHOLD || $is_exception{$measurement}) {

This kind of mistake is unfortunately common, but very easy to avoid. If you never use an unless or until, then you'll never have to rewrite it as an if or while. Moreover, if your control statements are always "positive", your code will generally be more comprehensible to all readers[26], regardless of the complexity of its logical expressions. Even in the simple cases, something like:

croak "$value is missing" if !$found;

is not appreciably harder to comprehend than:

croak "$value is missing" unless $found;

unless or until are minor conveniences that can easily become major liabilities. They're never necessary, and frequently counterproductive. Don't use them.

C-Style Loops

Avoid C-style for statements.

The three-part for statements that Perl inherits from C are needed only for unusual loop control behaviour, such as iterating by twos, or in an irregular sequence. But even in such cases, these C-style loops provide that unusual behaviour in an obscure and harder-to-maintain way.

That's because the iterative behaviour of a three-part for statement is emergent, rather than explicit. In other words, the only way to know what a loop like:

for (my $n=4; $n<=$MAX; $n+=2) {
    print $result[$n];
}

is going to do is to sit down and work out the abstract logic of the three components:

"Let's see: n starts at 4, and continues up to MAX, incrementing by two each time. So the sequence is 4, 6, 8, etc. So the loop iterates through all the even n's from 4 up to and including MAX (if MAX itself is even)."

But you could write the same loop without the C-style for, like this:

RESULT:
for my $n (4..$MAX) {
    next RESULT if odd($n);
    print $result[$n];}

The advantage with this version is that subsequent readers of the code no longer have to work out the logic of the loop. The code itself says explicitly:

"n from 4 to MAX, skipping values that are odd."

The code is clearer, which means it's more maintainable and less susceptible to subtle bugs or nasty edge-cases.

The usual counter-argument is that this second example has to iterate twice as many times for the same effect, and has to call a subroutine (odd()) each of those times. Should $MAX become large, that additional cost might become prohibitive.

In practice, many loops don't iterate enough times for those overheads to matter. And often the actual work done by the loop will swamp the costs of iteration anyway. But, if benchmarking indicates that the clearer-but-slower code is having a noticeable effect on performance, then a better solution is usually to "inline" the call to even(), replacing it with a direct check on each $n value:

RESULT:
for my $n (4..$MAX) {
    next RESULT if $n % 2;    # $n%2!=0 when $n is odd

    print $result[$n];}

In a simple example like this one, it can be hard to see the benefits of the more readable non-C-style for loops. But those benefits become clearer as the complexity of the loop control increases. Take a few moments to work out what the following three-part for loop does:

SALE:
for (my ($sales, $seats)=(0,$CAPACITY);
     $sales < $TARGET && $seats > $RESERVE;
     $sales += sell_ticket(), $seats--
) {
    prompt -yn, "[$seats seats remaining] Sell another? "
        or last SALE;
}

The fact that working out what that loop does really does take a few moments is a good indicator that the code is not sufficiently readable—especially when compared with the following non-C-style version:

my $sales = 0;
my $seats = $CAPACITY;

SALE:
while ($sales < $TARGET && $seats > $RESERVE) {
    prompt -yn, "[$seats seats remaining] Sell another? "
        or last SALE;

    $sales += sell_ticket();
    $seats--;}

The three-part for loop is rarely used as a pure for loop (i.e., to iterate a fixed number of times). More often, it's used as a complex while loop, which means that using an actual while loop in such cases is a much better practice.

Unnecessary Subscripting

Avoid subscripting arrays or hashes within loops.

Unless you actually need to know the indices of the array elements you're processing, iterate the values of an array directly:

for my $client (@clients) {
    $client->tally_hours();
    $client->bill_hours();
    $client->reset_hours();}

Iterating the indices and then doing repeated array accesses is significantly slower, and less readable:

for my $n (0..$#clients) {
    $clients[$n]->tally_hours();
    $clients[$n]->bill_hours();
    $clients[$n]->reset_hours();
}

Repeated indexing is repeated computation; duplicated effort that incurs an extra cost but provides no added benefit. Iterating indices is also prone to off-by-one errors. For example:

for my $n (1..@clients) {
    $clients[$n]->tally_hours();
    $clients[$n]->bill_hours();
    $clients[$n]->reset_hours();
}

Likewise, if you're processing the entries of a hash and you need only the values of those entries, don't iterate the keys and then look up the values repeatedly:

for my $original_word (keys %translation_for) {
   if ( $translation_for{$original_word} =~ m/ $PROFANITY /xms) {
       $translation_for{$original_word} = '[DELETED]';
   }
}

Repeated hash look-ups are even more costly than repeated array indexing. Just iterate the hash values directly:

for my $translated_word (values %translation_for) {
   if ( $translated_word =~ m/ $PROFANITY /xms) {
       $translated_word = '[DELETED]';
   }}

Note that this last example works correctly because, in Perl 5.6 and later, the values function returns a list of aliases to the actual values of the hash, rather than just a list of copies (see "Hash Values" in Chapter 8). So if you change the iterator variable (for example, assigning '[DELETED]' to $translated_word), you're actually changing the corresponding original value inside the hash.

The only situation where iterating values doesn't work correctly is when you need to delete the entries from the hash:

for my $translated_word (values %translation_for) {
   if ( $translated_word =~ m/ $PROFANITY /xms) {
       delete $translated_word;                    # Compile-time error
   }
}

Here, aliasing isn't enough, because the delete builtin needs to know the key as well, so it will only accept actual hash look-ups as arguments. The correct solution is to use a hash slice instead (see Chapter 5):

my @unacceptable_words
    = grep {$translation_for{$_} =~ m/ $PROFANITY /xms}
           keys %translation_for;delete @translation_for{@unacceptable_words};

The grep collects all those keys whose values must be removed, and stores that list in @unacceptable_words. The list of keys is then used to create a slice of the original hash (i.e., a list of hash look-ups), which can be passed to delete.

Necessary Subscripting

Never subscript more than once in a loop.

Sometimes you have no choice: you really do need to know the index of each value you're iterating over, as well as the value itself. But, even when it is necessary to iterate indices or keys, be sure to extract the value only once:

for my $agent_num (0..$#operatives) {                        # Iterate indices
    my $agent = $operatives[$agent_num];                     # Extract value once

    print "Checking agent $agent_num
";                     # Use index
    if ($on_disavowed_list{$agent}) {                        # Use value
        print "	...$agent disavowed!
";                    # Use value again
    }}

Never extract it repeatedly in the same iteration:

for my $agent_num (0..$#operatives) {                        # Iterate indices
    print "Checking agent $agent_num
";                     # Use index
    if ($on_disavowed_list{$operatives[$agent_num]}) {       # Extract value
        print "	...$operatives[$agent_num] disavowed!
";   # Extract value again
    }
}

Apart from the fact that repeated array look-ups are repeatedly expensive, they also clutter the code, and increase the maintenance effort if either the array name or the name of the iterator variable subsequently has to be changed.

Occasionally a mere copy of the value won't do, because you need to iterate both indices and values, and still be able to modify the values. It's easy to do that too—just use the Data::Alias CPAN module[27]:

use Data::Alias;

for my $agent_num (0..$#operatives) {           # Iterate indices
   alias my $agent = $operatives[$agent_num];   # Rename value

   print "Checking agent $agent_num
";         # Use index

   if ($on_disavowed_list{$agent}) {            # Use value

       $agent = '[DISAVOWED]';                  # Change value

   }}

The technique here is the same as before: iterate indices, then look up the value once. But in this case, instead of copying the value, the loop block creates a lexically scoped alias to it.

The alias function takes a variable as its only argument, and converts that variable into an alias variable. Assigning directly to the alias variable as you declare it:

alias my $agent                     # Alias variable
    = $operatives[$agent_num];      # Assigned expression

causes the alias variable to become another name for the assigned expression (usually another variable).

Once the aliasing is accomplished, anything that is done to the alias variable (including subsequent assignments) is actually being done to the variable to which it was aliased. Thus assigning to $agent actually assigns to $operatives[$agent_num]. The only difference is that there's no array element look-up involved, so accessing the alias is actually faster.

Exactly the same approach can be taken when it's necessary to iterate the keys and values of a hash:

for my $name (keys %client_named) {                             # Iterate keys
    alias my $client_info = $client_named{$name};               # Rename value

    print "Checking client $name
";                            # Use key
    if ($client_info->inactivity() > $ONE_YEAR) {                   # Use value
        $client_info                                            # Change value...
            = Client::Moribund->new({ from => $client_info });  # ...using value
    }}

Note that you can't use Perl's built-in each function for this kind of thing:

while (my ($name, $client_info) = each %client_named) {         # Iterate key/value
    print "Checking client $name
";                            # Use key
    if ($client_info->inactivity() > $ONE_YEAR) {       # Use value
        $client_info                                            # Change copy (!)
            = Client::Moribund->new({ from => $client_info });  # ...using value
    }
}

If you use each, the $client_info variable receives only a copy of each hash value, not an alias to it. So changing $client_info has no effect on the original values inside %client_named.

Extracting or renaming values is also a good practice if those values are nested inside a deeper data structure, or are the result of a subroutine call. For example, if the previous code needed to choose different responses for clients with different periods of inactivity, it would be cleaner and more efficient to extract that information only once:

for my $name (keys %client_named) {                    # Iterate keys
    alias my $client_info = $client_named{$name};      # Rename value
    my $inactive = $client_info->inactivity();         # Extract value once

    print "Checking client $name
";                   # Use key

    $client_info
        # Reuse value many times...   To decide new status of client...
        = $inactive > $ONE_YEAR   ? Client::Moribund->new({ from => $client_info })
        : $inactive > $SIX_MONTHS ?   Client::Silver->new({ from => $client_info })
        : $inactive > $SIX_WEEKS  ?     Client::Gold->new({ from => $client_info })
        :                           Client::Platinum->new({ from => $client_info })
        ;}

Iterator Variables

Use named lexicals as explicit for loop iterators.

From a readability standpoint, $_ is a terrible name for a variable, especially for an iterator variable. It conveys nothing about the nature or purpose of the values it stores, except that they're currently being iterated in the innermost enclosing loop. For example:

for (@candidates) {
    if (m/[ NO ] z/xms) {
        $_ = reconsider($_);

        $have_reconsidered{lc()}++;
    }
    else {
        print "New candidate: $_
";

        $_ .= accept_or_reject($_);

        $have_reconsidered{lc()} = 0;
    }
}

This piece of code starts off well enough: "For each of these candidates, if it[28] matches a certain pattern…". But things go downhill very quickly from there.

On the third line, the call to lc has its argument omitted, so the function defaults to using $_. And the maintainability of the code immediately suffers. Whoever wrote the code obviously knew that lc defaults to $_ in this way; in fact, that's probably part of the reason they used $_ as the loop iterator in the first place. But will future maintainers of the code know about that default behaviour? If not, they'll have to look up lc to check, which makes their job just a little harder. Unnecessarily harder.

The usual reply at this point is that those maintainers should know Perl well enough to know that lc defaults to lowercasing $_. But that's the crumbling edge of a very slippery slope. Which of the following built-in functions also default to $_?

abs          close           printf        sleep
chdir        die             require       -tchroot       localtime       select        -T

Even if you knew[29], and were confident that you knew, are you equally confident that your teammates know?

Taking advantage of the default behaviour of any builtin forces the reader to "fill in the gaps", and therefore makes the resulting code harder to read. And much harder to debug. Saying explicitly what you mean takes only a little extra coding effort—effort that is repaid many times over when the code is being maintained.

Meanwhile, the example gets worse:

else {
    print "New candidate: $_
";

    $_ .= accept_or_reject($_);

    $have_reconsidered{lc()} = 0;
}

Because $_ isn't a meaningful name in the context, it's hard to tell at a glance exactly what information is being printed by the first line of the else block. Likewise, it's impossible to tell what's being accepted or rejected by the following line, or what the result is being appended to.

All the problems of maintainability described so far can be eliminated simply by giving the iterator variable a human-readable name:

for my $name (@candidates) {
    if ($name =~ m/[ NO ] z/xms) {
        $name = reconsider($name);

        $have_reconsidered{lc($name)}++;
    }
    else {
        print "New candidate: $name
";

        $name .= accept_or_reject($name);

        $have_reconsidered{lc($name)} = 0;
    }}

In this version it's clear that the name of each candidate in the array is matched against a pattern. If the match succeeds, then the candidate's name has been (case-insensitively) reconsidered one more time. Otherwise, something is appended to their name, and it's noted that the name has not yet been reconsidered.

Now the reader doesn't need to remember default behaviours, because the code can't use them (as the iterator variable is no longer $_). Nor is there any need to mentally keep track of what the iterator variable actually contains, because it now has an explicit name that describes its contents.

Note that the recommendation to avoid $_ applies even if $_ is entirely implicit in the loop, and regardless of the type of loop:

while (<$bookmarks>) {
    if (m/phoenix|firebird/xms) {
        s/s* z/ (see firefox)
/xms;
    }
    print;
}

For maintainability, that's still better written as:

while (my $record = <$bookmarks>) {
    if ($record =~ m/phoenix|firebird/xms) {
        $record =~ s/s* z/ (see firefox)
/xms;
    }
    print $record;}

Non-Lexical Loop Iterators

Always declare a for loop iterator variable with my.

When using an explicit iterator variable in a for loop, make sure it's explicitly declared as a lexical variable, using the my keyword. That is, never write a for loop like this:

my $client;

SEARCH:
for $client (@clients) {
    last SEARCH if $client->holding();
}

if ($client) {
    $client->resume_conversation();
}

If you leave off the my, Perl doesn't reuse the lexical variable declared above the loop. Instead, it silently declares a new lexical variable (which is also named $client) as the iterator variable. That new lexical is always scoped to the loop block, and it hides any variable of the same name from any outer scope.

This behaviour is contrary to all reasonable expectation. Everywhere else in Perl, when you declare a lexical variable, it's visible throughout the remainder of its scope, unless another explicit my declaration hides it. So it's natural to expect that the $client variable used in the for loop is the same lexical $client variable that was declared before the loop.

But it isn't. The previous example is actually equivalent to:

my $client;

SEARCH:
for my $some_other_variable_also_named_client (@clients) {
    last SEARCH if $some_other_variable_also_named_client->holding();
}

if ($client) {
    $client->resume_conversation();
}

Writing it that way makes the logical error in the code much more obvious. The loop isn't setting the outermost lexical $client to the first client who's on hold. It's setting an inner lexical variable (which is also named $client in the original version). Then it's throwing that variable away at the end of the loop. The outer lexical $client retains its original undefined value, and the if block is never executed.

Unfortunately, the first version shown doesn't make that error obvious at all. It looks like it ought to work. It would work if the loop construct was anything other than a for. And that's the problem. Finding this particular bug is made very much more difficult by the counter-intuitive semantics of loop iterators.

Fortunately, if you always use an explicitly declared lexical iterator instead, the problem never arises, because it's obvious that there are two distinct $client variables:

my $client;

SEARCH:
for my $client (@clients) {
    last SEARCH if $client->holding();
}

if ($client) {
    $client->resume_conversation();}

Of course, the code is still broken. But now the declaration of a second lexical $client makes the problem obvious. Best practice isn't only about coding in a way that doesn't introduce errors. Sometimes it's also about coding in a way that doesn't conceal errors.

It's simply impossible to use the final value of a loop iterator variable after its for loop has terminated; iterator variables always cease to exist after the loop exits. So the correct solution here is to stop trying to reuse the iterator outside its loop and use a separate variable entirely:

my $holding_client;

SEARCH:
for my $client (@clients) {
    if ($client->holding()) {
         $holding_client = $client;
         last SEARCH;
    }
}

if ($holding_client) {
    $holding_client->resume_conversation();}

Or better still, factor the search out into a subroutine:

sub get_next_holding_client {
    # Search for and return any client on hold...
    for my $client (@_) {
        return $client if $client->holding();
    }

    # Fail if no clients on hold...
    return;
}

# and later...

my $client_on_hold = get_next_holding_client(@clients );

if ($client_on_hold) {
    $client_on_hold->resume_conversation();}

Or, best of all, just use a core utility[30]:

use List::Util qw( first );

my $client_on_hold = first {$_->holding} @clients;

if ($client_on_hold) {
    $client_on_hold->resume_conversation();}

The first function is exactly like a short-circuiting version of grep. You give grep a block and a list of values, and it returns all those values for which the block evaluates true. You also give first a block and a list of values, but then it returns only the first value for which the block is true (and doesn't bother to check the rest of the values).

List Generation

Use map instead of for when generating new lists from old.

A for loop is so convenient that it's natural to reach for it in any situation where a fixed number of list elements is to be processed. For example:

my @sqrt_results;
for my $result (@results) {
    push @sqrt_results, sqrt($result);
}

But code like that can be very inefficient, because it has to perform a separate push for every transformed element. Those pushes usually require a series of internal memory reallocations, as the @sqrt_results array repeatedly fills up. It is possible to preallocate space in @sqrt_results, but the syntax to do that is a little obscure, which doesn't help readability:

my @sqrt_results;

# Preallocate as many elements as @results already has...
$#sqrt_results = $#results;

for my $next_sqrt_result (0..$#results) {
    $sqrt_results[$next_sqrt_result] = sqrt $results[$next_sqrt_result];
}

You also have to use an explicit counter if you preallocate. You can't use push, because you just gave the array some number of preallocated elements, so push would put each new value after them.

The alternative is to use Perl's built-in map function. This function is specifically aimed at those situations when you want to process a list of values, to create some kind of related list. For example, to produce a list of square roots from a list of numbers:

my @sqrt_results = map { sqrt $_ } @results;

Some of the benefits of this approach are very obvious. For a start, there's less code, so (provided you know what map does) the code is significantly easier to understand. Less code also means there are likely to be fewer bugs, as there are fewer places for things to go wrong.

There are a couple of other advantages that aren't quite as obvious. For example, when you use map, most of your looping and list generation is being done in heavily optimized compiled C code, not in interpreted Perl. So it's usually being done considerably faster.

In addition, the map knows in advance exactly how many elements it will eventually process, so it can preallocate sufficient space in the list it's returning. Or rather it can usually preallocate sufficient space. If the map's block returns more than one value for each element of the original list, then extra allocations will still be necessary. But, even then, not as many as the equivalent series of push statements would require.

Finally, on a more abstract level, a map is almost always used to transform a sequence of data, so seeing a map immediately suggests to the reader that a data transformation is intended. And the syntax of the function makes it easy to visually locate both the transformation itself (what's in the braces) and the data it's being applied to (what's after the braces).

List Selections

Use grep and first instead of for when searching for values in a list.

The same principles apply when you want to refine a list by removing unwanted elements. Instead of a for loop:

# Identify candidates who are unfit for the cut-and-thrust of politics...
my @disqualified_candidates;
for my $name (@candidates) {
    if (cannot_tell_a_lie($name)) {
        push @disqualified_candidates, $name;
    }
}

just use a grep:

# Identify candidates who are unfit for the cut-and-thrust of politics...
my @disqualified_candidates
    = grep {cannot_tell_a_lie($_)} @candidates;

Likewise, don't use a for when you're searching a list for a particular element:

# Victimize someone at random...
my $scapegoat = $disqualified_candidates[rand @disqualified_candidates];

# Unless there's a juicier story...
SEARCH:
for my $name (@disqualified_candidates) {
    if (chopped_down_cherry_tree($name)) {
        $scapegoat = $name;
        last SEARCH;
    }
}

# Publish and be-damn...
print {$headline} "Disgraced $scapegoat Disqualified From Election!!!
";

Using the first function often results in code that is both more comprehensible and more efficient:

use List::Util qw( first );

# Find a juicy story...
my $scapegoat
    = first { chopped_down_cherry_tree($_) }  @disqualified_candidates;

# Otherwise victimize someone at random...
if (!defined $scapegoat) {
    $scapegoat = $disqualified_candidates[rand @disqualified_candidates];
}

# Publish and be-damn...print {$headline} "Disgraced $scapegoat Disqualified From Election!!!
";

List Transformation

Use for instead of map when transforming a list in place.

There is, however, a particular case where map and grep are not better than an explicit for loop: when you're transforming an array in situ. In other words, when you have an array of elements or a list of lvalues and you want to replace each of them with a transformed version of the original.

For example, suppose you have a series of temperature measurements in Fahrenheit, and you need them in Kelvin instead. You could accomplish that transformation by applying a map to the data and then assigning it back to the original container:

@temperature_measurements = map { F_to_K($_) } @temperature_measurements;

But the map statement has to allocate extra memory to store the transformed values and then assign that temporary list back to the original array. That process could become expensive if the list is large or the transformation is repeated many times.

In contrast, the equivalent for block can simply reuse the existing memory in the array:

for my $measurement (@temperature_measurements) {
    $measurement = F_to_K($measurement);}

Note that this second version also makes it slightly more obvious that elements of the array are being replaced. To detect that fact in the map version, you have to compare the array names at both ends of a long assignment statement. In the for-loop version, the more compact statement:

    $measurement = F_to_K($measurement);

makes it easier to see that each measurement is being replaced with some transformed version of its original value.

Complex Mappings

Use a subroutine call to factor out complex list transformations.

When a map, grep, or first is applied to a list, the block performing the transformation or conditional test can sometimes become quite complex[31]. For example:

use List::Util qw( max );

Readonly my $JITTER_FACTOR => 0.01;   # Jitter by a maximum of 1%

my @jittered_points
    = map { my $x = $_->{x};
            my $y = $_->{y};

            my $max_jitter = max($x, $y) / $JITTER_FACTOR;
            { x => $x + gaussian_rand({mean=>0, dev=>0.25, scale=>$max_jitter}),
              y => $y + gaussian_rand({mean=>0, dev=>0.25, scale=>$max_jitter}),
            }
          } @points;

This large block is very hard to read, especially since the final anonymous hash constructor looks more like a nested block. So the temptation is to use a for instead:

my @jittered_points;
for my $point (@points) {
    my $x = $point->{x};
    my $y = $point->{y};

    my $max_jitter = max($x, $y) / $JITTER_FACTOR;

    my $jittered_point = {
        x => $x + gaussian_rand({ mean=>0, dev=>0.25, scale=>$max_jitter }),
        y => $y + gaussian_rand({ mean=>0, dev=>0.25, scale=>$max_jitter }),
    };

    push @jittered_points, $jittered_point;
}

That certainly does help the overall readability, but it's still far from optimal. A better solution is to factor out the complex calculation into a separate subroutine, then call that subroutine within a now much simpler and more readable map expression:

my @jittered_points = map { jitter($_) } @points;

# and elsewhere...

# Add a random Gaussian perturbation to a point...
sub jitter {
    my ($point) = @_;
    my $x = $point->{x};
    my $y = $point->{y};

    my $max_jitter = max($x, $y) / $JITTER_FACTOR;

    return {
        x => $x + gaussian_rand({ mean=>0, dev=>0.25, scale=>$max_jitter }),
        y => $y + gaussian_rand({ mean=>0, dev=>0.25, scale=>$max_jitter }),
    };}

List Processing Side Effects

Never modify $_ in a list function.

One particular feature of the way the map, grep, and first functions work can easily become a source of subtle errors. These functions all use the $_ variable to pass each list element into their associated block. But, for better efficiency, these functions alias $_ to each list value they're iterating, rather than copying each value into $_.

You probably don't often think of map, grep, and first as creating aliases. You probably just think of those functions as taking a list and returning a second, independent list. And, most importantly, you almost certainly don't expect them to change the original list.

However, if the block you give to a map, grep, or first modifies $_ in any way, then it's actually modifying an alias to some element of the function's list. That means it's actually modifying the original element itself, which is almost certainly an error.

This kind of mistake commonly occurs in code like this:

# Select .pm files for which no corresponding .pl file exists...
@pm_files_without_pl_files
    = grep { s/.pmz/.pl/xms && !-e } @pm_files;

The intention here is almost certainly virtuous. The thought process was probably something like:

The implicit $_ successively holds a copy of each of the filenames in @pm_files. I'll replace the .pm suffix of that copy with .pl, then see if the resulting file exists. If it doesn't, then the original (.pm) filename will be passed through the grep to be collected in @pm_files_without_pl_files.

The mistake is simple, but deadly: $_ doesn't successively hold a copy of anything. It successively holds aliases. So the actual effect of the grep is far more sinister. $_ is an alias—that is, just another name—for each of the filenames in @pm_files. So the substitution in the grep block replaces the .pm suffix of each original filename with .pl; then the -e checks whether the resulting file exists. If the file doesn't exist, then the filename (now ending in .pl) will be passed through to @pm_files_without_pl_files. And, regardless of whether that name is passed through or not, the block will have modified the original element in @pm_files.

Oops!

Not only did that grep statement unintentionally mess up the contents of @pm_files, it didn't even do the job it was supposed to do. Because it changes each $_ on the way through, what you actually get back are the names of the .pl files that were M.I.A., not the .pm files that were looking for them.

This kind of error can occur anywhere that the block of any list-processing function uses any of Perl's numerous $_-modifying features. For example:

# Find the first "chunk" that spans more than one line
$next_multi_line_chunk
    = first { chomp; m/
/xms; } @file_chunks;

Here, the first block chomps the actual elements of @file_chunks, because the raw chomp chomps $_, which is successively aliased to each element of @file_chunks. But the first stops calling its block as soon as one of those post-chomped elements is found to still have a newline in it (m/ /xms).

So, after this assignment statement executes, the first will have surreptitiously chomped each element in @file_chunks, up to and including the first element that contained a newline. But, because first will have stopped checking at that point, none of the elements after the first match will have been modified at all. So @file_chunks is left in a state that is simultaneously unexpected (it's not obvious that the array was being modified at all), inconsistent (only part of the array has been modified), and unpredictable (how much was modified depends on the contents of the array).

Of course, there is no limit to human iniquity, and occasionally that kind of subtle nastiness is actually intentional. For example:

use List::MoreUtils qw( uniq );

# Remove directory pathnames from filenames and collect separately...
@dir_paths = uniq map { s{ A (.*/) }{}xms ? $1 : './' } @file_paths;

In this case, the sneaky substitution within the map block is deliberate. The implementer genuinely wants to chop off any leading 'some/path/here/' from each element of @file_paths, and at the same time collect those off-cuts into the @dir_paths array. Ten-out-of-ten for Perl savoir-faire, but minus several million for maintainability.

The rule here is simple: no map, grep, or first block should ever have a side effect. In particular, no map, grep, or first block should ever modify $_.

If your block really does need to modify a copy of each list element, then create the copy explicitly within the block:

@pm_files_without_pl_files
    = grep {
          my $file = $_;
          $file =~ s/.pmz/.pl/xms;
          !-e $file;      } @pm_files;

In this version, the substitution is applied to an explicit copy of the filename (in $file), so the original strings in @pm_files will be unchanged, and the filenames that flow through to @pm_files_without_pl_files will retain their original .pm suffixes.

On the other hand, if you find you really do need side effects in your map, grep, or first block, then don't use map or grep or first at all. Rewrite the code as a for loop instead. That way, the side-effects in the loop can easily be detected and understood:

# Track directory paths to ensure uniqueness...
my %seen_dir;

FILE_PATH:
for my $file (@file_paths) {
    # Default to current directory...
    my $dir_path = './';
    # Capture and remove any actual directory path and use it as the path...
    if ($file =~ s{ A (.*/) }{}xms) {
        $dir_path = $1;
    }

    # Reject repeated directory paths...
    next FILE_PATH if $seen_dir{$dir_path}++;

    # Record the extracted path...
    push @dir_paths, $dir_path;}

Multipart Selections

Avoid cascading an if.

Avoid cascades of if-elsif-elsif-else statements wherever possible. They tend to produce code with poor readability that is also expensive to execute.

The readability of an if cascade suffers because the blocks associated with each alternative have to be placed between the alternatives themselves. That can easily cause the entire construct to expand beyond a single screen or page. Any kind of code that extends over a visual boundary is very much more difficult to understand, because the reader is then forced to mentally cache parts of the construct as they scroll through it.

Even if the code doesn't cause a mental page fault, the alternation of condition-action-condition-action-condition-action can make it difficult to compare the conditions and hence to verify that the logic you're implementing is correct. For example, it can be hard to verify that, collectively, your conditions cover all the important alternatives. It can also be difficult to ensure that they are mutually exclusive.

Likewise, if the actions are very similar (e.g., assigning different values to the same variable), it's relatively easy to induce errors (mistyping the variable name in one branch, for example) or to introduce subtleties (such as deliberately using a different variable name in one branch).

The performance of an if cascade can also be suboptimal. Unless you are able to put the most common cases first, a cascaded if is going to have to test, on average, one-half of its alternative conditions before it can execute any of its blocks. And often it's simply not possible to put the common cases first, either because you don't know which cases will be the common ones or because you specifically need to check the special cases first.

The following guidelines examine specific types of cascaded if, and suggest alternative code structures that are more robust, readable, and efficient.

Value Switches

Use table look-up in preference to cascaded equality tests.

Sometimes an if cascade selects its action by testing the same variable against a fixed number of predefined values. For example:

sub words_to_num {
    my ($words) = @_;

    # Treat each sequence of non-whitespace as a word...
    my @words = split /s+/, $words;

    # Translate each word to the appropriate number...
    my $num = $EMPTY_STR;
    for my $word (@words) {
        if ($word =~ m/ zero | zéro /ixms) {
            $num .= '0';
        }
        elsif ($word =~ m/ one | un | une /ixms) {
            $num .= '1';
        }
        elsif ($word =~ m/ two | deux /ixms) {
            $num .= '2';
        }
        elsif ($word =~ m/ three | trois /ixms) {
            $num .= '3';
        }
        # etc. etc. until...
        elsif ($word =~ m/ nine | neuf /ixms) {
            $num .= '9';
        }
        else {
            # Ignore unrecognized words
        }
    }

    return $num;
}

# and later...

print words_to_num('one zero eight neuf'),# prints: 1089

A cleaner and more efficient solution is to use a hash as a look-up table, like so:

my %num_for = (
#   English       Français        Française
   'zero' => 0,   'zéro' => 0,
    'one' => 1,     'un' => 1,    'une' => 1,
    'two' => 2,   'deux' => 2,
  'three' => 3,  'trois' => 3,
#        etc.           etc.
   'nine' => 9,   'neuf' => 9,
);

sub words_to_num {
    my ($words) = @_;

    # Treat each sequence of non-whitespace as a word...
    my @words = split /s+/, $words;

    # Translate each word to the appropriate number...
    my $num = $EMPTY_STR;
    for my $word (@words) {
        my $digit = $num_for{lc $word};
        if (defined $digit) {
             $num .= $digit;
        }
    }

    return $num;
}

# and later...print words_to_num('one zero eight neuf'),    # prints: 1089

In this second version, words_to_num() looks up the lowercase form of each word in the %num_for hash and, if that look-up provides a defined result, appends it to the number being created.

The primary advantage here is that the code in the for loop never need change, no matter how many extra words you subsequently add to the look-up table. For example, if we wished to cater for Hindi digits as well, then you'd need to change the if'd version in 10 separate places:

for my $word (@words) {
        if ($word =~ m/ zero | zéro | shunya /ixms) {
            $num .= '0';
        }
        elsif ($word =~ m/ one | un | une | ek /ixms) {
            $num .= '1';
        }
        elsif ($word =~ m/ two | deux | do /ixms) {
            $num .= '2';
        }
        elsif ($word =~ m/ three | trois | teen /ixms) {
            $num .= '3';
        }
        # etc.
        elsif ($word =~ m/ nine | neuf | nau /ixms) {
            $num .= '9';
        }
        else {
            # Ignore unrecognized words
        }
    }

But, in the look-up table version, the only change would be to add an extra column to the table itself:

my %num_for = (
#    English       Français         Française          Hindi
   'zero' => 0,    'zéro' => 0,                   'shunya' => 0,
    'one' => 1,      'un' => 1,    'une' => 1,        'ek' => 1,
    'two' => 2,    'deux' => 2,                       'do' => 2,
  'three' => 3,   'trois' => 3,                     'teen' => 3,
#        etc.            etc.                             etc.
   'nine' => 9,    'neuf' => 9,                      'nau' => 9,);

Factoring the translations out into a table also improves the readability of the code, both because the code is more compact, and because tables are a familiar and comprehensible way to structure information.

The values to be looked up in a table don't have to be scalar constants. For example, here's a simple module that installs a debug() function, whose behaviour can be configured when the module is loaded:

package Debugging;

use Carp;
use Log::Stdlog  { level => 'debug' };

# Choice of actions when debugging...
my %debug_mode = (
  # MODE           DEBUGGING ACTION
    off    => sub {},
    logged => sub { return print {*STDLOG} debug =>  @_; },
    loud   => sub {                  carp 'DEBUG: ', @_; },
    fatal  => sub {                 croak 'DEBUG: ', @_; },
);

# Change debugging behaviour whenever module is used...
sub import {
    my $package = shift;
    my $mode    = @_ > 0 ? shift : 'loud';   # Default to carping

    # Locate appropriate behaviour, or die trying...
    my $debugger = $debug_mode{$mode};
    croak "Unknown debugging mode ('$mode')" if !defined $debugger;

    # Install new behaviour...
    use Sub::Installer;
    caller()->reinstall_sub(debug  => $debugger);

    return;}

The module's import() subroutine (which is called whenever the module is loaded) takes a string that specifies how the newly created debug() subroutine should behave. For example:

use Debugging qw( logged );   # debug() logs messages

That string is unpacked into $mode within the import subroutine, and then used as a look-up key into the %debug_mode hash. The look-up returns an anonymous subroutine that is then installed (via the Sub::Installer module) as the caller's debug() subroutine.

Once again, the advantage is that the logic of the import() subroutine doesn't have to change when additional debugging alternatives become available. You can simply add the new behaviour (as an anonymous subroutine) to the %debug_mode table. For example, to provide a debugging mode that counts messages:

# Choice of actions when debugging...
my %debug_mode = (
  # MODE           DEBUGGING ACTION
    off     => sub {},
    logged  => sub { return print {*STDLOG} debug => @_;   },
    loud    => sub {                 carp 'DEBUG: ', @_; },
    fatal   => sub {                croak 'DEBUG: ', @_; },
    counted =>  do {
                       my $count = 1;     # Private variable for sub

                       sub { carp "DEBUG: [$count] ", @_; $count++; }
                   },);

Tabular Ternaries

When producing a value, use tabular ternaries.

Hash-based table look-ups aren't always feasible. Sometimes decisions have to be made based on a series of tests, rather than on a particular value. However, if each alternative course of action results in a simple value, then it's still possible to avoid explicit cascaded ifs and preserve a tabular layout in your code. The trick is to use the ternary operator (?:) instead.

For example, to produce a suitable string for a salutation in a form letter, you might write something like:

my $salute;
if ($name eq $EMPTY_STR) {
    $salute = 'Dear Customer';
}
elsif ($name =~ m/A ((?:Sir|Dame) s+ S+)/xms) {
    $salute = "Dear $1";
}
elsif ($name =~ m/([^
]*), s+ Ph[.]?D z/xms) {
    $sa1ute = "Dear Dr $1";
}
else {
    $salute = "Dear $name";
}

The repeated assignments to $salute suggest that a cleaner solution, using only a single assignment, may be possible. Indeed, you could build a simple tabular structure to determine the correct salutation, by cascading ternaries instead of ifs, like so:

           # Name format...                            # Salutation...
my $salute = $name eq $EMPTY_STR                       ? 'Dear Customer'
           : $name =~ m/ A((?:Sir|Dame) s+ S+) /xms ? "Dear $1"
           : $name =~ m/ (.*), s+ Ph[.]?D z     /xms ? "Dear Dr $1"
           :                                             "Dear $name"           ;

The efficiency of this series of tests will be exactly the same as the preceding cascaded-if version, so there's no advantage in that respect. The advantages of this approach are in terms of readability and comprehensibility. For a start, it's very obvious that this extended construct is, despite the many alternatives it considers, really just a single assignment statement. And it's very easy to confirm that the correct variable is being assigned to[32].

A second advantage of the ternary version is that (if you squint a little) it looks like a table: one column of tests on $name and a second column listing the corresponding salutations. It even has column borders of a kind: the vertical rows of colons and question marks.

The ternary version is also considerably more compact, and requires two-thirds fewer lines than the equivalent cascaded if. That makes it far easier to keep the code on one screen as additional alternatives are added.

The final advantage of using ternaries instead of an if cascade is that the syntax of the ternary operator is much stricter. In a regular cascaded if statement, it's easy to accidentally leave off the final unconditional else. For example:

my $salute;
if ($name eq $EMPTY_STR) {
    $salute = 'Dear Customer';
}
elsif ($name =~ m/A ((?:Sir|Dame) s+ S+)/xms) {
    $salute = "Dear $1";
}
elsif ($name =~ m/(.*), s+ Ph[.]?Dz/xms) {
    $salute = "Dear Dr $1";
}

In which case, $salute might sometimes unexpectedly remain undefined.

However, there is no way to make the same mistake using a ternary cascade:

           # Name format...                          Salutation...
my $salute = $name eq $EMPTY_STR                     ? 'Dear Customer'
           : $name =~ m/A((?:Sir|Dame) s+ S+)/xms ? "Dear $1"
           : $name =~ m/(.*), s+ Ph[.]?D z    /xms ? "Dear Dr $1"
           ;

If you do, Perl will immediately (and lethally) inform you that leaving out the final alternative is a syntax error.

do-while Loops

Don't use do…while loops.

Like any other postfix looping construct, a do…while loop is intrinsically hard to read, because it places the controlling condition at the end of the loop, rather than at the beginning.

More importantly, in Perl a do…while loop isn't a "first-class" loop at all. Specifically, you can't use the next, last, or redo commands within a do…while. Or, worse still, you can use those control directives; they just won't do what you expect.

For example, the following code looks like it should work:

sub get_big_int {
    my $int;

    TRY:
    do {
        # Request an integer...
        print 'Enter a large integer: ';
        $int = <>;

        # That's not an integer!...
        next TRY if $int !~ /A [-+]? d+ 
? z/xms;

        # Otherwise tidy it up a little...
        chomp $int;
    } while $int < 10;   # Until the input is more than a single digit

    return $int;
}

# and later...
for (1..$MAX_NUMBER_OF_ATTEMPTS) {
    print sqrt get_big_int(), "
";
}

That looks okay, but it isn't. Specifically, if a non-integer is ever entered and the next TRY command is invoked, that next starts looking for an appropriately labeled loop to re-iterate. But the do…while isn't actually a loop; it's a postfix-modified do block. So the next ignores the TRY: label attached to the do. Control passes out of the do block, and then out of the subroutine call (a subroutine isn't a loop either), until it returns to the for loop. But the for loop isn't labeled TRY:, so control passes outwards again, this time right out of the program.

In other words, if the user ever enters a value that isn't a pure integer, the entire application will immediately terminate—not a very robust or graceful way to respond to errors. That kind of bug is particularly hard to find too, because it's one of those rare cases of a Perl construct not doing what you mean. It looks right, but it works wrong.

The best practice is to avoid do…while entirely. The simple way to do that is to use a regular while loop instead, but to "counter-initialize" the $int variable, to guarantee that the loop executes at least once:

sub get_big_int {
    my $int = 0;   # Small value so the while loop has to iterate at least once

    TRY:
    while ($int < 10) {
        print 'Enter a large integer: ';
        $int = <>;

        next TRY if $int !~ /A [-+]? d+ 
? z/xms;

        chomp $int;
    }

    return $int;}

Sometimes, however, the condition to be met is too complex to permit counter-initialization, or perhaps no counter-initial value is possible. This most commonly occurs when the test is performed by a separate subroutine. In such cases, either use a flag:

sub get_big_int {
    my $tried = 0;
    my $int;

    while (!$tried || !is_big($int)) {
        print 'Enter a valid integer: ';
        $int = <>;

        chomp $int;

        $tried = 1;
    }

    return $int;}

or, better still, use a return to explicitly escape from an infinite loop:

sub get_big_int {
    while (1) {
        print 'Enter a valid integer: ';
        my $int = <>;

        chomp $int;

        return $int if is_big($int);
    }

    return;}

Linear Coding

Reject as many iterations as possible, as early as possible.

Chapter 2 recommends the practice of "coding in paragraphs" as a way to chunk code and improve its comprehensibility. Taking this idea one step further, it is also good practice to "process in paragraphs". That is, don't wait until you have all your data assembled before you start checking it. It's more efficient, and often more comprehensible, to verify as you go.

Checking data as soon as it's available means that you can short-circuit sooner if the data is unacceptable. More importantly, the resulting "paragraphs" of code are then specific to each piece of data, rather than to one phase of the processing. That means your code chunks are better focused on the distinct elements of the problem domain, rather than on the more complex interactions between those elements.

For example, instead of:

for my $client (@clients) {
    # Compute current and future client value...
    my $value     = $client->{volume} * $client->{rate};
    my $projected = $client->{activity} * $value;

    # Verify client is active, worth watching, and worth keeping...
    if ($client->{activity}
        && $value >= $WATCH_LEVEL
        && $projected >= $KEEP_LEVEL
    ) {
        # If so, add in the client's expected contribution...
        $total += $projected * $client->{volatility};
    }
}

you can generate-and-test each datum sequentially, like so:

CLIENT:
for my $client (@clients) {
    # Verify active client...
    next CLIENT if !$client->{activity};

    # Compute current client value and verify client is worth watching...
    my $value = $client->{volume} * $client->{rate};
    next CLIENT if $value < $WATCH_LEVEL;

    # Compute likely client future value and verify client is worth keeping...
    my $projected = $client->{activity}* $value;
    next CLIENT if $projected < $KEEP_LEVEL;

    # Add in client's expected contribution...
    $total += $projected * $client->{volatility};}

Note that this second version deals with each part of the decision separately, instead of lumping them together in one bloated, multiline conditional test. This sequential approach makes it easier to see the different criteria that are being tested, as you can focus on one criterion—and its associated data—at a time. Linear coding also typically reduces the number of nested blocks that are required, which can further improve readability.

Better still, the second version is potentially much more efficient. The first version computes $value and $projected for every single client, whether or not the loop eventually uses that data. The second version does no work at all on inactive clients, because its first next statement terminates each iteration as soon as torpor is detected. Similarly, the loop block does only half as much work on the clients who are active but not worth watching.

Distributed Control

Don't contort loop structures just to consolidate control.

The bloated conditional tests mentioned in the previous guideline can also appear in the conditions of loop structures, where they usually indicate the (mis)application of structured programming techniques.

Proponents of structured programming usually insist that every loop should have only a single exit point: the conditional expression that's controlling the loop. The very laudable intent of that rule is to make it easier to determine the correctness of the loop by consolidating all information about its termination behaviour in a single place.

Unfortunately, blind adherence to this principle frequently produces code that looks like this:

Readonly my $INTEGER => qr/A [+-]? d+ 
? z/xms;

my $int   = 0;
my $tries = 0;
my $eof   = 0;

while (!$eof
       && $tries < $MAX_TRIES
       && ( $int !~ $INTEGER || $int < $MIN_BIG_INT )
) {
    print 'Enter a big integer: ';
    $int = <>;
    if (defined $int) {
        chomp $int;

        if ($int eq $EMPTY_STR) {
            $int = 0;
            $tries--;
        }
    }
    else {
        $eof = 1;
    }
    $tries++;
}

The loop conditional typically contains a mixture of positive and negative tests on several flag variables. The block itself then contains multiple nested if tests, mainly to set the termination flags or to pre-empt further execution if an exit condition is encountered within the block.

When a loop has been contorted in this manner, it's often extremely difficult to understand. Take a moment to work through the previous example code and determine exactly what it does.

Now compare that convoluted code with the following version (which provides exactly the same behaviour):

Readonly my $INTEGER => qr/A [+-]? d+ 
? z/xms;

my $int;

INPUT:
for my $attempt (1..$MAX_TRIES) {
    print 'Enter a big integer: ';
    $int = <>;

    last INPUT if not defined $int;
    redo INPUT if $int eq "
";
    next INPUT if $int !~ $INTEGER;

    chomp $int;
    last INPUT if $int >= $MIN_BIG_INT;}

This version requires no flag variables. It has fewer lines of code. It has no nested conditionals or multipart tests. You can easily work out what it does on end-of-file or when the input is an empty line or when given a non-integer simply by working through the linear sequence of tests within the block.

Herding loop flags into a single location only gives the illusion of consolidating control. A complex exit condition still relies on other tests within the loop to set the appropriate flags, so the actual control is still implicitly distributed.

Perl provides clean ways to distribute control explicitly throughout a loop. Use them.

Redoing

Use for and redo instead of an irregularly counted while.

In the final version of the input code shown in the previous guideline, a while loop plus a count variable ($tries) was replaced by a for loop. This is a good practice in any situation where a while loop is controlled by a variable that is linearly incremented on each iteration. Using a for makes explicit your intention to loop a fixed number of times. It also eliminates both the count variable and the need to explicitly test that variable against some maximal value. That, in turn, removes the possibility of forgetting to increment the variable and the risk of off-by-one errors in the explicit test.

However, this kind of loop refactoring is satisfactory only when the count variable is uniformly incremented on every iteration. There are plenty of situations where that is not quite the case; where the count is usually incremented each time, but not always. Such exceptions obviously create a serious problem in a fixed-repetition for loop.

For example, the previous example didn't count an empty input line as a legitimate "try". That was easy to accommodate in the "while ($tries < $MAX_TRIES)" version; you simply don't increment $tries in that case. But, in a for loop, the expected number of iterations is fixed before the loop even starts, and you have no control over the incrementing of the loop variable. So it would seem that a for loop is contraindicated whenever the iteration-counting is irregular.

Fortunately, the redo statement allows a loop to have its cake (i.e., be a for loop instead of a while) and eat it too (by still discounting certain iterations). That's because a redo sends the execution back to the start of the current iteration of the loop block: "Do not pass for. Do not collect another iterated value."

Using a redo allows you to take advantage of the fixed-iteration semantics of a for loop (with their cleaner and more maintainable syntax), but still allows controlled deviations from a fixed number of iterations when necessary. It also makes the program's control flow more obvious and comprehensible. There is no longer any need to decode the implicit behaviour caused by (non-)changes to the value of a count variable. Instead, every exception to the fixed number of repetitions promised by the for loop is explicitly marked with a redo keyword.

For these reasons it's a good practice to replace any "counted" while loop with a for loop, and then to use a redo in any situation where the count should not be incremented for that particular iteration.

Unfortunately, this practice does not generalize to situations where the count must be decremented in any way or increased by more than one. In such cases, a while-plus-$count is the correct solution.

Loop Labels

Label every loop that is exited explicitly, and use the label with every next, last, or redo.

The next, last, and redo statements make it much easier to specify sophisticated flow of control in a readable manner. And that readability is further enhanced if the reader doesn't have to puzzle out which particular loop a given next, last, or redo is controlling.

The easiest way to accomplish that is to label every loop in which a next, last, or redo is used. Then use the same label on each next, last, and redo in that loop. The reader can then match up the name on the keyword against the labels on the surrounding loops to determine which loop's flow of control is being altered.

So you should write:

INPUT:
for my $try (1..$MAX_TRIES) {
    print 'Enter an integer: ';
    $int = <>;

    last INPUT if not defined $int;
    redo INPUT if $int eq "
";

    chomp $int;
    last INPUT if $int =~ $INTEGER;}

instead of:

for my $try (1..$MAX_TRIES) {
    print 'Enter an integer: ';
    $int = <>;

    last if not defined $int;
    redo if $int eq "
";

    chomp $int;
    last if $int =~ $INTEGER;
}

Another, less obvious benefit of following this guideline is that the presence of the label at the start of any loop alerts the reader to the fact that the loop has embedded flow control.

Place the label on the line preceding the loop keyword, at the same level of indentation, and with an empty line (or a paragraph comment) above it. That way, the label helps the loop stand out, but leaves the actual loop keyword on the left margin, where it's easy to see.

When you're labeling a loop, choose a label that helps to document the purpose of the loop, and of the flow control statements. In particular, don't name loops LOOP:

LOOP:
for my $try (1..$MAX_TRIES) {
    print 'Enter an integer: ';
    $int = <>;

    last LOOP if not defined $int;
    redo LOOP if $int eq "
";

    chomp $int;
    last LOOP if $int =~ $INTEGER;
}

That's as bad as naming a variable $var or calling a subroutine func().

Labelling loops is especially important for maintainability. A typical mistake is to initially write code that correctly exits from a loop like so:

while (my $client_ref = get_client()) {
    # Retrieve phone number...
    my $phone = $client_ref->{phone};

    # Skip client if "do not call" was requested...
    next if $phone =~ m/do s+ not s+ call/ixms;

    # Profit!
    send_sms_to($phone, $advert);
}

Later, a change of internal data structure may make it necessary to add an inner loop, at which point the flow of control can easily go awry:

while (my $client_ref = get_client()) {
    my $preferred_phone;

    # Retrieve phone number (clients can now have more than one)...
    for my $phone ( @{ $client_ref->{phones} } ) {
        # Skip client if "do not call" was requested...
        next if $phone =~ m/do s+ not s+ call/ixms;
        # Select phone number...
        $preferred_phone = $phone;
        last;
    }
    # Profit!
    send_sms_to($preferred_phone, $advert);
}

The problem here is that the intention was to give up trying to contact clients if any of their numbers is marked "Do Not Call". But moving the next if… inside a nested for loop means that the next no longer moves the loop on to the next client, just on to the next phone number of the current client. Apart from causing you to annoy clients who have specifically asked you not to, this error also introduces the possibility that $preferred_phone will be undefined when it's finally passed to send_sms_to().

In contrast, if your policy is to always label every loop that has a flow control statement inside it:

CLIENT:
while (my $client_ref = get_client()) {
    # Retrieve phone number...
    my $phone = $client_ref->{phone};

    # Skip client if "do not call" was requested...
    next CLIENT if $phone =~ m/do s+ not s+ call/ixms;

    # Profit!
    send_sms_to($phone, $advert);}

then either the updated code will be automatically correct:

CLIENT:
while (my $client_ref = get_client()) {
    my $preferred_phone;

    # Retrieve phone number (clients can now have more than one)...
    PHONE_NUMBER:
    for my $phone ( @{ $client_ref->{phones} } ) {
        # Skip client if "do not call" was requested...
        next CLIENT if $phone =~ m/do s+ not s+ call/ixms;

        # Select phone number...
        $preferred_phone = $phone;
        last PHONE_NUMBER;
    }

    # Profit!
    send_sms_to($preferred_phone, $advert);}

or the error will be obvious, as the comment and the target of the next will then be manifestly inconsistent:

# Skip client if "do not call" was requested...
next PHONE_NUMBER if $phone =~ m/do s+ not s+ call/ixms;


[25] See Chapter 13 for an explanation of the throw() method.

[26] Including yourself in six months' time…

[27] Or the Lexical::Alias module if you're running under a version of Perl earlier than 5.8.1.

[28] In Perl, the best way to pronounce $_ is usually "it".

[29] Of the 12 builtins listed, only abs, chroot, require, and -T default to operating on $_.

[30] Core in 5.8 and later. Available on the CPAN for earlier versions of Perl.

[31] map blocks rarely start out complex, but the complexity of any critical piece of code will tend to grow over time. This fall from grace seems to be caused not so much by the Second Law of Thermodynamics as by the First Law of Murphy.

[32] Did you notice the cascaded-if version has a bug in its third alternative? That branch doesn't assign to $salute; it assigns to $sa1ute.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset