CHAPTER 6

Structure, Flow, and Control

In this chapter, we look at Perl's control structures, starting with the basic syntax of the language, expressions, and statements. We will then build these into more complex structures such as compound statements (also known as blocks) and conditional statements.

We will consider Perl's conditional statements, and then move on to read about loops in Perl. We will look at the various statements that we can use to create loops and how to use them in particular with lists, arrays, and hashes. We will also look into the modifiers provided by Perl to change the behavior of loops.

Declarations, Statements, Expressions, and Blocks

A Perl program consists of a mixture of statements, declarations, and comments. Statements are executed by the Perl interpreter at run time. Declarations, on the other hand, are directives that affect the way that the program is compiled. Therefore, a declaration's effect ends after compilation, whereas a statement will effect the program each time that section of code is run. The sub, my, and use keywords are the most obvious types of declarations.

Statements are made up of expressions, and can be combined together into a block or compound statement. An expression is any piece of Perl code that produces a value when it is evaluated; we saw plenty of these in the preceding chapters. For example, the number 3 is an expression because it produces 3 when evaluated. Operators combine expressions to form larger expressions. For example, 3+6 is an expression that produces the number 9. In Perl, however, the distinction between statements and expressions is rather arbitrary. Most statements are simply expressions whose return values are discarded rather than used. The distinction between statements and declarations is also somewhat arbitrary. Perl, like many other languages inspired by or derived from Lisp, allows code to be run during the compilation phase, prior to execution. This blurs the line between compilation and execution and between declarations and statements.

Blocks are just aggregations of statements, defined with the use of curly braces. A block defines its own scope, so variables declared inside a block only last so long as that block is being executed. More technically, blocks create a new stack frame, and variables allocated within it last only so long as that frame exists, unless a reference to them is retained by something outside the block. In that case, the variable persists (via the reference) so long as the reference does.

Declarations

Declarations take effect at compile time, rather than at run time. An important class of declaration is the inclusion of modules or source code with the use keyword. This includes both pragmatic modules like use integer and functional modules like use CGI. Another is declaring a subroutine ahead of its definition with sub. Other kinds of declaration include my and our statements and format definitions. The following are all examples of declarations:

sub mysubroutine ($);   # declare a subroutine with one scalar argument

my $scalar;   # declare a lexical variable (at compile-time)

# define a format for STDOUT
format =
@<<<< = @>>>>

$key, $value

use warnings;   # use pragmatic module

use strict;   # use pragmatic module

use CGI qw (:standard);   # use CGI module

BEGIN {
   print "This is a compile-time statement";
}

The BEGIN block is an interesting example, because it is the most obvious case of code being executed during the compilation rather than the execution of a program. The block behaves like a declaration, but the contents are statements. use also falls into this category; it is really a require statement and an import statement wrapped inside a BEGIN block that executes at compile time. So, arguably, use is not a declaration, but like BEGIN, a mechanism to execute code during the compile phase. Whether a BEGIN block is a declaration or not is a moot point, but it happens at the compile phase, so it is certainly not a regular block.

All of these examples demonstrate features of Perl covered later in the book, so for now we will just note their existence and move on. For the curious, subroutines are covered in Chapter 7; my and our in Chapter 8; use, require, and import in Chapter 9; and the BEGIN block (along with siblings END, CHECK, and INIT) is covered in more detail in Chapter 10. Formats can be found in Chapter 18.

Expressions and Simple Statements

An expression in Perl is any construct that returns a value (be it scalar, list, or even an undefined value). It can be anything from a literal number to a complex expression involving multiple operators, functions, and subroutine calls. That value can then be used in larger expressions or form part of a statement. Statements differ from expressions only in that they are separated from each other by semicolons. The chief distinction between a statement and an expression is that a statement does not return a value (for example, an if statement) or returns a value that is ignored (for example, a print statement).

This second kind of statement is really just an expression in disguise—Perl will detect that we don't use the return value and optimize the code accordingly, but we could still change our mind and use it if we choose to. In Perl terminology, these expressions are evaluated in void context. For example, the statement $b = $a is also an expression, which returns the value of the assignment ($a). We can see this in action by considering the following two statements:

$b = $a;        # $b = $a is a statement
$c = $b = $a;   # $b = $a is now an expression, and $a assigns to $c

Another way of looking at this is to say that a statement is an expression that is executed primarily because it performs a useful action rather than returns a result. A print statement is probably the most common example:

print "Hello World";

This is a statement by virtue of the fact that the return value from print is not used. Indeed, it is easy to forget that print does in fact return a value. In point of fact, print actually returns a true value if it succeeds and a false value if the filehandle to which it is printing is invalid. We rarely bother to check for this when printing to standard output, but if we wanted to, we could turn this statement into an expression by writing

$success = print "Hello World";

Now we have a print expression whose value is used in an assignment statement. For standard output this is usually not necessary, but if we were printing to a filehandle, and especially to one that could become invalid outside our control, like a network socket, this becomes more important.

Blocks and Compound Statements

The block is a Perl construct that allows several statements (which in turn may be simple statements or further blocks) to be logically grouped together into a compound statement and executed as a unit. Blocks are defined by enclosing the statements within curly braces with the final statement optionally terminated by a semicolon. The general form of a block is therefore

{ STATEMENT; STATEMENT; ... ; STATEMENT[;] }

This is the most obvious form of block. Less obvious is that a block is also created by the limits of a source file. A simple Perl script is a block that starts at the top of the file and ends at the bottom (or a __DATA__ or __END__ token, if one is present in the file; see Chapter 12 for more on these). Likewise, an included file that has been read in using require also defines a block corresponding to the included file.

The definition of a block is important because in addition to grouping statements together logically, a block also defines a new scope in which variables can be declared and used. Following are two short programs, one of which executes the other via require. Both files define their own scope, and in addition the explicit block in the second program child.pl defines a third. Here is the parent process:

#!/usr/bin/perl
# parent.pl
use warnings;
use strict;

my $text = "This is the parent";
require 'child.pl';
print "$text ";   # produces "This is the parent"

and here is the child process:

#!/usr/bin/perl
# child.pl
use warnings;
use strict;

my $text = "This is the child";
{
   my $text = "This is block scoped";
   print "$text ";   # produces "This is block scoped";
}
print "$text ";   # produces "This is the child";

Variables that are defined within a particular scope only exist as long as that block is being executed and are not seen or usable by the wider scope outside the block. This has a lot of significant implications, as we will see.

Blocks in Perl Statements

Almost all of Perl's control structures (such as if statements, for and while loops, and subroutine declarations) can accept a block in their definitions, and many require it. For example, the if statement requires a block to encapsulate as the action that follows its condition; a simple statement will not do and will cause Perl to generate a syntax error.

if (EXPRESSION) { STATEMENT; STATEMENT; ... STATEMENT[;] }

Or, put more simply:

if (EXPRESSION) BLOCK

Note that a block is not the equivalent of a statement. As we just saw, blocks are accepted in places where simple statements are not. Also, blocks do not require a terminating semicolon after the closing brace, unlike the statements inside it. Significantly, in some contexts blocks can also return a value.

Naked Blocks

Although it is their most common application, blocks do not have to belong to a larger statement. They can exist entirely on their own, purely for the purposes of defining a scope. The following example shows a block in which several scalar variables are defined using my. The variables exist for the lifetime of the block's execution and then cease to exist.

#!/usr/bin/perl
# time.pl
use warnings;

# a bare block definition
{
   # define six scalars in new block scope:
   my ($sec, $min, $hour, $day, $month, $year) = localtime();
   # variables exist and can be used inside block
   print "The time is: $hour: $min. $sec ";
   $month++;
   $year += 1900;
   print "The date is: $year/ $month/ $day ";
   # end of block - variable definitions cease to exist
}

# produces 'uninitialized value' warning - $sec does not exist here
print "$sec seconds ";

The output from this is


Name "main::sec" used only once: possible typo at d.pl line 18.
The time is: 2: 30. 5
The date is: 2000/ 12/ 15
Use of uninitialized value in concatenation (.) at d.pl line 18.
 seconds

Note that adding use strict would turn the preceding warning into a compile-time syntax error as strictness requires declaring all variables.

If we take a reference to a bare block, it can also be used to define an anonymous subroutine, a subject we will cover in Chapter 7.

Defining the Main Program As a Block

An interesting use of blocks is to put the main program code into a block within the source file. This helps to distinguish the actual program from any declarations or initialization code (in the shape of use statements and so forth) that may occur previously. It also allows us to restrict variables needed by the main program to the scope of the main program only, rather than turning them into global variables, which should be avoided. Consider the following simple but illustrative program:

#!/usr/bin/perl
# blockmain.pl

# Declarations First
use strict;
use warnings;

# Initialization code, global scope

my $global_variable = "All the World can see Me";
use constant MY_GLOBAL_CONSTANT => "Global Constant";

# Here is the main program code

MAIN: {
   # variable defined in the main program scope, but not global
   my $main_variable = "Not visible outside main block";
   print_variables ($main_variable);
}

# No one here but us subroutines...

sub print_variables {
   print $global_variable, " ", MY_GLOBAL_CONSTANT, " ";
   # print $main_variable, " ";   #error!
   print $_[0], " ";   # passed from main block, ok now
}

We have used a label MAIN: to prefix the start of the main program block to make it stand out. The use of the label MAIN: is entirely arbitrary—we could as easily have said MY_PROGRAM_STARTS_NOW:. However, MAIN: is friendlier to those coming from a C programming background where a main function is required. Of course, we could also create a real main subroutine, and we need to make sure that we call it.

The issue of scoping variables so they are invisible from subroutines is not a minor one. If we had failed to enable warnings and strict mode, and if we had uncommented the second line of print_variables, Perl would have happily accepted the undefined variable $main_variable and printed out a blank line. By placing otherwise global variables inside the scope of a main block, we prevent them from being accidentally referred to inside subroutines, which should not be able to see them.

Blocks As Loops

Bare blocks can sometimes be treated as loops, which are discussed in detail later in the chapter. A block that is not syntactically required (for example, by an if statement) or is part of a loop statement can be treated as a loop that executes only once and then exits. This means that loop control statements like next, last, and redo will work in a block. Because blocks are one-shot loops, next and last are effectively the same. However, redo will reexecute the block.

In short, these three loops all do the same thing, one with a while, one with a foreach, and one with a bare block and a redo:

#!/usr/bin/perl
# while.pl
use warnings;
use strict;

my $n = 0;

print "With a while loop: ";
while (++$n < 4) {print "Hello $n ";}

print "With a foreach loop: ";
foreach my $n (1..3) { print "Hello $n "; }

print "With a bare block and redo: ";
$n = 1; { print "Hello $n ";
last if (++$n > 3); redo; }

The block of an if statement is required syntactically, and if is not a loop statement, so the redo statement here will not work:

#!/usr/bin/perl
# badblockloop.pl
use warnings;
use strict;

if (defined(my $line = <>)) {
   last if $line =˜/quit/;
   print "You entered: $line";
   $line = <>;
   redo;
}
print "Bye! ";

The fact that redo, next, and last do not work in if blocks is actually a blessing. Otherwise it would be hard, albeit not impossible, to break out of a loop conditionally. Instead we get a syntax error.

Can't "redo" outside a loop block at ./badblockloop.pl line 10, <> line 2.

However, we can nest blocks inside each other, so by adding an extra bare block we can fix the preceding program so that it will work.

#!/usr/bin/perl
# Blockloop.pl
use warnings;
use strict;

if (defined(my $line = <>)) { { # <- note the extra block
   last if $line =˜/quit/;
   print "You entered: $line";
   $line = <>;
   redo;
} }
print "Bye! ";

Using blocks as loops is an interesting approach to solving problems, but they are not always the simplest or easiest to understand. The preceding script could more easily be fixed simply by replacing the if with a while. This makes more sense and does not require an extra block because while is a looping statement:

#!/usr/bin/perl
# blockwhile.pl
use warnings;
use strict;

while (my $line = <>) {
   last if $line =˜/quit/;
   print "You entered: $line";
}
print "Bye! ";

We cover loops in more detail later in the chapter.

The do Block

Blocks do not normally return a value; they are compound statements, not expressions. They also provide a void context, which applies to the last statement in the block. This causes its value to be discarded, just as all the statements before it are. However, the do keyword allows blocks to return their values as if they were expressions, the value being derived from the last statement. Let's consider an example:

@words = do {
    @text = ("is", "he", "last");
    sort @text;
};

In this example, a list is generated and returned by the sort function. We could make this more explicit by adding a return beforehand as we do for subroutines, but it is not actually necessary. (return is not necessary in subroutines either, but it certainly adds clarity.)

Because prefixing do to a block turns it into an expression, it often needs to be followed by a semicolon when used as the final part of a statement. Omitting the final semicolon from statements like the preceding one is a common mistake, because in any other context a block does not require a following semicolon.

There is another, syntactic, reason for needing a do to return the value of a block. Without do Perl would have a hard time telling apart a bare block from a hash definition.

$c = do { $a = 3, $b = 6 };   # a block, $c = 6
{ $a = 3; $b = 6 }   # has a semicolon, therefore a block
# a hash definition, $c = {3 => 6}, test with 'print keys %{$c}'
$c = { $a = 3, $b = 6 };

Regarding the use of blocks as loops, do blocks are not considered loops by Perl, because the block is syntactically required by the do. Loop-control statements will therefore not work inside a do block. However, a do block can be suffixed with a loop condition such as while or until, in which case it is transformed into a loop.

do { chomp($line = <>); $input. = $line } until $line =˜/^stop/;

The block is executed before the condition is tested, so in this example the word stop will be added to the end of $line before the loop terminates.

BEGIN and END Blocks

BEGIN and END blocks are special blocks that are executed outside the normal order of execution. We can use them in any application, though they are mostly used in modules, and accordingly we cover them in detail in Chapter 10. They are worth a brief examination here, because apart from the circumstances of their execution, they have a lot in common with regular bare blocks.

BEGIN blocks are executed during the compilation phase as they are encountered by the interpreter, so their contents are compiled and run before the rest of the source code is even compiled. We can define multiple BEGIN blocks, which are executed in the order the interpreter encounters them. This is especially relevant when we consider that the use statement uses an implicit BEGIN block to allow modules to export definitions before the main program code is compiled.

END blocks are the inverse of BEGIN blocks; they are executed by the interpreter after the application exits and before the exit of the interpreter itself. They are useful for "cleanup" duties such as closing database connections, resetting terminal properties, or deleting temporary files. We can also define multiple END blocks, in which case they are executed in reverse order of definition.

The following is a short script that shows both BEGIN and END blocks in action:

#!/usr/bin/perl
# begend.pl
use warnings;
use strict;

END {
    print "Exiting... ";
}

print "Running! ";

fun();

sub fun {
    print "Inside fun ";
}

BEGIN {
    print "Compiling... ";
    # can't call 'fun' - not compiled yet
    # fun();
}

When run, this program prints out the following:

> perl begend.pl


Compiling...
Running!
Inside fun
Exiting...

As the output shows, the BEGIN block was executed first. Since the fun subroutine had not yet been compiled when the BEGIN block gets executed, attempting to call fun from inside the BEGIN block would cause an error. On other hand, the END block, which defined first in the program, is executed last.

Perl actually defines five special block types, though only BEGIN and END are in widespread use. Two others are CHECK and INIT, which take place just after the compile phase and just before the run phase, respectively. Though these are rarely used in practice, we cover them in Chapter 10 also. The final special block is DESTROY, and it is used in object-oriented modules covered in Chapter 19.

Conditional Statements

Conditional statements execute the body of the statement (sometimes known as a branch or branch of execution) only if a given Boolean condition is met. The condition is an expression whose value is used to determine the course of execution. Perl's primary mechanism for conditional execution is the if statement and its related keywords, unless, else, and elsif. However, Perl being as flexible as it is, there are other ways we can write conditions too.

Multiple-branch conditions are implemented in other languages using special multiple-branch conditional statements like switch or case. Perl has no such equivalent, because it does not need one. As we will see, there are already plenty of ways to write a multiple-branch condition in Perl. However, for those who really must have a dedicated switch statement, Perl provides the Switch module.

Before embarking on a detailed look at these functions, it is worth taking a brief diversion to discuss the nature of truth in Perl.

What Is Truth?

Perl has a very broad-minded view of the meaning of true and false—in general, anything that has a "non-zero" value is true. Anything else is false. By "non-zero" we mean that the value is in some sense "set." Even 0 is information of a sort, especially compared to undef.

There are a few special cases. The string 0 is considered false even though it has a value, as a convenience to calculations that are performed in string context. The undefined value also evaluates to false for the purposes of conditions. However, a string of spaces is true, as is the string 00. The examples in Table 6-1 illustrate various forms of truth and falsehood.

Table 6-1. True and False Values

Value True/ False
1 True
-1 True
"abc" True
0 False
"0" False
"" False
" " True
"00" True
"0E0" True (this is returned by some Perl libraries)
"0 but true" True (ditto)
() False (empty list)
undef False

To distinguish between the undefined value and other values that evaluate to false, we can use the defined function; for instance:

if (defined $var) {
    print "$var is defined";
}

The ability to handle undef as distinct from true and false is very useful. For example, it allows functions and subroutines that return a Boolean result to indicate an "error" by returning undef. If we want to handle the error, we can do so by checking for undef. If we do not care or need to know, we can just check for truth instead.

if (defined($var) && $var) {
      print "true ";
}

if, else, and elsif

As we have already seen, basic conditions can be written using an if statement. The basic form of an if statement is as follows (note that a trailing semicolon is not required):

if (EXPRESSION) BLOCK

Here EXPRESSION is any Perl expression, and BLOCK is a compound statement—one or more Perl statements enclosed by curly braces. BLOCK is executed only if EXPRESSION is true. For instance, in the preceding example, the block that contains print true " would be executed only if the expression (defined($var) && $var) evaluates to true, that is, only if $var is defined and true.

We can invert the syntax of an if statement and put the BLOCK first. In this case, we can omit both the parentheses of the condition and also replace the block with a bare statement or list of statements. The following forms of if statement are all legal in Perl:

BLOCK if EXPRESSION;
STATEMENT if EXPRESSION;
STATEMENT, STATEMENT, STATEMENT if EXPRESSION;

For example:

print "Equal" if $a eq $b;

print (STDERR "Illegal Value"), return "Error" if $not_valid;

close FILE, print ("Done"), exit if $no_more_lines;

return if $a ne $b;

The use of the comma operator here deserves a little attention. In a list context (that is, when placed between parentheses), the comma operator generates lists. However, that is not how it is being used here. In this context, it simply returns the right-hand side, discarding the left, so it becomes a handy way to combine several statements into one and relies on the fact that most statements are also expressions.

The inverted syntax is more suitable for some conditions than others. As Perl's motto has it, there is more than one way to do it (so long as the program remains legible). In the preceding examples, only the last return statement is really suited to this style; the others would probably be better off as normal if statements.

Beware declaring a variable in an inverted conditional statement, since the variable will only exist if the condition succeeds. This can lead to unexpected syntax errors if we have warnings enabled and unexpected bugs otherwise.

use warnings;

$arg = $ARGV[1] if $#ARGV;
if ($arg eq "help" ) {   #$arg may not be declared
   print "Usage: ";
   ...
}
...

We would be unlikely to leave $arg undefined if we had written a conventional if statement because the declaration would be inside the block, making it obvious that the scope of the variable is limited. However, the inverted syntax can fool us into thinking that it is a declaration with wider scope.

If, then, and else conditions are implemented with the else keyword.

if (EXPRESSION) BLOCK else BLOCK

For example:

# First 'if else' tests whether $var is defined
if (defined $var) {
    # If $var is defined, the second 'if else' tests whether $var is true
    if ($var) {
        print "true ";
    } else {
        print "false ";
    }
} else {
    print "undefined ";
}

However, it is not legal (and not elegant, even if it were) to invert an if statement and then add an else clause.

# ERROR!
return if $not_valid else { print "ok" };

If we have multiple mutually exclusive conditions, then we can chain them together using the elsif keyword, which may occur more than once and may or may not be followed by an else.

if (EXPRESSION) BLOCK elsif (EXPRESSION) BLOCK elsif...
if (EXPRESSION) BLOCK elsif (EXPRESSION) BLOCK else BLOCK

For example, to compare strings using just if and else we might write

if ($a eq $b) {
    print "Equal";
} else {
    if ($a gt $b) {
        print "Greater";
    } else {
        print "Less";
    }
}

The equivalent code written using elsif is simpler to understand, shorter, and avoids a second level of nesting:

if ($a eq $b) {
    print "Equal";
} elsif ($a gt $b) {
    print "Greater";
} else {
    print "Less";
}

Note that the else if construct, while legal in other languages such as C, is not legal in Perl and will cause a syntax error. In Perl, use elsif instead. Also note that if $a is less than $b most of the time, then we would be better off rewriting this statement to test $a lt $b first, then $a gt $b or $a eq $b second. It pays to work out the most likely eventuality and then make that the fastest route through our code.

If the conditions are all testing the same expression with different values, then there are more efficient ways to do this. See "Switches and Multibranched Conditions" later in the chapter for some examples.

The if, unless, and elsif keywords all permit a variable to be declared in their conditions. For example:

if (my @lines = <HANDLE>) {   # test if there is a filehandle called HANDLE
    ...do something to file contents...
} else {
    "Nothing to process ";
}

The scope of variables declared in this fashion is limited to that of the immediately following block, so here @lines can be used in the if clause but not the else clause or after the end of the statement.

unless

If we replace the if in an if statement with unless, the condition is inverted. This is handy for testing a condition that we want to act on if it evaluates to false, such as trapping error conditions.

# unless file filename is successfully opened then return a failure message
unless (open FILE, $filename) {
    return "Failed to open $filename: $!";
}

We can also invert the syntax of an unless statement, just as we can with if.

return "Failed to open $filename: $!" unless (open FILE, $filename);

This is exactly the same as inverting the condition inside the parentheses but reads a little better than using an if and not:

if (not open FILE, $filename) {
    return "Failed to open $filename: $!";
}

It is perfectly legal, though possibly a little confusing, to combine unless with an else or elsif as in the following:

unless (open FILE, $filename) {
    return "Failed to open $filename: $!";
} else {
    @lines = <FILE>;
    foreach (0..$#lines)  {
        print "This is a line "
    }
    close FILE;
}

In this case, it is probably better to write an if-not expression or to invert the clauses, since unless-else is not a natural English construct.

Writing Conditions with Logical Operators

Perl's logical operators automatically execute a shortcut to avoid doing unnecessary work whenever possible, a feature Perl shares with most languages derived from or inspired by C (see Chapter 4). Take the following example:

$result = try_first() or try_second() or try_third ();

If try_first returns a true value, then clearly Perl has no need to even call the try_second or try_third functions, since their results will not be used; $result takes only one value, and that would be the value returned by try_first. So Perl takes a shortcut and does not call them at all.

We can use this feature to write conditional statements using logical operators instead of if and unless. For example, a very common construct to exit a program on a fatal error uses the die function that, upon failure, prints out an error message and finishes the program.

open (FILE, $filename) or die "Cannot open file: $!";

This is equivalent, but more direct, than the more conventional

unless (open FILE, $filename) {
   die "Cannot open file: $!";
}

We can also provide a list of statements (separated by commas) or a do block for the condition to execute on success. Here is an example that supplies a list:

open (FILE, $filename) or
    print (LOG "$filename failed: $!"), die "Cannot open file:$!";

Not every programmer likes using commas to separate statements, so we can instead use a do block. This also avoids the need to use parentheses to delineate the arguments to print.

open (FILE, $filename) or do {
   print LOG "$filename failed: $!";
   die "Cannot open file: $!";
};

When writing conditions with logical operators, it is good practice to use the low-precedence and, or, and not operators, instead of the higher priority &&, ||, and !. This prevents precedence from changing the meaning of our condition. If we were to change the previous example to

# ERROR! This statement ...
open (FILE, $filename)
    || print (LOG "$filename failed: $1"), die "Cannot open file:$!";

Perl's precedence rules would cause it to interpret this as a list containing a condition and a die statement.

# ERROR! ... actually means this
(open (FILE, $filename) || print (LOG "$filename failed: $1")),
    die "Cannot open file:$!";

As a result, this statement will always cause the program to die with a "Cannot open file" message, regardless of whether the open failed or succeeded.

Using a do block avoids all these problems and also makes the code easier to comprehend. Either a || or an or will work fine in this rewritten example:

open (FILE, $filename) || do {
   print LOG "$filename failed: $1";
   die "Cannot open file:$!";
};

Whether this is better than the original if form is questionable, but it does emphasize the condition in cases where the condition is actually the point of the exercise. In this case, the open is the most significant thing happening in this statement, so writing the condition in this way helps to emphasize and draw attention to it.

The drawback of these kinds of conditions is that they do not lend themselves easily to else type clauses. The following is legal, but tends toward illegibility:

open (FILE, $filename), $text = <FILE> or die "Cannot open file: $!";

It would also fail with a closed filehandle error if we used || instead of or; this has higher precedence than the comma and would test the result of $text = <FILE> and not the open.

The Ternary Operator

The ternary operator, ?:, is a variant of the standard if-style conditional statement that works as an expression and returns a value that can be assigned or used in other expressions. It works identically to the ternary operator in C. The operator evaluates the first expression: if that expression is true, it returns the value of the second expression; and if the first expression was false, then the operator returns the value of the third expression. This is what it looks like:

result = expression1 ? expression2 : expression3

The ternary operator is very convenient when the purpose of a test is to return one of two values rather than follow one of two paths of execution. For example, the following code snippet adds a plural s conditionally, using a conventional if-else condition:

#!/usr/bin/perl
# plural_if.pl
use warnings;
use strict;

my @words = split ('s+', <>);   #read some text and split on whitespace

my $count = scalar (@words);

print "There ";
if ($count == 1) {
   print "is";
} else {
   print "are";
}
print " $count word";

unless ($count == 1) {
   print "s";
}

print " in the text ";

Running this program and entering some text produces messages like


There are 0 words in the text
    There is 1 word in the text
    There are 4 words in the text

The same code rewritten using the ternary operator is considerably simpler.

#!/usr/bin/perl
# plural_ternary.pl
use warnings;
use strict;
my @words = split ('s+', <>);   #read some text and split on whitespace
my $words = scalar (@words);

print "There ", ($words == 1)?"is":"are"," $words word",
    ($words == 1)?"":"s","
in the text ";

We can also nest ternary operators, though doing this more than once can produce code that is hard to read. The following example uses two ternary operators to compute a value based on a string comparison using cmp, which can return −1, 0, or 1:

#!/usr/bin/perl
# comparison.pl
use warnings;
use strict;

my @words = split ('s+',<>);
die "Enter two words " unless scalar(@words) == 2;

my $result = $words[0] cmp $words[1];
print "The first word is ", $result ? $result>0 ? "greater than" :
    "less than" : "equal to "," the second ";

This program checks that we have entered exactly two words, and if so it prints out one of the following three messages:

The first word is less than the second
    The first word is greater than the second
    The first word is equal to the second

The nested ternary operators know which ? and : belongs where, but it does not make for legible code. To improve upon this, the last line is probably better written with parentheses.

print "The first word is ", $result
    ? ($result > 0 ? "greater than" : "less than")
    : "equal to", " the second ";

This makes it much simpler to see which expression belongs to which condition.

Be careful when combining the ternary operator into larger expressions. The precedence of operators can sometimes cause Perl to group the parts of an expression in ways we did not intend, as in the following example:

#!/usr/bin/perl
# plural_message.pl
use warnings;
use strict;

my @words = split ('s+', <>);
my $words = scalar (@words);
#ERROR!
my $message = "There ". ($words==1) ? "is" :
 "are". " $words word".
    ($words == 1)?"" : "s". " in the text ";

print $message;

This appears to do much the same as the previous example, except it stores the resulting message in an intermediate variable before printing it. But (unlike the comma operator) the precedence of the concatenation operator, ., is greater than that of the ternary ? or :, so the meaning of the statement is entirely changed. Using explicit parentheses, the first expression is equivalent to

"There ", (($words==1)? "is" : "are"), " $words word",
    (($words == 1)?"" : "s"), " in the text ";

But with the concatenation operator, what we actually get is

("There ". ($words==1))? "is" : ("are". " $words word",
    ($words == 1)?"" : "s". " in the text ");

The expression ("There ". ($words==1)) always evaluates to a true value, so the result of running this program will always be to print the word "is" regardless of the input we give it.

One final trick that we can perform with the ternary operator is to use it with expressions that return an lvalue (that is, an assignable value). An example of such an expression is the substr function.

#!/usr/bin/perl
# fix.pl
use warnings;
use strict;
my $word = "mit";
my $fix = "re";
my $before = int(<>);   #no warnings in case we enter no numeric text

($before ? substr($word, 0, 0): substr ($word, length($word), 0)) = $fix;
print $word, " ";

In this program the contents of $fix are either prefixed or postfixed to the contents of the variable $word. The ternary operator evaluates to either the beginning or the end of the value in $word as returned from substr. This value is then assigned the value of $fix, modifying the contents of $word, which is then printed out.

The result of this program is either the word remit, if we enter any kind of true value (such as 1), or mitre, if we enter either nothing or a string that evaluates to false (such as "0", or a nonnumeric value).

Switches and Multibranched Conditions

A switch is a conditional statement that contains multiple branches of execution. It can be thought of as rotary switch with several different positions. A simple but crude way to implement a switch is with an if...elsif...else statement, as we have already seen.

if ($value == 1) {
    print "First Place";
} elsif ($value == 2) {
    print "Second Place";
} elsif ($value == 3) {
    print "Third Place";
} else {
    print "Try Again";
}

The problem with this kind of structure is that after a few conditions it becomes hard to understand. Perl does not have a built-in multiple-branch conditional statement like C or Java, but it does not really need one as there are many ways to achieve the same effect, including the Switch module for those who disagree. Here are two ways of writing the same set of conditions in a block:

SWITCH: {
    if ($value == 1) { print "First Place" };
    if ($value == 2) { print "Second Place" };
    if ($value == 3) { print "Third Place" };
    if ($value > 3)  { print "Try Again" };
}
SWITCH: {
    $value == 1 and print "First Place";
    $value == 2 and print "Second Place";
    $value == 3 and print "Third Place";
    $value > 3 and print "Try Again";
}

Here the block does not actually do anything useful except to allow us to group the conditions together for clarity. The SWITCH: label that prefixes the block likewise has no function except to indicate that the block contains a multiple-branch condition. Both of these examples are also less efficient than the original example because all conditions are tested, even if an earlier one matches. But as we saw earlier, bare blocks can be considered loops, so we can use the last loop control statements to break out of the block after the correct match.

SWITCH: {
   $value == 1 and print ("First Place"), last;
   $value == 2 and print ("Second Place"), last;
   $value == 3 and print ("Third Place"), last;
   print "Try Again";   # default case
}

As a bonus, the use of last, like break in C, guarantees that we cannot go on to match more than one condition, which in turn allows us to express later conditions a little more loosely, since they do not have to worry about avoiding matches against values now catered for by earlier cases.

We can also make use of the label to make our last statements more explicit.

SWITCH: {
   $value == 1 and print ("First Place"), last SWITCH;
   $value == 2 and print ("Second Place"), last SWITCH;
   $value == 3 and print ("Third Place"), last SWITCH;
   print "Try Again";   # default case
}

Here the meaning of last is clear enough, but the label can be very useful in longer clauses and particularly in multiple nested blocks and conditions.

If the cases we want to execute have only one or two statements and are similar, it is fine just to write them as a comma-separated list, as in this example. If the cases are more complex, however, this rapidly becomes illegible. A better solution in this case might be to use do blocks.

SWITCH: {
   $value == 1 and do {
      print "First Place";
      last;
   };

   $value == 2 and do {
      print "Second Place";
      last;
   };

   $value == 3 and do {
      print "Third Place";
      last;
   };

   print "Try Again";
}

Note that a do block does not count as a loop, so the last statements still apply to the switch block that encloses them. This is fortunate; otherwise we would have to say last SWITCH to ensure that right block is referred to. Of course, we can choose to use the label anyway for clarity if we choose, as noted previously.

If we are testing the value of a string rather than an integer, we can reproduce the preceding techniques but just replace the conditions with string equality tests.

SWITCH: {
   $value eq "1" and print ("First Place"), last;
   $value eq "2" and print ("Second Place"), last;
   $value eq "3" and print ("Third Place"), last;
   print "Try Again";
}

Having said this, if our strings are numeric, we can do a numeric comparison if need be. In this example, $value eq "1" and $value == 1 have precisely the same result, thanks to Perl's automatic string number conversion. Of course, this only holds so long as we don't go past "9".

We can also use regular expression matching.

SWITCH: {
   $value =˜/^1$/ and print("First Place"), last;
   $value =˜/^2$/ and print("Second Place"), last;
   $value =˜/^3$/ and print("Third Place"), last;
   print "Try Again";
}

This might not seem much of an improvement, but regular expressions have the useful feature that if they are not associated with a value, then they use the contents of the special variable $_ that Perl provides internally. As we mentioned earlier, it is the "default variable" that functions read or write from if no alternative variable is given. We will see in "Using foreach with Multibranched Conditions" how to use this with foreach to rewrite our switch.

The Switch Module

The Switch module gives Perl a bona fide switch and case statement, allowing us to write multibranch conditions in a similar style to languages that provide them natively.

use Switch;

switch (10 * rand) {
    case 1 { print "First Place" }
    case 2 { print "Second Place" }
    case 3 { print "Third Place" }
    else   { print "...Also Ran" }
}

As this example illustrates, a default branch can also be created with else, which works just the same as it would in an if...else statement.

An advantage of this switch statement is that by default cases do not fall through, that is, once a given case matches the value, no further cases are considered. This differs from both C, where we must explicitly break out of the switch, and the examples earlier, where we had to use last. If we actually want to fall through to other cases, we can explicitly do so with next.

    ...
    case 4 { print "Fourth Place"; next }
    ...

If the switch is given the value 4, it will now output Fourth Place...Also Ran.

Alternatively, to have all cases fall through by default(in the style of C) append 'fallthrough' to the use statement. To break out of the switch, we must now request it explicitly with last, as in the previous examples. The next example is equivalent to the previous one, with the additional case 4, but with fall through enabled as the default:

use Switch 'fallthrough';


switch (10 * rand) {
    case 1 { print "First Place", last }
    case 2 { print "Second Place", last }
    case 3 { print "Third Place", last }
    case 4 { print "Fourth Place" }
    else   { print "...Also Ran" }
}

The conditions can be almost anything and will be evaluated in the most appropriate way. We can use numbers, strings, and regular expressions. We can also use hashes and hash references (true if the value being tested exists as a key in the hash), array references (true if the value is in the array), code blocks, and subroutines. Here is an example that exercises most of the possibilities available:

#!/usr/bin/perl
# bigswitch.pl
use strict;
use warnings;
use Switch;

my $perl = "Perl";
my %hash = ( "pErl" => 2, "peRl" => 3 );
my $cref  = sub { $_[0] eq "pERl" };
sub testcase { $_[0] eq "peRL" };
my @array = (2..4);

my @values=qw[
    1 perl Perl 3 6 pErl PerL pERL pERl peRL PERL php
];

foreach my $input (@values) {
    switch ($input) {
        case 1                       { print "1 literal number" }
        case "perl"                  { print "2 literal string" }
        case ($perl)                 { print "3 string variable" }
        case (@array)               { print "4 array variable reference" }
        case [5..9]                  { print "5 literal array reference" }
        case (%hash)                 { print "6 hash key" }
        case { "PerL" => "Value" }   { print "7 hash reference key" }
        case { $_[0] eq "pERL" }     { print "8 anonymous sub" }
        case ($cref)                 { print "9 code reference (anonymous)" }
        case (&testcase)            { print "A code reference (named)" }
        case /^perl/i                { print "B regular expression" }
        else                         { print "C not known at this address" }
    }
    print " ";
}

The seventh and eighth cases in the previous example, hash reference and anonymous subroutine, bear a little closer examination. Both are delimited by curly braces, but the switch can tell them apart because of the operator in use (=> versus eq). This prescience is possible because the Switch module actually parses the source code just prior to execution and works out the most sensible thing to do based on what it sees.

The anonymous subroutine also bears examination because it refers to the variable $_[0], which is not otherwise defined in this program. What is actually going on here is hinted at by the fact that this case is called "anonymous subroutine." The block { $_[0] eq "pERL" } is actually a subroutine defined in place within the case statement, and $_[0] simply accesses the first argument passed to it, which is the value of $input. It is therefore exactly equivalent to the ninth and tenth "code reference" cases, just more concise.

Interestingly, the switch value can also be a code reference or subroutine, in which case the case tests are applied to it instead. There are limitations to this method, since there is no way to pass a conventional text value. Instead it must be written explicitly into the subroutine.

#!/usr/bin/perl -w
# switchonsub.pl
use strict;
use Switch;

my $input;

sub lessthan { $input < $_[0] };

$input=int(<>);
switch ( &lessthan ) {
    case 10             { print "less than 10" }
    case (100-$input)   { print "less than 50" }
    case 100            { print "less than 100" }
}

There are friendlier ways to handle this kind of situation using a closure (see Chapter 7), and not every subroutine-based switch necessarily needs to reference a global variable the way this one does, but in a lot of cases there is likely to be a better way to express the problem, for instance with explicit case conditions like case { $_ < 10 }.

Perl 6 will provide a native multibranch statement, but using given and when in place of switch and case. The Switch module can be told to use Perl 6 terminology by appending 'Perl6' to the use statement.

use Switch 'Perl6';


given ($value) {
    when 1 { print "First Place" }
    when 2 { print "Second Place" }
    when 3 { print "Third Place" }
}

Returning Values from Multibranched Conditions

Simple if and unless statements do not return a value, but this is not a problem since we can write a conditional expression using the ternary operator. For multiple-branch conditions, we have to be more inventive, but again Perl provides several ways for us to achieve this goal. One way to go about it is with logical operators using a do block.

print do {
   $value == 1 && "First Place" ||
   $value == 2 && "Second Place" ||
   $value == 3 && "Third Place" ||
   "Try again"
}, " ";

If this approach does not suit our purposes, we can always resort to a subroutine and use return to return the value to us.

sub placing  {
   $_[0] == 1 and return "First Place";
   $_[0] == 2 and return "Second Place";
   $_[0] == 3 and return "Third Place";
   return "Try Again";
}
print placing ($value), " ";

Or, using the ternary operator:

sub placing {
   return $_[0] == 1? "First place" :
           $_[0] == 2? "Second place" :
            $_[0] == 3? "Third place" :
             "Try Again";
}

While this works just fine, it does not scale indefinitely. For situations more complex than this, it can be easier to decant the conditions and return values into the keys and values of a hash, then test for the hash key. Finally, there is another solution involving using foreach, which we will also consider in "Using foreach with Multibranched Conditions."

Loops and Looping

A loop is a block of code that is executed repeatedly, according to the criteria of the loop's controlling conditions. Perl provides two kinds of loop:

  • Iterating loops, provided by for and foreach
  • Conditional loops, provided by while and until

The distinction between the two types is in the way the controlling conditions are defined.

The for and foreach loops iterate over a list of values given either explicitly or generated by a function or subroutine. The sense of the loop is "for each of these values, do something." Each value in turn is fed to the body of the loop for consideration. When the list of values runs out, the loop ends.

The while and until loops, on the other hand, test a condition each time around the loop. The sense of the loop is "while this condition is satisfied, keep doing something." If the condition succeeds, the loop body is executed once more. If it fails, the loop ends. There is no list of values and no new value for each iteration, unless it is generated in the loop body itself.

Both kinds of loop can be controlled using statements like next, last, and redo. These statements allow the normal flow of execution in the body of a loop to be restarted or terminated, which is why they are also known as loop modifiers. We have already talked about loop modifiers briefly in Chapter 2, but will learn more about them later in this chapter.

Because Perl is such a versatile language, there are also ways to create loop-like effects without actually writing a loop. Perl provides functions such as map and grep that can often be used to produce the same effect as a foreach or while loop—but more efficiently. In particular, if the object of a loop is to process a list of values and convert them into another list of values, map may be a more effective solution than an iterative foreach loop.

Writing C-Style Loops with for

The for and foreach keywords are actually synonyms, and typically differ only in how they get used. for is used, by convention, for loops that imitate the structure of the for loop in C. Here's how a for loop can be used to count from nine to zero:

for ($n = 9; $n >= 0; $n—) {
   print $n;
}

Any C programmer will recognize this syntax as being identical to C, with the minor exception of the dollar sign of Perl's scalar data type syntax. Similarly, to count from zero to nine we could write

for ($n = 0; $n < 10; $n++) {
   print $n, " ";
   sleep 1;
}
print "Liftoff! ";

The parenthesized part of the for loop contains three statements: an initialization statement, a condition, and a continuation statement. These are usually (but not always) used to set up and check a loop variable, $n in the first example. The initialization statement (here $n=0) is executed before the loop starts. Just before each iteration of the loop the condition $n<10 is tested. If true the loop is executed; if false the loop finishes. After each completion of the loop body, the continuation statement $n++ is executed. When $n reaches 10, the condition fails and the loop exits without executing the loop body, making 9 the last value of $n to be printed and giving $n the value 10 after the loop has finished.

In the preceding example, we end up with the scalar variable $n still available, even though it is only used inside the loop. It would be better to declare the variable so that it only exists where it is needed. Perl allows the programmer to declare the loop variable inside the for statement. A variable declared this way has its scope limited to the body of the for loop, so it exists only within the loop statement:

for (my $n = 0; $n < 10; $n ++) {
   print $n,' is ', ($n % 2)? 'odd' : 'even';
}

In this example, we declare $n lexically with my, so it exists only within the for statement itself. (For why this is a good idea and other scoping issues, see Chapter 8.)

As an aside, the for loop can happily exist with nothing supplied for the first or last statement in the parentheses. Remember, however, that the semicolons are still required to get C-style semantics since for and foreach are synonyms. The following is thus a funny looping while loop:

for (; eof (FILE) ;) {
   print <FILE>;
}

While we are on the subject and jumping ahead for a moment, the optional continue block is really the same construct as the last statement of a C-style for loop, just with a different syntax. Here is the equivalent of the earlier for loop written using while:

$n = 0;
while ($n < 10) {
   print $n, ' is ', ($n % 2)? 'odd': 'even';
} continue {
   $n ++;
}

Writing Better Loops with foreach

The C-style for loop is familiar to C programmers, but it is often unnecessarily complicated. For instance, one of the most common uses of a for loop in C is to iterate over the contents of an array using a loop variable to index the array. In the following example, the loop variable is $n, and it is used to index the elements of an array (presumed to already exist) called @array. The first element is at index 0, and the highest is given by $#array.

for (my $n = 0; $n < $#array; $n++) {
   print $array [$n], " ";
}

However, we do not need to use an index variable. We can just iterate directly over the contents of the array instead. Although in practice for is usually used for the C style and foreach for the Perl style, the two keywords are actually synonyms, and both may be used in either the C and Perl syntaxes. The convention of using each in its allotted place is not enforced by Perl but is generally considered good practice anyway. Here is the foreach (i.e., Perl-style) version of the preceding loop:

my $element;
foreach $element (@array) {
   print $element, " ";
}

Even better, foreach allows us to declare the loop variable in the loop. This saves a line because no separate declaration is needed. More importantly, it restricts the scope of the variable to the loop, just as with the for loop earlier. This means that if the variable did not exist beforehand, neither will it after.

foreach my $element (@array) {
   print $element," ";
}
# $element does not exist here

If the loop variable already happens to exist and we don't use my, Perl localizes the variable when it is used as a loop variable, equivalent to using the local keyword. When the loop finishes, the old value of the variable is reestablished.

#!/usr/bin/perl
# befaft.pl
use warnings;

$var = 42;
print "Before: $var ";
foreach $var (1..5) {
   print "Inside: $var ";
}
print "After: $var ";   # prints '42', not '5'

This localization means that we cannot accidentally overwrite an existing variable, but it also means we cannot return the last value used in a foreach loop as we would be able to in C. If we need to do so this, we may be better off using a while loop or a map. Of course, giving a loop variable the same name as a variable that already exists is confusing, prone to error, and generally a bad idea anyway.

If we really want to index an array by element number, we can still do that with foreach. A foreach loop needs a list of values, and we want to iterate from zero to the highest element in the array. So we need to generate a list from zero to the highest element index and supply that to the foreach loop. We can achieve that easily using the range operator and the $#array notation to retrieve the highest index:

foreach my $element (0..$#array) {
   print "Element $element is $array[$element] ";
}

Using a range is easier to read, but in versions of Perl prior to 5.005 it is less efficient than using a loop variable in a for (or while) loop, for the simple reason that the range operator creates a list of all the values between the two ranges. For a range of zero to one hundred million, this involves the creation of a list containing one-hundred-million integers, which requires at least four-hundred-million bytes of storage. Of course, it is unlikely that we are handling an array with one-hundred-million values in the first place. However, the principle holds true, so be wary of creating large temporary arrays when you can avoid them. From Perl 5.005 onwards the range operator has been optimized to return values iteratively (rather like each) in foreach loops, so it is now much faster than a loop variable. This can be considered a reason to upgrade as much as a programming point, of course.

If no loop variable is supplied, Perl uses the "default" variable $_ to hold the current loop value.

foreach (@array) {
   print "$_ ";
}

This is very convenient, especially with functions that default to using $_ if no argument is supplied, like the regular expression operators.

foreach (@array) {
   /match_text/ and print "$_ contains a match! ";
}

A final, somewhat unusual, form of the for/foreach loop inverts the loop to place the body before the for. This is the same syntax as the inverted if, but applied to a loop instead. For example:

/match_text/ and print ("$_ contains a match! ") foreach @array;

This syntax can be convenient for very short loop bodies, but it is not really suitable if the foreach becomes obscured. The preceding example is borderline legible, for example, and a map or the former version would probably be better.

Using foreach with Multibranched Conditions

We have already mentioned that, when used with switches and multibranched conditions, regular expressions have the particularly useful feature of using $_ when they are not associated with a value. By combining this with a foreach loop, we can remove the test variable altogether. Without a defined loop variable, foreach assigns each value that it is given in turn to $_ inside the block that follows it. This means we can rewrite this statement:

SWITCH: {
   $value =˜/^1$/ and print("First Place"), last;
   $value =˜/^2$/ and print("Second Place"), last;
   $value =˜/^3$/ and print("Third Place"), last;
   print "Try Again";
}

like this:

SWITCH: foreach ($value) {
   /^1$/ and print ("First Place"), last;
   /^2$/ and print ("Second Place"), last;
   /^3$/ and print ("Third Place"), last;
   print "Try Again";
}

Note that the SWITCH label helps to remind us that this isn't a foreach loop in the usual sense, but it is not actually necessary.

We have also seen how to return a value from multibranched conditions using a do block, subroutine, or the ternary operator. However, foreach also comes in very handy here when used with logical operators:

foreach ($value) {
   $message = /^1$/ && "First Place" ||
              /^2$/ && "Second Place" ||
              /^3$/ && "Third Place" ||
              "Try Again";
   print "$message ";
}

Here we use a foreach to alias $value to $_, then test with regular expressions. Because $value is a scalar, not a list, the loop only executes once, but the aliasing still takes place. The shortcut behavior of logical operators will ensure that the first matching expression will return the string attached to the && operator. Note that if we were writing more complex cases, parentheses would be in order; for this simple example we don't need them.

This approach works only so long as the resulting values are all true. In this case we are returning one of the strings First Place...Try Again, so there is no problem. For situations involving zero, an undefined value, or an empty string (all of which evaluate to false), we can make use of the ternary operator to produce a similar effect.

foreach ($value) {
   $message = /^1$/? "First Place":
              /^2$/? "Second Place":
              /^3$/? "Third Place":
                 "Try Again";
   print "$message ";
}

The regular expressions in this example are testing against $_, which is aliased from $value by the foreach.

Variable Aliasing in foreach Loops

If we are iterating over a real array (as opposed to a list of values), then the loop variable is not a copy but a direct alias for the corresponding array element. If we change the value of the loop variable, then we also change the corresponding array element. This can be a source of problems in Perl programs if we don't take this into account, but it can also be very useful. This example uses aliasing to convert a list of strings into a consistent capitalized form.

#!/usr/bin/perl
# capitalize.pl
use warnings;
use strict;

my @array = ("onE", "two", "THREE", "fOUR", "FiVe");
foreach (@array) {
   # lc turns the word into lowercase, ucfirst then capitalizes the first letter
$_ = ucfirst lc;   # lc uses $_ by default with no argument
}
print join(',', @array);

Sometimes we might want to avoid the aliasing feature and instead modify a copy of the original array. The simplest way to do that is to copy the original array before we start.

foreach (@tmparray = @array) {
   $_ =˜tr/a-z/A-Z/;
   print;
}

The assignment to a local lexically scoped variable creates a temporary array, which can be modified without affecting the original array. It is also disposed of at the end of the loop.

Conditional Loops—while, until, and do

The while and until loops test a condition and continue to execute the loop for as long as that condition holds. The only difference is that for while the condition holds while it is true, and for until it holds until it is false (i.e., until it is true). Here is an example of counting from 1 to 10 using a while loop rather than a for or foreach loop:

#!/usr/bin/perl
# count10.pl
use warnings;
use strict;

# count from 1 to 10 (note the post-increment in the condition)
my $n = 0;
while ($n++ < 10) {
   print $n, " ";
}

The while and until loops are well suited to tasks where we want to repeat an action continuously until a condition that we can have no advance knowledge of occurs, such as reaching the end of a file. The following example shows a while loop being used to read the contents of a file line by line. When the end of the file is reached, the readline operator returns false and the loop terminates.

open FILE, "file.txt";
while ($line = <FILE>) {
    print $line;
}
close FILE;

If we replace while with until, the meaning of the condition is reversed, in the same way that unless reverses the condition of an if statement. This makes more sense when the nature of the question asked by the Boolean test implies that we are looking for a "no" answer. The eof function is a good example; it returns true when there is no more data.

open FILE, "file.txt";
until (eof(FILE)) {
    $line = <FILE>;
    print $line;
}

Variable Aliasing with while

while loops do not alias their conditions the way that a foreach loop does its controlling list, because there is no loop variable to alias with. However, a few Perl functions will alias their values to $_ if placed in the condition of a while loop. One of them is the readline operator. This means we can write a loop to read the lines of a file one by one without a loop variable.

open FILE, "file.txt";
while (<FILE>) {
    print "$.: $_";
}

Or, more tersely:

print "$.: $_" while <FILE>;

Looping Over Lists and Arrays with while

We can loop over the contents of an array with while if we don't mind destroying the array as we do it.

while ($element = shift @array) {
   print $element, " ";
}
# @array is empty here

On the face of it, this construct does not appear to have any advantage over a more intuitive foreach loop. In addition, it destroys the array in the process of iterating through it, since the removed elements are discarded. However, it can have some advantages. One performance-related use is to discard large memory-consuming values (like image data) as soon as we have finished with them. This allows Perl to release memory that much faster.

There can also be computational advantages. Assume we have a list of unique strings and we want to discard every entry before a particular "start" entry. This is easy to achieve with a while loop because we discard each nonmatching string as we test it.

#!/usr/bin/perl
# startexp.pl
use warnings;
use strict;
# define a selection of strings one of which is "start"
my @lines = ("ignored", "data", "start", "the data", "we want");

# discard lines until we see the "start" marker
while (my $line = shift @lines) {
   last if $line eq 'start';
}

# print out the remaining elements using interpolation ($")
print "@lines";

Looping on Self-Modifying Arrays

We can use array functions like push, pop, shift, and unshift to modify the array even while we are processing it. This lets us create some interesting variations on a standard loop that are otherwise hard to achieve.

As an example, the following program oscillates indefinitely between two values. It works by shifting elements off an array one by one and adding them to the other end after subtracting each value from the highest value in the range, plus 1:

#!/usr/bin/perl
# oscillator.pl
use warnings;
use strict;
my $max = 20;
my @array = (1..$max-1);

while (my $element = shift @array) {
   push (@array, $max - $element);
   sleep 1;   # delay the print for one second to see the output
   print '*' x $element, " ";   # multiply single '*' to get a bar of '*'s
}

A slight variation of this program produces a loop that counts from one to a maximum value, then back to one again, and terminates. The principal difference is that the array ranges from one to $max not one to $max-1:

#!/usr/bin/perl
# upanddown.pl
use warnings;
use strict;

my $max = 6;
my @array = (1..$max);

while (my $element = shift @array) {
   push (@array,$max - $element);
   print $element, " : ", join(",", @array), " ";
}

Why should such a trivial difference cause the loop to terminate? This program produces the following output, which shows us why it terminates after passing through the array only twice:


1 : 2,3,4,5,6,5
2 : 3,4,5,6,5,4
3 : 4,5,6,5,4,3
4 : 5,6,5,4,3,2
5 : 6,5,4,3,2,1
6 : 5,4,3,2,1,0
5 : 4,3,2,1,0,1
4 : 3,2,1,0,1,2
3 : 2,1,0,1,2,3
2 : 1,0,1,2,3,4
1 : 0,1,2,3,4,5

We can see from this what is actually going on. The values of the array are each replaced with a value one lower. Since the first array element contained 1, this is reduced to 0. When it comes around for the second time, the result of the shift is a false value, because 0 is false. So the loop terminates.

These particular examples are chosen for simplicity and could also be implemented using simpler loops, for example, using an increment variable that oscillates between +1 and −1 at each end of the number range. While we have only used an ordered list for clarity, the oscillator will work even if the array does not contain ordered numbers.

Looping Over Hashes with while

We can iterate over a hash with while instead of foreach using the each function, which in a list context returns the next key-value pair in the hash, in the same order that keys and values return the keys and values, respectively. When there are no more key-value pairs, each returns undef, making it suitable for use in the condition of a while loop.

while (($key, $value) = each(%hash)) {
   print "$key => $value ";
}

Using foreach and keys or while and each for this kind of task is mostly a matter of personal preference. However, foreach is generally more flexible as it allows sorting keys and aliasing with $_, neither of which are possible in a while/each loop. However, while avoids extracting the entire key list at the start of the loop and is preferable if we intend to quit the loop once a condition is met. This is particularly true if the hash happens to be tied to something that is resource-heavy (in comparison to an in-memory hash) like a DBM database.

Note that a foreach loop is a much safer option if we want to alter the contents of the array or hash we are iterating over. In particular, the internal iterator that each uses can get confused if the hash is modified during the course of the loop.

do . . . while and do . . . until

One problem with while and until loops is that they test the condition first and execute the loop body only if the test succeeds. This means that if the test fails on the first pass, the loop body is never executed. Sometimes, however, we want to ensure that the body is executed at least once. Fortunately, we can invert while and until loops by appending them to a do block to produce a do...while or do...until loop.

do {
   $input = <>; #read a line from standard input
   print "You typed: $input ";
} while ($input !˜ /^quit/);

The last line can be rewritten to use until to equal effect.

} until $input =˜ /^quit/;

Note that parentheses around the condition are optional in an inverted while or until loop, just as they are in an inverted if.

Interestingly, this inverted loop structure applies to all the looping statements, even foreach:

# this works, but is confusing. Don't do it.
do {
   print;
} foreach (@array);

However, there is little point in doing this for foreach, first because it will not work except using $_, second because the loop body does not execute first as it needs the loop value to proceed, and third because it's just plain confusing. We mention it only because Perl allows it, and it is conceivably possible that we may encounter it in code.

Note that in the inverted form we cannot declare a variable in the conditional expression. We also cannot use loop control statements to control the loop's execution as these are not permitted in a do block—see "The Trouble with do" later in the chapter.

Controlling Loop Execution

Ordinarily a loop will execute according to its controlling criteria. Frequently, however, we want to alter the normal flow of execution from within the loop body itself, depending on conditions that arise as the loop body is executed. Perl provides three statements for this, collectively known as loop modifiers: next, which advances to the next iteration (retesting the loop condition); last, which immediately exits the loop; and redo, which restarts the current iteration (without retesting the loop condition).

The next statement forces the loop immediately onto the next iteration, skipping any remaining code in the loop body but executing the continue block if it is present. It is most often used when all the tasks necessary for a given iteration have been completed or the loop variable value for the current iteration is not applicable.

The following code snippet reads configuration parameters from the user, consisting of lines of name = value pairs. It uses next to skip past empty lines, comments (lines beginning with a #), and lines without an equals sign.

#!/usr/bin/perl
# config.pl
use warnings;
use strict;
my %config = ();
while (<>) {
    chomp;   #strip linefeed

    next if /^s*$/;   #skip to the next iteration on empty lines
    next if /^s*#/;  #skip to the next iteration on comments
    my ($param, $value) = split("=", $_, 2);   #split on first '='
    unless ($value) {
        print ("No value for parameter '$_' ");
        next;
    }

    $config{$param} = $value;
}

foreach (sort keys %config) {
   print "$_ => $config{$_} ";
}

The last statement forces a loop to exit immediately, as if the loop had naturally reached its last iteration. A last is most often used when the task for which the loop was written has been completed, such as searching for a given value in an array—once found, no further processing is necessary. It can also be used in foreach loops pressed into service as multibranch conditions as we saw earlier. Here is a more conventional use of last that copies elements from one array to another until it hits an undefined element or reaches the end of the source array:

#!/usr/bin/perl
# last.pl
use warnings;

my @array = ("One", "Two", "Three", undef, "Five", "Six");

#copy array up to the first undefined element
my @newarray = ();
foreach my $element (@array) {
    last unless defined ($element);
    push @newarray, $element;
}

foreach (@newarray) {
    print $_." ";   # prints One, Two, Three
}

The redo statement forces the loop to execute the current iteration over again. At first sight this appears similar to next. The distinction is that with redo the loop condition is not retested, and the continue block, if present, is not executed. In the case of a foreach loop, that means that the loop variable retains the value of the current loop rather than advances to the next. In the case of a while or until loop, the code in the conditional clause is not reexecuted, and any functions in it are not called. A redo is most often used when more than one iteration may be needed before the main body of a loop can be executed, for example, reading files with multiple-line statements.

#!/usr/bin/perl
# backslash.pl
use warnings;
use strict;
my @lines = ();
while (<>) {
   chomp;
   if (s/\$//) {   #check for and remove a trailing backslash character
      my $line = <>;
      $_.= $line, redo;   # goes to the 'chomp' above
      }
      push @lines, $_;
   }

foreach (0..$#lines) {
   print "$_ : $lines[$_] ";
}

In this example, the while statement reads a line of input with <> and aliases it to $_. The chomp removes the trailing newline, and the remainder of the line is checked for a trailing backslash. If one is found, another line is read and appended to $_.

Inside the if statement, the redo is called to pass execution back up to the chomp statement. Because redo does not reexecute the while statement, the value of $_ is not overridden, and the chomp is performed on the value of $_ that was assigned inside the if statement. This process continues so long as we continue to enter lines ending with a backslash.

All of the loop control statements next, last, and redo can be used in any kind of loop (for, foreach, while, until). One exception to this is the do...while and do...until loops. This is because loops built around do blocks do not behave quite the way we expect, as we will see shortly.

The continue Clause

All of Perl's loops can accept an additional continue clause. Code placed into the block of a continue clause is executed after the main body of the loop. Ordinarily this has no different effect from just adding the code to the end of the main loop, unless the loop body contains a next statement, in which case the continue block is executed before returning to the top of the loop. This makes a continue block a suitable place to increment a loop variable.

my $n = 0;

while ($n < 10) {
   next if ($n % 2);
   print $n, " ";

} continue {
   # 'next' comes here
   $n++;
}

# 'last' comes here

Note, however, that a last statement will not execute the continue block before exiting the loop. Similarly, redo will not execute the continue block because it reexecutes the loop body on the same iteration, rather than continuing to the next.

There are actually few, if any, instances where a continue block is actually necessary, since most loops with a continue clause can be easily rewritten to avoid one. As we mentioned earlier, the continue clause is actually an explicit way to write the third part of a for loop, which deals with next, last, and redo in the same way as the while...continue loop earlier.

Controlling Nested Loops

So far we have just seen how to use loop control statements to affect the execution of the current loop. However, the next, last, and redo statements all accept an optional loop label as an argument. This allows us to jump to the start or end of an outer loop, so long as that loop has a name. To give a loop a name, we just prefix it with a label.

my @lines = ();
LINE: foreach (<>) {
   chomp;
   next LINE if /^$/;   #skip blank lines
   push @lines, $_;
}

Even in a simple loop this can enable us to write more legible code. Since the label indicates the purpose of the loop and of the control statements inside it, next LINE literally means "do the next line." However, if we have two nested loops, labeling the outer loop allows us to jump to the next iteration of the outer loop using next.

OUTER: foreach my $outer (@array) {
   INNER: foreach my $inner (@{$outer}) {
      next OUTER unless defined $inner;
   }
   # 'last' or 'last INNER' would come here
}

This is very similar to using a last statement, except that it will jump to the top of the outer loop rather than the end of the inner loop. If the outer loop contains more code after the inner loop, next will avoid it while last will execute it.

Similarly, we can use last to exit both loops simultaneously. This is a much more efficient way to exit nested loops than exiting each loop individually.

LINE: foreach my $line (<>) {
   chomp;
   ITEM: foreach (split /, /, $line) {
      last LINE if /^_END_/;  #abort both loops on token
      next LINE if /^_NEXT_/; #skip remaining items on token
      next ITEM if /^s*$/;   #skip empty columns
      #process item
      print "Got: $_ ";
   }
}

Only the outer loop actually needs to be labeled, so loop control statements can apply themselves to the outer loop and not to the inner loop.

Perl allows labels to be defined multiple times. When a label is used, the label definition that is closest in scope is taken to be the target. For loop control statements, the first matching loop label in the stack of loops surrounding the statement is used. In general, we do not expect to be giving two loops the same name if one is inside the other, so it is always clear which label a loop control statement is referring to. Reusing labels is also handy for switch-style conditional statements and any other constructs where we want to make the purpose of the construct clear.

Strangely, we can jump to a loop label of an outer loop, even if there is a subroutine call in the way. This is really a very bad idea and is almost certainly not intended, so Perl will warn us if we do it inadvertently.

Exiting subroutine via next at ...

Tip Although we would not expect to do this normally, it is possible to mistype the name of a label, especially if we copy and paste carelessly.


The Trouble with do

The fact that loop modifiers do not work in do...while, or do...until loops may seem strange. The reason for this is slightly obscure, but it comes about because unlike a normal while or until loop, the while and until conditions in a do...while or do...until loop are considered modifiers, which modify the behavior of the do block immediately before them. The do block is not considered to be a loop, so loop control statements do not work in them.

It is possible, though not terribly elegant, to get a next statement to work in a do...while loop through the addition of an extra bare block inside the do block, as in this example:

#!/usr/bin/perl
# even.pl
use warnings;
use strict;

# print out even numbers with a do...while loop
my $n = 0;
do { {
   next if ($n % 2);
   print $n, " ";
} } while ($n++ < 10);

Unfortunately while this works for next, it does not work for last, because both next and last operate within the bounds of the inner block. All last does in this case is take us to the end of the inner block, where the while condition is still in effect. In addition, this is ugly and nonintuitive code. The better solution at this point is to find a way to rephrase this code as a normal while, until, or foreach loop and avoid the whole problem.

$n = 0;
while (++$n <= 10) {
   next if ($n % 2);
   print $n, " ";
}

The goto Statement

The goto statement has two basic modes of operation. The simpler and more standard use allows execution to jump to an arbitrary labeled point in the code, just as in C and many other languages.

($lines, $empty, $comment, $code) = (0, 0, 0, 0);

while (<>) {
   /^$/ and $empty++, goto CONTINUE;
/^#/ and $comment++, goto CONTINUE;
   $code++, goto CONTINUE;
CONTINUE:
   $lines++;
}

There are few, if any, reasons to use a goto with a label. In this case, we would be better off replacing goto with next statements and putting the continue code into a continue block.

while (<>) {
   /^$/ and $empty++, next;
   /^#/ and $comment++, next;
   $code++;
} continue {
   $lines++;
}

A goto statement can also take an expression as its argument. The result of the expression should be a label that execution can jump to. This gives us another, albeit rather ugly, way to write a compound switch statement.

$selection = int(3*rand);   # random integer between 0 and 2

@selections = ("ZERO", "ONE", "TWO");
goto $selections[$selection];

{ ZERO:
    print "None";
    next;
  ONE:
    print "One";
    next;
  TWO:
    print "Two";
    next;
}

print "...done ";

Again, there are better ways to write compound statements. We covered these earlier, so we should not have to resort to goto here.

The second and more interesting use of goto is to call subroutines. When used in this context, the new subroutine entirely replaces the context of the calling one, so that on return from the second subroutine, execution is returned directly to the caller of the first subroutine. The primary use of this form is in autoloaded functions, which will be covered in Chapter 10.

It can also be used for so-called tau-recursion. This is where a subroutine can call itself recursively many times without causing Perl to create an ever-growing stack of subroutine calls. The final call returns directly to the original caller instead of returning a value through all of the intermediate subroutine calls. We will cover this in Chapter 7.

map and grep

The map and grep functions convert one list into another, applying a transform or condition to each element of the source list in turn. If the goal of a foreach or while loop is to generate a new list, we might be able to do the job better using map or grep. The syntax of map (and grep) takes one of two equivalent forms:

map EXPRESSION, LIST        grep EXPRESSION, LIST
map BLOCK LIST              grep BLOCK LIST

In each case the EXPRESSION or BLOCK is executed for each value of LIST, with the results returned as a new list.

The purpose of map is to convert the elements of a list one by one and to produce a new list as a result. The expression or block performs the conversion, so map is conceptually related to a foreach loop. Similarly, the purpose of grep is to return a list containing a subset of the original list. The expression or block is evaluated to a true or false value to see if the element is eligible for inclusion, so grep is conceptually related to a while loop. Both functions perform aliasing to $_ in the same way that foreach does.

map

To illustrate how map works, here is an example. Assume that we have a list of integers representing ASCII values, and we want to turn it into a list of character strings. We can do that with a foreach loop with a loop like this:

my @numbers = (80, 101, 114, 108);
my @characters;

foreach (@numbers) {
   push @characters, chr $_;
}

print @characters;

With map we can replace the loop with

my @characters = map (chr $_, @numbers);

Or, using the block syntax:

my @characters = map {chr $_} @numbers;

Even better, we can feed the list returned by the map into a join, then print the result in a single operation.

print join ('-', map {chr $_} @numbers);   # displays 'P-e-r-l'

Another common use for map is to construct a hash map of values to quickly determine if a given value exists or not. With an array we would have compare each element in turn.

#!/usr/bin/perl -w
# insequencemap.pl
use strict;

my @sequence=(1,1,2,3,5,8,13,21,34,55,89);
my %insequence=map { $_ => 1 } @sequence;
my $number=<>;
print "$number is ",($insequence{$number}?"":"NOT"),"in sequence ";

Unlike foreach, map cannot choose to use an explicit loop variable and must use $_ within the block or expression. It also cannot make use of loop control variables, for the same reasons that a do block cannot, as we saw earlier.

In void context, the return value of map is discarded. Prior to Perl 5.8, this was perfectly functional code but not very efficient; the list of values to return would be constructed and then discarded again. From version 5.8.1 onwards Perl is smart enough to notice when it is asked to evaluate a map whose return value is not used and will optimize the map to avoid the redundant work. The following two statements are therefore equivalent and equally efficient.

ucfirst foreach @words; # foreach style self-modify list
map { ucfirst } @words; # map style self-modify list

From 5.8.4, assigning a map to a scalar value (which counts the number of values present) will also be optimized.

my $howmany = map { ucfirst } @words; # more efficient in Perl >= 5.8.4

Void context aside, map is usually used to convert one list into another of equal length, with one new value in the result for each value in the input. However, it is possible to persuade map not to return a new value by having it return ().

print map { ($_>110) ? () : chr($_) } @numbers; # displays 'Pel'

If the only object of the map is to selectively return values, however, we should really be using grep.

grep

The grep function gets its name from the Unix grep command, which scans text files and returns lines from them that match a given regular expression search pattern. The Perl grep function is similar in concept in that it returns a list containing a subset of the original list, though it does not directly have anything to do with regular expressions.

The syntax of grep is identical to map, but while the expression or block in a map statement is used to transform each value in a list, the corresponding expression or block in a grep statement is evaluated as a condition to determine if the value should be included in the returned list.

For example, the following while loop reads a list of lines from standard input and builds up a list of the lines that started with a digit:

my @numerics = ();
while (<>) {
   push @numerics, $_ if /^d/;
}
print "@numerics ";

We can simplify the preceding to just

@numerics = grep {/^d/} <>;

Here we have used a regular expression as the condition, in keeping with the spirit of the Unix grep command, which works on a similar basis. However, since grep accepts any expression or block to test the condition, we can use any kind of condition we like.

Just because grep tests each value rather than manipulating it does not mean that it has to leave the value untouched. Just as map can be made to act like grep by returning (), grep can be made to act like map by assigning a new value to $_. However, doing this alters the original list. This is fine if we intend to discard the original list, but it can lead to problematic code if we forget. Here is an example where the source list is generated by reading from standard input. We can't subsequently access this list from anywhere else, so there is no risk of making a mistake.

@numerics = grep { s/^(d+)/Line $1:/ } <>;

This example assigns the result of the substitution value to $_ for each matching line. The return value of the substitution is true if a match was made and false otherwise, so only the lines that were transformed are returned by the grep.

Just like map, grep can be used in a void context to change an original array without creating a new list of values. Also like map, from version 5.8.1 Perl will optimize such a grep so that no output values are generated to then get immediately discarded.

Chaining map and grep Functions Together

Both map and grep take lists as input and produce lists as output, so we can chain them together. The following example again reads a list of lines from standard input, and returns a list of all lines that were exactly five characters long (including the terminating linefeed), with each line lowercased and the first character capitalized (assuming it can be). Both ucfirst and lc will use $_ if given no explicit argument. We can write

@numerics = map { ucfirst } map { lc } grep { length($_)==5 } <>;

A chain like this can be a powerful way to quickly and concisely manipulate a list through several different stages, more so when the bodies are more complex (e.g., call subroutines) than the simplistic example given here. The drawback is that to make sense of the code we have to read it from back to front, which is a little counterintuitive.

This example also illustrates a typical situation where the block syntax of map and grep is much clearer than the expression syntax, which would require three sets of nested parentheses.

Summary

We started this chapter by exploring the basic structures of Perl. We covered statements, declarations, expressions, and blocks. We then looked in particular at the properties and facilities provided by blocks and the various ways in which they can be expressed and used. In particular, treating blocks as loops, defining do blocks, and working with BEGIN and END blocks are all discussed.

We covered Perl's conditional statements, if, else, elsif, and unless. After a short discussion on the nature of truth, we also looked in detail at how to create loops with for and foreach, while, until, do, do..while, and do...until and how to control loops with next, last, redo, and continue.

The chapter ended with a short discussion of the uses and disadvantages of goto, followed by a look at the map and grep functions, which turn a list of input values into a new list. We saw how to use both map and grep to implement code that both transforms and selectively removes values from the input list, plus how to chain multiple map and grep expressions together to achieve more complex kinds of list manipulation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset