In this chapter, we look at Perl's control structures, starting with the basic syntax of the language, expressions, and statements. We will then build these into more complex structures such as compound statements (also known as blocks) and conditional statements.
We will consider Perl's conditional statements, and then move on to read about loops in Perl. We will look at the various statements that we can use to create loops and how to use them in particular with lists, arrays, and hashes. We will also look into the modifiers provided by Perl to change the behavior of loops.
A Perl program consists of a mixture of statements, declarations, and comments. Statements are executed by the Perl interpreter at run time. Declarations, on the other hand, are directives that affect the way that the program is compiled. Therefore, a declaration's effect ends after compilation, whereas a statement will effect the program each time that section of code is run. The sub, my
, and use
keywords are the most obvious types of declarations.
Statements are made up of expressions, and can be combined together into a block or compound statement. An expression is any piece of Perl code that produces a value when it is evaluated; we saw plenty of these in the preceding chapters. For example, the number 3 is an expression because it produces 3 when evaluated. Operators combine expressions to form larger expressions. For example, 3+6 is an expression that produces the number 9. In Perl, however, the distinction between statements and expressions is rather arbitrary. Most statements are simply expressions whose return values are discarded rather than used. The distinction between statements and declarations is also somewhat arbitrary. Perl, like many other languages inspired by or derived from Lisp, allows code to be run during the compilation phase, prior to execution. This blurs the line between compilation and execution and between declarations and statements.
Blocks are just aggregations of statements, defined with the use of curly braces. A block defines its own scope, so variables declared inside a block only last so long as that block is being executed. More technically, blocks create a new stack frame, and variables allocated within it last only so long as that frame exists, unless a reference to them is retained by something outside the block. In that case, the variable persists (via the reference) so long as the reference does.
Declarations take effect at compile time, rather than at run time. An important class of declaration is the inclusion of modules or source code with the use
keyword. This includes both pragmatic modules like use integer
and functional modules like use CGI
. Another is declaring a subroutine ahead of its definition with sub
. Other kinds of declaration include my
and our
statements and format definitions. The following are all examples of declarations:
sub mysubroutine ($); # declare a subroutine with one scalar argument
my $scalar; # declare a lexical variable (at compile-time)
# define a format for STDOUT
format =
@<<<< = @>>>>
$key, $value
use warnings; # use pragmatic module
use strict; # use pragmatic module
use CGI qw (:standard); # use CGI module
BEGIN {
print "This is a compile-time statement";
}
The BEGIN
block is an interesting example, because it is the most obvious case of code being executed during the compilation rather than the execution of a program. The block behaves like a declaration, but the contents are statements. use
also falls into this category; it is really a require
statement and an import
statement wrapped inside a BEGIN
block that executes at compile time. So, arguably, use
is not a declaration, but like BEGIN
, a mechanism to execute code during the compile phase. Whether a BEGIN
block is a declaration or not is a moot point, but it happens at the compile phase, so it is certainly not a regular block.
All of these examples demonstrate features of Perl covered later in the book, so for now we will just note their existence and move on. For the curious, subroutines are covered in Chapter 7; my
and our
in Chapter 8; use, require
, and import
in Chapter 9; and the BEGIN
block (along with siblings END, CHECK
, and INIT
) is covered in more detail in Chapter 10. Formats can be found in Chapter 18.
An expression in Perl is any construct that returns a value (be it scalar, list, or even an undefined value). It can be anything from a literal number to a complex expression involving multiple operators, functions, and subroutine calls. That value can then be used in larger expressions or form part of a statement. Statements differ from expressions only in that they are separated from each other by semicolons. The chief distinction between a statement and an expression is that a statement does not return a value (for example, an if
statement) or returns a value that is ignored (for example, a print
statement).
This second kind of statement is really just an expression in disguise—Perl will detect that we don't use the return value and optimize the code accordingly, but we could still change our mind and use it if we choose to. In Perl terminology, these expressions are evaluated in void context. For example, the statement $b = $a
is also an expression, which returns the value of the assignment ($a
). We can see this in action by considering the following two statements:
$b = $a; # $b = $a is a statement
$c = $b = $a; # $b = $a is now an expression, and $a assigns to $c
Another way of looking at this is to say that a statement is an expression that is executed primarily because it performs a useful action rather than returns a result. A print
statement is probably the most common example:
print "Hello World";
This is a statement by virtue of the fact that the return value from print
is not used. Indeed, it is easy to forget that print
does in fact return a value. In point of fact, print
actually returns a true value if it succeeds and a false value if the filehandle to which it is printing is invalid. We rarely bother to check for this when printing to standard output, but if we wanted to, we could turn this statement into an expression by writing
$success = print "Hello World";
Now we have a print
expression whose value is used in an assignment statement. For standard output this is usually not necessary, but if we were printing to a filehandle, and especially to one that could become invalid outside our control, like a network socket, this becomes more important.
The block is a Perl construct that allows several statements (which in turn may be simple statements or further blocks) to be logically grouped together into a compound statement and executed as a unit. Blocks are defined by enclosing the statements within curly braces with the final statement optionally terminated by a semicolon. The general form of a block is therefore
{ STATEMENT; STATEMENT; ... ; STATEMENT[;] }
This is the most obvious form of block. Less obvious is that a block is also created by the limits of a source file. A simple Perl script is a block that starts at the top of the file and ends at the bottom (or a __DATA__
or __END__
token, if one is present in the file; see Chapter 12 for more on these). Likewise, an included file that has been read in using require
also defines a block corresponding to the included file.
The definition of a block is important because in addition to grouping statements together logically, a block also defines a new scope in which variables can be declared and used. Following are two short programs, one of which executes the other via require
. Both files define their own scope, and in addition the explicit block in the second program child.pl
defines a third. Here is the parent process:
#!/usr/bin/perl
# parent.pl
use warnings;
use strict;
my $text = "This is the parent";
require 'child.pl';
print "$text
"; # produces "This is the parent"
and here is the child process:
#!/usr/bin/perl
# child.pl
use warnings;
use strict;
my $text = "This is the child";
{
my $text = "This is block scoped";
print "$text
"; # produces "This is block scoped";
}
print "$text
"; # produces "This is the child";
Variables that are defined within a particular scope only exist as long as that block is being executed and are not seen or usable by the wider scope outside the block. This has a lot of significant implications, as we will see.
Almost all of Perl's control structures (such as if
statements, for
and while
loops, and subroutine declarations) can accept a block in their definitions, and many require it. For example, the if
statement requires a block to encapsulate as the action that follows its condition; a simple statement will not do and will cause Perl to generate a syntax error.
if (EXPRESSION) { STATEMENT; STATEMENT; ... STATEMENT[;] }
Or, put more simply:
if (EXPRESSION) BLOCK
Note that a block is not the equivalent of a statement. As we just saw, blocks are accepted in places where simple statements are not. Also, blocks do not require a terminating semicolon after the closing brace, unlike the statements inside it. Significantly, in some contexts blocks can also return a value.
Naked Blocks
Although it is their most common application, blocks do not have to belong to a larger statement. They can exist entirely on their own, purely for the purposes of defining a scope. The following example shows a block in which several scalar variables are defined using my
. The variables exist for the lifetime of the block's execution and then cease to exist.
#!/usr/bin/perl
# time.pl
use warnings;
# a bare block definition
{
# define six scalars in new block scope:
my ($sec, $min, $hour, $day, $month, $year) = localtime();
# variables exist and can be used inside block
print "The time is: $hour: $min. $sec
";
$month++;
$year += 1900;
print "The date is: $year/ $month/ $day
";
# end of block - variable definitions cease to exist
}
# produces 'uninitialized value' warning - $sec does not exist here
print "$sec seconds
";
The output from this is
Name "main::sec" used only once: possible typo at d.pl line 18.
The time is: 2: 30. 5
The date is: 2000/ 12/ 15
Use of uninitialized value in concatenation (.) at d.pl line 18.
seconds
Note that adding use strict
would turn the preceding warning into a compile-time syntax error as strictness requires declaring all variables.
If we take a reference to a bare block, it can also be used to define an anonymous subroutine, a subject we will cover in Chapter 7.
Defining the Main Program As a Block
An interesting use of blocks is to put the main program code into a block within the source file. This helps to distinguish the actual program from any declarations or initialization code (in the shape of use
statements and so forth) that may occur previously. It also allows us to restrict variables needed by the main program to the scope of the main program only, rather than turning them into global variables, which should be avoided. Consider the following simple but illustrative program:
#!/usr/bin/perl
# blockmain.pl
# Declarations First
use strict;
use warnings;
# Initialization code, global scope
my $global_variable = "All the World can see Me";
use constant MY_GLOBAL_CONSTANT => "Global Constant";
# Here is the main program code
MAIN: {
# variable defined in the main program scope, but not global
my $main_variable = "Not visible outside main block";
print_variables ($main_variable);
}
# No one here but us subroutines...
sub print_variables {
print $global_variable, "
", MY_GLOBAL_CONSTANT, "
";
# print $main_variable, "
"; #error!
print $_[0], "
"; # passed from main block, ok now
}
We have used a label MAIN:
to prefix the start of the main program block to make it stand out. The use of the label MAIN:
is entirely arbitrary—we could as easily have said MY_PROGRAM_STARTS_NOW:
. However, MAIN:
is friendlier to those coming from a C programming background where a main
function is required. Of course, we could also create a real main
subroutine, and we need to make sure that we call it.
The issue of scoping variables so they are invisible from subroutines is not a minor one. If we had failed to enable warnings
and strict
mode, and if we had uncommented the second line of print_variables
, Perl would have happily accepted the undefined variable $main_variable
and printed out a blank line. By placing otherwise global variables inside the scope of a main
block, we prevent them from being accidentally referred to inside subroutines, which should not be able to see them.
Blocks As Loops
Bare blocks can sometimes be treated as loops, which are discussed in detail later in the chapter. A block that is not syntactically required (for example, by an if
statement) or is part of a loop statement can be treated as a loop that executes only once and then exits. This means that loop control statements like next, last
, and redo
will work in a block. Because blocks are one-shot loops, next
and last
are effectively the same. However, redo
will reexecute the block.
In short, these three loops all do the same thing, one with a while
, one with a foreach
, and one with a bare block and a redo
:
#!/usr/bin/perl
# while.pl
use warnings;
use strict;
my $n = 0;
print "With a while loop:
";
while (++$n < 4) {print "Hello $n
";}
print "With a foreach loop:
";
foreach my $n (1..3) { print "Hello $n
"; }
print "With a bare block and redo:
";
$n = 1; { print "Hello $n
";
last if (++$n > 3); redo; }
The block of an if
statement is required syntactically, and if
is not a loop statement, so the redo
statement here will not work:
#!/usr/bin/perl
# badblockloop.pl
use warnings;
use strict;
if (defined(my $line = <>)) {
last if $line =˜/quit/;
print "You entered: $line";
$line = <>;
redo;
}
print "Bye!
";
The fact that redo, next
, and last
do not work in if
blocks is actually a blessing. Otherwise it would be hard, albeit not impossible, to break out of a loop conditionally. Instead we get a syntax error.
Can't "redo" outside a loop block at ./badblockloop.pl line 10, <> line 2.
However, we can nest blocks inside each other, so by adding an extra bare block we can fix the preceding program so that it will work.
#!/usr/bin/perl
# Blockloop.pl
use warnings;
use strict;
if (defined(my $line = <>)) { { # <- note the extra block
last if $line =˜/quit/;
print "You entered: $line";
$line = <>;
redo;
} }
print "Bye!
";
Using blocks as loops is an interesting approach to solving problems, but they are not always the simplest or easiest to understand. The preceding script could more easily be fixed simply by replacing the if
with a while
. This makes more sense and does not require an extra block because while
is a looping statement:
#!/usr/bin/perl
# blockwhile.pl
use warnings;
use strict;
while (my $line = <>) {
last if $line =˜/quit/;
print "You entered: $line";
}
print "Bye!
";
We cover loops in more detail later in the chapter.
The do Block
Blocks do not normally return a value; they are compound statements, not expressions. They also provide a void context, which applies to the last statement in the block. This causes its value to be discarded, just as all the statements before it are. However, the do
keyword allows blocks to return their values as if they were expressions, the value being derived from the last statement. Let's consider an example:
@words = do {
@text = ("is", "he", "last");
sort @text;
};
In this example, a list is generated and returned by the sort
function. We could make this more explicit by adding a return
beforehand as we do for subroutines, but it is not actually necessary. (return
is not necessary in subroutines either, but it certainly adds clarity.)
Because prefixing do
to a block turns it into an expression, it often needs to be followed by a semicolon when used as the final part of a statement. Omitting the final semicolon from statements like the preceding one is a common mistake, because in any other context a block does not require a following semicolon.
There is another, syntactic, reason for needing a do
to return the value of a block. Without do
Perl would have a hard time telling apart a bare block from a hash definition.
$c = do { $a = 3, $b = 6 }; # a block, $c = 6
{ $a = 3; $b = 6 } # has a semicolon, therefore a block
# a hash definition, $c = {3 => 6}, test with 'print keys %{$c}'
$c = { $a = 3, $b = 6 };
Regarding the use of blocks as loops, do
blocks are not considered loops by Perl, because the block is syntactically required by the do
. Loop-control statements will therefore not work inside a do
block. However, a do
block can be suffixed with a loop condition such as while
or until
, in which case it is transformed into a loop.
do { chomp($line = <>); $input. = $line } until $line =˜/^stop/;
The block is executed before the condition is tested, so in this example the word stop
will be added to the end of $line
before the loop terminates.
BEGIN
and END
blocks are special blocks that are executed outside the normal order of execution. We can use them in any application, though they are mostly used in modules, and accordingly we cover them in detail in Chapter 10. They are worth a brief examination here, because apart from the circumstances of their execution, they have a lot in common with regular bare blocks.
BEGIN
blocks are executed during the compilation phase as they are encountered by the interpreter, so their contents are compiled and run before the rest of the source code is even compiled. We can define multiple BEGIN
blocks, which are executed in the order the interpreter encounters them. This is especially relevant when we consider that the use
statement uses an implicit BEGIN
block to allow modules to export definitions before the main program code is compiled.
END
blocks are the inverse of BEGIN
blocks; they are executed by the interpreter after the application exits and before the exit of the interpreter itself. They are useful for "cleanup" duties such as closing database connections, resetting terminal properties, or deleting temporary files. We can also define multiple END
blocks, in which case they are executed in reverse order of definition.
The following is a short script that shows both BEGIN
and END
blocks in action:
#!/usr/bin/perl
# begend.pl
use warnings;
use strict;
END {
print "Exiting...
";
}
print "Running!
";
fun();
sub fun {
print "Inside fun
";
}
BEGIN {
print "Compiling...
";
# can't call 'fun' - not compiled yet
# fun();
}
When run, this program prints out the following:
> perl begend.pl
Compiling...
Running!
Inside fun
Exiting...
As the output shows, the BEGIN
block was executed first. Since the fun
subroutine had not yet been compiled when the BEGIN
block gets executed, attempting to call fun
from inside the BEGIN
block would cause an error. On other hand, the END
block, which defined first in the program, is executed last.
Perl actually defines five special block types, though only BEGIN
and END
are in widespread use. Two others are CHECK
and INIT
, which take place just after the compile phase and just before the run phase, respectively. Though these are rarely used in practice, we cover them in Chapter 10 also. The final special block is DESTROY
, and it is used in object-oriented modules covered in Chapter 19.
Conditional statements execute the body of the statement (sometimes known as a branch or branch of execution) only if a given Boolean condition is met. The condition is an expression whose value is used to determine the course of execution. Perl's primary mechanism for conditional execution is the if
statement and its related keywords, unless, else
, and elsif
. However, Perl being as flexible as it is, there are other ways we can write conditions too.
Multiple-branch conditions are implemented in other languages using special multiple-branch conditional statements like switch
or case
. Perl has no such equivalent, because it does not need one. As we will see, there are already plenty of ways to write a multiple-branch condition in Perl. However, for those who really must have a dedicated switch statement, Perl provides the Switch
module.
Before embarking on a detailed look at these functions, it is worth taking a brief diversion to discuss the nature of truth in Perl.
Perl has a very broad-minded view of the meaning of true
and false
—in general, anything that has a "non-zero" value is true. Anything else is false. By "non-zero" we mean that the value is in some sense "set." Even 0
is information of a sort, especially compared to undef
.
There are a few special cases. The string 0
is considered false even though it has a value, as a convenience to calculations that are performed in string context. The undefined value also evaluates to false for the purposes of conditions. However, a string of spaces is true, as is the string 00
. The examples in Table 6-1 illustrate various forms of truth and falsehood.
Table 6-1. True and False Values
Value | True/ False |
1 |
True |
-1 |
True |
"abc " |
True |
0 |
False |
"0" |
False |
"" | False |
" " | True |
"00 " |
True |
"0E0 " |
True (this is returned by some Perl libraries) |
"0 but true " |
True (ditto) |
() |
False (empty list) |
undef |
False |
To distinguish between the undefined value and other values that evaluate to false, we can use the defined
function; for instance:
if (defined $var) {
print "$var is defined";
}
The ability to handle undef
as distinct from true
and false
is very useful. For example, it allows functions and subroutines that return a Boolean result to indicate an "error" by returning undef
. If we want to handle the error, we can do so by checking for undef
. If we do not care or need to know, we can just check for truth instead.
if (defined($var) && $var) {
print "true
";
}
As we have already seen, basic conditions can be written using an if
statement. The basic form of an if
statement is as follows (note that a trailing semicolon is not required):
if (EXPRESSION) BLOCK
Here EXPRESSION
is any Perl expression, and BLOCK
is a compound statement—one or more Perl statements enclosed by curly braces. BLOCK
is executed only if EXPRESSION
is true. For instance, in the preceding example, the block that contains print true
"
would be executed only if the expression (defined($var) && $var)
evaluates to true, that is, only if $var
is defined and true.
We can invert the syntax of an if
statement and put the BLOCK
first. In this case, we can omit both the parentheses of the condition and also replace the block with a bare statement or list of statements. The following forms of if
statement are all legal in Perl:
BLOCK if EXPRESSION;
STATEMENT if EXPRESSION;
STATEMENT, STATEMENT, STATEMENT if EXPRESSION;
For example:
print "Equal" if $a eq $b;
print (STDERR "Illegal Value"), return "Error" if $not_valid;
close FILE, print ("Done"), exit if $no_more_lines;
return if $a ne $b;
The use of the comma operator here deserves a little attention. In a list context (that is, when placed between parentheses), the comma operator generates lists. However, that is not how it is being used here. In this context, it simply returns the right-hand side, discarding the left, so it becomes a handy way to combine several statements into one and relies on the fact that most statements are also expressions.
The inverted syntax is more suitable for some conditions than others. As Perl's motto has it, there is more than one way to do it (so long as the program remains legible). In the preceding examples, only the last return
statement is really suited to this style; the others would probably be better off as normal if
statements.
Beware declaring a variable in an inverted conditional statement, since the variable will only exist if the condition succeeds. This can lead to unexpected syntax errors if we have warnings
enabled and unexpected bugs otherwise.
use warnings;
$arg = $ARGV[1] if $#ARGV;
if ($arg eq "help" ) { #$arg may not be declared
print "Usage:
";
...
}
...
We would be unlikely to leave $arg
undefined if we had written a conventional if
statement because the declaration would be inside the block, making it obvious that the scope of the variable is limited. However, the inverted syntax can fool us into thinking that it is a declaration with wider scope.
If, then
, and else
conditions are implemented with the else
keyword.
if (EXPRESSION) BLOCK else BLOCK
For example:
# First 'if else' tests whether $var is defined
if (defined $var) {
# If $var is defined, the second 'if else' tests whether $var is true
if ($var) {
print "true
";
} else {
print "false
";
}
} else {
print "undefined
";
}
However, it is not legal (and not elegant, even if it were) to invert an if
statement and then add an else
clause.
# ERROR!
return if $not_valid else { print "ok" };
If we have multiple mutually exclusive conditions, then we can chain them together using the elsif
keyword, which may occur more than once and may or may not be followed by an else
.
if (EXPRESSION) BLOCK elsif (EXPRESSION) BLOCK elsif...
if (EXPRESSION) BLOCK elsif (EXPRESSION) BLOCK else BLOCK
For example, to compare strings using just if
and else
we might write
if ($a eq $b) {
print "Equal";
} else {
if ($a gt $b) {
print "Greater";
} else {
print "Less";
}
}
The equivalent code written using elsif
is simpler to understand, shorter, and avoids a second level of nesting:
if ($a eq $b) {
print "Equal";
} elsif ($a gt $b) {
print "Greater";
} else {
print "Less";
}
Note that the else if
construct, while legal in other languages such as C, is not legal in Perl and will cause a syntax error. In Perl, use elsif
instead. Also note that if $a
is less than $b
most of the time, then we would be better off rewriting this statement to test $a lt $b
first, then $a gt $b
or $a eq $b
second. It pays to work out the most likely eventuality and then make that the fastest route through our code.
If the conditions are all testing the same expression with different values, then there are more efficient ways to do this. See "Switches and Multibranched Conditions" later in the chapter for some examples.
The if, unless
, and elsif
keywords all permit a variable to be declared in their conditions. For example:
if (my @lines = <HANDLE>) { # test if there is a filehandle called HANDLE
...do something to file contents...
} else {
"Nothing to process
";
}
The scope of variables declared in this fashion is limited to that of the immediately following block, so here @lines
can be used in the if
clause but not the else
clause or after the end of the statement.
If we replace the if
in an if
statement with unless
, the condition is inverted. This is handy for testing a condition that we want to act on if it evaluates to false, such as trapping error conditions.
# unless file filename is successfully opened then return a failure message
unless (open FILE, $filename) {
return "Failed to open $filename: $!";
}
We can also invert the syntax of an unless
statement, just as we can with if
.
return "Failed to open $filename: $!" unless (open FILE, $filename);
This is exactly the same as inverting the condition inside the parentheses but reads a little better than using an if
and not
:
if (not open FILE, $filename) {
return "Failed to open $filename: $!";
}
It is perfectly legal, though possibly a little confusing, to combine unless
with an else
or elsif
as in the following:
unless (open FILE, $filename) {
return "Failed to open $filename: $!";
} else {
@lines = <FILE>;
foreach (0..$#lines) {
print "This is a line
"
}
close FILE;
}
In this case, it is probably better to write an if-not
expression or to invert the clauses, since unless-else
is not a natural English construct.
Perl's logical operators automatically execute a shortcut to avoid doing unnecessary work whenever possible, a feature Perl shares with most languages derived from or inspired by C (see Chapter 4). Take the following example:
$result = try_first() or try_second() or try_third ();
If try_first
returns a true value, then clearly Perl has no need to even call the try_second
or try_third
functions, since their results will not be used; $result
takes only one value, and that would be the value returned by try_first
. So Perl takes a shortcut and does not call them at all.
We can use this feature to write conditional statements using logical operators instead of if
and unless
. For example, a very common construct to exit a program on a fatal error uses the die
function that, upon failure, prints out an error message and finishes the program.
open (FILE, $filename) or die "Cannot open file: $!";
This is equivalent, but more direct, than the more conventional
unless (open FILE, $filename) {
die "Cannot open file: $!";
}
We can also provide a list of statements (separated by commas) or a do
block for the condition to execute on success. Here is an example that supplies a list:
open (FILE, $filename) or
print (LOG "$filename failed: $!"), die "Cannot open file:$!";
Not every programmer likes using commas to separate statements, so we can instead use a do
block. This also avoids the need to use parentheses to delineate the arguments to print
.
open (FILE, $filename) or do {
print LOG "$filename failed: $!";
die "Cannot open file: $!";
};
When writing conditions with logical operators, it is good practice to use the low-precedence and, or
, and not
operators, instead of the higher priority &&, ||
, and !
. This prevents precedence from changing the meaning of our condition. If we were to change the previous example to
# ERROR! This statement ...
open (FILE, $filename)
|| print (LOG "$filename failed: $1"), die "Cannot open file:$!";
Perl's precedence rules would cause it to interpret this as a list containing a condition and a die
statement.
# ERROR! ... actually means this
(open (FILE, $filename) || print (LOG "$filename failed: $1")),
die "Cannot open file:$!";
As a result, this statement will always cause the program to die with a "Cannot open file" message, regardless of whether the open
failed or succeeded.
Using a do
block avoids all these problems and also makes the code easier to comprehend. Either a ||
or an or
will work fine in this rewritten example:
open (FILE, $filename) || do {
print LOG "$filename failed: $1";
die "Cannot open file:$!";
};
Whether this is better than the original if
form is questionable, but it does emphasize the condition in cases where the condition is actually the point of the exercise. In this case, the open
is the most significant thing happening in this statement, so writing the condition in this way helps to emphasize and draw attention to it.
The drawback of these kinds of conditions is that they do not lend themselves easily to else
type clauses. The following is legal, but tends toward illegibility:
open (FILE, $filename), $text = <FILE> or die "Cannot open file: $!";
It would also fail with a closed filehandle error if we used ||
instead of or
; this has higher precedence than the comma and would test the result of $text = <FILE>
and not the open
.
The ternary operator, ?:
, is a variant of the standard if
-style conditional statement that works as an expression and returns a value that can be assigned or used in other expressions. It works identically to the ternary operator in C. The operator evaluates the first expression: if that expression is true, it returns the value of the second expression; and if the first expression was false, then the operator returns the value of the third expression. This is what it looks like:
result = expression1 ? expression2 : expression3
The ternary operator is very convenient when the purpose of a test is to return one of two values rather than follow one of two paths of execution. For example, the following code snippet adds a plural s
conditionally, using a conventional if
-else
condition:
#!/usr/bin/perl
# plural_if.pl
use warnings;
use strict;
my @words = split ('s+', <>); #read some text and split on whitespace
my $count = scalar (@words);
print "There ";
if ($count == 1) {
print "is";
} else {
print "are";
}
print " $count word";
unless ($count == 1) {
print "s";
}
print " in the text
";
Running this program and entering some text produces messages like
There are 0 words in the text
There is 1 word in the text
There are 4 words in the text
The same code rewritten using the ternary operator is considerably simpler.
#!/usr/bin/perl
# plural_ternary.pl
use warnings;
use strict;
my @words = split ('s+', <>); #read some text and split on whitespace
my $words = scalar (@words);
print "There ", ($words == 1)?"is":"are"," $words word",
($words == 1)?"":"s","
in the text
";
We can also nest ternary operators, though doing this more than once can produce code that is hard to read. The following example uses two ternary operators to compute a value based on a string comparison using cmp
, which can return −1, 0, or 1:
#!/usr/bin/perl
# comparison.pl
use warnings;
use strict;
my @words = split ('s+',<>);
die "Enter two words
" unless scalar(@words) == 2;
my $result = $words[0] cmp $words[1];
print "The first word is ", $result ? $result>0 ? "greater than" :
"less than" : "equal to "," the second
";
This program checks that we have entered exactly two words, and if so it prints out one of the following three messages:
The first word is less than the second
The first word is greater than the second
The first word is equal to the second
The nested ternary operators know which ?
and :
belongs where, but it does not make for legible code. To improve upon this, the last line is probably better written with parentheses.
print "The first word is ", $result
? ($result > 0 ? "greater than" : "less than")
: "equal to", " the second
";
This makes it much simpler to see which expression belongs to which condition.
Be careful when combining the ternary operator into larger expressions. The precedence of operators can sometimes cause Perl to group the parts of an expression in ways we did not intend, as in the following example:
#!/usr/bin/perl
# plural_message.pl
use warnings;
use strict;
my @words = split ('s+', <>);
my $words = scalar (@words);
#ERROR!
my $message = "There ". ($words==1) ? "is" :
"are". " $words word".
($words == 1)?"" : "s". " in the text
";
print $message;
This appears to do much the same as the previous example, except it stores the resulting message in an intermediate variable before printing it. But (unlike the comma operator) the precedence of the concatenation operator, ., is greater than that of the ternary ?
or :
, so the meaning of the statement is entirely changed. Using explicit parentheses, the first expression is equivalent to
"There ", (($words==1)? "is" : "are"), " $words word",
(($words == 1)?"" : "s"), " in the text
";
But with the concatenation operator, what we actually get is
("There ". ($words==1))? "is" : ("are". " $words word",
($words == 1)?"" : "s". " in the text
");
The expression ("There ". ($words==1))
always evaluates to a true value, so the result of running this program will always be to print the word "is" regardless of the input we give it.
One final trick that we can perform with the ternary operator is to use it with expressions that return an lvalue
(that is, an assignable value). An example of such an expression is the substr
function.
#!/usr/bin/perl
# fix.pl
use warnings;
use strict;
my $word = "mit";
my $fix = "re";
my $before = int(<>); #no warnings in case we enter no numeric text
($before ? substr($word, 0, 0): substr ($word, length($word), 0)) = $fix;
print $word, "
";
In this program the contents of $fix
are either prefixed or postfixed to the contents of the variable $word
. The ternary operator evaluates to either the beginning or the end of the value in $word
as returned from substr
. This value is then assigned the value of $fix
, modifying the contents of $word
, which is then printed out.
The result of this program is either the word remit
, if we enter any kind of true value (such as 1
), or mitre
, if we enter either nothing or a string that evaluates to false (such as "0
", or a nonnumeric value).
A switch is a conditional statement that contains multiple branches of execution. It can be thought of as rotary switch with several different positions. A simple but crude way to implement a switch is with an if...elsif...else
statement, as we have already seen.
if ($value == 1) {
print "First Place";
} elsif ($value == 2) {
print "Second Place";
} elsif ($value == 3) {
print "Third Place";
} else {
print "Try Again";
}
The problem with this kind of structure is that after a few conditions it becomes hard to understand. Perl does not have a built-in multiple-branch conditional statement like C or Java, but it does not really need one as there are many ways to achieve the same effect, including the Switch
module for those who disagree. Here are two ways of writing the same set of conditions in a block:
SWITCH: {
if ($value == 1) { print "First Place" };
if ($value == 2) { print "Second Place" };
if ($value == 3) { print "Third Place" };
if ($value > 3) { print "Try Again" };
}
SWITCH: {
$value == 1 and print "First Place";
$value == 2 and print "Second Place";
$value == 3 and print "Third Place";
$value > 3 and print "Try Again";
}
Here the block does not actually do anything useful except to allow us to group the conditions together for clarity. The SWITCH:
label that prefixes the block likewise has no function except to indicate that the block contains a multiple-branch condition. Both of these examples are also less efficient than the original example because all conditions are tested, even if an earlier one matches. But as we saw earlier, bare blocks can be considered loops, so we can use the last
loop control statements to break out of the block after the correct match.
SWITCH: {
$value == 1 and print ("First Place"), last;
$value == 2 and print ("Second Place"), last;
$value == 3 and print ("Third Place"), last;
print "Try Again"; # default case
}
As a bonus, the use of last
, like break
in C, guarantees that we cannot go on to match more than one condition, which in turn allows us to express later conditions a little more loosely, since they do not have to worry about avoiding matches against values now catered for by earlier cases.
We can also make use of the label to make our last
statements more explicit.
SWITCH: {
$value == 1 and print ("First Place"), last SWITCH;
$value == 2 and print ("Second Place"), last SWITCH;
$value == 3 and print ("Third Place"), last SWITCH;
print "Try Again"; # default case
}
Here the meaning of last
is clear enough, but the label can be very useful in longer clauses and particularly in multiple nested blocks and conditions.
If the cases we want to execute have only one or two statements and are similar, it is fine just to write them as a comma-separated list, as in this example. If the cases are more complex, however, this rapidly becomes illegible. A better solution in this case might be to use do
blocks.
SWITCH: {
$value == 1 and do {
print "First Place";
last;
};
$value == 2 and do {
print "Second Place";
last;
};
$value == 3 and do {
print "Third Place";
last;
};
print "Try Again";
}
Note that a do
block does not count as a loop, so the last
statements still apply to the switch block that encloses them. This is fortunate; otherwise we would have to say last SWITCH
to ensure that right block is referred to. Of course, we can choose to use the label anyway for clarity if we choose, as noted previously.
If we are testing the value of a string rather than an integer, we can reproduce the preceding techniques but just replace the conditions with string equality tests.
SWITCH: {
$value eq "1" and print ("First Place"), last;
$value eq "2" and print ("Second Place"), last;
$value eq "3" and print ("Third Place"), last;
print "Try Again";
}
Having said this, if our strings are numeric, we can do a numeric comparison if need be. In this example, $value eq "1"
and $value == 1
have precisely the same result, thanks to Perl's automatic string number conversion. Of course, this only holds so long as we don't go past "9".
We can also use regular expression matching.
SWITCH: {
$value =˜/^1$/ and print("First Place"), last;
$value =˜/^2$/ and print("Second Place"), last;
$value =˜/^3$/ and print("Third Place"), last;
print "Try Again";
}
This might not seem much of an improvement, but regular expressions have the useful feature that if they are not associated with a value, then they use the contents of the special variable $_
that Perl provides internally. As we mentioned earlier, it is the "default variable" that functions read or write from if no alternative variable is given. We will see in "Using foreach with Multibranched Conditions" how to use this with foreach
to rewrite our switch.
The Switch Module
The Switch
module gives Perl a bona fide switch
and case
statement, allowing us to write multibranch conditions in a similar style to languages that provide them natively.
use Switch;
switch (10 * rand) {
case 1 { print "First Place" }
case 2 { print "Second Place" }
case 3 { print "Third Place" }
else { print "...Also Ran" }
}
As this example illustrates, a default branch can also be created with else
, which works just the same as it would in an if...else
statement.
An advantage of this switch
statement is that by default cases do not fall through, that is, once a given case matches the value, no further cases are considered. This differs from both C, where we must explicitly break out of the switch, and the examples earlier, where we had to use last
. If we actually want to fall through to other cases, we can explicitly do so with next
.
...
case 4 { print "Fourth Place"; next }
...
If the switch is given the value 4, it will now output Fourth Place...Also Ran
.
Alternatively, to have all cases fall through by default(in the style of C) append 'fallthrough'
to the use
statement. To break out of the switch, we must now request it explicitly with last
, as in the previous examples. The next example is equivalent to the previous one, with the additional case 4
, but with fall through enabled as the default:
use Switch 'fallthrough';
switch (10 * rand) {
case 1 { print "First Place", last }
case 2 { print "Second Place", last }
case 3 { print "Third Place", last }
case 4 { print "Fourth Place" }
else { print "...Also Ran" }
}
The conditions can be almost anything and will be evaluated in the most appropriate way. We can use numbers, strings, and regular expressions. We can also use hashes and hash references (true if the value being tested exists as a key in the hash), array references (true if the value is in the array), code blocks, and subroutines. Here is an example that exercises most of the possibilities available:
#!/usr/bin/perl
# bigswitch.pl
use strict;
use warnings;
use Switch;
my $perl = "Perl";
my %hash = ( "pErl" => 2, "peRl" => 3 );
my $cref = sub { $_[0] eq "pERl" };
sub testcase { $_[0] eq "peRL" };
my @array = (2..4);
my @values=qw[
1 perl Perl 3 6 pErl PerL pERL pERl peRL PERL php
];
foreach my $input (@values) {
switch ($input) {
case 1 { print "1 literal number" }
case "perl" { print "2 literal string" }
case ($perl) { print "3 string variable" }
case (@array) { print "4 array variable reference" }
case [5..9] { print "5 literal array reference" }
case (%hash) { print "6 hash key" }
case { "PerL" => "Value" } { print "7 hash reference key" }
case { $_[0] eq "pERL" } { print "8 anonymous sub" }
case ($cref) { print "9 code reference (anonymous)" }
case (&testcase) { print "A code reference (named)" }
case /^perl/i { print "B regular expression" }
else { print "C not known at this address" }
}
print "
";
}
The seventh and eighth cases in the previous example, hash reference and anonymous subroutine, bear a little closer examination. Both are delimited by curly braces, but the switch can tell them apart because of the operator in use (=>
versus eq
). This prescience is possible because the Switch
module actually parses the source code just prior to execution and works out the most sensible thing to do based on what it sees.
The anonymous subroutine also bears examination because it refers to the variable $_[0]
, which is not otherwise defined in this program. What is actually going on here is hinted at by the fact that this case is called "anonymous subroutine." The block { $_[0] eq "pERL" }
is actually a subroutine defined in place within the case
statement, and $_[0]
simply accesses the first argument passed to it, which is the value of $input
. It is therefore exactly equivalent to the ninth and tenth "code reference" cases, just more concise.
Interestingly, the switch value can also be a code reference or subroutine, in which case the case tests are applied to it instead. There are limitations to this method, since there is no way to pass a conventional text value. Instead it must be written explicitly into the subroutine.
#!/usr/bin/perl -w
# switchonsub.pl
use strict;
use Switch;
my $input;
sub lessthan { $input < $_[0] };
$input=int(<>);
switch ( &lessthan ) {
case 10 { print "less than 10" }
case (100-$input) { print "less than 50" }
case 100 { print "less than 100" }
}
There are friendlier ways to handle this kind of situation using a closure (see Chapter 7), and not every subroutine-based switch necessarily needs to reference a global variable the way this one does, but in a lot of cases there is likely to be a better way to express the problem, for instance with explicit case conditions like case { $_ < 10 }
.
Perl 6 will provide a native multibranch statement, but using given
and when
in place of switch
and case
. The Switch
module can be told to use Perl 6 terminology by appending 'Perl6'
to the use
statement.
use Switch 'Perl6';
given ($value) {
when 1 { print "First Place" }
when 2 { print "Second Place" }
when 3 { print "Third Place" }
}
Simple if
and unless
statements do not return a value, but this is not a problem since we can write a conditional expression using the ternary operator. For multiple-branch conditions, we have to be more inventive, but again Perl provides several ways for us to achieve this goal. One way to go about it is with logical operators using a do
block.
print do {
$value == 1 && "First Place" ||
$value == 2 && "Second Place" ||
$value == 3 && "Third Place" ||
"Try again"
}, "
";
If this approach does not suit our purposes, we can always resort to a subroutine and use return
to return the value to us.
sub placing {
$_[0] == 1 and return "First Place";
$_[0] == 2 and return "Second Place";
$_[0] == 3 and return "Third Place";
return "Try Again";
}
print placing ($value), "
";
Or, using the ternary operator:
sub placing {
return $_[0] == 1? "First place" :
$_[0] == 2? "Second place" :
$_[0] == 3? "Third place" :
"Try Again";
}
While this works just fine, it does not scale indefinitely. For situations more complex than this, it can be easier to decant the conditions and return values into the keys and values of a hash, then test for the hash key. Finally, there is another solution involving using foreach
, which we will also consider in "Using foreach with Multibranched Conditions."
A loop is a block of code that is executed repeatedly, according to the criteria of the loop's controlling conditions. Perl provides two kinds of loop:
for
and foreach
while
and until
The distinction between the two types is in the way the controlling conditions are defined.
The for
and foreach
loops iterate over a list of values given either explicitly or generated by a function or subroutine. The sense of the loop is "for each of these values, do something." Each value in turn is fed to the body of the loop for consideration. When the list of values runs out, the loop ends.
The while
and until
loops, on the other hand, test a condition each time around the loop. The sense of the loop is "while this condition is satisfied, keep doing something." If the condition succeeds, the loop body is executed once more. If it fails, the loop ends. There is no list of values and no new value for each iteration, unless it is generated in the loop body itself.
Both kinds of loop can be controlled using statements like next, last
, and redo
. These statements allow the normal flow of execution in the body of a loop to be restarted or terminated, which is why they are also known as loop modifiers. We have already talked about loop modifiers briefly in Chapter 2, but will learn more about them later in this chapter.
Because Perl is such a versatile language, there are also ways to create loop-like effects without actually writing a loop. Perl provides functions such as map
and grep
that can often be used to produce the same effect as a foreach
or while
loop—but more efficiently. In particular, if the object of a loop is to process a list of values and convert them into another list of values, map
may be a more effective solution than an iterative foreach
loop.
The for
and foreach
keywords are actually synonyms, and typically differ only in how they get used. for
is used, by convention, for loops that imitate the structure of the for
loop in C. Here's how a for
loop can be used to count from nine to zero:
for ($n = 9; $n >= 0; $n—) {
print $n;
}
Any C programmer will recognize this syntax as being identical to C, with the minor exception of the dollar sign of Perl's scalar data type syntax. Similarly, to count from zero to nine we could write
for ($n = 0; $n < 10; $n++) {
print $n, "
";
sleep 1;
}
print "Liftoff!
";
The parenthesized part of the for
loop contains three statements: an initialization statement, a condition, and a continuation statement. These are usually (but not always) used to set up and check a loop variable, $n
in the first example. The initialization statement (here $n=0
) is executed before the loop starts. Just before each iteration of the loop the condition $n<10
is tested. If true the loop is executed; if false the loop finishes. After each completion of the loop body, the continuation statement $n++
is executed. When $n
reaches 10, the condition fails and the loop exits without executing the loop body, making 9 the last value of $n
to be printed and giving $n
the value 10 after the loop has finished.
In the preceding example, we end up with the scalar variable $n
still available, even though it is only used inside the loop. It would be better to declare the variable so that it only exists where it is needed. Perl allows the programmer to declare the loop variable inside the for
statement. A variable declared this way has its scope limited to the body of the for
loop, so it exists only within the loop statement:
for (my $n = 0; $n < 10; $n ++) {
print $n,' is ', ($n % 2)? 'odd' : 'even';
}
In this example, we declare $n
lexically with my
, so it exists only within the for
statement itself. (For why this is a good idea and other scoping issues, see Chapter 8.)
As an aside, the for
loop can happily exist with nothing supplied for the first or last statement in the parentheses. Remember, however, that the semicolons are still required to get C-style semantics since for
and foreach
are synonyms. The following is thus a funny looping while
loop:
for (; eof (FILE) ;) {
print <FILE>;
}
While we are on the subject and jumping ahead for a moment, the optional continue
block is really the same construct as the last statement of a C-style for
loop, just with a different syntax. Here is the equivalent of the earlier for
loop written using while
:
$n = 0;
while ($n < 10) {
print $n, ' is ', ($n % 2)? 'odd': 'even';
} continue {
$n ++;
}
The C-style for
loop is familiar to C programmers, but it is often unnecessarily complicated. For instance, one of the most common uses of a for
loop in C is to iterate over the contents of an array using a loop variable to index the array. In the following example, the loop variable is $n
, and it is used to index the elements of an array (presumed to already exist) called @array
. The first element is at index 0, and the highest is given by $#array
.
for (my $n = 0; $n < $#array; $n++) {
print $array [$n], "
";
}
However, we do not need to use an index variable. We can just iterate directly over the contents of the array instead. Although in practice for
is usually used for the C style and foreach
for the Perl style, the two keywords are actually synonyms, and both may be used in either the C and Perl syntaxes. The convention of using each in its allotted place is not enforced by Perl but is generally considered good practice anyway. Here is the foreach
(i.e., Perl-style) version of the preceding loop:
my $element;
foreach $element (@array) {
print $element, "
";
}
Even better, foreach
allows us to declare the loop variable in the loop. This saves a line because no separate declaration is needed. More importantly, it restricts the scope of the variable to the loop, just as with the for
loop earlier. This means that if the variable did not exist beforehand, neither will it after.
foreach my $element (@array) {
print $element,"
";
}
# $element does not exist here
If the loop variable already happens to exist and we don't use my
, Perl localizes the variable when it is used as a loop variable, equivalent to using the local
keyword. When the loop finishes, the old value of the variable is reestablished.
#!/usr/bin/perl
# befaft.pl
use warnings;
$var = 42;
print "Before: $var
";
foreach $var (1..5) {
print "Inside: $var
";
}
print "After: $var
"; # prints '42', not '5'
This localization means that we cannot accidentally overwrite an existing variable, but it also means we cannot return the last value used in a foreach
loop as we would be able to in C. If we need to do so this, we may be better off using a while
loop or a map
. Of course, giving a loop variable the same name as a variable that already exists is confusing, prone to error, and generally a bad idea anyway.
If we really want to index an array by element number, we can still do that with foreach
. A foreach
loop needs a list of values, and we want to iterate from zero to the highest element in the array. So we need to generate a list from zero to the highest element index and supply that to the foreach
loop. We can achieve that easily using the range operator and the $#array
notation to retrieve the highest index:
foreach my $element (0..$#array) {
print "Element $element is $array[$element]
";
}
Using a range is easier to read, but in versions of Perl prior to 5.005 it is less efficient than using a loop variable in a for
(or while
) loop, for the simple reason that the range operator creates a list of all the values between the two ranges. For a range of zero to one hundred million, this involves the creation of a list containing one-hundred-million integers, which requires at least four-hundred-million bytes of storage. Of course, it is unlikely that we are handling an array with one-hundred-million values in the first place. However, the principle holds true, so be wary of creating large temporary arrays when you can avoid them. From Perl 5.005 onwards the range operator has been optimized to return values iteratively (rather like each
) in foreach
loops, so it is now much faster than a loop variable. This can be considered a reason to upgrade as much as a programming point, of course.
If no loop variable is supplied, Perl uses the "default" variable $_
to hold the current loop value.
foreach (@array) {
print "$_
";
}
This is very convenient, especially with functions that default to using $_
if no argument is supplied, like the regular expression operators.
foreach (@array) {
/match_text/ and print "$_ contains a match!
";
}
A final, somewhat unusual, form of the for/foreach
loop inverts the loop to place the body before the for
. This is the same syntax as the inverted if
, but applied to a loop instead. For example:
/match_text/ and print ("$_ contains a match!
") foreach @array;
This syntax can be convenient for very short loop bodies, but it is not really suitable if the foreach
becomes obscured. The preceding example is borderline legible, for example, and a map
or the former version would probably be better.
Using foreach with Multibranched Conditions
We have already mentioned that, when used with switches and multibranched conditions, regular expressions have the particularly useful feature of using $_
when they are not associated with a value. By combining this with a foreach
loop, we can remove the test variable altogether. Without a defined loop variable, foreach
assigns each value that it is given in turn to $_
inside the block that follows it. This means we can rewrite this statement:
SWITCH: {
$value =˜/^1$/ and print("First Place"), last;
$value =˜/^2$/ and print("Second Place"), last;
$value =˜/^3$/ and print("Third Place"), last;
print "Try Again";
}
like this:
SWITCH: foreach ($value) {
/^1$/ and print ("First Place"), last;
/^2$/ and print ("Second Place"), last;
/^3$/ and print ("Third Place"), last;
print "Try Again";
}
Note that the SWITCH
label helps to remind us that this isn't a foreach
loop in the usual sense, but it is not actually necessary.
We have also seen how to return a value from multibranched conditions using a do
block, subroutine, or the ternary operator. However, foreach
also comes in very handy here when used with logical operators:
foreach ($value) {
$message = /^1$/ && "First Place" ||
/^2$/ && "Second Place" ||
/^3$/ && "Third Place" ||
"Try Again";
print "$message
";
}
Here we use a foreach
to alias $value
to $_
, then test with regular expressions. Because $value
is a scalar, not a list, the loop only executes once, but the aliasing still takes place. The shortcut behavior of logical operators will ensure that the first matching expression will return the string attached to the &&
operator. Note that if we were writing more complex cases, parentheses would be in order; for this simple example we don't need them.
This approach works only so long as the resulting values are all true. In this case we are returning one of the strings First Place...Try Again
, so there is no problem. For situations involving zero, an undefined value, or an empty string (all of which evaluate to false), we can make use of the ternary operator to produce a similar effect.
foreach ($value) {
$message = /^1$/? "First Place":
/^2$/? "Second Place":
/^3$/? "Third Place":
"Try Again";
print "$message
";
}
The regular expressions in this example are testing against $_
, which is aliased from $value
by the foreach
.
Variable Aliasing in foreach Loops
If we are iterating over a real array (as opposed to a list of values), then the loop variable is not a copy but a direct alias for the corresponding array element. If we change the value of the loop variable, then we also change the corresponding array element. This can be a source of problems in Perl programs if we don't take this into account, but it can also be very useful. This example uses aliasing to convert a list of strings into a consistent capitalized form.
#!/usr/bin/perl
# capitalize.pl
use warnings;
use strict;
my @array = ("onE", "two", "THREE", "fOUR", "FiVe");
foreach (@array) {
# lc turns the word into lowercase, ucfirst then capitalizes the first letter
$_ = ucfirst lc; # lc uses $_ by default with no argument
}
print join(',', @array);
Sometimes we might want to avoid the aliasing feature and instead modify a copy of the original array. The simplest way to do that is to copy the original array before we start.
foreach (@tmparray = @array) {
$_ =˜tr/a-z/A-Z/;
print;
}
The assignment to a local lexically scoped variable creates a temporary array, which can be modified without affecting the original array. It is also disposed of at the end of the loop.
The while
and until
loops test a condition and continue to execute the loop for as long as that condition holds. The only difference is that for while
the condition holds while it is true, and for until
it holds until it is false (i.e., until it is true). Here is an example of counting from 1 to 10 using a while
loop rather than a for
or foreach
loop:
#!/usr/bin/perl
# count10.pl
use warnings;
use strict;
# count from 1 to 10 (note the post-increment in the condition)
my $n = 0;
while ($n++ < 10) {
print $n, "
";
}
The while
and until
loops are well suited to tasks where we want to repeat an action continuously until a condition that we can have no advance knowledge of occurs, such as reaching the end of a file. The following example shows a while
loop being used to read the contents of a file line by line. When the end of the file is reached, the readline
operator returns false and the loop terminates.
open FILE, "file.txt";
while ($line = <FILE>) {
print $line;
}
close FILE;
If we replace while
with until
, the meaning of the condition is reversed, in the same way that unless
reverses the condition of an if
statement. This makes more sense when the nature of the question asked by the Boolean test implies that we are looking for a "no" answer. The eof
function is a good example; it returns true when there is no more data.
open FILE, "file.txt";
until (eof(FILE)) {
$line = <FILE>;
print $line;
}
Variable Aliasing with while
while
loops do not alias their conditions the way that a foreach
loop does its controlling list, because there is no loop variable to alias with. However, a few Perl functions will alias their values to $_
if placed in the condition of a while
loop. One of them is the readline
operator. This means we can write a loop to read the lines of a file one by one without a loop variable.
open FILE, "file.txt";
while (<FILE>) {
print "$.: $_";
}
Or, more tersely:
print "$.: $_" while <FILE>;
Looping Over Lists and Arrays with while
We can loop over the contents of an array with while
if we don't mind destroying the array as we do it.
while ($element = shift @array) {
print $element, "
";
}
# @array is empty here
On the face of it, this construct does not appear to have any advantage over a more intuitive foreach
loop. In addition, it destroys the array in the process of iterating through it, since the removed elements are discarded. However, it can have some advantages. One performance-related use is to discard large memory-consuming values (like image data) as soon as we have finished with them. This allows Perl to release memory that much faster.
There can also be computational advantages. Assume we have a list of unique strings and we want to discard every entry before a particular "start" entry. This is easy to achieve with a while
loop because we discard each nonmatching string as we test it.
#!/usr/bin/perl
# startexp.pl
use warnings;
use strict;
# define a selection of strings one of which is "start"
my @lines = ("ignored", "data", "start", "the data", "we want");
# discard lines until we see the "start" marker
while (my $line = shift @lines) {
last if $line eq 'start';
}
# print out the remaining elements using interpolation ($")
print "@lines";
Looping on Self-Modifying Arrays
We can use array functions like push, pop, shift
, and unshift
to modify the array even while we are processing it. This lets us create some interesting variations on a standard loop that are otherwise hard to achieve.
As an example, the following program oscillates indefinitely between two values. It works by shifting elements off an array one by one and adding them to the other end after subtracting each value from the highest value in the range, plus 1:
#!/usr/bin/perl
# oscillator.pl
use warnings;
use strict;
my $max = 20;
my @array = (1..$max-1);
while (my $element = shift @array) {
push (@array, $max - $element);
sleep 1; # delay the print for one second to see the output
print '*' x $element, "
"; # multiply single '*' to get a bar of '*'s
}
A slight variation of this program produces a loop that counts from one to a maximum value, then back to one again, and terminates. The principal difference is that the array ranges from one to $max
not one to $max-1
:
#!/usr/bin/perl
# upanddown.pl
use warnings;
use strict;
my $max = 6;
my @array = (1..$max);
while (my $element = shift @array) {
push (@array,$max - $element);
print $element, " : ", join(",", @array), "
";
}
Why should such a trivial difference cause the loop to terminate? This program produces the following output, which shows us why it terminates after passing through the array only twice:
1 : 2,3,4,5,6,5
2 : 3,4,5,6,5,4
3 : 4,5,6,5,4,3
4 : 5,6,5,4,3,2
5 : 6,5,4,3,2,1
6 : 5,4,3,2,1,0
5 : 4,3,2,1,0,1
4 : 3,2,1,0,1,2
3 : 2,1,0,1,2,3
2 : 1,0,1,2,3,4
1 : 0,1,2,3,4,5
We can see from this what is actually going on. The values of the array are each replaced with a value one lower. Since the first array element contained 1, this is reduced to 0. When it comes around for the second time, the result of the shift
is a false value, because 0 is false. So the loop terminates.
These particular examples are chosen for simplicity and could also be implemented using simpler loops, for example, using an increment variable that oscillates between +1 and −1 at each end of the number range. While we have only used an ordered list for clarity, the oscillator will work even if the array does not contain ordered numbers.
Looping Over Hashes with while
We can iterate over a hash with while
instead of foreach
using the each
function, which in a list context returns the next key-value pair in the hash, in the same order that keys
and values
return the keys and values, respectively. When there are no more key-value pairs, each returns undef
, making it suitable for use in the condition of a while
loop.
while (($key, $value) = each(%hash)) {
print "$key => $value
";
}
Using foreach
and keys
or while
and each
for this kind of task is mostly a matter of personal preference. However, foreach
is generally more flexible as it allows sorting keys and aliasing with $_
, neither of which are possible in a while/each
loop. However, while
avoids extracting the entire key list at the start of the loop and is preferable if we intend to quit the loop once a condition is met. This is particularly true if the hash happens to be tied to something that is resource-heavy (in comparison to an in-memory hash) like a DBM database.
Note that a foreach
loop is a much safer option if we want to alter the contents of the array or hash we are iterating over. In particular, the internal iterator that each
uses can get confused if the hash is modified during the course of the loop.
do . . . while and do . . . until
One problem with while
and until
loops is that they test the condition first and execute the loop body only if the test succeeds. This means that if the test fails on the first pass, the loop body is never executed. Sometimes, however, we want to ensure that the body is executed at least once. Fortunately, we can invert while
and until
loops by appending them to a do
block to produce a do...while
or do...until
loop.
do {
$input = <>; #read a line from standard input
print "You typed: $input
";
} while ($input !˜ /^quit/);
The last line can be rewritten to use until
to equal effect.
} until $input =˜ /^quit/;
Note that parentheses around the condition are optional in an inverted while
or until
loop, just as they are in an inverted if
.
Interestingly, this inverted loop structure applies to all the looping statements, even foreach
:
# this works, but is confusing. Don't do it.
do {
print;
} foreach (@array);
However, there is little point in doing this for foreach
, first because it will not work except using $_
, second because the loop body does not execute first as it needs the loop value to proceed, and third because it's just plain confusing. We mention it only because Perl allows it, and it is conceivably possible that we may encounter it in code.
Note that in the inverted form we cannot declare a variable in the conditional expression. We also cannot use loop control statements to control the loop's execution as these are not permitted in a do
block—see "The Trouble with do" later in the chapter.
Ordinarily a loop will execute according to its controlling criteria. Frequently, however, we want to alter the normal flow of execution from within the loop body itself, depending on conditions that arise as the loop body is executed. Perl provides three statements for this, collectively known as loop modifiers: next
, which advances to the next iteration (retesting the loop condition); last
, which immediately exits the loop; and redo
, which restarts the current iteration (without retesting the loop condition).
The next
statement forces the loop immediately onto the next iteration, skipping any remaining code in the loop body but executing the continue
block if it is present. It is most often used when all the tasks necessary for a given iteration have been completed or the loop variable value for the current iteration is not applicable.
The following code snippet reads configuration parameters from the user, consisting of lines of name = value
pairs. It uses next
to skip past empty lines, comments (lines beginning with a #
), and lines without an equals sign.
#!/usr/bin/perl
# config.pl
use warnings;
use strict;
my %config = ();
while (<>) {
chomp; #strip linefeed
next if /^s*$/; #skip to the next iteration on empty lines
next if /^s*#/; #skip to the next iteration on comments
my ($param, $value) = split("=", $_, 2); #split on first '='
unless ($value) {
print ("No value for parameter '$_'
");
next;
}
$config{$param} = $value;
}
foreach (sort keys %config) {
print "$_ => $config{$_}
";
}
The last
statement forces a loop to exit immediately, as if the loop had naturally reached its last iteration. A last
is most often used when the task for which the loop was written has been completed, such as searching for a given value in an array—once found, no further processing is necessary. It can also be used in foreach
loops pressed into service as multibranch conditions as we saw earlier. Here is a more conventional use of last
that copies elements from one array to another until it hits an undefined element or reaches the end of the source array:
#!/usr/bin/perl
# last.pl
use warnings;
my @array = ("One", "Two", "Three", undef, "Five", "Six");
#copy array up to the first undefined element
my @newarray = ();
foreach my $element (@array) {
last unless defined ($element);
push @newarray, $element;
}
foreach (@newarray) {
print $_."
"; # prints One, Two, Three
}
The redo
statement forces the loop to execute the current iteration over again. At first sight this appears similar to next
. The distinction is that with redo
the loop condition is not retested, and the continue
block, if present, is not executed. In the case of a foreach
loop, that means that the loop variable retains the value of the current loop rather than advances to the next. In the case of a while
or until
loop, the code in the conditional clause is not reexecuted, and any functions in it are not called. A redo
is most often used when more than one iteration may be needed before the main body of a loop can be executed, for example, reading files with multiple-line statements.
#!/usr/bin/perl
# backslash.pl
use warnings;
use strict;
my @lines = ();
while (<>) {
chomp;
if (s/\$//) { #check for and remove a trailing backslash character
my $line = <>;
$_.= $line, redo; # goes to the 'chomp' above
}
push @lines, $_;
}
foreach (0..$#lines) {
print "$_ : $lines[$_]
";
}
In this example, the while
statement reads a line of input with <>
and aliases it to $_
. The chomp
removes the trailing newline, and the remainder of the line is checked for a trailing backslash. If one is found, another line is read and appended to $_
.
Inside the if
statement, the redo
is called to pass execution back up to the chomp
statement. Because redo
does not reexecute the while
statement, the value of $_
is not overridden, and the chomp
is performed on the value of $_
that was assigned inside the if
statement. This process continues so long as we continue to enter lines ending with a backslash.
All of the loop control statements next, last
, and redo
can be used in any kind of loop (for, foreach, while, until
). One exception to this is the do...while
and do...until
loops. This is because loops built around do
blocks do not behave quite the way we expect, as we will see shortly.
The continue Clause
All of Perl's loops can accept an additional continue
clause. Code placed into the block of a continue
clause is executed after the main body of the loop. Ordinarily this has no different effect from just adding the code to the end of the main loop, unless the loop body contains a next
statement, in which case the continue
block is executed before returning to the top of the loop. This makes a continue
block a suitable place to increment a loop variable.
my $n = 0;
while ($n < 10) {
next if ($n % 2);
print $n, "
";
} continue {
# 'next' comes here
$n++;
}
# 'last' comes here
Note, however, that a last
statement will not execute the continue
block before exiting the loop. Similarly, redo
will not execute the continue
block because it reexecutes the loop body on the same iteration, rather than continuing to the next.
There are actually few, if any, instances where a continue
block is actually necessary, since most loops with a continue
clause can be easily rewritten to avoid one. As we mentioned earlier, the continue
clause is actually an explicit way to write the third part of a for
loop, which deals with next, last
, and redo
in the same way as the while...continue
loop earlier.
Controlling Nested Loops
So far we have just seen how to use loop control statements to affect the execution of the current loop. However, the next, last
, and redo
statements all accept an optional loop label as an argument. This allows us to jump to the start or end of an outer loop, so long as that loop has a name. To give a loop a name, we just prefix it with a label.
my @lines = ();
LINE: foreach (<>) {
chomp;
next LINE if /^$/; #skip blank lines
push @lines, $_;
}
Even in a simple loop this can enable us to write more legible code. Since the label indicates the purpose of the loop and of the control statements inside it, next LINE
literally means "do the next line." However, if we have two nested loops, labeling the outer loop allows us to jump to the next iteration of the outer loop using next
.
OUTER: foreach my $outer (@array) {
INNER: foreach my $inner (@{$outer}) {
next OUTER unless defined $inner;
}
# 'last' or 'last INNER' would come here
}
This is very similar to using a last
statement, except that it will jump to the top of the outer loop rather than the end of the inner loop. If the outer loop contains more code after the inner loop, next
will avoid it while last
will execute it.
Similarly, we can use last
to exit both loops simultaneously. This is a much more efficient way to exit nested loops than exiting each loop individually.
LINE: foreach my $line (<>) {
chomp;
ITEM: foreach (split /, /, $line) {
last LINE if /^_END_/; #abort both loops on token
next LINE if /^_NEXT_/; #skip remaining items on token
next ITEM if /^s*$/; #skip empty columns
#process item
print "Got: $_
";
}
}
Only the outer loop actually needs to be labeled, so loop control statements can apply themselves to the outer loop and not to the inner loop.
Perl allows labels to be defined multiple times. When a label is used, the label definition that is closest in scope is taken to be the target. For loop control statements, the first matching loop label in the stack of loops surrounding the statement is used. In general, we do not expect to be giving two loops the same name if one is inside the other, so it is always clear which label a loop control statement is referring to. Reusing labels is also handy for switch
-style conditional statements and any other constructs where we want to make the purpose of the construct clear.
Strangely, we can jump to a loop label of an outer loop, even if there is a subroutine call in the way. This is really a very bad idea and is almost certainly not intended, so Perl will warn us if we do it inadvertently.
Exiting subroutine via next at ...
Tip Although we would not expect to do this normally, it is possible to mistype the name of a label, especially if we copy and paste carelessly.
The Trouble with do
The fact that loop modifiers do not work in do...while
, or do...until
loops may seem strange. The reason for this is slightly obscure, but it comes about because unlike a normal while
or until
loop, the while
and until
conditions in a do...while
or do...until
loop are considered modifiers, which modify the behavior of the do
block immediately before them. The do
block is not considered to be a loop, so loop control statements do not work in them.
It is possible, though not terribly elegant, to get a next
statement to work in a do...while
loop through the addition of an extra bare block inside the do
block, as in this example:
#!/usr/bin/perl
# even.pl
use warnings;
use strict;
# print out even numbers with a do...while loop
my $n = 0;
do { {
next if ($n % 2);
print $n, "
";
} } while ($n++ < 10);
Unfortunately while this works for next
, it does not work for last
, because both next
and last
operate within the bounds of the inner block. All last
does in this case is take us to the end of the inner block, where the while
condition is still in effect. In addition, this is ugly and nonintuitive code. The better solution at this point is to find a way to rephrase this code as a normal while, until
, or foreach
loop and avoid the whole problem.
$n = 0;
while (++$n <= 10) {
next if ($n % 2);
print $n, "
";
}
The goto
statement has two basic modes of operation. The simpler and more standard use allows execution to jump to an arbitrary labeled point in the code, just as in C and many other languages.
($lines, $empty, $comment, $code) = (0, 0, 0, 0);
while (<>) {
/^$/ and $empty++, goto CONTINUE;
/^#/ and $comment++, goto CONTINUE;
$code++, goto CONTINUE;
CONTINUE:
$lines++;
}
There are few, if any, reasons to use a goto
with a label. In this case, we would be better off replacing goto
with next
statements and putting the continue
code into a continue
block.
while (<>) {
/^$/ and $empty++, next;
/^#/ and $comment++, next;
$code++;
} continue {
$lines++;
}
A goto
statement can also take an expression as its argument. The result of the expression should be a label that execution can jump to. This gives us another, albeit rather ugly, way to write a compound switch
statement.
$selection = int(3*rand); # random integer between 0 and 2
@selections = ("ZERO", "ONE", "TWO");
goto $selections[$selection];
{ ZERO:
print "None";
next;
ONE:
print "One";
next;
TWO:
print "Two";
next;
}
print "...done
";
Again, there are better ways to write compound statements. We covered these earlier, so we should not have to resort to goto
here.
The second and more interesting use of goto
is to call subroutines. When used in this context, the new subroutine entirely replaces the context of the calling one, so that on return from the second subroutine, execution is returned directly to the caller of the first subroutine. The primary use of this form is in autoloaded functions, which will be covered in Chapter 10.
It can also be used for so-called tau-recursion. This is where a subroutine can call itself recursively many times without causing Perl to create an ever-growing stack of subroutine calls. The final call returns directly to the original caller instead of returning a value through all of the intermediate subroutine calls. We will cover this in Chapter 7.
The map
and grep
functions convert one list into another, applying a transform or condition to each element of the source list in turn. If the goal of a foreach
or while
loop is to generate a new list, we might be able to do the job better using map
or grep
. The syntax of map
(and grep
) takes one of two equivalent forms:
map EXPRESSION, LIST grep EXPRESSION, LIST
map BLOCK LIST grep BLOCK LIST
In each case the EXPRESSION
or BLOCK
is executed for each value of LIST
, with the results returned as a new list.
The purpose of map
is to convert the elements of a list one by one and to produce a new list as a result. The expression or block performs the conversion, so map
is conceptually related to a foreach
loop. Similarly, the purpose of grep
is to return a list containing a subset of the original list. The expression or block is evaluated to a true or false value to see if the element is eligible for inclusion, so grep
is conceptually related to a while
loop. Both functions perform aliasing to $_
in the same way that foreach
does.
map
To illustrate how map
works, here is an example. Assume that we have a list of integers representing ASCII values, and we want to turn it into a list of character strings. We can do that with a foreach
loop with a loop like this:
my @numbers = (80, 101, 114, 108);
my @characters;
foreach (@numbers) {
push @characters, chr $_;
}
print @characters;
With map
we can replace the loop with
my @characters = map (chr $_, @numbers);
Or, using the block syntax:
my @characters = map {chr $_} @numbers;
Even better, we can feed the list returned by the map into a join, then print the result in a single operation.
print join ('-', map {chr $_} @numbers); # displays 'P-e-r-l'
Another common use for map
is to construct a hash map of values to quickly determine if a given value exists or not. With an array we would have compare each element in turn.
#!/usr/bin/perl -w
# insequencemap.pl
use strict;
my @sequence=(1,1,2,3,5,8,13,21,34,55,89);
my %insequence=map { $_ => 1 } @sequence;
my $number=<>;
print "$number is ",($insequence{$number}?"":"NOT"),"in sequence
";
Unlike foreach, map
cannot choose to use an explicit loop variable and must use $_
within the block or expression. It also cannot make use of loop control variables, for the same reasons that a do
block cannot, as we saw earlier.
In void context, the return value of map
is discarded. Prior to Perl 5.8, this was perfectly functional code but not very efficient; the list of values to return would be constructed and then discarded again. From version 5.8.1 onwards Perl is smart enough to notice when it is asked to evaluate a map whose return value is not used and will optimize the map to avoid the redundant work. The following two statements are therefore equivalent and equally efficient.
ucfirst foreach @words; # foreach style self-modify list
map { ucfirst } @words; # map style self-modify list
From 5.8.4, assigning a map to a scalar value (which counts the number of values present) will also be optimized.
my $howmany = map { ucfirst } @words; # more efficient in Perl >= 5.8.4
Void context aside, map
is usually used to convert one list into another of equal length, with one new value in the result for each value in the input. However, it is possible to persuade map
not to return a new value by having it return ()
.
print map { ($_>110) ? () : chr($_) } @numbers; # displays 'Pel'
If the only object of the map is to selectively return values, however, we should really be using grep
.
grep
The grep
function gets its name from the Unix grep
command, which scans text files and returns lines from them that match a given regular expression search pattern. The Perl grep
function is similar in concept in that it returns a list containing a subset of the original list, though it does not directly have anything to do with regular expressions.
The syntax of grep
is identical to map
, but while the expression or block in a map
statement is used to transform each value in a list, the corresponding expression or block in a grep
statement is evaluated as a condition to determine if the value should be included in the returned list.
For example, the following while
loop reads a list of lines from standard input and builds up a list of the lines that started with a digit:
my @numerics = ();
while (<>) {
push @numerics, $_ if /^d/;
}
print "@numerics
";
We can simplify the preceding to just
@numerics = grep {/^d/} <>;
Here we have used a regular expression as the condition, in keeping with the spirit of the Unix grep
command, which works on a similar basis. However, since grep
accepts any expression or block to test the condition, we can use any kind of condition we like.
Just because grep
tests each value rather than manipulating it does not mean that it has to leave the value untouched. Just as map
can be made to act like grep
by returning (), grep
can be made to act like map
by assigning a new value to $_
. However, doing this alters the original list. This is fine if we intend to discard the original list, but it can lead to problematic code if we forget. Here is an example where the source list is generated by reading from standard input. We can't subsequently access this list from anywhere else, so there is no risk of making a mistake.
@numerics = grep { s/^(d+)/Line $1:/ } <>;
This example assigns the result of the substitution value to $_
for each matching line. The return value of the substitution is true if a match was made and false otherwise, so only the lines that were transformed are returned by the grep
.
Just like map, grep
can be used in a void context to change an original array without creating a new list of values. Also like map
, from version 5.8.1 Perl will optimize such a grep
so that no output values are generated to then get immediately discarded.
Chaining map and grep Functions Together
Both map
and grep
take lists as input and produce lists as output, so we can chain them together. The following example again reads a list of lines from standard input, and returns a list of all lines that were exactly five characters long (including the terminating linefeed), with each line lowercased and the first character capitalized (assuming it can be). Both ucfirst
and lc
will use $_
if given no explicit argument. We can write
@numerics = map { ucfirst } map { lc } grep { length($_)==5 } <>;
A chain like this can be a powerful way to quickly and concisely manipulate a list through several different stages, more so when the bodies are more complex (e.g., call subroutines) than the simplistic example given here. The drawback is that to make sense of the code we have to read it from back to front, which is a little counterintuitive.
This example also illustrates a typical situation where the block syntax of map
and grep
is much clearer than the expression syntax, which would require three sets of nested parentheses.
We started this chapter by exploring the basic structures of Perl. We covered statements, declarations, expressions, and blocks. We then looked in particular at the properties and facilities provided by blocks and the various ways in which they can be expressed and used. In particular, treating blocks as loops, defining do
blocks, and working with BEGIN
and END
blocks are all discussed.
We covered Perl's conditional statements, if, else, elsif
, and unless
. After a short discussion on the nature of truth, we also looked in detail at how to create loops with for
and foreach, while, until, do, do..while
, and do...until
and how to control loops with next, last, redo
, and continue
.
The chapter ended with a short discussion of the uses and disadvantages of goto
, followed by a look at the map
and grep
functions, which turn a list of input values into a new list. We saw how to use both map
and grep
to implement code that both transforms and selectively removes values from the input list, plus how to chain multiple map
and grep
expressions together to achieve more complex kinds of list manipulation.