We've touched on a few idioms in the preceding sections (not to mention the preceding chapters), but there are many other idioms you'll commonly see if you read programs by accomplished Perl programmers. When we speak of idiomatic Perl in this context, we don't just mean a set of arbitrary Perl expressions with fossilized meanings. Rather, we mean Perl code that shows an understanding of the flow of the language, what you can get away with when, and what that buys you. And when to buy it.
We can't hope to list all the idioms you might see--that would take a book as big as this one. Maybe two. (See the Perl Cookbook, for instance.) But here are some of the important idioms, where "important" might be defined as "that which induces hissy fits in people who think they already know just how computer languages ought to work".
Use =>
in place of a comma anywhere
you think it improves readability:
return bless $mess => $class;
This reads, "Bless this mess into the specified class." Just be careful not to use it after a word that you don't want autoquoted:
sub foo () { "FOO" } sub bar () { "BAR" } print foo => bar; # prints fooBAR, not FOOBAR;
Another good place to use
=>
is near a literal comma that might get
confused visually:
join(", " => @array);
Perl provides you with more than one way to do things so that you can exercise your ability to be creative. Exercise it!
Use the singular pronoun to increase readability:
for (@lines) { $_ .= " "; }
The $_
variable is
Perl's version of a pronoun, and it essentially means "it". So the
code above means "for each line, append a newline to
it." Nowadays you might even spell
that:
$_ .= " " for @lines;
The $_
pronoun is so
important to Perl that its use is mandatory in
grep
and map
. Here is one
way to set up a cache of common results of an expensive
function:
%cache = map { $_ => expensive($_) } @common_args; $xval = $cache{$x} || expensive($x);
Omit the pronoun to increase readability even further.[1]
Use loop controls with statement modifiers.
while (<>) { next if /^=fors+(index|later)/; $chars += length; $words += split; $lines += y/ //; }
This is a fragment of code we used to do page counts for this book. When you're going to be doing a lot of work with the same variable, it's often more readable to leave out the pronouns entirely, contrary to common belief.
The fragment also demonstrates the idiomatic use of
next
with a statement modifier to short-circuit
a loop.
The $_
variable is always the loop
control variable in grep
and
map
, but the program's reference to it is often
implicit:
@haslen = grep { length } @random;
Here we take a list of random scalars and
only pick the ones that have a length greater than
0
.
Use for
to set the antecedent for a
pronoun:
for ($episode) { s/fred/barney/g; s/wilma/betty/g; s/pebbles/bambam/g; }
So what if there's only one element in the loop? It's a
convenient way to set up "it", that is, $_
.
Linguistically, this is known as topicalization. It's not
cheating, it's communicating.
Implicitly reference the plural pronoun,
@_
.
Use control flow operators to set defaults:
sub bark { my Dog $spot = shift; my $quality = shift || "yapping"; my $quantity = shift || "nonstop"; … }
Here we're implicitly using the other Perl pronoun,
@_
, which means "them". The arguments to a
function always come in as "them". The shift
operator knows to operate on @_
if you omit it,
just as the ride operator at Disneyland might call out "Next!"
without specifying which queue is supposed to shift. (There's no
point in specifying, because there's only one queue that
matters.)
The ||
can be used to set defaults
despite its origins as a Boolean operator, since Perl returns the
first true value. Perl programmers often manifest a cavalier
attitude toward the truth; the line above would break if, for
instance, you tried to specify a quantity of 0. But as long as you
never want to set either $quality
or
$quantity
to a false value, the idiom works
great. There's no point in getting all superstitious and throwing
in calls to defined
and
exists
all over the place. You just have to
understand what it's doing. As long as it won't accidentally be
false, you're fine.
Use assignment forms of operators, including control flow operators:
$xval = $cache{$x} ||= expensive($x);
Here we don't initialize our cache at all. We just rely on
the ||=
operator to call
expensive($x)
and assign it to
$cache{$x}
only if
$cache{$x}
is false. The result of that is
whatever the new value of $cache{$x}
is. Again,
we take the cavalier approach towards truth, in that if we cache a
false value, expensive($x)
will get called
again. Maybe the programmer knows that's okay, because
expensive($x)
isn't expensive when it returns
false. Or maybe the programmer knows that
expensive($x)
never returns a false value at
all. Or maybe the programmer is just being sloppy. Sloppiness can
be construed as a form of creativity.
Use loop controls as operators, not just as statements. And...
Use commas like small semicolons:
while (<>) { $comments++, next if /^#/; $blank++, next if /^s*$/; last if /^__END__/; $code++; } print "comment = $comments blank = $blank code = $code ";
This shows an understanding that statement modifiers modify
statements, while next
is a mere operator. It
also shows the comma being idiomatically used to separate
expressions much like you'd ordinarily use a semicolon. (The
difference being that the comma keeps the two expressions as part
of the same statement, under the control of the single statement
modifier.)
Use flow control to your advantage:
while (<>) { /^#/ and $comments++, next; /^s*$/ and $blank++, next; /^__END__/ and last; $code++; } print "comment = $comments blank = $blank code = $code ";
Here's the exact same loop again, only this time with the
patterns out in front. The perspicacious Perl programmer
understands that it compiles down to exactly the same internal
codes as the previous example. The if
modifier
is just a backward and
(or
&&
) conjunction, and the
unless
modifier is just a backward
or
(or ||
)
conjunction.
Use the implicit loops provided by the
-n
and -p
switches.
Don't put semicolon at the end of a one-line block:
#!/usr/bin/perl -n $comments++, next LINE if /#/; $blank++, next LINE if /^s*$/; last LINE if /^__END__/; $code++; END { print "comment = $comments blank = $blank code = $code " }
This is essentially the same program as before. We put an
explicit LINE
label on the loop control
operators because we felt like it, but we didn't really need to,
since the implicit LINE
loop supplied by
-n
is the innermost enclosing loop. We used
an END
to get the final print statement outside
the implicit main loop, just as in
awk.
Use here docs when the printing gets ferocious.
Use a meaningful delimiter on the here doc:
END { print <<"COUNTS" } comment = $comments blank = $blank code = $code COUNTS
Rather than using multiple prints, the fluent Perl
programmer uses a multiline string with interpolation. And despite
our calling it a Common Goof earlier, we've brazenly left off the
trailing semicolon because it's not necessary at the end of the
END
block. (If we ever turn it into a multiline
block, we'll put the semicolon back in.)
Do substitutions and translations en passant on a scalar:
($new = $old) =~ s/bad/good/g;
Since lvalues are lvaluable, so to speak, you'll often see people changing a value "in passing" while it's being assigned. This could actually save a string copy internally (if we ever get around to implementing the optimization):
chomp($answer = <STDIN>);
Any function that modifies an argument in place can do the en passant trick. But wait, there's more!
Don't limit yourself to changing scalars en passant:
for (@new = @old) { s/bad/good/g }
Here we copy @old
into
@new
, changing everything in passing (not all
at once, of course—the block is executed repeatedly, one "it" at a
time).
Pass named parameters using the fancy
=>
comma operator.
Rely on assignment to a hash to do even/odd argument processing:
sub bark { my DOG $spot = shift; my %parm = @_; my $quality = $parm{QUALITY} || "yapping"; my $quantity = $parm{QUANTITY} || "nonstop"; … } $fido->bark( QUANTITY => "once", QUALITY => "woof" );
Named parameters are often an affordable luxury. And with Perl, you get them for free, if you don't count the cost of the hash assignment.
Repeat Boolean expressions until false.
Use minimal matching when appropriate.
Use the /e
modifier to evaluate a
replacement expression:
#!/usr/bin/perl -p 1 while s/^(.*?)( +)/$1 . ' ' x (length($2) * 4 - length($1) % 4)/e;
This program fixes any file you receive from someone who
mistakenly thinks they can redefine hardware tabs to occupy 4
spaces instead of 8. It makes use of several important idioms.
First, the 1 while
idiom is handy when all the
work you want to do in the loop is actually done by the
conditional. (Perl is smart enough not to warn you that you're
using 1
in a void context.) We have to repeat
this substitution because each time we substitute some number of
spaces in for tabs, we have to recalculate the column position of
the next tab from the beginning.
The (.*?)
matches the smallest string it
can up until the first tab, using the minimal matching modifier
(the question mark). In this case, we could have used an ordinary
greedy *
like this:
([^ ]*)
. But that only works because a tab is
a single character, so we can use a negated character class to
avoid running past the first tab. In general, the minimal matcher
is much more elegant, and doesn't break if the next thing that
must match happens to be longer than one character.
The /e
modifier does a substitution using
an expression rather than a mere string. This lets us do the
calculations we need right when we need them.
Use creative formatting and comments on complex substitutions:
#!/usr/bin/perl -p 1 while s{ ^ # anchor to beginning ( # start first subgroup .*? # match minimal number of characters ) # end first subgroup ( # start second subgroup + # match one or more tabs ) # end second subgroup } { my $spacelen = length($2) * 4; # account for full tabs $spacelen -= length($1) % 4; # account for the uneven tab $1 . ' ' x $spacelen; # make correct number of spaces }ex;
This is probably overkill, but some people find it more impressive than the previous one-liner. Go figure.
Go ahead and use $`
if you feel like
it:
1 while s/( +)/' ' x (length($1) * 4 - length($`) % 4)/e;
Here's the shorter version, which uses
$`
, which is known to impact performance.
Except that we're only using the length of it, so it doesn't
really count as bad.
Use the offsets directly from the @-
(@LAST_MATCH_START
) and @+
(@LAST_MATCH_END
) arrays:
1 while s/ +/' ' x (($+[0] - $-[0]) * 4 - $-[0] % 4)/e;
This one's even shorter. (If you don't see any arrays there,
try looking for array elements instead.) See @-
and @+
in Chapter
28.
Use eval
with a constant return
value:
sub is_valid_pattern { my $pat = shift; return eval { "" =~ /$pat/; 1 } || 0; }
You don't have to use the eval {}
operator to return a real value. Here we always return
1
if it gets to the end. However, if the
pattern contained in $pat
blows up, the
eval
catches it and returns
undef
to the Boolean conditional of the
||
operator, which turns it into a defined
0
(just to be polite, since
undef
is also false but might lead someone to
believe that the is_valid_pattern
subroutine is
misbehaving, and we wouldn't want that, now would we?).
Use modules to do all the dirty work.
Use object factories.
Use callbacks.
Use stacks to keep track of context.
Use negative subscripts to access the end of an array or string:
use XML::Parser; $p = new XML::Parser Style => 'subs'; setHandlers $p Char => sub { $out[-1] .= $_[1] }; push @out, ""; sub literal { $out[-1] .= "C<"; push @out, ""; }sub literal_ { my $text = pop @out; $out[-1] .= $text . ">"; } …
This is a snippet from the 250-line program we used to translate the XML version of the old Camel book back into pod format so we could edit it for this edition with a Real Text Editor.
The first thing you'll notice is that we rely on the
XML::Parser
module (from CPAN) to parse our XML
correctly, so we don't have to figure out how. That cuts a few
thousand lines out of our program right there (presuming we're
reimplementing in Perl everything XML::Parser
does for us,[2] including translation from almost any character set
into UTF-8).
XML::Parser
uses a high-level idiom
called an object factory. In this case, it's
a parser factory. When we create an XML::Parser
object, we tell it which style of parser interface we want, and it
creates one for us. This is an excellent way to build a testbed
application when you're not sure which kind of interface will turn
out to be the best in the long run. The subs
style is just one of XML::Parser
's interfaces.
In fact, it's one of the oldest interfaces, and probably not even
the most popular one these days.
The setHandlers
line shows a method call
on the parser, not in arrow notation, but in "indirect object"
notation, which lets you omit the parens on the arguments, among
other things. The line also uses the named parameter idiom we saw
earlier.
The line also shows another powerful concept, the notion of
a callback. Instead of us calling the parser to get the next item,
we tell it to call us. For named XML tags like
<literal>
, this interface style will
automatically call a subroutine of that name (or the name with an
underline on the end for the corresponding end tag). But the data
between tags doesn't have a name, so we set up a
Char
callback with the
setHandlers
method.
Next we initialize the @out
array, which
is a stack of outputs. We put a null string into it to represent
that we haven't collected any text at the current tag embedding
level (0 initially).
Now is when that callback comes back in. Whenever we see
text, it automatically gets appended to the final element of the
array, via the $out[-1]
idiom in the callback.
At the outer tag level, $out[-1]
is the same as
$out[0]
, so $out[0]
ends up
with our whole output. (Eventually. But first we have to deal with
tags.)
Suppose we see a <literal>
tag.
Then the literal
subroutine gets called,
appends some text to the current output, then pushes a new context
onto the @out
stack. Now any text up until the
closing tag gets appended to that new end of the stack. When we
hit the closing tag, we pop the $text
we've
collected back off the @out
stack, and append
the rest of the transmogrified data to the new (that is, the old)
end of stack, the result of which is to translate the XML string,
<literal>
text
</literal>
,
into the corresponding pod string,
C<
text
>
.
The subroutines for the other tags are just the same, only different.
Use my
without assignment to create an
empty array or hash.
Split the default string on whitespace.
Assign to lists of variables to collect however many you want.
Use autovivification of undefined references to create them.
Autoincrement undefined array and hash elements to create them.
Use autoincrement of a %seen
hash to
determine uniqueness.
Assign to a handy my
temporary in the
conditional.
Use the autoquoting behavior of braces.
Use an alternate quoting mechanism to interpolate double quotes.
Use the ?
: operator to switch between two
arguments to a printf
.
Line up printf
args with their
%
field:
my %seen; while (<>) { my ($a, $b, $c, $d) = split; print unless $seen{$a}{$b}{$c}{$d}++; } if (my $tmp = $seen{fee}{fie}{foe}{foo}) { printf qq(Saw "fee fie foe foo" [sic] %d time%s. "), $tmp, $tmp == 1 ? "" : "s"; }
These nine lines are just chock full of idioms. The first
line makes an empty hash because we don't assign anything to it.
We iterate over input lines setting "it", that is,
$_
, implicitly, then using an argumentless
split
which splits "it" on whitespace. Then we
pick off the four first words with a list assignment, throwing any
subsequent words away. Then we remember the first four words in a
four-dimensional hash, which automatically creates (if necessary)
the first three reference elements and final count element for the
autoincrement to increment. (Under use
warnings
, the autoincrement will never warn that you're
using undefined values, because autoincrement is an accepted way
to define undefined values.) We then print out the line if we've
never seen a line starting with these four words before, because
the autoincrement is a postincrement, which, in addition to
incrementing the hash value, will return the old true value if
there was one.
After the loop, we test %seen
again to
see if a particular combination of four words was seen. We make
use of the fact that we can put a literal identifier into braces
and it will be autoquoted. Otherwise, we'd have to say
$seen{"fee"}{"fie"}{"foe"}{"foo"}
, which is a
drag even when you're not running from a giant.
We assign the result of
$seen{fee}{fie}{foe}{foo}
to a temporary
variable even before testing it in the Boolean context provided by
the if
. Because assignment returns its left
value, we can still test the value to see if it was true. The
my
tells your eye that it's a new variable, and
we're not testing for equality but doing an assignment. It would
also work fine without the my
, and an expert
Perl programmer would still immediately notice that we used one
=
instead of two ==
. (A
semiskilled Perl programmer might be fooled, however. Pascal
programmers of any skill level will foam at the mouth.)
Moving on to the printf
statement, you
can see the qq()
form of double quotes we used
so that we could interpolate ordinary double quotes as well as a
newline. We could've directly interpolated $tmp
there as well, since it's effectively a double-quoted string, but
we chose to do further interpolation via
printf
. Our temporary $tmp
variable is now quite handy, particularly since we don't just want
to interpolate it, but also test it in the conditional of a
?
: operator to see whether we should pluralize
the word "time". Finally, note that we lined up the two fields
with their corresponding %
markers in the
printf
format. If an argument is too long to
fit, you can always go to the next line for the next argument,
though we didn't have to in this case.
Whew! Had enough? There are many more idioms we could discuss, but this book is already sufficiently heavy. But we'd like to talk about one more idiomatic use of Perl, the writing of program generators.