4.1. Syntactical Sugaring

Perl has an unjustified reputation as a write-only language. We suspect this is because programmers who have never seen it before think they should be able to understand what's going on the first time they lay eyes on a statement the way they could with a BASIC program. But Perl provides many concise idioms and syntactic shortcuts that make programs much shorter than they would have been, and all you need is enough familiarity with the idioms.

4.1.1. The Variable That Wasn't There

A frequent source of bafflement for newcomers is the way that Perl programs seem to be talking to themselves:

while (<STDIN>)
   {
   chomp;
   s/#.*//;
   next unless /S/;
   # ...
   }

This apparently defies the usual paradigm of functions operating on variables. What on earth is being operated on?

The answer is the special variable $_, which is used mainly in its absence. Because it is a default variable for the chomp, s///, and m// functions, and also gets set to the next input line when a readline operator (<>) is the only thing inside a while clause, that code fragment (which appears to be parsing some data format that can contain embedded comments) can do all kinds of things with and to $_ without ever having to mention it.

There are plenty of times you do have to type $_ , of course. If the previous code needed to add the current line to an array, it would go on to say push @commands, $_; since push doesn't take a default second argument. As a rule of thumb, the number of implicit references to $_ should make the explicit references worthwhile; if you have too many explicit references to $_, then you should probably make them all explicit references to a variable with a useful name. Otherwise you're increasing confusion without having saved enough typing to make it worth it.

If you have many explicit references to the same instance of $_, use a named variable instead.


There are also times when it's feasible to use $_ but not worth the trouble. For instance, if you're nesting one loop that uses $_ as the loop variable around another, such as:

my @list = qw(alpha);
$_ = "aleph";
foreach (@list)
   {
   while (<STDIN>)
      {
      # do something
      }
   }

When the smoke clears from feeding values to the loop reading lines from STDIN, what value will $_ have? If you answered "aleph", reasoning that $_ should have been saved while you were going around the loops, congratulations, you're right. But before you head off to celebrate, take a look at the contents of @list. It still contains one element, but its value is now undef instead of "alpha". Oops.

The reason is that the foreach loop aliased $_ to each element of its list in turn. Therefore anything done to $_ would get done to that array element.[1] And while the foreach loop saved (or localized) its loop variable of $_, the while loop didn't. So when it read a line from STDIN, it did so into the same $_ in use by foreach, which overwrote the array element.

[1] Because it's writable. Trying to change an aliased list element that isn't an lvalue triggers the error “Modification of a read-only value attempted.”

Yuck! What can we do about this? Is the solution to say that you shouldn't nest one loop that uses $_ as the loop variable inside another? Consider the following:

sub proc
   {
   while (<STDIN>)
      {
      # do something
      }
   }

Now do we have to check every place that calls proc() and its ancestors to see if it's inside a foreach loop? Hardly.

What we can do is remember that while is the only loop statement that doesn't localize $_, so we just need to do it ourselves whenever we construct a while statement that might end up in a context where $_ is already in use:

sub proc
   {
   local $_;
   while (<STDIN>)
      {
      # do something
      }
   }

Now you can safely call proc() from within a foreach loop to your heart's content.

When using a while loop that sets $_, localize $_ first if it might be called from elsewhere.


Note that when we say localize, this really does call for the local operator, which we otherwise prefer to avoid. Reason: you can't make a lexical $_ with my, because it's a special pseudo-global variable that just won't stand for possessiveness.

4.1.2. The Case of the Vanishing Parentheses

Unlike practically every other procedural language in existence, when you call a function or procedure in Perl (both of which are just “subroutines” in Perl parlance), it is not always necessary to put its arguments inside parentheses. If Perl can tell what the arguments are without the parentheses, you don't have to put them in. This makes for more readable code, not to mention less typing; for instance

print join ' ', reverse split /:/;

is arguably more aesthetic than

print(join(' ', reverse(split(/:/))));

and stops you from feeling like you're typing LISP.

On the other hand, be careful not to leave them out when they're necessary:

my %userdb = (pjscott => "Peter Scott",
              ewright => "Ed Wright");
print "Usernames: ", join " ", sort keys %userdb, "
";

results in

Usernames:
 ewright pjscott

with no newline on the end. This happens because the prototype for sort is

sort optional_subroutine_or_block LIST

and so how is Perl to know where the LIST ends? As long as there are terms that look like list elements, they'll get included, and that means the " " is sorted along with the hash keys. So you'll actually have to bend your fingers over to the parenthesis keys in this case and type

print "Usernames: ", join " ", sort (keys %userdb), "
";

In fact, there's more than one way to do this, and it's instructive to look at some of the other possibilities:

print "Usernames: ", join " ", (sort keys %userdb), "
";
print "Usernames: ", join (" ", sort keys %userdb), "
";
print "Usernames: ", (join " ", sort keys %userdb), "
";

We're using two different ways of limiting the scope of the sort arguments here. One is to put the arguments to a function in parentheses; the other is to put the function name along with its arguments in parentheses. We get the same result by two different means: When the function arguments are contained in parentheses, Perl exercises a parsing rule called “if-it-looks-like-a-function-then-it-is-a-function,” and the parentheses delineate the function argument list. When the function and its arguments are contained within parentheses, the parentheses create a term, keeping the sort from consuming any arguments outside the term (see Table 4-1 in section 4.2).

What about this:

print ("Usernames: ", join " ", sort keys %userdb), "
";

If you try this without the -w flag, you'll see that the newline isn't printed. If you put the -w flag in, you find out why:

print (...) interpreted as function at line 1.
Useless use of a constant in void context at line 1.

The reason for this is apparent from the rules we've already stated: Perl is exercising the if-it-looks-like-a-function-then-it-is-a-function rule, which means that the arguments to the print function are contained within the parentheses, and what follows is a comma operator and a constant. Since that constant isn't being assigned or passed to anything, it serves no purpose, and Perl warns us that we're putting a constant in a place where it does nothing.[2]

[2] If you're wondering why Perl doesn't raise a similar objection to the “1” that you should put at the end of a module that gets used or required, the answer is that it is at the end of a notional block—the file itself—and therefore gets parsed as a return value.

If you find yourself confronting that warning but you really did mean what you said and you want the warning to go away, you can put a + in front of the parentheses:

print +("Usernames: ", join " ", sort keys %userdb), "
";

which doesn't change what gets printed but does create the interpretation you want. Usually, however, nothing so unintuitive is called for. If you want to execute multiple statements without having to put them in a block, the low-precedence logical and is perfect:

warn "We're out of $food
" and return
     unless exists $menu{$food};

This doesn't need any parentheses at all. (But make sure the statement before the and is bound to evaluate as true.)

Here's another example of the same problem: trying to populate a hash from an array with this code:

my @nums = 1..100;
my %square = map ($_, $_ * $_), @nums;

causes the complaint Useless use of private array in void context because while the programmer wanted the map expression to return a two-element list, the looks-like-a-function rule made the first argument $_ and the second argument $_ * $_, thereby leaving poor @nums out in the cold, hard space of void context. This also can be cured by placing a + in front of the parentheses.

One place it helps to put parentheses in is where you're passing no arguments to a function that can take some. Consider a subroutine for converting Celsius to Fahrenheit:

use constant RATIO => 9/5;
sub c2f
   {
   return shift * RATIO + 32;
   }

Compile this and you'll see

Type of arg 1 to shift must be array (not addition (+))

Incidentally, if you parenthesize it as return (shift * RATIO) + 32; you get an even more interesting error,

Type of arg 1 to shift must be array (not ref-to-glob cast)

What's happening is that Perl thinks that what follows shift may be an argument for it, since it looks like the beginning of a token that could be one.

With experience—or table 3-3, “Ambiguous Characters” in the third edition of Programming Perl (Wall, et al., O'Reilly, 2000)—you'll learn where you don't need to apply this solution:

Put empty parentheses after a function that could take arguments but doesn't.[3]

[3] There is one ugly exception to this rule, the eof function. If you ever need to use it, look in perlfunc first to see what we mean.


In particular, it's always safe to leave them off if the function call is the last thing in the statement.

4.1.3. The Many Faces of {}

When you think about the myriad uses of braces,[4] it's amazing that Perl can keep them all straight. They're used for delineating code blocks, hash keys, anonymous hash references, and regular expression quantifiers, and can also be used as delimiters for the q, qq, qr, qw, qx, m//, s///, and tr// operators. Those code blocks are used in many places, like subroutine declarations, sort or map routines, or even part of a variable identifier:

[4] In Perl. We don't want to know what else you use something for holding your pants or your teeth together.

print "Three largest cities:
      @{$state_info{NY}{Megalopoles}}[0..2]";

The amazing thing is that Perl has so many uses for braces yet hardly ever needs help telling which one you mean. Because there are so many ways in which braces are valid in Perl, though, incorrect use may generate no message or something cryptic. The most common brace mistake for newcomers is hash initializations:

my %queen = { Britain     => 'Elizabeth',
              Netherlands => 'Beatrix' };  # Wrong!

This is so common that it is specifically checked for by -w, which gives the warning Reference found where even-sized list expected.

(No message is output without -w; before you get much further in this book you will be convinced that no Perl program should be allowed to run without it.)

This isn't an outright error, because it is, in fact, syntactically valid: it creates a hash with one key for which the value is undef. That key is the stringified form of a reference to an anonymous hash, so it looks something like “HASH(0xbd864).” The anonymous hash it refers to is the one the newcomer intended to create but got confused about when to use braces in connection with hashes. (The right-hand side should be a list, and literal lists are surrounded by parentheses.)

The most common place where necessary braces are omitted is in dereferencing compound constructs such as the “three largest cities” example. If we'd left out one set of braces and written the variable as

@$state_info{NY}{Megalopoles}[0..2]

we would have triggered the compilation error

Can't use subscript on hash slice

which gives us a clue that Perl is trying to parse it as though it were

@{$state_info}{NY}{Megalopoles}[0..2]

which means, “Treat $state_info as a reference to a hash, return the slice of that hash with the single key NY, then take the element of that with the key Megalopoles—whoa, the thing we're trying to subscript is not a hash!”

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset