Just as there are numerous ways to create references, there are also several ways to use, or dereference, a reference. There is just one overriding principle: Perl does no implicit referencing or dereferencing.[4] When a scalar is holding a reference, it always behaves like a simple scalar. It doesn't magically start being an array or hash or subroutine; you have to tell it explicitly to do so, by dereferencing it.
When you encounter a scalar like
$foo
, you should be thinking "the scalar value of
foo
." That is, there's a foo
entry in the symbol table, and the $
funny
character is a way of looking at whatever scalar value might be
inside. If what's inside is a reference, you can look inside
that (dereferencing $foo
) by
prepending another funny character. Or looking at it the other way
around, you can replace the literal foo
in
$foo
with a scalar variable that points to the
actual referent. This is true of any variable type, so not only is
$$foo
the scalar value of whatever
$foo
refers to, but @$bar
is
the array value of whatever $bar
refers to,
%$glarch
is the hash value of whatever
$glarch
refers to, and so on. The upshot is that
you can put an extra funny character on the front of any simple
scalar variable to dereference it:
$foo = "three humps"; $scalarref = $foo; # $scalarref is now a reference to $foo $camel_model = $$scalarref; # $camel_model is now "three humps"
Here are some other dereferences:
$bar = $$scalarref; push(@$arrayref, $filename); $$arrayref[0] = "January"; # Set the first element of @$arrayref @$arrayref[4..6] = qw/May June July/; # Set several elements of @$arrayref %$hashref = (KEY => "RING", BIRD => "SING"); # Initialize whole hash $$hashref{KEY} = "VALUE"; # Set one key/value pair @$hashref{"KEY1","KEY2"} = ("VAL1","VAL2"); # Set two more pairs &$coderef(1,2,3); print $handleref "output ";
This form of dereferencing can only make use of a simple
scalar variable (one without a subscript). That is, dereferencing
happens before (or binds tighter than) any
array or hash lookups. Let's use some braces to clarify what we
mean: an expression like $$arrayref[0]
is
equivalent to ${$arrayref}[0]
and means the first
element of the array referred to by $arrayref
.
That is not at all the same as ${$arrayref[0]}
,
which is dereferencing the first element
of the (probably nonexistent) array named
@arrayref
. Likewise,
$$hashref{KEY}
is the same as
${$hashref}{KEY}
, and has nothing to do with
${$hashref{KEY}}
, which would be dereferencing an
entry in the (probably nonexistent) hash named
%hashref
. You will be miserable until you
understand this.
You can achieve multiple levels of referencing and
dereferencing by concatenating the appropriate funny characters. The
following prints "howdy
":
$refrefref = \"howdy"; print $$$$refrefref;
You can think of the dollar signs as operating right to left. But the beginning of the chain must still be a simple, unsubscripted scalar variable. There is, however, a way to get fancier, which we already sneakily used earlier, and which we'll explain in the next section.
Not only can you dereference a simple variable name,
you can also dereference the contents of a
BLOCK
. Anywhere you'd put an alphanumeric
identifier as part of a variable or subroutine name, you can replace
the identifier with a BLOCK
returning a
reference of the correct type. In other words, the earlier examples
could all be disambiguated like this:
$bar = ${$scalarref}; push(@{$arrayref}, $filename); ${$arrayref}[0] = "January"; @{$arrayref}[4..6] = qw/May June July/; ${$hashref}{"KEY"} = "VALUE"; @{$hashref}{"KEY1","KEY2"} = ("VAL1","VAL2"); &{$coderef}(1,2,3);
not to mention:
$refrefref = \"howdy"; print ${${${$refrefref}}};
Admittedly, it's silly to use the braces in these simple
cases, but the BLOCK
can contain any
arbitrary expression. In particular, it can contain subscripted
expressions. In the following example,
$dispatch{$index}
is assumed to contain a
reference to a subroutine (sometimes called a "coderef"). The
example invokes the subroutine with three arguments.
&{ $dispatch{$index} }(1, 2, 3);
Here, the BLOCK
is necessary.
Without that outer pair of braces, Perl would have treated
$dispatch
as the coderef instead of
$dispatch{$index}
.
For references to arrays, hashes, or subroutines, a third
method of dereferencing involves the use of the
->
infix operator. This form of syntactic
sugar makes it easier to get at individual array or hash elements,
or to call a subroutine indirectly.
The type of the dereference is determined by the right operand, that is, by what follows directly after the arrow. If the next thing after the arrow is a bracket or brace, the left operand is treated as a reference to an array or a hash, respectively, to be subscripted by the expression on the right. If the next thing is a left parenthesis, the left operand is treated as a reference to a subroutine, to be called with whatever parameters you supply in the parentheses on the right.
Each of these next trios is equivalent, corresponding to the three notations we've introduced. (We've inserted some spaces to line up equivalent elements.)
$ $arrayref [2] = "Dorian"; #1 ${ $arrayref }[2] = "Dorian"; #2 $arrayref->[2] = "Dorian"; #3 $ $hashref {KEY} = "F#major"; #1 ${ $hashref }{KEY} = "F#major"; #2 $hashref->{KEY} = "F#major"; #3 & $coderef (Presto => 192); #1 &{ $coderef }(Presto => 192); #2 $coderef->(Presto => 192); #3
You can see that the initial funny character is missing from
the third notation in each trio. The funny character is guessed at
by Perl, which is why it can't be used to dereference complete
arrays, complete hashes, or slices of either. As long as you stick
with scalar values, though, you can use any expression to the left
of the ->
, including another dereference,
because multiple arrow operators associate left to right:
print $array[3]->{"English"}->[0];
You can deduce from this expression that the fourth element of
@array
is intended to be a hash reference, and
the value of the "English
" entry in that hash is
intended to be an array reference.
Note that $array[3]
and
$array->[3]
are not the same. The first is
talking about the fourth element of @array
, while
the second one is talking about the fourth element of the (possibly
anonymous) array whose reference is contained in
$array
.
Suppose now that $array[3]
is undefined.
The following statement is still legal:
$array[3]->{"English"}->[0] = "January";
This is one of those cases mentioned earlier in which
references spring into existence (or "autovivify") when used as an
lvalue (that is, when a value is being assigned to it). If
$array[3]
was undefined, it's automatically
defined as a hash reference so that we can set a value for
$array[3]->{"English"}
in it. Once that's
done, $array[3]->{"English"}
is automatically
defined as an array reference so that we can assign something to the
first element in that array. Note that rvalues are a little
different: print
$array[3]->{"English"}->[0]
only defines
$array[3]
and
$array[3]->{"English"}
, not
$array[3]->{"English"}->[0]
, since the
final element is not an lvalue. (The fact that it defines the first
two at all in an rvalue context could be considered a bug. We may
fix that someday.)
The arrow is optional between brackets or braces, or between a closing bracket or brace and a parenthesis for an indirect function call. So you can shrink the previous code down to:
$dispatch{$index}(1, 2, 3); $array[3]{"English"}[0] = "January";
In the case of ordinary arrays, this gives you multidimensional arrays that are just like C's array:
$answer[$x][$y][$z] += 42;
Well, okay, not entirely like C's arrays. For one thing, C doesn't know how to grow its arrays on demand, while Perl does. Also, some constructs that are similar in the two languages parse differently. In Perl, the following two statements do the same thing:
$listref->[2][2] = "hello"; # Pretty clear $$listref[2][2] = "hello"; # A bit confusing
This second of these statements may disconcert the C
programmer, who is accustomed to using *a[i]
to
mean "what's pointed to by the ith element of
a
". But in Perl, the five characters ($
@ * % &
) effectively bind more tightly than braces or
brackets.[5] Therefore, it is $$listref
and not
$listref[2]
that is taken to be a reference
to an array. If you want the C behavior,
either you have to write ${$listref[2]}
to force the $listref[2]
to get evaluated before
the leading $
dereferencer, or you have to use
the ->
notation:
$listref[2]->[$greeting] = "hello";
If a reference happens to be a reference to an object, then the class that defines that object probably provides methods to access the innards of the object, and you should generally stick to those methods if you're merely using the class (as opposed to implementing it). In other words, be nice, and don't treat an object like a regular reference, even though Perl lets you when you really need to. Perl does not enforce encapsulation. We are not totalitarians here. We do expect some basic civility, however.
In return for this civility, you get complete orthogonality between objects and data structures. Any data structure can behave as an object when you want it to. Or not, when you don't.
A pseudohash is any reference to an array whose first element is a reference to a hash. You can treat the pseudohash reference as either an array reference (as you would expect) or a hash reference (as you might not expect). Here's an example of a pseudohash:
$john = [ {age => 1, eyes => 2, weight => 3}, 47, "brown", 186 ];
The underlying hash in $john->[0]
defines the names ("age
",
"eyes
", "weight
") of the array
elements that follow (47
,
"brown
", 186
). Now you can
access an element with both hash and array notations:
$john->{weight} # Treats $john as a hashref $john->[3] # Treats $john as an arrayref
Pseudohash magic is not deep; it only knows one "trick": how to turn a hash dereference into an array dereference. When adding another element to a pseudohash, you have to explicitly tell the underlying mapping hash where the element will reside before you can use the hash notation:
$john->[0]{height} = 4; # height is to be element 4 $john->{height} = "tall"; # Or $john->[4] = "tall"
Perl raises an exception if you try to delete a key from a pseudohash, although you can always delete keys from the mapping hash. Perl also raises an exception if you try to access a nonexistent key, where "existence" means presence in the mapping hash:
delete $john->[0]{height}; # Deletes from the underlying hash only $john->{height}; # This now raises an exception $john->[4]; # Still prints "tall"
Don't try to splice the array unless you know what you're doing. If the array elements move around, the mapping hash values will still refer to the old element positions, unless you change those explicitly, too. Pseudohash magic is not deep.
To avoid inconsistencies, you can use the
fields::phash
function provided by the
use fields
pragma to create a pseudohash:
use fields; $ph = fields::phash(age => 47, eyes => "brown", weight => 186); print $ph->{age};
There are two ways to check for the existence of a key in a
pseudohash. The first is to use exists
, which
checks whether the given field has ever been set. It acts this way
to match the behavior of a real hash. For instance:
use fields; $ph= fields::phash([qw(age eyes brown)], [47]); $ph->{eyes} = undef; print exists $ph->{age}; # True, 'age' was set in declaration. print exists $ph->{weight}; # False, 'weight' has not been used. print exists $ph->{eyes}; # True, your 'eyes' have been touched.
The second way is to use exists
on the
mapping hash sitting in the first array element. This checks whether
the given key is a valid field for that pseudohash:
print exists $ph->[0]{age}; # True, 'age' is a valid field print exists $ph->[0]{name}; # False, 'name' can't be used
Unlike what happens in a real hash, calling
delete
on a pseudohash element deletes only the
array value corresponding to the key, not the real key in the
mapping hash. To delete the key, you have to explicitly delete it
from the mapping hash. Once you do that, you may no longer use that
key name as a pseudohash subscript:
print delete $ph->{age}; # Removes and returns $ph->[1], 47 print exists $ph->{age}; # Now false print exists $ph->[0]{age}; # True, 'age' key still usable print delete $ph->[0]{age}; # Now 'age' key is gone print $ph->{age}; # Run-time exception
You've probably begun to wonder what could possibly have motivated this masquerade of arrays prancing about in hashes' clothing. Arrays provide faster lookups and more efficient storage, while hashes offer the convenience of naming (instead of numbering) your data; pseudohashes provide the best of both worlds. But it's not until you consider Perl's compilation phase that the greatest benefit becomes apparent. With the help of a pragma or two, the compiler can verify proper access to valid fields, so you can find out about nonexistent subscripts (or spelling errors) before your program starts to run.
Pseudohashes' properties of speed, efficiency, and
compile-time access checking (you might even think of it as type
safety) are especially handy for creating efficient and robust class
modules. See the discussion of the use fields
pragma in Chapter 12 and Glossary.
Pseudohashes are a new and relatively experimental feature; as
such, the underlying implementation may well change in the future.
To protect yourself from such changes, always go through the
fields
module's documented interface via its
phash
and new
functions.
As mentioned earlier, the backslash operator is usually used on a single referent to generate a single reference, but it doesn't have to be. When used on a list of referents, it produces a list of corresponding references. The second line of the following example does the same thing as the first line, since the backslash is automatically distributed throughout the whole list.
@reflist = ($s, @a, \%h, &f); # List of four references @reflist = ($s, @a %h, &f); # Same thing
If a parenthesized list contains exactly one array or hash, then all of its values are interpolated and references to each returned:
@reflist = (@x); # Interpolate array, then get refs @reflist = map { $_ } @x; # Same thing
This also occurs when there are internal parentheses:
@reflist = (@x, (@y)); # But only single aggregates expand @reflist = (@x, map { $_ } @y); # Same thing
If you try this with a hash, the result will contain references to the values (as you'd expect), but references to copies of the keys (as you might not expect).
Since array and hash slices are really just lists, you can backslash a slice of either of these to get a list of references. Each of the next three lines does exactly the same thing:
@envrefs = @ENV{'HOME', 'TERM'}; # Backslashing a slice @envrefs = ( $ENV{HOME}, $ENV{TERM} ); # Backslashing a list @envrefs = ( $ENV{HOME}, $ENV{TERM} ); # A list of two references
Since functions can return lists, you can apply a backslash to them. If you have more than one function to call, first interpolate each function's return values into a larger list and then backslash the whole thing:
@reflist = fx(); @reflist = map { $_ } fx(); # Same thing @reflist = ( fx(), fy(), fz() ); @reflist = ( fx(), fy(), fz() ); # Same thing @reflist = map { $_ } fx(), fy(), fz(); # Same thing
The backslash operator always supplies a list context to its operand, so those functions are all called in list context. If the backslash is itself in scalar context, you'll end up with a reference to the last value of the list returned by the function:
@reflist = localtime(); # Ref to each of nine time elements $lastref = localtime(); # Ref to whether it's daylight savings time
In this regard, the backslash behaves like the named Perl list
operators, such as print
,
reverse
, and sort
, which
always supply a list context on their right no matter what might be
happening on their left. As with named list operators, use an
explicit scalar
to force what follows into scalar
context:
$dateref = scalar localtime(); # "Sat Jul 16 11:42:18 2000"
You can use the ref
operator to determine
what a reference is pointing to. Think of ref
as
a "typeof" operator that returns true if its argument is a reference
and false otherwise. The value returned depends on the type of thing
referenced. Built-in types include SCALAR
,
ARRAY
, HASH
,
CODE
, GLOB
,
REF
, LVALUE
,
IO
, IO::Handle
, and
Regexp
. Here, we use it to check subroutine
arguments:
sub sum { my $arrayref = shift; warn "Not an array reference" if ref($arrayref) ne "ARRAY"; return eval join("+", @$arrayref); }
If you use a hard reference in a string context, it'll be
converted to a string containing both the type and the address:
SCALAR(0x1fc0e)
. (The reverse conversion cannot
be done, since reference count information is lost during
stringification--and also because it would be dangerous to let
programs access a memory address named by an arbitrary
string.)
You can use the bless
operator to
associate a referent with a package functioning as an object class.
When you do this, ref
returns the class name
instead of the internal type. An object reference used in a string
context returns a string with the external and internal types, and
the address in memory: MyType=HASH(0x20d10)
or
IO::Handle=IO(0x186904)
. See Chapter 12 for more details about
objects.
Since the way in which you dereference something
always indicates what sort of referent you're looking for, a
typeglob can be used the same way a reference can, despite the fact
that a typeglob contains multiple referents of various types. So
${*main::foo}
and
${$main::foo}
both access the same scalar
variable, although the latter is more efficient.
Here's a trick for interpolating the return value of a subroutine call into a string:
print "My sub returned @{[ mysub(1,2,3) ]} that time. ";
It works like this. At compile time, when the
@{…}
is seen within the double-quoted string,
it's parsed as a block that returns a reference. Within the block,
there are square brackets that create a reference to an anonymous
array from whatever is in the brackets. So at run time,
mysub(1,2,3)
is called in list context, and the
results are loaded into an anonymous array, a reference to which is
then returned within the block. That array reference is then
immediately dereferenced by the surrounding @{…}
,
and the array value is interpolated into the double-quoted string
just as an ordinary array would be. This chicanery is also useful
for arbitrary expressions, such as:
print "We need @{ [$n + 5] } widgets! ";
Be careful though: square brackets supply a list context to
their expression. In this case it doesn't matter, although the
earlier call to mysub
might care. When it does
matter, use an explicit scalar
to force the
context:
print "mysub returns @{ [scalar mysub(1,2,3)] } now. ";
Earlier we talked about creating anonymous subroutines
with a nameless sub {}
. You can think of those
subroutines as defined at run time, which means that they have a
time of generation as well as a location of definition. Some
variables might be in scope when the subroutine is created, and
different variables might be in scope when the subroutine is
called.
Forgetting about subroutines for a moment, consider a reference that refers to a lexical variable:
{ my $critter = "camel"; $critterref = $critter; }
The value of $$critterref
will remain
"camel
" even though $critter
disappears after the closing curly brace. But
$critterref
could just as well have referred to a
subroutine that refers to $critter
:
{ my $critter = "camel"; $critterref = sub { return $critter }; }
This is a closure, which is a notion out of the functional programming world of LISP and Scheme.[6] It means that when you define an anonymous function in a particular lexical scope at a particular moment, it pretends to run in that scope even when later called from outside that scope. (A purist would say it doesn't have to pretend--it actually does run in that scope.)
In other words, you are guaranteed to get the same copy of a lexical variable each time, even if other instances of that lexical variable have been created before or since for other instances of that closure. This gives you a way to set values used in a subroutine when you define it, not just when you call it.
You can also think of closures as a way to write a
subroutine template without using eval
. The
lexical variables act as parameters for filling in the template,
which is useful for setting up little bits of code to run later.
These are commonly called callbacks in
event-based programming, where you associate a bit of code with a
keypress, mouse click, window exposure, and so on. When used as
callbacks, closures do exactly what you expect, even if you don't
know the first thing about functional programming. (Note that this
closure business only applies to my
variables.
Global variables work as they've always worked, since they're
neither created nor destroyed the way lexical variables are.)
Another use for closures is within function generators; that is, functions that create and return brand new functions. Here's an example of a function generator implemented with closures:
sub make_saying { my $salute = shift; my $newfunc = sub { my $target = shift; print "$salute, $target! "; }; return $newfunc; # Return a closure } $f = make_saying("Howdy"); # Create a closure $g = make_saying("Greetings"); # Create another closure # Time passes… $f->("world"); $g->("earthlings");
This prints:
Howdy, world! Greetings, earthlings!
Note in particular how $salute
continues to
refer to the actual value passed into
make_saying
, despite the fact that the
my $salute
has gone out of scope by the time the
anonymous subroutine runs. That's what closures are all about. Since
$f
and $g
hold references to
functions that, when called, still need access to the distinct
versions of $salute
, those versions automatically
stick around. If you now overwrite $f
,
its version of $salute
would
automatically disappear. (Perl only cleans up when you're not
looking.)
Perl doesn't provide references to object methods (described in Chapter 12) but you can get a similar effect using a closure. Suppose you want a reference not just to the subroutine the method represents, but one which, when invoked, would call that method on a particular object. You can conveniently remember both the object and the method as lexical variables bound up inside a closure:
sub get_method_ref { my ($self, $methodname) = @_; my $methref = sub { # the @_ below is not the same as the one above! return $self->$methodname(@_); }; return $methref; } my $dog = new Doggie:: Name => "Lucky", Legs => 3, Tail => "clipped"; our $wagger = get_method_ref($dog, 'wag'), $wagger->("tail"); # Calls $dog->wag('tail').
Not only can you get Lucky to wag what's left of his tail now,
even once the lexical $dog
variable has gone out
of scope and Lucky is nowhere to be seen, the global
$wagger
variable can still get him to wag his
tail, wherever he is.
Using a closure as a function template allows you to generate many functions that act similarly. Suppose you want a suite of functions that generate HTML font changes for various colors:
print "Be ", red("careful"), "with that ", green("light"), "!!!";
The red
and green
functions would be very similar. We'd like to name our functions,
but closures don't have names since they're just anonymous
subroutines with an attitude. To get around that, we'll perform
the cute trick of naming our anonymous subroutines. You can bind a
coderef to an existing name by assigning it to a typeglob of the
name of the function you want. (See Section 10.1 in Chapter 10. In this case, we'll
bind it to two different names, one uppercase and one
lowercase:
@colors = qw(red blue green yellow orange purple violet); for my $name (@colors) { no strict 'refs'; # Allow symbolic references *$name = *{uc $name} = sub { "<FONT COLOR='$name'>@_</FONT>" }; }
Now you can call functions named red
,
RED
, blue
,
BLUE
, and so on, and the appropriate closure
will be invoked. This technique reduces compile time and conserves
memory, and is less error-prone as well, since syntax checks
happen during compilation. It's critical that any variables in the
anonymous subroutine be lexicals in order to create a closure.
That's the reason for the my
above.
This is one of the few places where giving a prototype to a closure makes sense. If you wanted to impose scalar context on the arguments of these functions (probably not a wise idea for this example), you could have written it this way instead:
*$name = sub ($) { "<FONT COLOR='$name'>$_[0]</FONT>" };
That's almost good enough. However, since prototype checking
happens during compile time, the run-time assignment above happens
too late to be of much use. You could fix this by putting the
whole loop of assignments within a BEGIN
block,
forcing it to occur during compilation. (More likely, you'd put it
out in a module that you use
at compile time.)
Then the prototypes will be visible during the rest of the
compilation.
If you are accustomed (from other programming languages) to using subroutines nested within other subroutines, each with their own private variables, you'll have to work at it a bit in Perl. Named subroutines do not nest properly, although anonymous ones do.[7] Anyway, we can emulate nested, lexically scoped subroutines using closures. Here's an example:
sub outer { my $x = $_[0] + 35; local *inner = sub { return $x * 19 }; return $x + inner(); }
Now inner
can only be called from within
outer
, because of the temporary assignments of
the closure. But when it is, it has normal access to the lexical
variable $x
from the scope of
outer
.
This has the interesting effect of creating a function local
to another function, something not normally supported in Perl.
Because local
is dynamically scoped, and
because function names are global to their package, any other
function that outer
called could also call the
temporary version of inner
. To prevent that,
you'd need an extra level of indirection:
sub outer { my $x = $_[0] + 35; my $inner = sub { return $x * 19 }; return $x + $inner->(); }
[4] We already confessed that this was a small fib. We're not about to do so again.
[5] But not because of operator precedence. The funny characters in Perl are not operators in that sense. Perl's grammar simply prohibits anything more complicated than a simple variable or block from following the initial funny character, for various funny reasons.
[6] In this context, the word "functional" should not be construed as an antonym of "dysfunctional".
[7] To be more precise, globally named subroutines don't
nest. Unfortunately, that's the only kind of named subroutine
declaration we have. We haven't yet implemented lexically
scoped, named subroutines (known as my
sub
s), but when we do, they should nest
correctly.