For both practical and philosophical reasons, Perl has always been biased in favor of flat, linear data structures. And for many problems, this is just what you want.
Suppose you wanted to build a simple table (two-dimensional array) showing vital statistics--age, eye color, and weight--for a group of people. You could do this by first creating an array for each individual:
@john = (47, "brown", 186); @mary = (23, "hazel", 128); @bill = (35, "blue", 157);
You could then construct a single, additional array consisting of the names of the other arrays:
@vitals = ('john', 'mary', 'bill'),
To change John's eyes to "red" after a night on the town,
we want a way to change the contents of the @john
array given only the simple string "john
". This is
the basic problem of indirection, which various
languages solve in various ways. In C, the most common form of
indirection is the pointer, which lets one variable hold the memory
address of another variable. In Perl, the most common form of
indirection is the reference.
In our example, $vitals[0]
has the
value "john
". That is, it contains a string that
happens to be the name of another (global) variable. We say that the
first variable refers to the second, and this
sort of reference is called a symbolic reference,
since Perl has to look up @john
in a symbol table
to find it. (You might think of symbolic references as analogous to
symbolic links in the filesystem.) We'll talk about symbolic
references later in this chapter.
The other kind of reference is a hard
reference, and this is what most Perl programmers use to accomplish
their indirections (if not their indiscretions). We call them hard
references not because they're difficult, but because they're real and
solid. If you like, think of hard references as real references and
symbolic references as fake references. It's like the difference
between true friendship and mere name-dropping. When we don't specify
which type of reference we mean, it's a hard reference. Figure 8.1 depicts a variable
named $bar
referring to the contents of a scalar
named $foo
which has the value
"bot
".
Unlike a symbolic reference, a real reference refers
not to the name of another variable (which is just a container for a
value) but to an actual value itself, some internal glob of data.
There's no good word for that thing, but when we have to, we'll call
it a referent. Suppose, for example, that you
create a hard reference to a lexically scoped array named
@array
. This hard reference, and the referent it
refers to, will continue to exist even after @array
goes out of scope. A referent is only destroyed when all the
references to it are eliminated.
A referent doesn't really have a name of its own, apart from the references to it. To put it another way, every Perl variable name lives in some kind of symbol table, holding one hard reference to its underlying (otherwise nameless) referent. That referent might be simple, like a number or string, or complex, like an array or hash. Either way, there's still exactly one reference from the variable to its value. You might create additional hard references to the same referent, but if so, the variable doesn't know (or care) about them.[1]
A symbolic reference is just a string that happens to name something in a package symbol table. It's not so much a distinct type as it is something you do with a string. But a hard reference is a different beast entirely. It is the third of the three kinds of fundamental scalar data types, the other two being strings and numbers. A hard reference doesn't know something's name just to refer to it, and it's actually completely normal for there to be no name to use in the first place. Such totally nameless referents are called anonymous; we discuss them in "Anonymous Data" below.
To reference a value, in the terminology of this chapter, is to create a hard reference to it. (There's a special operator for this creative act.) The reference so created is simply a scalar, which behaves in all familiar contexts just like any other scalar. To dereference this scalar means to use the reference to get at the referent. Both referencing and dereferencing occur only when you invoke certain explicit mechanisms; implicit referencing or dereferencing never occurs in Perl. Well, almost never.
A function call can use implicit
pass-by-reference semantics--if it has a prototype declaring it that
way. If so, the caller of the function doesn't explicitly pass a
reference, although you still have to dereference it explicitly within
the function. See Section
6.4 in Chapter 6. And to be
perfectly honest, there's also some behind-the-scenes dereferencing
happening when you use certain kinds of filehandles, but that's for
backward compatibility and is transparent to the casual user. Finally,
two built-in functions, bless
and
lock
, each take a reference for their argument but
implicitly dereference it to work their magic on what lies behind. But
those confessions aside, the basic principle still holds that Perl
isn't interested in muddling your levels of indirection.
A reference can point to any data structure. Since references are scalars, you can store them in arrays and hashes, and thus build arrays of arrays, arrays of hashes, hashes of arrays, arrays of hashes and functions, and so on. There are examples of these in Chapter 9.
Keep in mind, though, that Perl arrays and hashes are internally one-dimensional. That is, their elements can hold only scalar values (strings, numbers, and references). When we use a phrase like "array of arrays", we really mean "array of references to arrays", just as when we say "hash of functions" we really mean "hash of references to subroutines". But since references are the only way to implement such structures in Perl, it follows that the shorter, less accurate phrase is not so inaccurate as to be false, and therefore should not be totally despised, unless you're into that sort of thing.
[1] If you're curious, you can determine the underlying refcount
with the Devel::Peek
module, bundled with
Perl.