We've talked about storing values in variables, but the variables themselves (their names and their associated definitions) also need to be stored somewhere. In the abstract, these places are known as namespaces. Perl provides two kinds of namespaces, which are often called symbol tables and lexical scopes.[6] You may have an arbitrary number of symbol tables or lexical scopes, but every name you define gets stored in one or the other. We'll explain both kinds of namespaces as we go along. For now we'll just say that symbol tables are global hashes that happen to contain symbol table entries for global variables (including the hashes for other symbol tables). In contrast, lexical scopes are unnamed scratchpads that don't live in any symbol table, but are attached to a block of code in your program. They contain variables that can only be seen by the block. (That's what we mean by a scope). The lexical part just means, "having to do with text", which is not at all what a lexicographer would mean by it. Don't blame us.)
Within any given namespace (whether global or lexical),
every variable type has its own subnamespace, determined by the funny
character. You can, without fear of conflict, use the same name for a
scalar variable, an array, or a hash (or, for that matter, a
filehandle, a subroutine name, a label, or your pet llama). This means
that $foo
and @foo
are two
different variables. Together with the previous rules, it also means
that $foo[1]
is an element of
@foo
totally unrelated to the scalar variable
$foo
. This may seem a bit weird, but that's okay,
because it is weird.
Subroutines may be named with an initial
&
, although the funny character is optional
when calling the subroutine. Subroutines aren't generally considered
lvalues, though recent versions of Perl allow you to return an lvalue
from a subroutine and assign to that, so it can look as though you're
assigning to the subroutine.
Sometimes you just want a name for "everything named
foo" regardless of its funny character. So symbol table entries can be
named with an initial *
, where the asterisk stands
for all the other funny characters. These are called
typeglobs, and they have several uses. They can
also function as lvalues. Assignment to typeglobs
is how Perl implements importing of symbols from one symbol table to
another. More about that later too.
Like most computer languages, Perl has a list of
reserved words that it recognizes as special keywords. However,
because variable names always start with a funny character, reserved
words don't actually conflict with variable names. Certain other kinds
of names don't have funny characters, though, such as labels and
filehandles. With these, you do have to worry (a little) about
conflicting with reserved words. Since most reserved words are
entirely lowercase, we recommend that you pick label and filehandle
names that contain uppercase letters. For example, if you say
open(LOG, logfile)
rather than the regrettable
open(log, "logfile")
, you won't confuse Perl into
thinking you're talking about the built-in log
operator (which does logarithms, not tree trunks). Using uppercase
filehandles also improves readability[7] and protects you from conflict with reserved words we
might add in the future. For similar reasons, user-defined modules are
typically named with initial capitals so that they'll look different
from the built-in modules known as pragmas, which are named in all
lowercase. And when we get to object-oriented
programming, you'll notice that class names are
usually capitalized for the same reason.
As you might deduce from the preceding paragraph, case
is significant in identifiers--FOO
,
Foo
, and foo
are all different
names in Perl. Identifiers start with a letter or underscore and may
be of any length (for values of "any" ranging between 1 and 251,
inclusive) and may contain letters, digits, and underscores. This
includes Unicode letters and digits. Unicode ideographs also count as
letters, but we don't recommend you use them unless you can read them.
See Chapter 15.
Names that follow funny characters don't have to be identifiers,
strictly speaking. They can start with a digit, in which case they may
only contain more digits, as in $123
. Names that
start with anything other than a letter, digit, or underscore are
(usually) limited to that one character (like $?
or
$$
), and generally have a predefined significance
to Perl. For example, just as in the Bourne shell,
$$
is the current process ID and
$?
the exit status of your last child process.
As of version 5.6, Perl also has an extensible syntax
for internal variables names. Any variable of the form
${^
NAME
}
is a special variable reserved for use by Perl. All these
non-identifier names are forced to be in the main symbol table. See
Chapter 28, for some
examples.
It's tempting to think of identifiers and names as the
same thing, but when we say name, we usually mean
a fully qualified name, that is, a name that says which symbol table
it lives in. Such names may be formed of a sequence of identifiers
separated by the :
: token:
$Santa::Helper::Reindeer::Rudolph::nose
That works just like the directories and filenames in a pathname:
/Santa/Helper/Reindeer/Rudolph/nose
In the Perl version of that notion, all the leading identifiers
are the names of nested symbol tables, and the last identifier is the
name of the variable within the most deeply nested symbol table. For
instance, in the variable above, the symbol table is named
Santa::Helper::Reindeer::Rudolph:
:, and the actual
variable within that symbol table is $nose
. (The
value of that variable is, of course, "red
".)
A symbol table in Perl is also known as a
package, so these are often called package
variables. Package variables are nominally private to the package in
which they exist, but are global in the sense that the packages
themselves are global. That is, anyone can name the package to get at
the variable; it's just hard to do this by accident. For instance, any
program that mentions $Dog::bert
is asking for the
$bert
variable within the Dog:
:
package. That is an entirely separate variable from
$Cat::bert
. See Chapter 10.
Variables attached to a lexical scope are not in any
package, so lexically scoped variable names may not contain the
:
: sequence. (Lexically scoped variables are
declared with a my
declaration.)
So the question is, what's in a name? How does Perl
figure out what you mean if you just say $bert
?
Glad you asked. Here are the rules the Perl parser uses while trying
to understand an unqualified name in context:
First, Perl looks earlier in the immediately
enclosing block to see whether the variable is declared in that
same block with a my
(or
our
) declaration (see those entries in Chapter 29, as well as the
section Section 4.8
in Chapter 4). If there is a
my
declaration, the variable is lexically
scoped and doesn't exist in any package--it exists only in that
lexical scope (that is, in the block's scratchpad). Because
lexical scopes are unnamed, nobody outside that chunk of program
can even name your variable.[8]
If that doesn't work, Perl looks for the block enclosing that block and tries again for a lexically scoped variable in the larger block. Again, if Perl finds one, the variable belongs only to the lexical scope from the point of declaration through the end of the block in which it is declared--including any nested blocks, like the one we just came from in step 1. If Perl doesn't find a declaration, it repeats step 2 until it runs out of enclosing blocks.
When Perl runs out of enclosing blocks, it
examines the whole compilation unit for declarations as if it
were a block. (A compilation unit is just
the entire current file, or the string currently being compiled
by an eval
STRING
operator.) If the compilation unit is a file, that's the largest
possible lexical scope, and Perl will look no further for
lexically scoped variables, so we go to step 4. If the
compilation unit is a string, however, things get fancier. A
string compiled as Perl code at run time pretends that it's a
block within the lexical scope from which the
eval
STRING
is
running, even though the actual boundaries of the lexical scope
are the limits of the string containing the code rather than any
real braces. So if Perl doesn't find the variable in the lexical
scope of the string, we pretend that the eval
STRING
is a block and go back to step
2, only this time starting with the lexical scope of the
eval
STRING
operator instead of the lexical scope inside its string.
If we get here, it means Perl didn't find any
declaration (either my
or
our
) for your variable. Perl now gives up on
lexically scoped variables and assumes that your variable is a
package variable. If the strict
pragma is in
effect, you will now get an error, unless the variable is one of
Perl's predefined variables or has been imported into the
current package. This is because that pragma disallows the use
of unqualified global names. However, we aren't done with
lexical scopes just yet. Perl does the same search of lexical
scopes as it did in steps 1 through 3, only this time it
searches for package
declarations instead of
variable declarations. If it finds such a package declaration,
it knows that the current code is being compiled for the package
in question and prepends the declared package name to the front
of the variable.
If there is no package declaration in any
surrounding lexical scope, Perl looks for the variable name in
the unnamed top-level package, which happens to have the name
main
when it isn't going around without a
name tag. So in the absence of any declarations to the contrary,
$bert
means the same as
$::bert
, which means the same as
$main::bert
. (But because
main
is just another package in the top-level
unnamed package, it's also $::main::bert
, and
$main::main::bert
,
$::main::main::bert
and so on. This could be
construed as a useless fact. But see Section 10.1 in Chapter 10.)
There are several implications to these search rules that might not be obvious, so we'll make them explicit.
Because the file is the largest possible lexical scope, a lexically scoped variable can never be visible outside the file in which it's declared. File scopes do not nest.
Any particular bit of Perl is compiled in at least one lexical scope and exactly one package scope. The mandatory lexical scope is, of course, the file itself. Additional lexical scopes are provided by each enclosing block. All Perl code is also compiled in the scope of exactly one package, and although the declaration of which package you're in is lexically scoped, packages themselves are not lexically constrained. That is, they're global.
An unqualified variable name may therefore be searched for in many lexical scopes, but only one package scope, whichever one is currently in effect (which is lexically determined).
A variable name may only attach to one scope. Although at least two different scopes (lexical and package) are active everywhere in your program, a variable can only exist in one of those scopes.
An unqualified variable name can therefore resolve to only a single storage location, either in the first enclosing lexical scope in which it is declared, or else in the current package--but not both. The search stops as soon as that storage location is resolved, and any storage location that it would have found had the search continued is effectively hidden.
The location of the typical variable name can be completely determined at compile time.
Now that you know all about how the Perl compiler deals with names, you sometimes have the problem that you don't know the name of what you want at compile time. Sometimes you want to name something indirectly; we call this the problem of indirection. So Perl provides a mechanism: you can always replace an alphanumeric variable name with a block containing an expression that returns a reference to the real data. For instance, instead of saying:
$bert
you might say:
${ some_expression() }
and if the some_expression()
function
returns a reference to variable $bert
(or even
the string, "bert
"), it will work just as if
you'd said $bert
in the first place. On the other
hand, if the function returns a reference to
$ernie
, you'll get his variable instead. The
syntax shown is the most general (and least legible) form of
indirection, but we'll cover several convenient variations in Chapter 8.
[6] We also call them packages and pads when we're talking about Perl's specific implementations, but those longer monikers are the generic industry terms, so we're pretty much stuck with them. Sorry.
[7] One of the design principles of Perl is that different things should look different. Contrast this with languages that try to force different things to look the same, to the detriment of readability.
[8] If you use an our
declaration
instead of a my
declaration, this only
declares a lexically scoped alias (a
nickname) for a package variable, rather than declaring a
true lexically scoped variable the way my
does. Outside code can still get at the real variable
through its package, but in all other respects an
our
declaration behaves like a
my
declaration. This is handy when you're
trying to limit your own use of globals with the
use strict
pragma (see the
strict
pragma in Glossary). But you should
always prefer my
if you don't need a
global.