We touched on the concept of scope briefly in Chapter 2 and mentioned it in following chapters from time to time without going into the full details. The scope of a variable is simply the range of places in code from which the variable can be accessed. However, there are two fundamentally different types of variable scope in Perl: package scope (also called dynamic scope) and lexical scope. Which one we use depends on how variables are declared. Named subroutines, on the other hand, always have package scope, as they are defined within a package. This is why we can call a subroutine in another package, and also why it makes no sense to define one subroutine inside another. Anonymous subroutine references are scalars, though, so they can be scoped like any other variable.
Package and lexical scope work in fundamentally different ways, and using both at the same time is frequently the source of much confusion. One useful rule-of-thumb for understanding the distinction is that package variables have a scope that is determined at run time from the current contents of the symbol table, which along with subroutine definitions represents the entirety of the Perl program in memory, whereas lexical variables have a scope that is determined at compile time based on the structure of the source code itself.
A package is defined as a namespace in which declarations of variables and subroutines are placed, confining their scope to that particular package. A great deal of the time we can avoid defining package variables at all by declaring all variables as lexical variables. This lets us define local variables in the way they are normally understood in other languages. However, we often use package variables without being aware of it, for example, the variables $_, @_, %ENV
, and @ARGS
are all package variables in the main
package.
Packages are defined by the package
keyword. In and of itself, a package
declaration does nothing, but it states that all further declarations of subroutines or package variables will be placed into the package designated by the declaration, until further notice. The notice in this case is either the end of the file, the end of the current block (if the package declaration was made inside one), or less commonly but perfectly legally, another package
declaration. The package
keyword takes one argument: a list of namespaces separated by double colons. For example:
package My::Package;
This declares that the variables and subroutines that follow will be in the Package
namespace, inside the My
namespace.
Package variables have the scope of the package in which they were declared. A package variable can be accessed from other packages by specifying the full package name, in the same way that we can refer to any file on a hard disk from the current directory by starting at the root directory and writing out the full path of the file. This code snippet creates and accesses package variables, both local to the current package and in other packages through their full name. It also defines two subroutines, one in the current package and one in a different package:
package My::Package;
$package_variable = "implicitly in 'My::Package'";
$My::Package::another_variable = "explicitly declared in 'My::Package'";
$Another::Package::variable = "explicitly declared in 'Another::Package'";
sub routine { print "subroutine in my package" }
sub Another::Package::routine { print "subroutine in another package" }
{
package Yet::Another::Package;
$yet_another_package_variable="blocks limit scope of package declaration";
}
$back_in_my_package = "back in 'My::Package' again";
The effect of a package
declaration is to cause all unqualified package variables and (more significantly) subroutines that follow it to be located in the namespace that it defines. We don't have to use a package
declaration though; we can define variables and subroutines with an explicit package name. As we showed in the preceding example, it just makes writing code a bit simpler if we do.
Package variables are defined simply by using them, as in the preceding example. No extra syntax or qualifier is needed. Unfortunately, this makes them very easy to define accidentally by misspelling the name of another variable. Perl prevents us from making this mistake with the use strict 'vars'
pragma, introduced in Chapter 2 and covered in more detail later.
A package is not the same as a module, however, and certainly not the same as a file. In simple terms, a package defines a namespace in a data structure known as the symbol table. Whenever a new variable or subroutine is defined, a new entry is added to the table for the appropriate package (the current package if one is not explicitly prefixed to the variable or subroutine name). The same variable name in a different package is not confused with it because the namespaces of the two variables are different and they are defined in different parts of the symbol table.
Package scope is not the same as file scope. A package is most often defined in a single file (a module) to improve maintainability, but Perl does not enforce this and allows packages to be spread across many files. While uncommon, this is one approach we can take if we want to split up subroutines or methods in a package into separate functional files. We define one main module with the same name as the package, translated into a pathname, and use
or require
all the other supporting modules from it:
package My::Module;
require My::Module::Input;
require My::Module::Output;
require My::Module::Process;
The modules Input.pm, Output.pm
, and Process.pm
in this scheme would all contain a first line of package My::Module;
, rather than a package that reflects the actual name of the file, such as My::Module::Input
for My/Module/Input.pm
. This allows them to add additional subroutine definitions to the My::Module
namespace directly. Since there is no familial relationship between namespaces, adding definitions to My::Module::Input
has no effect on the My::Module
package, or vice versa. This is more efficient than the alternative, using the Exporter
module to export all the names explicitly.
It follows from this that package variables must be global variables. In practice, what we usually think of as global variables are package variables that are accessed from within their own package, where the package prefix is not required. For instance, the variable $package_variable
in the first example earlier is global within the package MyPackage
. We can access it from anywhere in our code, so long as the contents of MyPackage
have been compiled and executed by the interpreter. Within My::Module
we can refer to it as $package_variable
. From any other package, we merely have to fully qualify it as $My::Module::package_variable
.
New package variables can be created by executing code, with existing variables given local temporary values. Both of these events modify the contents of the symbol table as the program executes. As a result, package scope is ultimately determined at run time and is dependent on the structure of the data in the symbol tables. This differs from lexical variables, whose scope is determined at compile time when the source code is parsed.
All subroutines and package variables must reside in a package, in the same way that all files in a filing system must have a directory, even if it is the root. In Perl, the "root" namespace is main
, so any subroutine or package variable we define without an explicit package name or prior package declaration is part of the main package. In effect, the top of every source file is prefixed by an implicit
package main;
Perl's special variables, as well as the filehandles STDIN, STDOUT
, and STDERR
, are exempt from the normal rules of package scope visibility. All of these variables are automatically used in the main package, wherever they are used, irrespective of the package declaration that may be in effect. There is therefore only one @_
array, only one default argument $_
, and only one %ARGV
or @INC
. However, these variables have a special status in Perl because they are provided directly by the language; for our own variable declarations and subroutines, the normal rules of packages and namespaces apply.
Since it is easy to accidentally define a new global variable without meaning to, Perl provides the strict
module. With either of use strict
or use strict 'vars'
, a compile-time syntax check is enabled that requires variables to either be declared lexically or be declared explicitly as package variables. This is generally a good idea, because without this check it is very easy to introduce hard-to-spot bugs, either by forgetting to define a variable before using it, creating a new variable instead of accessing an existing one, or by accessing a global variable when we actually meant to create and use a local copy.
With strict variables in effect, Perl will no longer allow us to define a variable simply by assigning to it. For instance:
#!/usr/bin/perl
# package.pl
use warnings;
use strict;
package MyPackage;
$package_variable = "This variable is in 'MyPackage'"; # ERROR
If we attempt to run this code fragment, Perl will complain with an error.
Global symbol "$package_variable" requires explicit package name at ...
Usually the simplest fix for this problem is to prefix the variable with my
, turning the variable into a file-scoped lexical variable. However, if we actually want to define a global package variable, we now need to say so explicitly.
The traditional meaning of global variable is a variable that can be seen from anywhere—it has global visibility. In Perl, the distinction between package and lexical variables means that there are two different kinds of global variables, package-global and file-global.
A package variable is global within the package in which it is defined, as we have just seen. Any code in the same package can access the variable directly without qualifying it with a package prefix, except when the variable has been hidden by another variable with the same name (defined in a lower scope). By contrast, a lexical variable (declared with my
) is a global variable if it is defined at the file level, outside a block, which makes it a global variable within the file. Only package variables are truly global, because they can always be accessed by referring to them via their full package-prefixed name in the symbol table.
The simplest way to define a package variable is to write it out in full, including the package name, for example:
$MyPackage::package_variable = "explicitly defined with package";
However, this can be inconvenient to type each time we want to define a new package variable and also causes a lot of editing if we happen to change the name of the package. We can avoid having to write the package name by using either use vars
or our
. These both define package variables in the current package, but work in slightly different ways.
To define a package variable with the vars
pragma, we pass it a list containing the variables we want to declare. This is a common application of qw
:
use vars ('$package_variable','$another_var','@package_array'),
# using qw is often clearer
use vars qw($package_variable $another_var @package_array);
This defines the named variables in the symbol table for the current package. Therefore the variables are directly visible anywhere the package is defined, and they can be accessed from anywhere by specifying the full variable name including the package.
Because use
takes effect at compile time, variables declared with use vars
are added to the symbol table for the current package at compile time. Also, use
pays no attention to enclosing declarations like subroutine definitions. As package variables know nothing about lexical scope, the following script defines the variable $package_variable
at compile time and makes it immediately visible anywhere within the package, despite the fact the use
is in a subroutine. The value is only assigned once the subroutine is executed, however:
#!/usr/bin/perl
# globpack.pl
use warnings;
use strict;
sub define_global {
use vars qw($package_variable);
$package_variable = "defined in subroutine";
}
print $package_variable; # visible here but not yet defined
define_global;
print $package_variable; # visible here and now defined
What we probably intended to do here was create a local package variable with our
, which was introduced into Perl 5.6 for precisely this sort of occasion. Since understanding our
requires an understanding of lexical scope, we'll leave discussing it until after we look at purely lexical my
declarations.
Package variables can be temporarily localized inside subroutines and other blocks with the local
keyword. This hides an existing package variable by masking it with a temporary value that exists for as long as the local
statement remains in lexical scope. As a statement, local
takes either a single variable name or a list enclosed in parentheses and optionally an assignment to one or more values:
local $hero;
local ($zip, @boing, %yatatata);
local @list = (1, 2, 3, 4);
local ($red, $blue, $green, $yellow) = ("red", "blue", "green");
The local
keyword does not, as its name suggests, create a local variable; that is actually the job of my
. The local
keyword only operates on an existing variable, which can be either a global or a variable defined in the calling context. If no such variable exists, and we have use strict
enabled, Perl will issue a compile-time error at us. Many programs simply avoided strict
in order to use local
, but now that our
exists there is no longer a compelling reason to do this. In any event, most of the time when we want to create a local variable, we really should be using my
instead.
Localized variables are visible inside subroutine calls, just as the variable they are masking would be if they had not been defined. They are global from the perspective of the subroutines in the call-chain below their scope, so they are not visible outside the subroutine call. This is because localization happens at run time and persists for the scope of the local
statement. In this respect, they differ from lexical variables, which are also limited by the enclosing scope but which are not visible in called subroutines.
The following demonstration script illustrates how local
works, as well as the differences and similarities between my, our
, and use vars
:
#!/usr/bin/perl
# scope-our.pl
use warnings;
use strict;
package MyPackage;
my $my_var = "my-var"; # file-global lexical variable
our $our_var = "our-var"; # global to be localized with 'our'
our $local_var = "global-var"; # global to be localized with 'local'
use vars qw($use_var); # define 'MyPackage::use_var' which exists
# only in this package
$use_var = "use-var";
package AnotherPackage;
print "Outside, my_var is '$my_var'
"; # display 'my-var'
print "Outside, our_var is '$our_var'
"; # display 'our-var'
print "Outside, local_var is '$local_var'
"; # display 'global-var'
#-----
sub sub1 {
my $my_var = "my_in_sub1";
our $our_var = "our_in_sub1";
local $local_var = "local_in_sub1";
print "In sub1, my_var is '$my_var'
"; # display 'my_in_sub1'
print "In sub1, our_var is '$our_var'
"; # display 'our_in_sub1'
print "In sub1, local_var is '$local_var'
"; # display 'local_in_sub1'
sub2();
}
sub sub2 {
print "In sub2, my_var is '$my_var'
"; # display 'my-var'
print "In sub2, our_var is '$our_var'
"; # display 'our-var'
print "In sub2, local_var is '$local_var'
"; # display 'local_in_sub1'
}
#-----
sub1();
print "Again outside, my_var is '$my_var'
"; # display 'my-var'
print "Again outside, our_var is '$our_var'
"; # display 'our-var'
print "Again outside, local_var is '$local_var'
"; # display 'global-var'
Although it is often not the right tool for the job, there are a few instances when only local
will do what we want. One is if we want to create a local version of one of Perl's built-in variables. For example, if we want to temporarily alter the output separator $
in a subroutine, we would do it with local
like this:
#!/usr/bin/perl -w
# localbuiltinvar.pl
use strict;
sub printwith {
my ($separator, @stuff)=@_;
local $, = $separator; # create temporary local $,
print @stuff,"
";
}
printwith("... ","one","two","three");
The output of this program is
one... two... three
This is also the correct approach for variables such as @ARGV
and %ENV
. The special variables defined automatically by Perl are all package variables, so creating a lexical version with my
would certainly work from the point of view of our own code, but either this or the lexical version would be totally ignored by built-in functions like print
. To get the desired effect, we need to use local
to create a temporary version of the global variable that will be seen by the subroutines and built-in functions we call.
Another use for local
is for creating local versions of filehandles. For example, this subroutine replaces STDOUT
with a different filehandle, MY_HANDLE
, presumed to have been opened previously. Because we have used local
, both the print
statement that follows and any print
statements in called subroutine a_sub_that_calls_print()
will go to MY_HANDLE
. In case MY_HANDLE
is not a legal filehandle, we check the result of print
and die
on a failure:
sub print_to_me {
local *STDOUT = *MY_HANDLE;
die unless print @_;
a_sub_that_calls_print();
}
If we had used our
, only the print
statement in the same subroutine would use the new filehandle. At the end of the subroutine the local definition of STDOUT
vanishes. Note that since STDOUT
always exists, the use of local
here is safe, and we do not need to worry about whether or not it exists prior to using local
.
Curiously, we can localize not just whole variables, but in the case of arrays and hashes, elements of them as well. This allows us to temporarily mask over an array element with a new value, as in this example:
#!/usr/bin/perl
# localelement.pl
our @array=(1,2,3);
{
local $array[1]=4;
print @array,"
"; # produces '143'
}
print @array,"
", # produces '123'
There might not immediately seem to be any useful applications for this, but as it turns out, there are plenty. For example, we can locally extend the value of @INC
to search for modules within a subroutine, locally alter the value of an environment variable in %ENV
before executing a subprogram, or install a temporary local signal handler into %SIG
:
sub execute_specialpath_catchint ($cmd,@args) {
local $ENV{PATH} = "/path/to/special/bin:".$ENV{PATH}
local $SIG{INT} = &catch_sigint;
system $cmd => @args;
}
A word of warning concerning localizing variables that have been tied to an object instance with tie
. While it is possible and safe to localize a tied scalar, attempting to localize a tied array or hash is not currently possible and will generate an error (this limitation may be removed in future).
Perl automatically localizes variables for us in certain situations. The most obvious example is the @_
array in subroutines. Each time a subroutine is called, a fresh local copy of @_
is created, temporarily hiding the existing one until the end of the subroutine. When the subroutine call returns, the old @_
reappears. This allows chains of subroutines to call each other, each with its own local @_
, without overwriting the @_
of the caller.
Other instances of automatically localized variables include loop variables, including $_
if we do not specify one explicitly. Although the loop variable might (and with strict
variables in effect, must) exist, when it is used for a loop, the existing variable is localized and hidden for the duration of the loop:
#!/usr/bin/perl
# autolocal.pl
use warnings;
use strict;
my $var = 42;
my $last;
print "Before: $var
";
foreach $var (1..5) {
print "Inside: $var
"; # print "Inside: 1", "Inside: 2" ...
$last = $var;
}
print "After: $var
"; # prints '42'
print $last;
It follows from this that we cannot find the last value of a foreach
loop variable if we exit the loop on the last
statement without first assigning it to something with a scope outside the loop, like we have done for $last
in the preceding example.
Lexical variables have the scope of the file, block, or eval
statement in which they were defined. Their scope is determined at compile time, determined by the structure of the source code, and their visibility is limited by the syntax that surrounds them. Unlike package variables, a lexical variable is not added to the symbol table and so cannot be accessed through it. It cannot be accessed from anywhere outside its lexical scope, even by subroutines that are called within the scope of the variable. When the end of the variable's scope is reached, it simply ceases to exist. (The value of the variable, on the other hand, might persist if a reference to it was created and stored elsewhere. If not, Perl reclaims the memory the value was using at the same time as it discards the variable.)
In this section, we are concerned with my
. While the similar-sounding our
also declares variables lexically, it declares package variables whose visibility is therefore greater than their lexical scope. We will come back to the our
keyword once we have looked at my
.
The following is a short summary of all the different ways in which we can declare lexical variables with my
, most of which should already be familiar:
my $scalar; # simple lexical scalar
my $assignedscalar = "value"; # assigned scalar
my @list = (1, 2, 3, 4); # assigned lexical array
my ($red, $blue, $green); # list of scalars
my ($left, $right, $center) = (1, 2, 0); # assigned list of scalars
my ($param1, $param2) = @_; # inside subroutines
All these statements create lexical variables that exist for the lifetime of their enclosing scope and are only visible inside it. If placed at the top of a file, the scope of the variable is the file. If defined inside an eval
statement, the scope is that of the evaluated code. If placed in a block or subroutine (or indeed inside curly braces of any kind), the scope of the variable is from the opening brace to the closing one:
#!/usr/bin/perl
# scope-my.pl
use warnings;
use strict;
my $file_scope = "visible anywhere in the file";
print $file_scope, "
";
sub topsub {
my $top_scope = "visible in 'topsub'";
if (rand > 0.5) {
my $if_scope = "visible inside 'if'";
# $file_scope, $top_scope, $if_scope ok here
print "$file_scope, $top_scope, $if_scope
";
}
bottomsub();
# $file_scope, $top_scope ok here
print "$file_scope, $top_scope
";
}
sub bottomsub {
my $bottom_scope = "visible in 'bottomsub'";
# $file_scope, $bottom_scope ok here
print "$file_scope, $bottom_scope
";
}
topsub();
# only $file_scope ok here
print $file_scope, "
";
In the preceding script, we define four lexical variables, each of which is visible only within the enclosing curly braces. Both subroutines can see $file_scope
because it has the scope of the file in which the subroutines are defined. Likewise, the body of the if
statement can see both $file_scope
and $top_scope
. However, $if_scope
ceases to exist as soon as the if
statement ends and so is not visible elsewhere in topsub
. Similarly, $top_scope
only exists for the duration of topsub
, and $bottom_scope
only exists for the duration of bottomsub
. Once the subroutines exit, the variables and whatever content they contain cease to exist.
While it is generally true that the scope of a lexical variable is bounded by the block or file in which it is defined, there are some common and important exceptions. Specifically, lexical variables defined in the syntax of a loop or condition are visible in all the blocks that form part of the syntax:
#!/usr/bin/perl
# ifscope.pl
use strict;
use warnings;
if ( (my $toss=rand) > 0.5 ) {
print "Heads ($toss)
";
} else {
print "Tails ($toss)
";
}
In this if
statement, the lexical variable $toss
is visible in both the immediate block and the else
block that follows. The same principle holds for elsif
blocks, and in the case of while
and foreach
, the continue
block.
Normally a lexically defined variable (either my
or our
) ceases to exist when its scope ends. However, this is not always the case. In the earlier example, it happens to be true because there are no references to the variables other than the one created by the scope itself. Once that ends, the variable is unreferenced and so is consumed by Perl's garbage collector. The variable $file_scope
appears to be persistent only because it drops out of scope at the end of the script, where issues of scope and persistence become academic.
However, if we take a reference to a lexically scoped variable and pass that reference back to a higher scope, the reference keeps the variable alive for as long as the reference exists. In other words, so long as something, somewhere, is pointing to the variable (or to be more precise, the memory that holds the value of the variable), it will persist even if its scope ends. The following short script illustrates the point:
#!/usr/bin/perl
# persist.pl
use warnings;
use strict;
sub definelexical {
my $lexvar = "the original value";
return $lexvar; # return reference to variable
}
sub printlexicalref {
my $lexvar = ${$_[0]}; # dereference the reference
print "The variable still contains $lexvar
";
}
my $ref = definelexical();
printlexicalref($ref);
In the subroutine definelexical
, the scope of the variable $lexvar
ends once the subroutine ends. Since we return a reference to the variable, and because that reference is assigned to the variable $ref
, the variable remains in existence, even though it can no longer be accessed as $lexvar
. We pass this reference to a second subroutine, printlexicalref
, which defines a second, $lexvar
, as the value to which the passed reference points. It is important to realize that the two $lexvar
variables are entirely different, each existing only in its own scope but both pointing to the same underlying scalar. When executed, this script will print out
The variable still contains the original value.
Tip In this particular example, there is little point in returning a reference. Passing the string as a value is simpler and would work just as well. However, complex data structures can also be preserved by returning a reference to them, rather than making a copy as would happen if we returned them as a value.
The fact that a lexical variable exists so long as a reference to it exists can be extended to include "references to references" and "references to references to references." So long as the "top" reference is stored somewhere, the lexical variable can be hidden at the bottom of a long chain of references, each one being kept alive by the one above. This is in essence how lexical array of arrays and hashes of hashes work. The component arrays and hashes are kept alive by having their reference stored in the parent array or hash.
The our
keyword is a partial replacement for use vars
, with improved semantics but not quite the same meaning. It allows us to define package variables with a lexical scope in the same manner as my
does. This can be a little tricky to understand, since traditionally lexical and package scope are usually entirely different concepts.
To explain, our
works like use vars
in that it adds a new entry to the symbol table for the current package. However, unlike use vars
the variable can be accessed without a package prefix from any package so long as its lexical scope exists. This means that a package variable, declared with our
at the top of the file, is accessible throughout the file using its unqualified name even if the package changes.
Another way to look at our
is to think of it as causing Perl to rewrite accesses to the variable in other packages (in the same file) to include the package prefix before it compiles the code. For instance, we might create a Perl script containing the following four lines:
package MyPackage;
our $scalar = "value"; # defines $MyPackage::scalar
package AnotherPackage;
print $scalar;
When Perl parses this, it sees the lexically scoped variable $scalar
and our
invisibly rewrites all other references to it in the same file to point to the definition in MyPackage
, that is, $MyPackage::scalar
.
Using our
also causes variables to disappear at the end of their lexical scope. Note that this does not mean it removes the package variable from the symbol table. It merely causes access through the unqualified name to disappear. Similarly, if the same variable is redeclared in a different package, the unqualified name is realigned to refer to the new definition. This example demonstrates both scope changes:
#!/usr/bin/perl -w
use strict;
package First;
our $scalar = "first"; # defines $First::scalar
print $scalar; # prints $FirstPackage::scalar, produces 'first'
package Second;
print $scalar; # prints $First::scalar, produces 'first'
our $scalar = "second";
print $scalar; # prints $Second::scalar, produces 'second'
package Third;
{
our $scalar = "inner"; # declaration contained in block
print $scalar; # prints $First::scalar, produces 'inner'
}
print $scalar; # print $Second::scalar, produces 'second'
An our
variable from another package may, but is not required to, exist, also unlike local
, which under strict vars
will only localize a variable that exists already. our
differs from use vars
in that a variable declared with use vars
has package scope even when declared inside a subroutine; any value the subroutine gives it persists after the subroutine exits. With our
, the variable is removed when the subroutine exits.
The our
keyword behaves exactly like my
in every respect except that it adds an entry to the symbol table and removes it afterwards (more correctly, it tells Perl to define the symbol in advance, since it happens at compile time). Like my
, the added entry is not visible from subroutine calls, since it is lexically scoped. See the "Declaring Lexical Variables" section for more details on how my
works.
The unique Attribute
The our
pragma takes one special attribute that allows a variable to be shared between interpreters when more than one exists concurrently. Note that this is not the same as a threaded Perl application.
Multiple interpreters generally come about only when forking processes under threaded circumstances: Windows, which emulates fork
, is the most common occurrence. Another is where a Perl interpreter is embedded into a multithreaded application. In both these cases, we can choose to have a package variable shared between interpreters, in the manner of threads, or kept separate within each interpreter, in the manner of forked processes:
# one value per interpeter
our $unshared_data = "every interpreter for itself";
# all interpreters see the same value;
our $shared_data : unique = "one for all and all for one";
Note that this mechanism doesn't allow us to communicate between interpreters. After the first fork, the shared value becomes strictly read-only. See Chapters 20 and 21 for more on embedding Perl and threaded programming, respectively.
We have seen that package variables are entered into the symbol table for the package in which they are defined, but they can be accessed from anywhere by their fully qualified name. This works because the symbol tables of packages are jointly held in a master symbol table, with the main::
package at the top and all other symbol tables arranged hierarchically below. Although for most practical purposes we can ignore the symbol table most of the time and simply let it do its job, a little understanding of its workings can be informative and even occasionally useful.
Perl implements its symbol table in a manner that we can easily comprehend with a basic knowledge of data types: it is really a hash of typeglobs. Each key is the name of the typeglob, and therefore the name of the scalar, array, hash, subroutine, filehandle, and report associated with that typeglob. The value is a typeglob containing the references or a hash reference to another symbol table, which is how Perl's hierarchical symbol table is implemented. In fact, the symbol table is the origin of typeglobs and the reason for their existence. This close relationship between typeglobs and the symbol table means that we can examine and manipulate the symbol table through the use of typeglobs.
Whenever we create a global (declared with our
or use vars
but not my
) variable in Perl we cause a typeglob to be entered into the symbol table and a reference for the data type we just defined placed into the typeglob. Consider the following example:
our $variable = "This is a global variable";
What we are actually doing here is creating a typeglob called variable
in the main
package and filling its scalar reference slot with a reference to the string "This is a global variable"
. The name of the typeglob is stored as a key in the symbol table, which is essentially just a hash, with the typeglob itself as the value, and the scalar reference in the typeglob defines the existence and value of $variable
. Whenever we refer to a global variable, Perl looks up the relevant typeglob in the symbol table and then looks for the appropriate reference, depending on what kind of variable we asked for.
The only thing other than a typeglob that can exist in a symbol table is another symbol table. This is the basis of Perl's package hierarchy, and the reason we can access a variable in one package from another. Regardless of which package our code is in, we can always access a package variable by traversing the symbol table tree from the top.
The default package is main
, the root of the symbol table hierarchy, so any package variable declared without an explicit package prefix or preceding package
declaration automatically becomes part of the "main package":
our $scalar; # defines $main::scalar
Since main
is the root table for all other symbol tables, the following statements are all equivalent:
package MyPackage;
our $scalar;
package main::MyPackage;
our $scalar;
our $MyPackage::scalar;
our $main::MyPackage::scalar
Strangely, since every package must have main
as its root, the main package is defined as an entry in its own symbol table. The following is also quite legal and equivalent to the preceding, if somewhat bizarre:
our $main::main::main::main::main::main::main::MyPackage::scalar;
This is more a point of detail than a useful fact, of course, but if we write a script to traverse the symbol table, then this is a special case we need to look out for.
In general, we do not need to use the main
package unless we want to define a package variable explicitly without placing it into its own package. This is a rare thing to do, so most of the time we can ignore the main
package. It does, however, allow us to make sense of error messages like
Name "main::a" used only once: possible typo at....
Whenever we define a new package variable in a new package, we cause Perl to create symbol tables to hold the variable. In Perl syntax, package names are separated by double colons, ::
, in much the same way that directories are separated by /
or , and domain names by a dot. For the same reason, the colons define a location in a hierarchical naming system.
For example, if we declare a package with three package elements, we create three symbol tables, each containing an entry to the one below:
package World::Country::City;
our $variable = "value";
This creates a chain of symbol tables. The World
symbol table created as an entry of main
contains no actual variables. However, it does contain an entry for the Country
symbol table, which therefore has the fully qualified name World::Country
. In turn, Country
contains an entry for a symbol table called City
. City
does not contain any symbol table entries, but it does contain an entry for a typeglob called *variable
, which contains a scalar reference to the value value
. When all put together as a whole, this gives us the package variable:
$main::World::Country::City::variable;
Since main
is always the root of the symbol table tree, we never need to specify it explicitly. In fact, it usually only turns up as a result of creating a symbolic reference for a typeglob as we saw earlier. So we can also just say
$World::Country::City::variable;
This is the fully qualified name of the package variable. The fact that we can omit the package names when we are actually in the World::Country::City
package is merely a convenience. There is no actual variable called $variable
, unless we declare it lexically. Even if we were in the main package, the true name of the variable would be $main::variable
.
All global variables are really package variables in the main
package, and they in turn occupy the relevant slot in the typeglob with that name in the symbol table. Similarly, subroutines are just code references stored in typeglobs. Without qualification or a package declaration, a typeglob is automatically in the main
package, so the following two assignments are the same:
*subglob = &mysubroutine;
*main::subglob = &main::mysubroutine;
Either way we can now call mysubroutine
with either name (with optional main::
prefixes if we really felt like it):
mysubroutine(); # original name
subglob(); # aliased name
We can, of course, alias to a different package too, to make a subroutine defined in one package visible in another. This is actually how Perl's import mechanism works underneath the surface when we say use module
and get subroutines that we can call without qualification in our own programs:
# import 'subroutine' into our namespace
*main::subroutine = &My::Module::subroutine;
Typeglob assignment works for any of the possible types that a typeglob can contain, including filehandles and code references. So we can create an alias to a filehandle this way:
*OUT = *STDOUT;
print OUT "this goes to standard output";
This lets us do things like choose from a selection of subroutines at runtime:
# choose a subroutine to call and alias it to a local name
our *aliassub = $debug? *mydebugsub: *mysubroutine;
# call the chosen subroutine via its alias
aliassub("Testing", 1, 2, 3);
All this works because package typeglobs are actually entries in the symbol table itself. Everything else is held in a typeglob of the same name. Once we know how it all works, it is relatively easy to see how Perl does the same thing itself. For example, when we define a named subroutine, we are really causing Perl to create a code reference, then assign it to a typeglob of that name in the symbol table. To prove it, here is a roundabout way to define a subroutine:
#!/usr/bin/perl
# anonsub.pl
use warnings;
use strict;
our $anonsub = sub {print "Hello World"};
*namedsub = &{$anonsub};
namedsub();
Here we have done the same job as defining a named subroutine, but in explicit steps: first creating a code reference, then assigning that code reference to a typeglob. The subroutine is defined in code rather than as a declaration, but the net effect is the same.
We can create aliases to scalars, arrays, and hashes in a similar way. As this example shows, with an alias both variables point to exactly the same underlying storage:
#!/usr/bin/perl -w
# scalaralias.pl
use strict;
our $scalar1="one";
*scalar2=$scalar1;
our $scalar2="two";
print $scalar1; # produces "two";
We use our
to declare the variables here, in order to keep Perl happy under strict vars
. As this is a manipulation of the symbol table, my
won't do. Also note that the assignment to the typeglob does not require any declaration, and the declaration of $scalar2
actually comes after it. Typeglobs cannot be declared, and it would not make much sense to try, but if we want to access the scalar variable afterwards without qualifying it with main::
, we need to ask permission to avoid running afoul of the strict
pragma.
We can also create a constant scalar variable by taking a reference to a scalar (but see also "Constants" in Chapter 5 for more approaches):
*constantstring="I will not be moved";
our $constantstring="try to change";
# Error! 'Attempt to modify constant scalar...'
Be wary of assigning to a typeglob things other than references or other typeglobs. For example, assigning a string does have an interesting but not entirely expected effect. We might suppose the following statement creates a variable called $hello
with the value world
:
*hello = "world";
However, if we try to print $hello
, we find that it does not exist. If we print out *hello
, we find that it has become aliased instead:
print *hello; # produce '*main::world'
In other words, the string has been taken as a symbolic reference to a typeglob name, and the statement is actually equivalent to
*hello = *world;
This can be very useful, especially since the string can be a scalar variable:
*hello = $name_to_alias_to;
However, it is also a potential source of confusion, especially as it is easily done by forgetting to include a backslash to create a reference. Assigning other things to typeglobs has less useful effects. An array or hash, for example, will be assigned in scalar context and alias the typeglob to a typeglob whose name is a number:
@array = (1, 2, 3);
*hello = @array;
print *hello; # produces 'main::3' since @array has three elements
This is unlikely to be what we wanted, and we probably meant to say *hello = @array
in this case. Assigning a subroutine aliases the typeglob to the value returned by the subroutine. If that's a string, it's useful; otherwise it probably isn't:
*hello = subroutine_that_returns_name_to_alias_to(@args);
Interestingly, the symbol table itself can be accessed in Perl by referring to the name of the package with a trailing ::
. Since symbol tables are hashes, and the hashes are stored in a typeglob with the same name, the hash that defines the main
symbol table can be accessed with %{*main::}
, or simply %{*::}
, as this short script demonstrates:
#!/usr/bin/perl
# dumpmain.pl
use warnings;
use strict;
foreach my $name (sort keys %{*::}) {
next if $name eq 'main';
print "Symbol '$name' =>
";
# extract the glob reference
my $globref = ${*::} {$name};
# define local package variables through alias
local *entry = *{$globref};
# make sure we can access them in 'strict' mode
our ($entry, @entry, %entry);
# extract scalar, array, and hash via alias
print " Scalar: $entry
" if defined $entry;
print " Array : [@entry]
" if @entry;
print " Hash : {", join(" ", {%entry}), "}
" if %entry;
# check for subroutine and handle via glob
print " Sub '$name' defined
" if *entry{CODE};
print " Handle '$name' (", fileno(*entry), ") defined
"
if *entry{IO};
}
The Dumpvalue
module provides a more convenient interface to the symbol table and forms a core part of the Perl debugger. It does essentially the same thing as the preceding example, but more thoroughly and with a more elegant output. The following script builds a hierarchy of symbol tables and variables and then uses the Dumpvalue
module to print them out:
#!/usr/bin/perl
# dumpval.pl
use warnings;
use strict;
use Dumpvalue;
# first define some variables
{
# no warnings to suppress 'usage' messages
no warnings;
package World::Climate;
our $weather = "Variable";
package World::Country::Climate;
our %weather = (
England => 'Cloudy'
);
package World::Country::Currency;
our %currency = (
England => 'Sterling',
France => 'Franc',
Germany => 'Mark',
USA => 'US Dollar',
);
package World::Country::City;
our @cities = ('London', 'Paris', 'Bremen', 'Phoenix'),
package World::Country::City::Climate;
our %cities = (
London => 'Foggy and Cold',
Paris => 'Warm and Breezy',
Bremen => 'Intermittent Showers',
Phoenix => 'Horrifyingly Sunny',
);
package World::Country::City::Sights;
our %sights = (
London => ('Tower of London','British Museum'),
Paris => ('Eiffel Tower','The Louvre'),
Bremen => ('Town Hall','Becks Brewery'),
Phoenix => ('Arcosanti'),
);
}
my $dumper = new Dumpvalue (globPrint => 1);
$dumper->dumpValue(*World::);
While Dumpvalue
can be pressed into service this way, it is worth considering the Symbol::Table
module, available from CPAN, which provides a more focused interface.
Scope and visibility are important concepts in any programming language. Perl has two distinct kinds of scope, package scope and lexical scope, each of which has its own rules and reasons for being. A discussion of scope therefore becomes a discussion of package variables, and declarations, versus lexical variables.
We began with a discussion of package variables and their scoping rules, including defining them under strict
, the distinction between package and global variables, declaring package variables lexically with our
, overriding them temporarily with local, and why we probably meant to use my
instead. We then talked about lexical variables, declaring them with my
, and how they differ from package variables and variables declared with our
.
We finished off with a look at the symbol table, which is the underlying structure in which not just package variables but subroutine declarations and file handles live. As it turns out, the symbol table is really just a big nested collection of typeglobs, so we also saw how to create new entries in the symbol table, how to create aliases for existing package variables and subroutines, and finally how to walk through the symbol table and examine its contents programmatically.