CHAPTER 8

Scope and Visibility

We touched on the concept of scope briefly in Chapter 2 and mentioned it in following chapters from time to time without going into the full details. The scope of a variable is simply the range of places in code from which the variable can be accessed. However, there are two fundamentally different types of variable scope in Perl: package scope (also called dynamic scope) and lexical scope. Which one we use depends on how variables are declared. Named subroutines, on the other hand, always have package scope, as they are defined within a package. This is why we can call a subroutine in another package, and also why it makes no sense to define one subroutine inside another. Anonymous subroutine references are scalars, though, so they can be scoped like any other variable.

Package and lexical scope work in fundamentally different ways, and using both at the same time is frequently the source of much confusion. One useful rule-of-thumb for understanding the distinction is that package variables have a scope that is determined at run time from the current contents of the symbol table, which along with subroutine definitions represents the entirety of the Perl program in memory, whereas lexical variables have a scope that is determined at compile time based on the structure of the source code itself.

Package Variables

A package is defined as a namespace in which declarations of variables and subroutines are placed, confining their scope to that particular package. A great deal of the time we can avoid defining package variables at all by declaring all variables as lexical variables. This lets us define local variables in the way they are normally understood in other languages. However, we often use package variables without being aware of it, for example, the variables $_, @_, %ENV, and @ARGS are all package variables in the main package.

Defining Package Variables

Packages are defined by the package keyword. In and of itself, a package declaration does nothing, but it states that all further declarations of subroutines or package variables will be placed into the package designated by the declaration, until further notice. The notice in this case is either the end of the file, the end of the current block (if the package declaration was made inside one), or less commonly but perfectly legally, another package declaration. The package keyword takes one argument: a list of namespaces separated by double colons. For example:

package My::Package;

This declares that the variables and subroutines that follow will be in the Package namespace, inside the My namespace.

Package variables have the scope of the package in which they were declared. A package variable can be accessed from other packages by specifying the full package name, in the same way that we can refer to any file on a hard disk from the current directory by starting at the root directory and writing out the full path of the file. This code snippet creates and accesses package variables, both local to the current package and in other packages through their full name. It also defines two subroutines, one in the current package and one in a different package:

package My::Package;

$package_variable = "implicitly in 'My::Package'";
$My::Package::another_variable = "explicitly declared in 'My::Package'";
$Another::Package::variable = "explicitly declared in 'Another::Package'";

sub routine { print "subroutine in my package" }
sub Another::Package::routine { print "subroutine in another package" }

{
    package Yet::Another::Package;
    $yet_another_package_variable="blocks limit scope of package declaration";
}

$back_in_my_package = "back in 'My::Package' again";

The effect of a package declaration is to cause all unqualified package variables and (more significantly) subroutines that follow it to be located in the namespace that it defines. We don't have to use a package declaration though; we can define variables and subroutines with an explicit package name. As we showed in the preceding example, it just makes writing code a bit simpler if we do.

Package variables are defined simply by using them, as in the preceding example. No extra syntax or qualifier is needed. Unfortunately, this makes them very easy to define accidentally by misspelling the name of another variable. Perl prevents us from making this mistake with the use strict 'vars' pragma, introduced in Chapter 2 and covered in more detail later.

A package is not the same as a module, however, and certainly not the same as a file. In simple terms, a package defines a namespace in a data structure known as the symbol table. Whenever a new variable or subroutine is defined, a new entry is added to the table for the appropriate package (the current package if one is not explicitly prefixed to the variable or subroutine name). The same variable name in a different package is not confused with it because the namespaces of the two variables are different and they are defined in different parts of the symbol table.

Package scope is not the same as file scope. A package is most often defined in a single file (a module) to improve maintainability, but Perl does not enforce this and allows packages to be spread across many files. While uncommon, this is one approach we can take if we want to split up subroutines or methods in a package into separate functional files. We define one main module with the same name as the package, translated into a pathname, and use or require all the other supporting modules from it:

package My::Module;
require My::Module::Input;
require My::Module::Output;
require My::Module::Process;

The modules Input.pm, Output.pm, and Process.pm in this scheme would all contain a first line of package My::Module;, rather than a package that reflects the actual name of the file, such as My::Module::Input for My/Module/Input.pm. This allows them to add additional subroutine definitions to the My::Module namespace directly. Since there is no familial relationship between namespaces, adding definitions to My::Module::Input has no effect on the My::Module package, or vice versa. This is more efficient than the alternative, using the Exporter module to export all the names explicitly.

It follows from this that package variables must be global variables. In practice, what we usually think of as global variables are package variables that are accessed from within their own package, where the package prefix is not required. For instance, the variable $package_variable in the first example earlier is global within the package MyPackage. We can access it from anywhere in our code, so long as the contents of MyPackage have been compiled and executed by the interpreter. Within My::Module we can refer to it as $package_variable. From any other package, we merely have to fully qualify it as $My::Module::package_variable.

New package variables can be created by executing code, with existing variables given local temporary values. Both of these events modify the contents of the symbol table as the program executes. As a result, package scope is ultimately determined at run time and is dependent on the structure of the data in the symbol tables. This differs from lexical variables, whose scope is determined at compile time when the source code is parsed.

All subroutines and package variables must reside in a package, in the same way that all files in a filing system must have a directory, even if it is the root. In Perl, the "root" namespace is main, so any subroutine or package variable we define without an explicit package name or prior package declaration is part of the main package. In effect, the top of every source file is prefixed by an implicit

package main;

Perl's special variables, as well as the filehandles STDIN, STDOUT, and STDERR, are exempt from the normal rules of package scope visibility. All of these variables are automatically used in the main package, wherever they are used, irrespective of the package declaration that may be in effect. There is therefore only one @_ array, only one default argument $_, and only one %ARGV or @INC. However, these variables have a special status in Perl because they are provided directly by the language; for our own variable declarations and subroutines, the normal rules of packages and namespaces apply.

Using strict Variables

Since it is easy to accidentally define a new global variable without meaning to, Perl provides the strict module. With either of use strict or use strict 'vars', a compile-time syntax check is enabled that requires variables to either be declared lexically or be declared explicitly as package variables. This is generally a good idea, because without this check it is very easy to introduce hard-to-spot bugs, either by forgetting to define a variable before using it, creating a new variable instead of accessing an existing one, or by accessing a global variable when we actually meant to create and use a local copy.

With strict variables in effect, Perl will no longer allow us to define a variable simply by assigning to it. For instance:

#!/usr/bin/perl
# package.pl
use warnings;
use strict;

package MyPackage;

$package_variable = "This variable is in 'MyPackage'";   # ERROR

If we attempt to run this code fragment, Perl will complain with an error.

Global symbol "$package_variable" requires explicit package name at ...

Usually the simplest fix for this problem is to prefix the variable with my, turning the variable into a file-scoped lexical variable. However, if we actually want to define a global package variable, we now need to say so explicitly.

Declaring Global Package Variables

The traditional meaning of global variable is a variable that can be seen from anywhere—it has global visibility. In Perl, the distinction between package and lexical variables means that there are two different kinds of global variables, package-global and file-global.

A package variable is global within the package in which it is defined, as we have just seen. Any code in the same package can access the variable directly without qualifying it with a package prefix, except when the variable has been hidden by another variable with the same name (defined in a lower scope). By contrast, a lexical variable (declared with my) is a global variable if it is defined at the file level, outside a block, which makes it a global variable within the file. Only package variables are truly global, because they can always be accessed by referring to them via their full package-prefixed name in the symbol table.

The simplest way to define a package variable is to write it out in full, including the package name, for example:

$MyPackage::package_variable = "explicitly defined with package";

However, this can be inconvenient to type each time we want to define a new package variable and also causes a lot of editing if we happen to change the name of the package. We can avoid having to write the package name by using either use vars or our. These both define package variables in the current package, but work in slightly different ways.

Declaring Global Package Variables with use vars

To define a package variable with the vars pragma, we pass it a list containing the variables we want to declare. This is a common application of qw:

use vars ('$package_variable','$another_var','@package_array'),
# using qw is often clearer
use vars qw($package_variable $another_var @package_array);

This defines the named variables in the symbol table for the current package. Therefore the variables are directly visible anywhere the package is defined, and they can be accessed from anywhere by specifying the full variable name including the package.

Because use takes effect at compile time, variables declared with use vars are added to the symbol table for the current package at compile time. Also, use pays no attention to enclosing declarations like subroutine definitions. As package variables know nothing about lexical scope, the following script defines the variable $package_variable at compile time and makes it immediately visible anywhere within the package, despite the fact the use is in a subroutine. The value is only assigned once the subroutine is executed, however:

#!/usr/bin/perl
# globpack.pl
use warnings;
use strict;

sub define_global {
    use vars qw($package_variable);
    $package_variable = "defined in subroutine";
}

print $package_variable;   # visible here but not yet defined
define_global;
print $package_variable;   # visible here and now defined

What we probably intended to do here was create a local package variable with our, which was introduced into Perl 5.6 for precisely this sort of occasion. Since understanding our requires an understanding of lexical scope, we'll leave discussing it until after we look at purely lexical my declarations.

Localizing Package Variables with local

Package variables can be temporarily localized inside subroutines and other blocks with the local keyword. This hides an existing package variable by masking it with a temporary value that exists for as long as the local statement remains in lexical scope. As a statement, local takes either a single variable name or a list enclosed in parentheses and optionally an assignment to one or more values:

local $hero;
local ($zip, @boing, %yatatata);
local @list = (1, 2, 3, 4);
local ($red, $blue, $green, $yellow) = ("red", "blue", "green");

The local keyword does not, as its name suggests, create a local variable; that is actually the job of my. The local keyword only operates on an existing variable, which can be either a global or a variable defined in the calling context. If no such variable exists, and we have use strict enabled, Perl will issue a compile-time error at us. Many programs simply avoided strict in order to use local, but now that our exists there is no longer a compelling reason to do this. In any event, most of the time when we want to create a local variable, we really should be using my instead.

Localized variables are visible inside subroutine calls, just as the variable they are masking would be if they had not been defined. They are global from the perspective of the subroutines in the call-chain below their scope, so they are not visible outside the subroutine call. This is because localization happens at run time and persists for the scope of the local statement. In this respect, they differ from lexical variables, which are also limited by the enclosing scope but which are not visible in called subroutines.

The following demonstration script illustrates how local works, as well as the differences and similarities between my, our, and use vars:

#!/usr/bin/perl
# scope-our.pl
use warnings;
use strict;

package MyPackage;

my  $my_var    = "my-var";       # file-global lexical variable
our $our_var   = "our-var";      # global to be localized with 'our'
our $local_var = "global-var";   # global to be localized with 'local'
use vars qw($use_var);           # define 'MyPackage::use_var' which exists
                                 # only in this package
$use_var = "use-var";

package AnotherPackage;

print "Outside, my_var is '$my_var' ";         # display 'my-var'
print "Outside, our_var is '$our_var' ";       # display 'our-var'
print "Outside, local_var is '$local_var' ";   # display 'global-var'

#-----

sub sub1 {
    my $my_var       = "my_in_sub1";
    our $our_var     = "our_in_sub1";
    local $local_var = "local_in_sub1";

    print "In sub1, my_var is '$my_var' ";       # display 'my_in_sub1'
    print "In sub1, our_var is '$our_var' ";     # display 'our_in_sub1'
    print "In sub1, local_var is '$local_var' "; # display 'local_in_sub1'
sub2();
}

sub sub2 {
    print "In sub2, my_var is '$my_var' ";       # display 'my-var'
    print "In sub2, our_var is '$our_var' ";     # display 'our-var'
    print "In sub2, local_var is '$local_var' "; # display 'local_in_sub1'
}

#-----

sub1();

print "Again outside, my_var is '$my_var' ";       # display 'my-var'
print "Again outside, our_var is '$our_var' ";     # display 'our-var'
print "Again outside, local_var is '$local_var' "; # display 'global-var'

Although it is often not the right tool for the job, there are a few instances when only local will do what we want. One is if we want to create a local version of one of Perl's built-in variables. For example, if we want to temporarily alter the output separator $ in a subroutine, we would do it with local like this:

#!/usr/bin/perl -w
# localbuiltinvar.pl
use strict;

sub printwith {
    my ($separator, @stuff)=@_;
    local $, = $separator;      # create temporary local $,
    print @stuff," ";
}

printwith("... ","one","two","three");

The output of this program is


one... two... three

This is also the correct approach for variables such as @ARGV and %ENV. The special variables defined automatically by Perl are all package variables, so creating a lexical version with my would certainly work from the point of view of our own code, but either this or the lexical version would be totally ignored by built-in functions like print. To get the desired effect, we need to use local to create a temporary version of the global variable that will be seen by the subroutines and built-in functions we call.

Another use for local is for creating local versions of filehandles. For example, this subroutine replaces STDOUT with a different filehandle, MY_HANDLE, presumed to have been opened previously. Because we have used local, both the print statement that follows and any print statements in called subroutine a_sub_that_calls_print() will go to MY_HANDLE. In case MY_HANDLE is not a legal filehandle, we check the result of print and die on a failure:

sub print_to_me {
    local *STDOUT = *MY_HANDLE;
    die unless print @_;
    a_sub_that_calls_print();
}

If we had used our, only the print statement in the same subroutine would use the new filehandle. At the end of the subroutine the local definition of STDOUT vanishes. Note that since STDOUT always exists, the use of local here is safe, and we do not need to worry about whether or not it exists prior to using local.

Curiously, we can localize not just whole variables, but in the case of arrays and hashes, elements of them as well. This allows us to temporarily mask over an array element with a new value, as in this example:

#!/usr/bin/perl
# localelement.pl
our @array=(1,2,3);
{
    local $array[1]=4;
    print @array," "; # produces '143'
}
print @array," ",   # produces '123'

There might not immediately seem to be any useful applications for this, but as it turns out, there are plenty. For example, we can locally extend the value of @INC to search for modules within a subroutine, locally alter the value of an environment variable in %ENV before executing a subprogram, or install a temporary local signal handler into %SIG:

sub execute_specialpath_catchint ($cmd,@args) {
    local $ENV{PATH} = "/path/to/special/bin:".$ENV{PATH}
    local $SIG{INT}  = &catch_sigint;
    system $cmd => @args;
}

A word of warning concerning localizing variables that have been tied to an object instance with tie. While it is possible and safe to localize a tied scalar, attempting to localize a tied array or hash is not currently possible and will generate an error (this limitation may be removed in future).

Automatic Localization in Perl

Perl automatically localizes variables for us in certain situations. The most obvious example is the @_ array in subroutines. Each time a subroutine is called, a fresh local copy of @_ is created, temporarily hiding the existing one until the end of the subroutine. When the subroutine call returns, the old @_ reappears. This allows chains of subroutines to call each other, each with its own local @_, without overwriting the @_ of the caller.

Other instances of automatically localized variables include loop variables, including $_ if we do not specify one explicitly. Although the loop variable might (and with strict variables in effect, must) exist, when it is used for a loop, the existing variable is localized and hidden for the duration of the loop:

#!/usr/bin/perl
# autolocal.pl
use warnings;
use strict;

my $var = 42;
my $last;
print "Before: $var ";
foreach $var (1..5) {
    print "Inside: $var ";   # print "Inside: 1", "Inside: 2" ...
    $last = $var;
}
print "After: $var ";   # prints '42'
print $last;

It follows from this that we cannot find the last value of a foreach loop variable if we exit the loop on the last statement without first assigning it to something with a scope outside the loop, like we have done for $last in the preceding example.

Lexical Variables

Lexical variables have the scope of the file, block, or eval statement in which they were defined. Their scope is determined at compile time, determined by the structure of the source code, and their visibility is limited by the syntax that surrounds them. Unlike package variables, a lexical variable is not added to the symbol table and so cannot be accessed through it. It cannot be accessed from anywhere outside its lexical scope, even by subroutines that are called within the scope of the variable. When the end of the variable's scope is reached, it simply ceases to exist. (The value of the variable, on the other hand, might persist if a reference to it was created and stored elsewhere. If not, Perl reclaims the memory the value was using at the same time as it discards the variable.)

In this section, we are concerned with my. While the similar-sounding our also declares variables lexically, it declares package variables whose visibility is therefore greater than their lexical scope. We will come back to the our keyword once we have looked at my.

Declaring Lexical Variables

The following is a short summary of all the different ways in which we can declare lexical variables with my, most of which should already be familiar:

my $scalar;                              # simple lexical scalar
my $assignedscalar = "value";            # assigned scalar
my @list = (1, 2, 3, 4);                 # assigned lexical array
my ($red, $blue, $green);                # list of scalars
my ($left, $right, $center) = (1, 2, 0); # assigned list of scalars
my ($param1, $param2) = @_;              # inside subroutines

All these statements create lexical variables that exist for the lifetime of their enclosing scope and are only visible inside it. If placed at the top of a file, the scope of the variable is the file. If defined inside an eval statement, the scope is that of the evaluated code. If placed in a block or subroutine (or indeed inside curly braces of any kind), the scope of the variable is from the opening brace to the closing one:

#!/usr/bin/perl
# scope-my.pl
use warnings;
use strict;

my $file_scope = "visible anywhere in the file";
print $file_scope, " ";

sub topsub {
    my $top_scope = "visible in 'topsub'";
    if (rand > 0.5) {
        my $if_scope = "visible inside 'if'";
        # $file_scope, $top_scope, $if_scope ok here
        print "$file_scope, $top_scope, $if_scope ";
   }
bottomsub();
    # $file_scope, $top_scope ok here
    print "$file_scope, $top_scope ";
}

sub bottomsub {
    my $bottom_scope = "visible in 'bottomsub'";
    # $file_scope, $bottom_scope ok here
    print "$file_scope, $bottom_scope ";
}

topsub();

# only $file_scope ok here
print $file_scope, " ";

In the preceding script, we define four lexical variables, each of which is visible only within the enclosing curly braces. Both subroutines can see $file_scope because it has the scope of the file in which the subroutines are defined. Likewise, the body of the if statement can see both $file_scope and $top_scope. However, $if_scope ceases to exist as soon as the if statement ends and so is not visible elsewhere in topsub. Similarly, $top_scope only exists for the duration of topsub, and $bottom_scope only exists for the duration of bottomsub. Once the subroutines exit, the variables and whatever content they contain cease to exist.

While it is generally true that the scope of a lexical variable is bounded by the block or file in which it is defined, there are some common and important exceptions. Specifically, lexical variables defined in the syntax of a loop or condition are visible in all the blocks that form part of the syntax:

#!/usr/bin/perl
# ifscope.pl
use strict;
use warnings;

if ( (my $toss=rand) > 0.5 ) {
    print "Heads ($toss) ";
} else {
    print "Tails ($toss) ";
}

In this if statement, the lexical variable $toss is visible in both the immediate block and the else block that follows. The same principle holds for elsif blocks, and in the case of while and foreach, the continue block.

Preserving Lexical Variables Outside Their Scope

Normally a lexically defined variable (either my or our) ceases to exist when its scope ends. However, this is not always the case. In the earlier example, it happens to be true because there are no references to the variables other than the one created by the scope itself. Once that ends, the variable is unreferenced and so is consumed by Perl's garbage collector. The variable $file_scope appears to be persistent only because it drops out of scope at the end of the script, where issues of scope and persistence become academic.

However, if we take a reference to a lexically scoped variable and pass that reference back to a higher scope, the reference keeps the variable alive for as long as the reference exists. In other words, so long as something, somewhere, is pointing to the variable (or to be more precise, the memory that holds the value of the variable), it will persist even if its scope ends. The following short script illustrates the point:

#!/usr/bin/perl
# persist.pl
use warnings;
use strict;

sub definelexical {
    my $lexvar = "the original value";
    return $lexvar;   # return reference to variable
}

sub printlexicalref {
    my $lexvar = ${$_[0]};   # dereference the reference
    print "The variable still contains $lexvar ";
}

my $ref = definelexical();
printlexicalref($ref);

In the subroutine definelexical, the scope of the variable $lexvar ends once the subroutine ends. Since we return a reference to the variable, and because that reference is assigned to the variable $ref, the variable remains in existence, even though it can no longer be accessed as $lexvar. We pass this reference to a second subroutine, printlexicalref, which defines a second, $lexvar, as the value to which the passed reference points. It is important to realize that the two $lexvar variables are entirely different, each existing only in its own scope but both pointing to the same underlying scalar. When executed, this script will print out


The variable still contains the original value.

Tip In this particular example, there is little point in returning a reference. Passing the string as a value is simpler and would work just as well. However, complex data structures can also be preserved by returning a reference to them, rather than making a copy as would happen if we returned them as a value.


The fact that a lexical variable exists so long as a reference to it exists can be extended to include "references to references" and "references to references to references." So long as the "top" reference is stored somewhere, the lexical variable can be hidden at the bottom of a long chain of references, each one being kept alive by the one above. This is in essence how lexical array of arrays and hashes of hashes work. The component arrays and hashes are kept alive by having their reference stored in the parent array or hash.

Lexically Declaring Global Package Variables with our

The our keyword is a partial replacement for use vars, with improved semantics but not quite the same meaning. It allows us to define package variables with a lexical scope in the same manner as my does. This can be a little tricky to understand, since traditionally lexical and package scope are usually entirely different concepts.

To explain, our works like use vars in that it adds a new entry to the symbol table for the current package. However, unlike use vars the variable can be accessed without a package prefix from any package so long as its lexical scope exists. This means that a package variable, declared with our at the top of the file, is accessible throughout the file using its unqualified name even if the package changes.

Another way to look at our is to think of it as causing Perl to rewrite accesses to the variable in other packages (in the same file) to include the package prefix before it compiles the code. For instance, we might create a Perl script containing the following four lines:

package MyPackage;
our $scalar = "value";   # defines $MyPackage::scalar

package AnotherPackage;
print $scalar;

When Perl parses this, it sees the lexically scoped variable $scalar and our invisibly rewrites all other references to it in the same file to point to the definition in MyPackage, that is, $MyPackage::scalar.

Using our also causes variables to disappear at the end of their lexical scope. Note that this does not mean it removes the package variable from the symbol table. It merely causes access through the unqualified name to disappear. Similarly, if the same variable is redeclared in a different package, the unqualified name is realigned to refer to the new definition. This example demonstrates both scope changes:

#!/usr/bin/perl -w
use strict;

package First;
our $scalar = "first";     # defines $First::scalar
print $scalar;             # prints $FirstPackage::scalar, produces 'first'

package Second;
print $scalar;             # prints $First::scalar, produces 'first'
our $scalar = "second";
print $scalar;             # prints $Second::scalar, produces 'second'

package Third;
{
    our $scalar = "inner"; # declaration contained in block
    print $scalar;         # prints $First::scalar, produces 'inner'
}
print $scalar;             # print $Second::scalar, produces 'second'

An our variable from another package may, but is not required to, exist, also unlike local, which under strict vars will only localize a variable that exists already. our differs from use vars in that a variable declared with use vars has package scope even when declared inside a subroutine; any value the subroutine gives it persists after the subroutine exits. With our, the variable is removed when the subroutine exits.

The our keyword behaves exactly like my in every respect except that it adds an entry to the symbol table and removes it afterwards (more correctly, it tells Perl to define the symbol in advance, since it happens at compile time). Like my, the added entry is not visible from subroutine calls, since it is lexically scoped. See the "Declaring Lexical Variables" section for more details on how my works.

The unique Attribute

The our pragma takes one special attribute that allows a variable to be shared between interpreters when more than one exists concurrently. Note that this is not the same as a threaded Perl application.

Multiple interpreters generally come about only when forking processes under threaded circumstances: Windows, which emulates fork, is the most common occurrence. Another is where a Perl interpreter is embedded into a multithreaded application. In both these cases, we can choose to have a package variable shared between interpreters, in the manner of threads, or kept separate within each interpreter, in the manner of forked processes:

# one value per interpeter
our $unshared_data = "every interpreter for itself";
# all interpreters see the same value;
our $shared_data : unique = "one for all and all for one";

Note that this mechanism doesn't allow us to communicate between interpreters. After the first fork, the shared value becomes strictly read-only. See Chapters 20 and 21 for more on embedding Perl and threaded programming, respectively.

The Symbol Table

We have seen that package variables are entered into the symbol table for the package in which they are defined, but they can be accessed from anywhere by their fully qualified name. This works because the symbol tables of packages are jointly held in a master symbol table, with the main:: package at the top and all other symbol tables arranged hierarchically below. Although for most practical purposes we can ignore the symbol table most of the time and simply let it do its job, a little understanding of its workings can be informative and even occasionally useful.

Perl implements its symbol table in a manner that we can easily comprehend with a basic knowledge of data types: it is really a hash of typeglobs. Each key is the name of the typeglob, and therefore the name of the scalar, array, hash, subroutine, filehandle, and report associated with that typeglob. The value is a typeglob containing the references or a hash reference to another symbol table, which is how Perl's hierarchical symbol table is implemented. In fact, the symbol table is the origin of typeglobs and the reason for their existence. This close relationship between typeglobs and the symbol table means that we can examine and manipulate the symbol table through the use of typeglobs.

Whenever we create a global (declared with our or use vars but not my) variable in Perl we cause a typeglob to be entered into the symbol table and a reference for the data type we just defined placed into the typeglob. Consider the following example:

our $variable = "This is a global variable";

What we are actually doing here is creating a typeglob called variable in the main package and filling its scalar reference slot with a reference to the string "This is a global variable". The name of the typeglob is stored as a key in the symbol table, which is essentially just a hash, with the typeglob itself as the value, and the scalar reference in the typeglob defines the existence and value of $variable. Whenever we refer to a global variable, Perl looks up the relevant typeglob in the symbol table and then looks for the appropriate reference, depending on what kind of variable we asked for.

The only thing other than a typeglob that can exist in a symbol table is another symbol table. This is the basis of Perl's package hierarchy, and the reason we can access a variable in one package from another. Regardless of which package our code is in, we can always access a package variable by traversing the symbol table tree from the top.

The main Package

The default package is main, the root of the symbol table hierarchy, so any package variable declared without an explicit package prefix or preceding package declaration automatically becomes part of the "main package":

our $scalar;   # defines $main::scalar

Since main is the root table for all other symbol tables, the following statements are all equivalent:

package MyPackage;
our $scalar;
package main::MyPackage;
our $scalar;

our $MyPackage::scalar;
our $main::MyPackage::scalar

Strangely, since every package must have main as its root, the main package is defined as an entry in its own symbol table. The following is also quite legal and equivalent to the preceding, if somewhat bizarre:

our $main::main::main::main::main::main::main::MyPackage::scalar;

This is more a point of detail than a useful fact, of course, but if we write a script to traverse the symbol table, then this is a special case we need to look out for.

In general, we do not need to use the main package unless we want to define a package variable explicitly without placing it into its own package. This is a rare thing to do, so most of the time we can ignore the main package. It does, however, allow us to make sense of error messages like

Name "main::a" used only once: possible typo at....

The Symbol Table Hierarchy

Whenever we define a new package variable in a new package, we cause Perl to create symbol tables to hold the variable. In Perl syntax, package names are separated by double colons, ::, in much the same way that directories are separated by / or , and domain names by a dot. For the same reason, the colons define a location in a hierarchical naming system.

For example, if we declare a package with three package elements, we create three symbol tables, each containing an entry to the one below:

package World::Country::City;
our $variable = "value";

This creates a chain of symbol tables. The World symbol table created as an entry of main contains no actual variables. However, it does contain an entry for the Country symbol table, which therefore has the fully qualified name World::Country. In turn, Country contains an entry for a symbol table called City. City does not contain any symbol table entries, but it does contain an entry for a typeglob called *variable, which contains a scalar reference to the value value. When all put together as a whole, this gives us the package variable:

$main::World::Country::City::variable;

Since main is always the root of the symbol table tree, we never need to specify it explicitly. In fact, it usually only turns up as a result of creating a symbolic reference for a typeglob as we saw earlier. So we can also just say

$World::Country::City::variable;

This is the fully qualified name of the package variable. The fact that we can omit the package names when we are actually in the World::Country::City package is merely a convenience. There is no actual variable called $variable, unless we declare it lexically. Even if we were in the main package, the true name of the variable would be $main::variable.

Manipulating the Symbol Table

All global variables are really package variables in the main package, and they in turn occupy the relevant slot in the typeglob with that name in the symbol table. Similarly, subroutines are just code references stored in typeglobs. Without qualification or a package declaration, a typeglob is automatically in the main package, so the following two assignments are the same:

*subglob = &mysubroutine;
*main::subglob = &main::mysubroutine;

Either way we can now call mysubroutine with either name (with optional main:: prefixes if we really felt like it):

mysubroutine(); # original name
    subglob();      # aliased name

We can, of course, alias to a different package too, to make a subroutine defined in one package visible in another. This is actually how Perl's import mechanism works underneath the surface when we say use module and get subroutines that we can call without qualification in our own programs:

# import 'subroutine' into our namespace
*main::subroutine = &My::Module::subroutine;

Typeglob assignment works for any of the possible types that a typeglob can contain, including filehandles and code references. So we can create an alias to a filehandle this way:

*OUT = *STDOUT;
print OUT "this goes to standard output";

This lets us do things like choose from a selection of subroutines at runtime:

# choose a subroutine to call and alias it to a local name
our *aliassub = $debug? *mydebugsub: *mysubroutine;

# call the chosen subroutine via its alias
aliassub("Testing", 1, 2, 3);

All this works because package typeglobs are actually entries in the symbol table itself. Everything else is held in a typeglob of the same name. Once we know how it all works, it is relatively easy to see how Perl does the same thing itself. For example, when we define a named subroutine, we are really causing Perl to create a code reference, then assign it to a typeglob of that name in the symbol table. To prove it, here is a roundabout way to define a subroutine:

#!/usr/bin/perl
# anonsub.pl
use warnings;
use strict;

our $anonsub = sub {print "Hello World"};

*namedsub = &{$anonsub};
namedsub();

Here we have done the same job as defining a named subroutine, but in explicit steps: first creating a code reference, then assigning that code reference to a typeglob. The subroutine is defined in code rather than as a declaration, but the net effect is the same.

We can create aliases to scalars, arrays, and hashes in a similar way. As this example shows, with an alias both variables point to exactly the same underlying storage:

#!/usr/bin/perl -w
# scalaralias.pl
use strict;

our $scalar1="one";
*scalar2=$scalar1;
our $scalar2="two";
print $scalar1; # produces "two";

We use our to declare the variables here, in order to keep Perl happy under strict vars. As this is a manipulation of the symbol table, my won't do. Also note that the assignment to the typeglob does not require any declaration, and the declaration of $scalar2 actually comes after it. Typeglobs cannot be declared, and it would not make much sense to try, but if we want to access the scalar variable afterwards without qualifying it with main::, we need to ask permission to avoid running afoul of the strict pragma.

We can also create a constant scalar variable by taking a reference to a scalar (but see also "Constants" in Chapter 5 for more approaches):

*constantstring="I will not be moved";
our $constantstring="try to change";
# Error! 'Attempt to modify constant scalar...'

Be wary of assigning to a typeglob things other than references or other typeglobs. For example, assigning a string does have an interesting but not entirely expected effect. We might suppose the following statement creates a variable called $hello with the value world:

*hello = "world";

However, if we try to print $hello, we find that it does not exist. If we print out *hello, we find that it has become aliased instead:

print *hello;   # produce '*main::world'

In other words, the string has been taken as a symbolic reference to a typeglob name, and the statement is actually equivalent to

*hello = *world;

This can be very useful, especially since the string can be a scalar variable:

*hello = $name_to_alias_to;

However, it is also a potential source of confusion, especially as it is easily done by forgetting to include a backslash to create a reference. Assigning other things to typeglobs has less useful effects. An array or hash, for example, will be assigned in scalar context and alias the typeglob to a typeglob whose name is a number:

@array = (1, 2, 3);
*hello = @array;
print *hello;   # produces 'main::3' since @array has three elements

This is unlikely to be what we wanted, and we probably meant to say *hello = @array in this case. Assigning a subroutine aliases the typeglob to the value returned by the subroutine. If that's a string, it's useful; otherwise it probably isn't:

*hello = subroutine_that_returns_name_to_alias_to(@args);

Examining the Symbol Table Directly

Interestingly, the symbol table itself can be accessed in Perl by referring to the name of the package with a trailing ::. Since symbol tables are hashes, and the hashes are stored in a typeglob with the same name, the hash that defines the main symbol table can be accessed with %{*main::}, or simply %{*::}, as this short script demonstrates:

#!/usr/bin/perl
# dumpmain.pl
use warnings;
use strict;
foreach my $name (sort keys %{*::}) {
    next if $name eq 'main';
    print "Symbol '$name' => ";

    # extract the glob reference
    my $globref = ${*::} {$name};

    # define local package variables through alias
    local *entry = *{$globref};
    # make sure we can access them in 'strict' mode
    our ($entry, @entry, %entry);

    # extract scalar, array, and hash via alias
    print " Scalar: $entry " if defined $entry;
    print " Array : [@entry] " if @entry;
    print " Hash  : {", join(" ", {%entry}), "} " if %entry;

    # check for subroutine and handle via glob
    print " Sub '$name' defined " if *entry{CODE};
    print " Handle '$name' (", fileno(*entry), ") defined "
    if *entry{IO};
}

The Dumpvalue module provides a more convenient interface to the symbol table and forms a core part of the Perl debugger. It does essentially the same thing as the preceding example, but more thoroughly and with a more elegant output. The following script builds a hierarchy of symbol tables and variables and then uses the Dumpvalue module to print them out:

#!/usr/bin/perl
# dumpval.pl
use warnings;
use strict;

use Dumpvalue;

# first define some variables
{
    # no warnings to suppress 'usage' messages
    no warnings;

    package World::Climate;
    our $weather = "Variable";

    package World::Country::Climate;
    our %weather = (
        England => 'Cloudy'
    );

    package World::Country::Currency;
    our %currency = (
        England => 'Sterling',
        France => 'Franc',
        Germany => 'Mark',
        USA => 'US Dollar',
    );
package World::Country::City;
    our @cities = ('London', 'Paris', 'Bremen', 'Phoenix'),

    package World::Country::City::Climate;
    our %cities = (
        London => 'Foggy and Cold',
        Paris => 'Warm and Breezy',
        Bremen => 'Intermittent Showers',
        Phoenix => 'Horrifyingly Sunny',
    );

    package World::Country::City::Sights;
    our %sights = (
        London => ('Tower of London','British Museum'),
        Paris => ('Eiffel Tower','The Louvre'),
        Bremen => ('Town Hall','Becks Brewery'),
        Phoenix => ('Arcosanti'),
    );
}

my $dumper = new Dumpvalue (globPrint => 1);
$dumper->dumpValue(*World::);

While Dumpvalue can be pressed into service this way, it is worth considering the Symbol::Table module, available from CPAN, which provides a more focused interface.

Summary

Scope and visibility are important concepts in any programming language. Perl has two distinct kinds of scope, package scope and lexical scope, each of which has its own rules and reasons for being. A discussion of scope therefore becomes a discussion of package variables, and declarations, versus lexical variables.

We began with a discussion of package variables and their scoping rules, including defining them under strict, the distinction between package and global variables, declaring package variables lexically with our, overriding them temporarily with local, and why we probably meant to use my instead. We then talked about lexical variables, declaring them with my, and how they differ from package variables and variables declared with our.

We finished off with a look at the symbol table, which is the underlying structure in which not just package variables but subroutine declarations and file handles live. As it turns out, the symbol table is really just a big nested collection of typeglobs, so we also saw how to create new entries in the symbol table, how to create aliases for existing package variables and subroutines, and finally how to walk through the symbol table and examine its contents programmatically.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset