Chapter 11. References

Pointers are like jumps, leading wildly from one part of the data structure to another. Their introduction into high-level languages has been a step backwards from which we may never recover

Charles Hoare

References in Perl are much safer than raw pointers (such as those available in C or C++). Perl references cannot be left dangling towards a scalar that has been garbage-collected, and they cannot be coerced into pretending that a hash is an array.

Semantically they're very robust, but sometimes their syntax lets them down, making code that uses references confusing or misleading. In certain configurations, they can also interfere with the garbage collector.

Symbolic references have far more problems. It's entirely possible for them to dangle, and they can easily be used to access the wrong type of referent. They also subvert the pre-eminence of lexically scoped variables. All in all, they're more trouble than they're worth.

Fortunately, every one of these problems can be avoided by following a small number of simple guidelines…

Dereferencing

Wherever possible, dereference with arrows.

Use the -> notation in preference to "circumfix" dereferencing. In other words, when you're accessing references to containers, use the arrow syntax:

print 'Searching from ', $list_ref->[0] ,  "
",
      '            to ', $list_ref->[-1] , "
";

This style results in much cleaner code than explicit wrap-and-prefix dereferencing:

print 'Searching from ', ${$list_ref}[0],  "
",
      '            to ', ${$list_ref}[-1], "
";

Note that the arrow syntax also interpolates correctly into strings, so the previous example would be better written:

print "Searching from $list_ref->[0]
",
      "            to $list_ref->[-1]
";

Explicit dereferencing is prone to two specific mistakes, which can be hard to detect if use strict is not in effect. The first error is simply forgetting to wrap-and-prefix at all:

print 'Searching from ', $list_ref[0],  "
",
      '            to ', $list_ref[-1], "
";

The second mistake is wrapping-and-prefixing correctly, but accidentally leaving off the reference variable's own sigil (i.e., the one inside the braces):

print 'Searching from ', ${list_ref}[0],  "
",
      '            to ', ${list_ref}[-1], "
";

In both cases, the array accesses are accessing the variable @list_ref instead of the array referred to by the reference in $list_ref.

Of course, if you need to access more than one element of a container (i.e., to slice it) via a reference to that container, there's no choice except to use the wrap-and-prefix syntax:

my ($from, $to) = @{$list_ref}[0, -1];

Attempting to use the arrow notation to achieve the same effect doesn't work:

my ($from, $to) = $list_ref->[0, -1];

Because the access expression ($list_ref->[0, -1]) begins with a $ sigil, the square brackets are treated as a scalar context, so the list of indices is evaluated in scalar context, and the result is just the final index. So the previous example is equivalent to:

my ($from, $to) = ($list_ref->[-1], undef);

Braced References

Where prefix dereferencing is unavoidable, put braces around the reference.

You can dereference a reference without first putting it in braces:

push @$list_ref, @results;

print substr($$str_ref, 0, $max_cols);

my $first = $$list_ref[0];
my @rest  = @$list_ref[1..$MAX];

my $first_name = $$name_ref{$first};
my ($initial, $last_name) = @$name_ref{$middle, $last};

print @$$ref_to_list_ref[1..$MAX];

All of these work correctly, but they may also produce intense uncertainty and anxiety on the part of future readers of your code, who will fret about the relative precedences of the multiple sigils, and of the indexing brackets and braces. Or they will misread the leading $$... sequence as being related to the $$ (a.k.a. $PID) variable—especially in string interpolations:

print "Your current ID is: JAPH_$$_ID_REF
";

Braced references are always visually unambiguous:

print "Your current ID is: JAPH_${$_ID_REF}
";

And they give the reader better clues as to the internal structure of dereference:

push @{$list_ref}, @results;

print substr(${$str_ref}, 0, $max_cols);

my $first = ${$list_ref}[0];
my @rest  = @{$list_ref}[1..$MAX];

my $first_name = ${$name_ref}{$first};
my ($initial, $last_name) = @{$name_ref}{$middle, $last};print @{${$ref_to_list_ref}}[1..$MAX];

In some cases, bracketing can prevent subtle errors caused by the ambiguity of human expectations:

my $result = $$$stack_ref[0];

By which the programmer may have intended:

my $result = ${${$stack_ref[0]}};

or:

my $result = ${${$stack_ref}[0]};

or:

my $result = ${${$stack_ref}}[0];

If you're not entirely sure which of those three alternatives the unbracketed $$$stack_ref[0] is actually equivalent to [60], that illustrates precisely how important it is to use the explicit braces. Or, better still, to unpack the reference in stages:

my $direct_stack_ref = ${$stack_ref};
my $result = $direct_stack_ref->[0];

Symbolic References

Never use symbolic references.

If use strict 'refs' isn't in effect, a string containing the name of a variable can be used to access that variable:

my $hash_name = 'tag';

${$hash_name}{nick}   = ${nick};
${$hash_name}{rank}   = ${'rank'}[-1];     # Most recent rank
${$hash_name}{serial} = ${'serial_num'};

You can even use the arrow notation on a plain string to get the same effect:

my $hash_name = 'tag';

$hash_name->{nick}   = ${nick};
$hash_name->{rank}   = 'rank'->[-1];
$hash_name->{serial} = ${'serial_num'};

A string used in this way is known as a symbolic reference. It's called that because when Perl encounters a string where it was expecting a reference, it uses the string to look up the local symbol table and find an entry for the relevant variable of the same name.

Hence the previous examples (assuming they are in package main) are both equivalent to:

(*{$main::{$hash_name}}{HASH})->{nick}   = ${*{$main::{'nick'}}{SCALAR}};
(*{$main::{$hash_name}}{HASH})->{rank}   = *{$main::{'rank'}}{ARRAY}->[-1];
(*{$main::{$hash_name}}{HASH})->{serial} = ${*{$main::{'serial_num'}}{SCALAR}};

(For the viewers at home, the breakdown of that first line is shown in Figure 11-1. "Breakdown" being the operative word here.)

Symbolic reference breakdown

Figure 11-1. Symbolic reference breakdown

You'd never willingly write complex, unreadable code like that. So don't write code that's surreptitiously equivalent to it.

The example deconstruction illustrates that a symbolic reference looks up the name of a variable in the current package's symbol table. That means that a symbol reference can only ever refer to a package variable. And since you won't be using package variables in your own development (see Chapter 5), that will only lead to confusion. For example:

# Create help texts...
Readonly my $HELP_CD  => 'change directory';
Readonly my $HELP_LS  => 'list directory';
Readonly my $HELP_RM  => 'delete file';
Readonly my $NO_HELP  => 'No help available';

# Request and read in next topic...
while (my $topic = prompt 'help> ') {
    # Prepend "HELP_", find the corresponding variable (symbolically),

    # and display the help text it contains...
    if (defined ${"HELP_U$topic"}) {
        print ${"HELP_U$topic"}, "
";
    }
    # Otherwise, display an unhelpful message...
    else {
        print "$NO_HELP
";
    }
}

The ${"HELP_U$topic"} variable interpolates the requested topic ($topic) into a string, capitalizing the topic as it does so (U$topic). It then uses the resulting string as the name of a variable and looks up the variable in the current symbol table.

Unfortunately, the desired help text won't ever be in the current symbol table; all the help texts were assigned to lexical variables, which don't live in symbol table entries.

The use of symbolic references almost always indicates a misdesign of the program's data structures. Rather than package variables located via symbolic references, what is almost always needed is a simple, lexical hash:

# Create table of help texts and default text...
Readonly my %HELP => (
    CD => 'change directory',
    LS => 'list directory',
    RM => 'delete file',
);

Readonly my $NO_HELP => 'No help available';

# Request and read in next topic...
while (my $topic = prompt 'help> ') {
    # Look up requested topic in help table and display it...
    if (exists $HELP{uc $topic}) {
        print $HELP{uc $topic}, "
";
    }
    # Otherwise, be helpless...
    else {
        print "$NO_HELP
";
    }}

Cyclic References

Use weaken to prevent circular data structures from leaking memory.

Actual circular linked lists are quite rare in most Perl applications, mainly because they're generally not an efficient solution. Nor are they particularly easy to implement. Generally speaking, a standard Perl array with a little added "modulo length" logic is a cleaner, simpler, and more robust solution. For example:

{
    # Make variables "private" by declaring them in a limited scope
    my @buffer;
    my $next = -1;

    # Get the next element stored in our cyclic buffer...

    sub get_next_cyclic {
        $next++;                   # ...increment cursor
        $next %= @buffer;          # ...wrap around if at end of array
        return $buffer[$next];     # ...return next element
    }

    # Grow the cyclic buffer by inserting new element(s)...
    sub insert_cyclic {
        # At next pos (or start): remove zero elems, then insert args...
        splice @buffer, max(0,$next), 0, @_;

        return;
    }

    # etc.}

However, circular data structures are still surprisingly easy to create. The commonest way is to have "owner" back-links in a hierarchical data structure. That is, if container nodes have references to the data nodes they own, and each data node has a reference back to the node that owns it, then you have cyclic references.

Non-hierarchical data can also easily develop circularities. Many kinds of bidirectional data relationships (such as peer/peer, supplier/consumer, client/server, or event callbacks) are modeled with links in both directions, to provide convenient and efficient navigation within the data structure.

Sometimes the cycle may not even be explicitly set up (or even intentional); sometimes it may just "fall out" of a natural arrangement of data [61]. For example:

# Create a new bank account...
sub new_account {
    my ($customer, $id, $type) = @_;

    # Account details are stored in anonymous hashes...
    my $new_account = {
        customer  => $customer,
        id        => generate_account_num( ),
        type      => $type,
        user_id   => $id,
        passwd    => generate_initial_passwd( ),
    };

    # The new account is then added to the customer's list of accounts...
    push @{$customer->{accounts}}, $new_account;

    return $new_account;
}

In the resulting data structure, each customer ($customer) is really a reference to a hash, in which the "accounts" entry ($customer->{accounts}) is a reference to an array, in which the most recently added element ($customer->{accounts}[-1]) is a reference to a hash, in which the value for the "customer" entry ($customer->{accounts}[-1]{customer}) is a reference back to the original customer hash. The great and mystical Circle Of Banking.

But even if it's stored in a lexical $customer variable, the allocated memory for this data structure will not be reclaimed when that variable goes out of scope. When $customer ceases to exist, the customer's hash will still be referred to by an entry in the hash that is referred to by an element of the array that is referred to by an entry in the customer's hash itself. The reference count of each of these variables is still (at least) one, so none of them is garbage collected.

Fortunately, in Perl 5.6 and later, it's easy to "weaken" references in order to break this chain of mutual dependency:

use Scalar::Util qw( weaken );

# Create a new bank account...
sub new_account {
    my ($customer, $id, $type) = @_;

    # Account details are stored in anonymous hashes...
    my $new_account = {
        customer  => $customer,
        id        => generate_account_num(  ),
        type      => $type,
        user_id   => $id,
        passwd    => generate_initial_passwd(  ),
    };

    # The new account is then added to the customer's list of accounts...
    push @{$customer->{accounts}},
         $new_account;

    # Make the backlink in the customer's newest account
    # invisible to the garbage collector...
    weaken $customer->{accounts}[-1]{customer};

    return $new_account;}

The weaken function can be exported from Scalar::Util under Perl 5.8 and later (or from the WeakRef CPAN module if you're running under Perl 5.6). You pass weaken one argument, which must be a reference. It marks that reference as no longer contributing to the garbage collector's reference count for whatever referent the reference refers to. In other words, weaken artificially reduces the reference count of a referent by one, but without removing the reference itself.

In the second version of the example, the customer hash originally had a reference count of 2 (one reference to it in the $customer variable itself, and another reference in the nested hash entry $customer->{accounts}[-1]{customer}). Executing the line:

    weaken $customer->{accounts}[-1]{customer};

reduces that reference count from two to one, but leaves the second reference still in the nested hash. It's now as if only $customer referred to the hash, with $customer->{accounts}[-1]{customer} being a kind of "stealth reference", cruising along undetected below the garbage collector's radar.

That means that when $customer goes out of scope, the reference count of the customer hash will be decremented as normal, but now it will decrement to zero, at which point the garbage collector will immediately reclaim the hash. Whereupon each hash reference stored in the array in $customer->{accounts} will also cease to exist, so the reference counts of those hashes will also be decremented, once again, to zero. So they're cleaned up too.

Note also that a weakened reference automatically converts itself to an undef when its referent is garbage-collected, so there is no chance of creating a nasty "dangling pointer" if the weakened reference happens to be in some external variable that isn't garbage-collected along with the data structure.

By weakening any backlinks in your data structures, you keep the advantages of bidirectional navigation, but also retain the benefits of proper garbage collection.



[60] $$$stack_ref[0] is the same as ${${$stack_ref}}[0]. Indexing brackets are of lower precedence than sigils.

[61] So it's something you need to watch for whenever you're setting up complex data structures.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset