CHAPTER 5

Arrays, Hashes, References, and Typeglobs

Having introduced scalars in Chapter 3, we now consider in this chapter the remaining data types. Specifically, this chapter introduces arrays, hashes, references, and typeglobs. We will also examine the more complex data that can be created by mixing data types. Later in the chapter, we will also see how to define scalar, list, and hash constants, as well as check for their existence. Finally, we discuss one other special value, the undefined value.

Lists and Arrays

A list is a compound value that may hold any number of scalar values (including none at all). Each value, or element, in the list is ordered and indexed, it is always in the same place, and we can refer to it by its position in the list. An array provides dynamic storage for a list, and so can be grown, shrunk, and manipulated by altering its values. In Perl, we often use the terms "array" and "list" interchangeably, but the difference can be important.

Creating Lists and Arrays

In Perl, lists are written out using parentheses and the comma operator. For example:

(1, 2, 3, 4, 5, 6)

A list is simply a sequence of scalar values; we can copy it about, store it in arrays, and index it, but we can't alter its contents because it is not stored anywhere—the preceding list is just an expression in Perl code. To manipulate a list of values, we need to store the list in an array.

An array variable is prefixed with an at-sign, @, in the same way that scalar variables are prefixed with a dollar sign.

# define a six-element array from a six-element list
my @array = (1, 2, 3, 4, 5, 6);

The usual way of defining lists is with the comma operator, which concatenates scalars together to produce list values. We tend to take the comma for granted because it is so obvious, but it is in fact an operator performing an important function. However, defining arrays of strings can get a little awkward.

my @strings = ('one', 'two', 'three', 'four', 'five'),

That's a lot of quotes and commas, an open invitation for typographic errors. A better way to define a list like this is with the list quoting operator, qw, which we briefly mentioned in Chapter 3. Here's the same list defined more legibly with qw:

my @strings = qw(one two three four five);

Or, defined with tabs and newlines:

my @strings = qw(
  one two
  three four
  five
);

As well as assigning lists to array variables, we can also assign them to a list of scalars variables.

my ($one, $two, $three) = (1, 2, 3);   # $one is now 1, $two 2 and $three 3

If there are too few variables to assign all the values, any remaining values will be discarded.

This is a very common sight inside subroutines, where we will often encounter a first line like

my ($arg1, $arg2, @listarg) = @_;

When we declare an array with my or our, Perl will automatically create the array with zero elements unless we assign some at the time of declaration. An initial value of () explicitly gives the new array zero elements (and in Perl 5.8 such an assignment is silently optimized away), so the following are equivalent declarations:

my @array=(); #explicit empty list
my @array; #implicit empty list

However, the following creates an array with a single undefined value, which may not be what we intended:

my @array=undef; # same as 'my @array=(undef)'

A mistake like this is relatively obvious written like this, but we can easily get tripped up if, for example, an array is assigned the return value of a subroutine that returns a list of zero or more values on success, but undef to indicate failure.

@array=a_subroutine_that_might_return_undef(); # beware!
die "failed!" if scalar(@array)==1
    and not defined $array[0]; # must check for undefined return value

Accessing Lists and Arrays

The array variable is a handle that we can use to access the values inside it, also known as array elements. Each element has an index number that corresponds to its position in the list. The index starts at zero, so the index number of an element is always one less than its place in the list. To access it, we supply the index number after the array in square brackets.

my @array = (1, 2, 3, 4, 5, 6);
# print the value of the fifth element (index 4, counting from 0)
print "The fifth element is $array[4] ";

We can also place an index on the end of a list, for example:

print "The fifth element is ", (1, 2, 3, 4, 5, 6)[4];   # outputs 5

Of course, there isn't much point in writing down a list and then only using one value from it, but we can use the same approach with lists returned by functions like localtime, where we only want some of the values that the list contains.

$year = (localtime)[5] + 1900;

For the curious, the parentheses around localtime prevent the [5] from being interpreted as an anonymous array and passed to localtime as an argument. A drawback of Perl's flexible syntax rules is that sometimes precedence and associativity can bite in unexpected ways. Because the year value is given in years since 1900, we add 1900 to get the actual year.

The values of an array are scalars (though these may include references), so the correct way to refer to an element is with a dollar prefix, not an @ sign. It is the type of the returned value that is important to Perl, not where it was found.

print "The first element is $array[0] ";

Given the @array earlier, this will reveal 1 to be the value of the first element.

If we specify a negative index, Perl rather smartly counts from the end of the array.

print "The last element is $array[-1] ";

This accesses the last element of @array, earlier, which has a positive index of 5 and a value of 6.

We can also extract a list from an array by specifying a range of indices or a list of index numbers, also known as a slice.

print "The third to fifth elements: @array[2..4] ";

This prints out the elements at indices 2, 3, and 4 with values 3, 4, and 5.

There is no need for a slice to be contiguous or defined as a range, and we can freely mix positive and negative indices.

print "The first two and last two elements: @array[0, 1, −2, −1] ";

In our example six-element array, this prints out the values 1, 2, 5, and 6. Note that if we only had an array of three values, indices 1 and -2 would refer to the same element and we would see its value printed twice.

We can also retrieve the same index multiple times.

# replace array with first three elements, in triplicate
my @array = @array[0..2, 0..2, 0..2];

Arrays can only contain scalars, but scalars can be numbers, strings, or references to other values like more arrays, which is exactly how Perl implements multidimensional arrays. They can also contain the undefined value, which is and isn't a scalar, depending on how we look at it.

Manipulating Arrays

Arrays are flexible creatures. We can modify them, extend them, truncate them, and extract elements from them in many different ways. We can add or remove elements from an array at both ends, and even in the middle.

Modifying the Contents of an Array

Changing the value of an element is simple; just assign a new value to the appropriate index of the array.

$array[4] = "The Fifth Element";

We are not limited to changing a single element at a time, however. We can assign to more than one element at once using a list or range in just the same way that we can read multiple elements. Because this is a selection of several elements, we use the @ prefix, since we are manipulating an array value:

@array[3..5, 7, −1] = ("4th", "5th", "6th", "8th", "Last");

We can even copy parts of an array to itself, including overlapping slices.

@array = (1, 2, 3, 4, 5, 6);
@array[2..4] = @array[0..2];
print "@array "; # @array is now (1, 2, 1, 2, 3, 6);

We might expect that if we supply a different number of elements to the number we are replacing, then we could change the number of elements in the array, replacing one element with three, for example. However, this is not the case. If we supply too many elements, then the later ones are simply ignored. If we supply too few, then the elements left without values are filled with the undefined value. There is a logic to this, however, as the following example shows:

# assign first three elements to @array_a, and the rest to @array_b
@array_a[0..2], @array_b = @array;

Sometimes we do want to change the number of elements being replaced. Luckily, there is a function that does replace parts of arrays with variable-length lists. Appropriately enough it is called splice, and takes an array, a starting index, a number of elements, and a replacement list as its arguments.

splice @array, $from, $quantity, @replacement;

As a practical example, to replace element three of a six-element list with three new elements (creating an eight-element list), we would write something like

#!/usr/bin/perl
# splice1.pl
use strict;
use warnings;

my @array = ('a', 'b', 'c', ''d', 'e', 'f'),
# replace third element with three new elements
my $removed = splice @array, 2, 1, (1, 2, 3);
print "@array ";   # produces 'a b 1 2 3 d e f'
print "$removed ";   # produces 'c'

This starts splicing from element 3 (index 2), removes one element, and replaces it with the list of three elements. The removed value is returned from splice and stored in $removed. If we were removing more than one element, we would supply a list instead.

#!/usr/bin/perl
# splice2.pl
use strict;
use warnings;

my @array = ('a', 'b', 'c', 'd', 'e', 'f'),
# replace three elements 2, 3, and 4 with a different three
my @removed = splice @array, 2, 3, (1, 2, 3);
print "@array ";   # produces 'a b 1 2 3 f'
print "@removed ";   # produces 'c d e'

If we only want to remove elements without adding new ones, we just leave out the replacement list, shrinking the array.

#!/usr/bin/perl
# splice3.pl
use strict;
use warnings;

my @array = ('a', 'b', 'c', 'd', 'e', 'f'),
# remove elements 2, 3 and 4
my @removed = splice @array, 2, 3;
print "@array ";   # produces 'a b f'
print "@removed ";   # produces 'c d e'

Leaving out the length as well removes everything from the specified index to the end of the list. We can also specify a negative number as an index, just as we can for accessing arrays, so combining these two facts we can do operations like this:

#!/usr/bin/perl
# splice4.pl
use strict;
use warnings;

my @array = ('a', 'b', 'c', 'd', 'e', 'f'),
# remove last three elements
my @last_3_elements = splice @array, -3;
print "@array ";   # produces 'a b c'
print "@last_3_elements ";   # produces 'd e f'

splice is a very versatile function and forms the basis for several other, simpler array functions like pop and push. We'll be seeing it a few more times before we are done with arrays.

Counting an Array

If we take an array or list and use it in a scalar context, Perl will return the number of elements (including undefined ones, if any) in the array. This gives us a simple way to count array elements.

$count = @array;

It also lets us write conditions for testing whether or not an array has any values like this:

die "Usage: $0 <arg1> <arg2> " unless @ARGV == 2;

This said, accidentally using an array in scalar context is a common cause of errors in Perl. If we really mean to count an array, we are often better off using the scalar function, even though it is redundant in scalar context, just to make it clear that we are doing what we intended to do.

$count = scalar(@array);

We can find the index of the last element of the array using the special prefix $#. As indices start at zero, the highest index is one less than the number of elements in the list.

$highest = $#array; # $highest = scalar(@array)-1

This is useful for looping over ranges and iterating over arrays by index rather than by element when the position of an element is also important.

#!/usr/bin/perl
# byindex.pl
use strict;
use warnings;

my @array = ("First", "Second");
foreach (0..$#array) {
    print "Element number $_ contains $array[$_] ";
}

Executing the code produces the following output:


Element number 0 contains First
Element number 1 contains Second

Adding Elements to an Array

Extending an array is also simple—we just assign to an element that doesn't exist yet.

#!/usr/bin/perl
# add1.pl
use strict;
use warnings;

my @array = ('a', 'b', 'c', 'd', 'e', 'f'),
print "@array ";   # produces 'a b 1 2 3 d e f'
$array[6] = "g";
print "@array ";   # produces 'a b 1 2 3 d e f g'

We aren't limited to just adding directly to the end of the array; any missing elements in the array between the current highest index and the new value are automatically added and assigned undefined values. For instance, adding $array[10] = "k"; to the end of the preceding example would cause Perl to create all of the elements with indices 7 to 9 (although only notionally—no actual memory is allocated to hold the elements) as well as assign the value k to the element with index 10.

To assign to the next element, we could find the number of elements and then assign to the array using that number as the array index. We find the number of elements by finding the scalar value of the array.

$array[scalar(@array)] = "This extends the array by one element";

However, it is much simpler to use the push function, which does the same thing without the arithmetic.

push @array, "This also extends the array by one element";

We can feed as many values as we like to push, including more scalars, arrays, lists, and hashes. All of them will be added in turn to the end of the array passed as the first argument. Alternatively we can add elements to the start of the array using unshift.

unshift @array, "This becomes the zeroth element";

With unshift the original indices of the existing elements are increased by the number of new elements added, so the element at index five moves to index six, and so on.

push and unshift are actually just special cases of the splice function. Here are their equivalents using splice:

# These are equivalent
push @array, @more;
splice @array, @array,0,@more;

# These are equivalent
unshift @array, @more;
splice @array, 0, 0, @more;

Passing @array to splice twice might seem a bizarre way to push values onto the end, but the second argument is evaluated as a scalar by splice, so this is actually equivalent to writing scalar(@array). As we saw earlier, this is the number of elements in the array and one more than the current highest index. Even though we do not need it, using scalar explicitly may be a good idea anyway for reasons of legibility.

Resizing and Truncating an Array

Interestingly, assigning to $#array actually changes the size of the array in memory. This allows us to both extend an array without assigning to a higher element and also to truncate an array that is larger than it needs to be, allowing Perl to return memory to the operating system.

$#array = 999;  # extend @array to 1000 elements
$#array = 3;    # remove @elements 4+ from array

Truncating an array destroys all elements above the new index, so the last example is a more efficient way to do the following:

@array = @array[0..3];

This assignment also truncates the array, but by reading out values and then reassigning them. Altering the value of $#array avoids the copy.

Removing Elements from an Array

The counterparts of push and unshift are pop and shift, which remove elements from the array at the end and beginning, respectively.

#!/usr/bin/perl
# remove1.pl
use strict;
use warnings;
my @array = (1, 2, 3, 4, 5, 6);
push @array, '7';   # add '7' to the end
print "@array ";   # array is now (1, 2, 3, 4, 5, 6, 7)
my $last = pop @array;   # retrieve '7' and return array to six elements
print "$last ";   # print 7
unshift @array, −1, 0;
print "@array ";   #  array is now (-1, 0, 1, 2, 3, 4, 5, 6)
shift @array;   # remove the first element of the array
shift @array;   # remove the first element of the array
print "@array ";   #  array is now again (1, 2, 3, 4, 5, 6)

While the push and unshift functions will add any number of new elements to the array, their counterparts are strictly scalar in operation, they only remove one element at a time. If we want to remove several at once, we can use the splice function. In fact, pop and shift are directly equivalent to specific cases of splice.

splice(@array, -1);   # equivalent to 'pop @array'

splice(@array, 0, 1); # equivalent to 'shift @array'

From this we can deduce that the pop function actually performs an operation very similar to the following example:

# read last element and then truncate array by one - that's a 'pop'
$last_element = $array[$#array--];

Extending this principle, here is how we can do a multiple pop operation without pop or splice.

@last_20_elements = $array[-20..-1];
$#array-=20;

The simpler way of writing this is just

@last_20_elements = splice(@array, -20);

Both undef and delete will remove the value from an array element, replacing it with the undefined value, but neither will actually remove the element itself, and higher elements will not slide down one place. This would seem to be a shame, since delete removes a hash key just fine. Hashes, however, are not ordered and indexed like arrays.

To remove elements from the middle of an array, we also use the splice function, omitting a replacement list.

@removed = splice(@array, $start, $quantity);

For example, to remove elements 2 to 5 (four elements in total) from an array, we would use

@removed = splice(@array, 2, 4);

Of course, if we don't want to keep the removed elements, we don't have to assign them to anything.

As a slightly more creative example, here is how we can move elements from the end of the list to the beginning, using a splice and an unshift:

unshift @array, splice(@array, −3, 3);

Or, in the reverse direction:

push @array, splice(@array, 0, 3);

Removing All or Many Elements from an Array

To destroy an array completely, we can undefine it using the undef function. This is a different operation to undefining just part of an array as we saw previously.

undef @array;   # destroy @array

This is equivalent to assigning an empty list to the array:

@array = ();

It follows that assigning a new list to the array also destroys the existing contents. We can use that to our advantage if we want to remove lines from the start of an array without removing all of them.

@array = @array[-100..-1];   # truncate @array to its last one hundred lines

This is simply another way of saying

splice(@array, 0, $#array-100);

Reversing and Sorting Lists and Arrays

Perl supplies two additional functions for generating differently ordered sequences of elements from an array or list. The reverse function simply returns a list in reverse order.

# reverse the elements of an array
@array = reverse @array;

# reverse elements of a list
@ymd = reverse((localtime)[3..5]);   # return in year/month/day order

This is handy for all kinds of things, especially for reversing the result of an array slice made using a range. reverse allows us to make up for the fact that ranges can only be given in ascending order.

The sort function allows us to perform arbitrary sorts on a list of values. With only a list as its argument, it performs a standard alphabetical sort.

@words = ('here', 'are', 'some', 'words'),
@Code:@alphabetical = sort @words;
print "@words";   # produces 'are here some words'

sort is much more versatile than this, however. By supplying a code or subroutine reference, we can sort the list in any way we like. sort automatically defines two special variables, $a and $b, that represent the values being sorted in custom sorting algorithms, so we can specify our own sort like this:

@alphabetical = sort {$a cmp $b} @words;

The $a and $b notation are unique to sort and originate in early versions of Perl, which is the reason for their slightly incongruous use here. The subroutine itself is defined by the opening and closing curly braces, and its return value is the result of the cmp operation.

The preceding sort subroutine using cmp is actually the behavior of the default sort algorithm that Perl uses when we specify no explicit algorithm of our own. In Perl 5.8, there are actually two underlying implementations, the older quicksort algorithm that was present in all Perl versions up to version 5.8, and the new default mergesort, which is more efficient, on average, for a larger range of cases. In some very special cases—typically where the same values appear many times in the list to be sorted—it might be more efficient to switch to the older algorithm. To do this, use

use sort _quicksort;

To reinstate the newer mergesort:

use sort _mergesort;

In the vast majority of cases, the distinction between these two algorithms is close to undetectable and will only be of interest to programmers who are critically concerned with performance and have datasets with many repeated values in them. Note also that this notation is likely to be transient while a better means of selecting the correct implementation is implemented, at which point the use sort pragma may go away again.

In order to be a correct and proper algorithm, the code must return -1 if $a is less than $b (however we define that), 0 if they are equal, and 1 if $a is greater than $b. This is exactly what cmp does for strings, and the numerical comparison operator <=> does for numbers.

We should take care to never alter $a or $b either, since they are aliases for the real values being sorted. At best this can produce an inconsistent result, at worst it may cause the sort to lose values or fail to return a result. The best sorts are the simple ones, such as

@ignoring_case = sort {lc($a) cmp lc($b)} @words;
@reversed = sort {$b cmp $a} @words;
@numerically = sort {$a <=> $b} @numbers;
@alphanumeric = sort {int($a) <=> int($b) or $a cmp $b} @mixed;

The last example is worth explaining. It first compares $a and $b as integers, forcing them into numeric values with int. If the result of that comparison is nonzero, then at least one of the values has a numeric value. If however the result is zero, which will be the case if $a and $b are both nonnumeric strings, the second comparison is used to compare the values as strings. We can chain any number of criteria together like this. Parentheses are not required because or has a very low precedence, lower than any of the comparison operators.

Note that it is possible (though not necessarily sensible) to use a sort inside the comparison function of another sort. This will only work for Perl version 5.6.1 onwards, however. In earlier versions, the inner sort will interfere with the outer ones.

We can also use a named subroutine to sort with. For example, we can create a subroutine named reversed that allows us to invent a sort reversed syntax.

sub reversed {$b cmp $a};
@reversed = sort reversed @words;

Similarly, here is a subroutine called numerically that also handles floating point numbers, presented as a complete example:

#!/usr/bin/perl
# numericsort.pl
use strict;
use warnings;

# force interpretation of $a and $b as floating point numbers
sub numerically {$a*1.0 <=> $b*1.0 or $a cmp $b};

my @words = qw(1e2 2e1 3);
print 'Normal  sort:', join ', ', sort @words;
print 'Numeric sort:', join ', ', sort numerically @words;

Running this program results in the output


3, 2e1, 1e2
1e2, 2e1, 3

Note, however, that all these sort routines must be defined in the same package as they are used in order to work, since the variables $a and $b are actually package variables automatically defined by the sort function. Similarly, we should never declare $a and $b with my since these will hide the global variables. Alternatively, we can define a prototype, which provokes sort into behaving differently.

sub backwards ($$) {$_[0] cmp $_[1]};

The prototype requires that two scalars be passed to the sort routine. Perl sees this and passes the values to be compared through the special variable @_ instead of via $a and $b. This will allow the sort subroutine to live in any package, for example, a fictional Order package containing a selection of sort algorithms.

use Order;
@reversed = sort Order::reversed @words;

We'll see how to create package like this in Chapter 9.

First, Max, Min, Sum, Shuffle, and Reduce

Perl provides several list manipulation routines as part of the List::Util package. These perform common functions that are useful but not fundamental enough to be eligible for native support within the Perl interpreter itself.

  • first: Return the first element of the list for which a condition is true.
  • max, min: Return the highest and lowest numerical value, respectively.
  • maxstr, minstr: Return the highest and lowest alphabetical value, respectively.
  • sum: Return the sum of the list evaluated numerically.
  • shuffle: Return the list in a random order.
  • reduce: Reduce a list to a single value by arbitrary criteria.

The first function looks like grep, but instead returns only the first value in the list for which the condition is true.

print first { $_ > 10 } (5,15,11,21);    # produces '15'.

The max, min, maxstr, min, minstr, and sum functions all return a single value according to their purpose. For the preceding example list, replacing first with min would return 1, while minstr would return 11 (since 11 is alphabetically before 5), max and maxstr both 21, and sum 52.

The reduce function resembles sort, complete with a code block containing the implicit variables $a and $b. Here, however, $a represents the result so far, while $b represents each value of the list in turn. Each time the block is executed, the result is assigned to $a ready for the next iteration, until the list is processed. For example, the following reduce subroutine behaves like sum:

print reduce { $a + $b } (5,15,11,21);

This is just a convenient Perl-like way to express in a short statement what would otherwise take a loop. The preceding statement has the same effect as the short code snipped that follows using foreach. This example is slightly expanded to use a transient $a and $b in the same manner as the reduce statement:

print do {
    my $a=0;
    foreach my $b (5,15,11,21) { $a = $a + $b }
    $a; #value returned to print
}

Changing the Starting Index Value

Perl allows us to change the starting index value from 0 to something else by assigning to the special variable $[. This is not a recommended practice, however, not least because the change affects all array accesses, not just the arrays we want to modify. For example, to have our lists and arrays start at index 1 (as Pascal would) instead of 0, we would write

$[=1;
@array = (11, 12, 13, 14, 15, 16);
print $array[3];   # produces 13 (not 14)

The scope of $[ is limited to the file that it is specified in, so subroutines and object methods called in other files will not be affected by the altered value of $[. Even so, messing with this special variable is dangerous and discouraged.

Converting Lists and Arrays into Scalars

Since lists and arrays contain compound values, they have no direct scalar representation—that's the point of a compound value. Other than counting an array by assigning it in scalar context, there are two ways that we can get a scalar representation of a list or array. First, we can create a reference to the values in the list or, in the case of an array, generate a direct reference. Second, we can convert the values into a string format. Depending on our requirements, this string may or may not be capable of being transformed back into the original values again.

Taking References

An array is a defined area of storage for list values, so we can generate or "take" a reference to it with the backslash operator.

$arrayref = @array;

This produces a reference through which the original array can be accessed and manipulated. Alternatively, we can make a copy of an array and assign that to a reference by using the array reference constructor (also known as the anonymous array constructor), [...].

$copyofarray = [@array];

Both methods give us a reference to an anonymous array that we can assign to, delete from, and modify. The distinction between the two is important, because one will produce a reference that points to the original array, and so can be used to pass it to subroutines for manipulations on the original data, whereas the other will create a copy that can be modified separately.

Converting Lists into Formatted Strings

The other way to turn an array into a scalar is via the join, sprintf, or pack function.

join is the counterpart to split, which we covered under strings in Chapter 3. It takes a simple string as its separator argument, not a regular expression like split. It creates a string from the contents of the array, optionally separated by the separator string.

# join values into comma-separated-value string
$string = join ',', @array;

# concatenate values together
$string = join '', @array;

join is not a complicated function, but if we want to join the values in an array together with a space, we can instead use interpolation to equal effect.

# equivalent join and interpolating string
$string = join ' ', @array;
$string = "@array";

The sprintf and pack functions both take a format string and a list of values, and returns a string created from the values in the list rendered according to the specified format. Both functions are covered in detail in Chapter 3, so here we will just briefly recap. Here's an example of sprintf being used to generate a custom date string from localtime, which returns a list of values:

# get current date and time into array
@date = (localtime)[5, 4, 3, 2, 1, 0];  # Y, M, D, h, m, s
$date[0]+=1900; $date[1]++;             # fix year and month

# generate time string using sprintf
$date = sprintf "%4d/%02d/%02d %2d:%02d:%02d", @date;

The following example uses pack to construct the string "Perl" from the ordinal value of the characters expressed as numbers:

@codes = (80, 101, 114, 108);
$word = pack 'C*', @codes;
print $word;   # produces 'Perl'

Of course, there are many more applications of pack and sprintf than this, depending on what we have in mind. Refer to Chapter 3 for more details and examples.

Converting Lists and Arrays into Hashes

By contrast with scalars, converting a list or array into a hash of key-value pairs is extremely simple; just assign the array to the hash.

%hash = @array;

The values extracted from the array are assigned to the hash in pairs, with even elements (starting at index 0) as the keys and odd elements (starting at index 1) as their values. If the array contains an odd number of elements, then the last key to be assigned to the hash will end up with an undefined value as its value; if we have warnings enabled (as we should), Perl warns against this with

Odd number of elements in hash assignment ...

To understand what a hash is and why this warning occurs, we need to look at hashes in more detail.

Hashes

Hashes, also known as associative arrays, are Perl's other compound data type. While lists and arrays are ordered and accessed by index, hashes are ordered and indexed by a descriptive key. There is no first or last element in a hash like there is in an array (the hash does have an internal order, but it reflects how Perl stores the contents of the hash for efficient access, and cannot be controlled by the programmer).

Creating Hashes

Hashes are defined in terms of keys and values, or key-value pairs to use an alternative expression. They are stored differently from arrays internally, in order to allow for more rapid lookups by name, so there is no "value" version of a hash in the same way that a list is a "value" version of an array. Instead, lists can be used to define either arrays or hashes, depending on how we use them.

The following list of key-value pairs illustrates the contents of a potential hash, but at this point it is still just a list:

('Mouse', 'Jerry', 'Cat', 'Tom', 'Dog', 'Spike')

Because hashes always consist of paired values, Perl provides the => operator as an alternative to the comma. This helps differentiate the keys and values and makes it clear to anyone reading our source code that we are actually talking about hash data and not just a list. Hash values can be any scalar, just like array elements, but hash keys can only be strings, so the => operator also allows us to omit the quotes by treating its left-hand side as a constant string. The preceding list would thus be better written as

(Mouse => 'Jerry', Cat => 'Tom', Dog => 'Spike')

To turn this into a hash, we need to assign it to a hash variable. Hashes, like arrays and scalars, have their own special prefix, in this case the % symbol. So, to create and populate a hash with the preceding list we would write

my %hash = (Mouse => 'Jerry', Cat => 'Tom', Dog => 'Spike'),

When this assignment is made, Perl accepts the keys and values supplied in the list and stores them in an internal format that is optimized for retrieving the values by key. To achieve this, Perl requires that the keys of a hash be string values, which is why when we use => we can omit quotes, even with strict vars in operation. This doesn't stop us from using a variable to store the key name, as Perl will evaluate it in string context, or a subroutine, if we use parentheses. However, it does mean that we must use quotes if we want to use a literal string containing spaces or other characters meaningful to Perl such as literal $, @, or % characters.

# using variables to supply hash keys
($mouse, $cat, $dog)=>('Souris', 'Chat', 'Chien'),
my %hash = ($mouse => 'Jerry', $cat => 'Tom', $dog => 'Spike'),

# using quotes to use nontrivial strings as keys (with and without interpolation)
%hash =('Exg Rate' => 1.656, '%age commission' => 2, "The $mouse" => 'Jerry'),

Tip This restriction on keys also means that if we try to use a nonstring value as a key, we will get unexpected results; in particular, if we try to use a reference as a key, it will be converted into a string, which cannot be converted back into the original reference. Therefore, we cannot store pairs of references as keys and values unless we use a symbolic reference as the key (see "References" later in the chapter for more on this subject).


Alternatively, we can use the qw operator and separate the keys and values with whitespace. A sensible layout for a hash might be

%hash = qw(
  Mouse   Jerry
  Cat   Tom
  Dog   Spike
);

Accessing Hashes

Note how this is very similar to creating an array. In fact, the assignment is identical, but the type of the variable means that Perl organizes the data differently in memory. We can now access elements of the hash, which we do by providing a key after the hash in curly brackets. Since the key must be a string, we can again omit the quotes even if use strict is in effect.

print "The mouse is ", $hash{Mouse};

This is similar in concept to how we index an array, but note that if we are using strict variables (courtesy of use strict), we ought to use quotes now; it is only the => operator that lets us get away with omitting the quotes when strict vars are in effect. Note that just like an array, a hash only stores scalars values. Consequently, the prefix for a hash key access is $, not %, just as it is for array elements.

We can also specify multiple keys to extract multiple values.

@catandmouse = @hash{'Cat', 'Mouse'};

This will return the list ('Tom', 'Jerry') into the array @catandmouse. Once again, note that the returned value is a list so we use the @ prefix.

We can even specify a range, but this is only useful if the keys are incremental strings, which typically does not happen too often; we would probably be better off using a list if our keys are that predictable. For example, if we had keys with names AA, AB . . . BY, BZ inclusive (and possibly others), then we could use

@aabz_values = @hash{'AA'..'BZ'};

We cannot access the first or last elements of a hash, since hashes have no concept of first or last. We can, however, return a list of keys with the keys function, which returns a list of the keys in the hash.

@keys = keys %hash;

The order of the keys returned is random (or rather, it is determined by how Perl chooses to store the hash internally), so we would normally sort the keys into a more helpful order if we wanted to display them. To sort lexically, we can just use sort keys %hash like this:

print "The keys are:";
print join(",", sort keys %hash);

We can also treat the keys as a list and feed it to a foreach loop.

# dump out contents of a hash
print "$_ => $hash{$_} " foreach sort keys %hash;

Hash Key Order

The order in which hash keys are stored and returned by keys and each is officially random, and should not be relied on to return keys in any expected fashion. Prior to Perl 5.8, however, it was true that repeatedly running a program that created a hash with the same keys would always return the keys in the same order. The order was highly dependent on the platform and build of Perl, so different Perl interpreters might give different results, but the same Perl executable would always create a hash the same way each time. The problem with this was that it opened up Perl applications to potential security issues due to the predictability of the key order. From Perl 5.8 onwards this is no longer true, and hash keys are always returned in a random order.

While this is not usually a problem, it is worth noting that for applications where we just want to know what order we added the keys in, we can make use of the Tie::IxHash module (Ix is short for IndexedHash). This module allows us to create hashes that internally record the order of hash keys so that we can retrieve it later. It is slower than a native hash since it is really a tied object pretending to be a hash (see Chapter 19), but other than the key order, it behaves just like a normal hash.

#!/usr/bin/perl
# orderedhash.pl
use strict;
use warnings;
use Tie::IxHash;

my %hash;
tie %hash, Tie::IxHash;
%hash = (one => 1, two => 2, three => 3);
print join(",",keys(%hash))," "; # *always* produces 'one,two,three'

The semantics of this hash is identical to a normal hash, the only difference being that the return value of keys and each is now known and reproducible. The Tie::IxHash module also provides an object-oriented interface that, amongst other things, allows hash key-value pairs to be pushed, popped, shifted, and unshifted like an array, and also if necessary set to an entirely new order.

my $hashobj=tie %hash, Tie::IxHash;
...
$hashobj->Push(four => 4);
print join("=>",$hashobj->Shift())," "; # produces 'one=>1'
$hashobj->Reorder('four','one','two'),
print join(",",keys(%hash))," "; # produces 'four,three,two'

Legacy Perl code may happen to depend on the order of keys in a hash—older versions of the Data::Dumper module have this problem, for example. For these cases, we can control the ordering of keys explicitly, so long as we appreciate that this may make a program vulnerable to hostile intents. First, we can set the environment variable PERL_HASH_SEED, which sets the initial seed of the pseudo-random number generator, to a constant value such as zero.

PERL_HASH_SEED=0 hashseed.pl

To find the seed with which Perl was initialized, use the hash_seed function from the Hash::Util module.

#!/usr/bin/perl -w
# hashseed.pl
use Hash::Util qw(hash_seed);
print hash_seed();

Setting PERL_HASH_SEED to this value will cause any subsequent invocation of Perl to create hashes in a reproducible way. Alternatively, we can build a Perl interpreter from source, specifying -DNO_HASH_SEED during the configuration step. This will permanently disable the random initial seed.

Before overriding the seed, bear in mind that overriding, storing, or passing the seed value elsewhere sidesteps the security purpose of randomizing the hash in the first place. This feature is only provided to for the sake of older Perl applications that require a predictable ordering to hash keys. Perl has always officially had random key orders, so such applications should ultimately be rewritten to remove their dependency on a predictable order.

Manipulating Hashes

We can manipulate hashes in all the same ways that we can manipulate arrays, with the odd twist due to their associative nature. Accessing hashes is a little more interesting than accessing arrays, however. Depending on what we want to do with them, we can use the keys and values functions, sort them in various different ways, or use the each iterator if we want to loop over them.

Adding and Modifying Hash Values

We can manipulate the values in a hash through their keys. For example, to change the value of the key Cat, we could use

$hash{'Cat'} = 'Sylvester';

If the key already exists in the hash, then its value is overwritten. Otherwise, it is added as a new key.

$hash{'Bird'} = 'Tweety';

Assigning an array (or another hash) produces a count of the elements, as we have seen in the past, but we can assign multiple keys and values at once by specifying multiple keys and assigning a list, much in the same way that we can extract a list from a hash.

@hash{'Cat', 'Mouse'} = ('Sylvester', 'Speedy Gonzales'),

You can also use arrays to create and expand hashes.

@hash{@keys} = @values;

We can even use ranges to generate multiple keys at once; for example, the following assignment creates key-value pairs ranging from A=>1 to Z=>26:

@lettercodes{'A'..'Z'} = 1..26;

Keys and values are added to the hash one by one, in the order that they are supplied, so our previous example of

@hash{'Cat', 'Mouse'} = ('Sylvester', 'Speedy Gonzales'),

is equivalent to

$hash{'Cat'} = 'Sylvester';
$hash{'Mouse'} = 'Speedy Gonzales';

This can be an important point to keep in mind, since it allows us to overwrite the values associated with hash keys, both deliberately and accidentally. For example, this code snippet defines a default set of keys and values and then selectively overrides them with a second set of keys and values, held in a second input hash. Any key in the second hash with the same name as one in the first overwrites the key in the resulting hash. Any keys not defined in the second hash keep their default values.

#!/usr/bin/perl
# hash1.pl
use strict;
use warnings;

# define a default set of hash keys and values
my %default_animals = (Cat => 'Tom', Mouse => 'Jerry'),

# get another set of keys and values
my %input_animals = (Cat => 'Ginger', Mouse => 'Jerry'),

# providing keys and values from second hash after those
# in default hash overwrites them in the result
my %animals = (%default_animals, %input_animals);
print "$animals{Cat} "; # prints 'Ginger'

Removing Hash Keys and Values

Removing elements from a hash is easier, but less flexible, than removing them from a list. Lists are ordered, so we can play a lot of games with them using the splice function amongst other things. Hashes do not have an order (or at least, not one that is meaningful to us), so we are limited to using undef and delete to remove individual elements.

The undef function removes the value of a hash key, but leaves the key intact in the hash.

undef $hash{'Bird'};   # 'Bird' still exists as a key

The delete function removes the key and value entirely from the hash.

delete $hash{'Bird'};   # 'Bird' removed

This distinction can be important, particularly because there is no way to tell the difference between a hash key that doesn't exist and a hash key that happens to have an undefined value as its value simply by looking at the result of accessing it.

print $hash{'Bird'};   # produces 'Use of uninitialized value in print ...'

It is for this reason that Perl provides two functions for testing hash keys, defined and exists.

Reversing Hashes

One special trick that is worth mentioning while we are on the subject of hashes is how to reverse the keys and values, so that the values become the keys and vice versa. This at first might seem to be a difficult, or at least a nontrivial task involving code similar to the following:

#!/usr/bin/perl
# reverse.pl
use strict;
use warnings;

my %hash = ('Key1' => 'Value1', 'Key2' => 'Value2'),
print "$hash{Key1} ";   # print 'Value1'
foreach (keys %hash) {
   # invert key-value pair
   $hash{$hash{$_}} = $_;

   # remove original key
   delete $hash{$_};
}
print "$hash{Value1} ";   # print 'Key1'

Reversing, or transposing, a hash offers plenty of problems. If the values are references, turning them into keys will convert them into strings, which cannot be converted back into references. Also, if two keys have the same value, we end up with only one of them making it into the reversed hash, since we cannot have two identical keys.

We can't fix the problem with duplicate keys, since hashes do not allow them, but we can reverse the keys and values much more simply than the preceding code, and without endangering identical key-value pairs, by converting the hash into a list, reversing the list, and then assigning it back to the hash again.

# this does the same as the previous example!
%hash = reverse %hash;

We have to look closely to see the list in this example; it is returned by the %hash because reverse is a function that gives its argument(s) a list context. There is no such thing as hash context in Perl, for the same reason that there is no such thing as a hash value, as we noted at the start of this discussion. The reverse then reverses the list, which also happens to reverse the orientation of the keys and values, and then the reversed list is assigned back to the hash.

If more than one key has the same value, then this trick will preserve the first one to be found. Since that's entirely random (because we cannot sort the list), we cannot determine which key will be preserved as a value in the new hash. If we want to handle that, we will have to either process the hash the slow way, find a way to eliminate duplicates first, or use a different storage strategy. For simple hashes without duplicates, though, this is a very simple way to achieve the desired end.

Accessing and Iterating over Hashes

The simplest, or at least the most common, way of iterating across a hash is to use the keys function to return a list of the keys. This list is actually a copy of the hash keys, so we cannot alter the key names through this list. However, it provides a very simple way to iterate across a hash.

#!/usr/bin/perl
# iterate.pl
use strict;
use warnings;

my %hash = ('Key1' => 'Value1', 'Key2' => 'Value2'),
# dump of hash
print "$_ => $hash{$_} " foreach keys %hash;

If we want a list for output, we probably want to sort it too.

# sorted dump of hash
print "$_ => $hash{$_} " foreach sort keys %hash;

We can also access the values directly through the value function.

@values = values %hash;

This provides a convenient way to process a hash when we do not care about the keys, with the caveat that we cannot easily find the keys if we need them, since hashes are one way, there is no "look up key by value" syntax.

# print list of sorted values
foreach (sort values %hash) {
   print "Value: $_ ";
}

This returns a copy of the values in the hash, so we cannot alter the original values this way. If we want to derive a list of values that we can alter to affect the original hash, we can do so with a loop like this:

# increment all hash values by one
@Code:foreach (@hash{keys %hash}) {
   $_++;
}

This example makes use of aliasing, where the default argument variable $_ becomes a direct alias for, rather than a copy of, the value that it refers to.

The catch with foreach is that it pulls all of the keys (or values) out of the hash at one time, and then works through them. This is inefficient in terms of memory usage, especially if the hash is large. An alternative approach is offered by the each function, which returns the next key-value pair each time it is used. It is ideal for use in while loops.

while (($key, $value) = each %hash) {
print "$key => $value ";
   $hash{$key}++;
}

The order of the key-value pairs produced by each is the same as that produced by keys and values. It works by moving an internal iterator through the hash, so that each subsequent call to each returns the next key-value pair. We cannot access the iterator directly, however. The index is reset after we reach the last key, or if we use keys to return the whole list.

Sorting and Indexing

If we want to generate an ordered list of hash keys and values, we can do so with the sort function. A simple alphabetical list of keys can be produced with sort keys %hash as we saw earlier. However, sort is a versatile function and we can play all sorts of tricks with it. One not so clever trick is simply to sort the values directly, as we saw earlier.

# print list of sorted values
foreach (sort values %hash) {
   print "Got $_ ";
}

The catch with this is that we can't easily get back to the keys if we want to. The solution to this problem is to give sort a subroutine that accesses the values via the keys.

# sort a hash by values
foreach (sort { $hash{$a} cmp $hash{$b} } keys %hash) {
   print "$hash{$_} <= $_ ";
}

This is important if we want to change the values in the hash, since values just returns a copy of the hash values, which we cannot assign to.

Creative uses of sort give us other possibilities too. For instance, we can create a hash with an index by replacing the values with references to two-element arrays or hashes containing an index value and the original value. This is an example of a complex data structure, which we cover later, so we'll just give a simple example of defining and then sorting such a hash.

#!/usr/bin/perl
# indexhash.pl
use warnings;
use strict;
# create a hash with integrated index
my %hash = (
   Mouse => { Index => 0, Value => 'Jerry'},
   Cat   => { Index => 1, Value => 'Tom'},
   Dog   => { Index => 2, Value => 'Spike'}
);
# sort a hash by integrated index
foreach (sort { $hash{$a} {'Index'} cmp $hash{$b}{'Index'} } keys %hash) {
   print "$hash{$_} {'Value'} <= $_ ";
}

The only catch with this is that we will need to keep track of the index numbers ourselves, since unlike an array we don't get it done for us automatically. However, see tie and the Tie::Hash module in Chapter 19 for another way to create an indexed hash that solves some of these problems.

Named Arguments

Perl does not offer an official mechanism for passing named arguments to subroutines, but hashes allow us to do exactly this if we write our subroutines to use them.

sub animate {
    my %animals = @_;
    # rest of subroutine...
}

animate(Cat => 'Tom', Mouse => 'Jerry'),

Some existing modules in the Perl library allow this and also adapt between ordinary or named arguments by prefixing the key names with a minus sign. Here is a quick example of how we can do it ourselves:

#!/usr/bin/perl
# arguments.pl
use warnings;
use strict;

# list form takes mouse, cat, dog as arguments, fixed order.
animate('Jerry', 'Tom', 'Spike'),

# hash form takes animals in any order using '-' prefix to identify type,
# also allows other animal types
animate(-Cat => 'Sylvester', -Bird => 'Tweety', -Mouse => 'Speedy Gonzales'),

# and the subroutine...
sub animate {
   my %animals;

   # check first element of @_ for leading minus...
   if ($_[0]!˜/^-/) {
      # it's a regular argument list, use fixed order
      @animals{'-Mouse', '-Cat', '-Dog'} = @_;
   } else {
      # it's named argument list, just assign it.
      %animals = @_;
   }
   # rest of subroutine...
   foreach (keys %animals) {
      print "$_ => $animals{$_} ";
   }
}

See Chapter 7 for more on this theme, as well as some improved examples that check arguments more closely.

Converting Hashes into Scalars

Evaluating a hash in scalar context returns 0 (false) if the hash is empty. If it contains data, we get a string containing a numeric ratio of the form N/M that describes in approximate terms how efficiently Perl has been able to store the keys and values in the hash. Loosely speaking, the numbers are a ratio and can be read as a fraction, the higher the first relative to the second, the more efficient the storage of the hash.

#!/usr/bin/perl
# convert1.pl
use warnings;
use strict;

my %hash = (one => 1, two => 2, three => 3, four => 4, five => 5);

# check the hash has data
if (%hash) {
   # find out how well the hash is being stored
   print scalar(%hash);   # produces '4/8'
}

Counting Hashes

While this is interesting if we are concerned with how well Perl is storing our hash data, it is unlikely to be of much use otherwise. We might have expected to get a count of the elements in the hash, or possibly the keys, but we can't count a hash in the same way that we can an array, simply by referring to it in scalar context. To count a hash we can use either keys or values and evaluate the result in scalar context. For example:

# count the keys of a hash
$elements = scalar(keys %hash);

If we really wanted to know the number of elements we would only need to multiply this result by 2.

Taking References

For a more useful scalar conversion, we can create a reference (also called taking a reference) to the hash with the backslash operator.

$hashref = \%hash;

Dereferencing a hash reference is very much like dereferencing an array reference, only with a key instead of an index.

$dog = $hash -> {'Dog'};

Alternatively, we can dereference the entire hash with a % prefix.

%hash == %$hashreference;

We can also create a hash reference with the {...} constructor, which creates a brand new anonymous hash with the same contents as the old one. This is different from, and produces a different result to, the array reference constructor, [...], because the reference points to an anonymous hash that is therefore organized and stored like one.

$hashref = {Mouse => 'Jerry', Cat => 'Tom', Dog => 'Spike'};

Since the contents of the constructor are just a list, we can also create a hash reference to an anonymous hash with the contents of an array, and vice versa.

$hashref = {@array};
$arrayref = [%hash];

Both constructors take lists as their arguments, but organize them into different kinds of anonymous compound value.

Converting Hashes into Arrays

Converting a hash into a list or array is very simple; we just assign it.

@array = %hash;

This retrieves all the keys and values from the hash in pairs, the order of which is determined by the internal structure of the hash. Alternatively, we can extract the hash as two lists, one of keys, and one of values.

@keys = keys %hash;
@values = values %hash;

This gives us two arrays with corresponding indices, so we can look up the value by the index of the key (and vice versa, something we cannot do with a hash).

A final option that is sometimes worth considering is turning the hash into an array of arrays or array of hashes, in order to create an index but preserve the key-value pairs in a single variable. Here is one way to do that:

my @array;
foreach (keys %hash) {
   push @array, { $_ => $hash{$_} };
}

This creates an array of hashes, each hash with precisely one key-value pair in it. Of course there are other, and arguably better, ways to create indexed hashes, one of which we covered earlier in this section. Again, it's a matter of preference, depending on whether we want to be able to look up the index and value by key, or the key and value by index.

Pseudohashes

A pseudohash is, essentially, an array that is pretending to be a hash. The object of the exercise is to have the flexibility of key-based access with the speed of an indexed array, while at the same time restricting the hash to only a specified set of keys. Deprecated in Perl 5.8, and scheduled to be removed entirely in Perl 5.10, pseudohashes have now been replaced by restricted hashes. However, old code may still be using pseudohashes, and so it is worth taking a quick look at what they are and how they work.

Properly speaking, pseudohashes should be called pseudohash references, since the feature only works through a reference. To create a pseudohash, we create a reference to an array whose first element is a hash describing the mapping of keys to array indices. Since the first element is taken up by the hash, the actual values start at index one. Here is a simple example:

my $pseudo=[{ one => 1, two => 2, three => 3}, 'first','second','third'];

The value 'first' can now be accessed by index or by key.

print $pseudo->{one}   # produces 'first'
print $pseudo->[1];    # also produces 'first'

Attempting to access or set a value in the hash with a key that is not in the index hash will cause an error, No such pseudo-hash field "key" at ....

It can be irritating and error-prone to get the right values associated with the right keys, so instead we can use the fields pragmatic module, which gives us more options in how we set up the pseudohash. It has two main routines: phash and new. Note that in Perl 5.10 onwards only new will be supported.

For example, to set up the same pseudohash as earlier, we can instead write

use fields;
my $pseudo_from_list=fields::phash(one=>'first',
                                   two=>'second', three=>'third'),
my $pseudo_from_2arefs=fields::phash([qw(one two three)],[qw(first second third)]);
my $no_values_pseudo=fields::phash(qw[one two thee]);

The semantics of pseudohashes are almost identical to ordinary hashes; we can use keys, values, each, and delete in the same way as usual.

my @keys=keys %$pseudo;
my @values=values %$pseudo;
while (my ($key,$value)=each %$pseudo) {
    print "$key => '$value' ";
}

Deleting a pseudohash key is slightly more complex. Deleting the key directly only frees up the value; we have to delete the key in the embedded hash as well to remove it from the hash.

delete $pseudo->{three}; # free value
delete $pseudo->[0]{three}; # remove key

If we have code that makes extensive use of pseudohashes and we want to be able to continue to use it without a rewrite after pseudohashes are removed from the language, there is another option. The Class::PseudoHash module from CPAN reimplements the semantics of pseudohashes so that code that needs them will continue to run (albeit not as fast, since the native support has been removed).

# for Perl 5.10 +
use fields;
use Class::PseudoHash;

my $pseudo=fields::phash(key => 'value'), # still works

We can also create a pseudohash with new. This will create a new pseudohash (or from Perl 5.10 onwards, a restricted hash, described in the next section) with the fields specified to the fields pragma in the specified package.

{
    package My::Package;
    use fields qw(one two three _secret_four);
}

my $pseudo=fields::new(My::Package);
$pseudo={one => 'first', two => 'second', three => 'third'};

This creates a regular pseudohash, as described previously, but blessed into the My::Package class. However, a more natural way to access this feature is via a typed scalar, which we will touch on in a moment as it works just as well with restricted hashes.

Restricted Hashes

Restricted hashes are the replacement for pseudohashes, and the tools to manipulate them are provided by the Hash::Util module. From Perl 5.10 onwards, the fields pragma will also create restricted hashes rather than pseudohashes, but with the same usage semantics (apart from delete, which will now work as usual). The feature exists in Perl 5.8 and up, and the equivalent of a pseudohash can be created as a restricted hash using Hash::Util with the lock_keys function.

#!/usr/bin/perl
@Code:# restrictedhash.pl
use strict;
use warnings;
use Hash::Util qw(lock_keys);

%hash1=(one =>'first', two=>'second', three=>'third'),
lock_keys(%hash1);                  # lock hash to pre-existing keys

Whereas a pseudohash provided us with a hash with fixed keys and changeable values, restricted hashes also let us lock values, making them read-only, and we can choose to lock and unlock individual values or the whole hash at will. The functions to do this are unlock_keys, lock_value, and unlock_value. Adding these to the use statement and continuing the preceding example, we can lock and unlock individual keys and values with

unlock_keys(%hash1);                # unlock all keys

my %hash2;
lock_keys(%hash2,'one',two','six'), # lock empty hash with the specified keys

lock_value(%hash2,'one','two'),     # lock the values of keys 'one' and 'two'
unlock_value(%hash2,'two'),         # unlock the value of key 'two'

Notice that we can lock keys and values that do not yet exist. Here, locking the keys of an empty hash (%hash2 in this example) to a list supplied to lock_keys means that only those keys can subsequently be created, but as yet, they do not exist in the hash. Locking the value of a nonexistent key causes the key to be created and given a value of undef.

Attempting to delete a key from a locked hash will fail to change the hash, but will not generate an error and will return the value (assuming the key is present) as usual. Attempting to add a new key or change a locked value will provoke an error, of course. Attempting to lock a nonempty hash with a list of keys that does not include all the keys currently in the hash will also cause an error, but if we want to lock the hash with additional keys, we can just add them to the list at the time the hash is locked.

my %hash3=(oldkey1=>1, oldkey2=>2);
lock_keys(%hash,keys(%hash3),'newkey1,'newkey2'),

There is currently no function to lock more than one value at a time, but if we just want to lock the entire hash, we can use the lock_hash function. Similarly, to unlock all values and keys, we can use unlock_hash.

lock_hash(%hash3);                   # lock hash keys and make all values read-only
unlock_hash(%hash3);                 # turn back into a regular unrestricted hash

Note that it is not currently possible to lock values in a hash whose keys are not locked; Hash::Util deems this to be of little point. Interestingly, using lock_hash followed by unlock_keys leaves the hash mutable but all its values read-only, proving that it is indeed possible.

To do the same thing directly in Perl 5.8 onwards we could use Internals::SvREADONLY, which does all the hard lifting for Hash::Util.

Internals::SvReadOnly($hash3->{oldkey1} => 1); # read-only value
Internals::SvReadOnly($hash3->{oldkey1} => 0); # writable again

Intriguing though this is, it is also a little arcane. Another way to create an immutable scalar in any version of Perl is with tie, as we will see in Chapter 19.

Compile-Time Checking of Restricted Hashes

Perl has a special syntax for lexically declared scalar variables (that is, scalars declared with my or our) that allows them to be associated with a specific package. Perl does not have a strong sense of type on the basis that it is largely redundant, so giving a type to a lexical scalar merely acts to coax some extra features out of the Perl interpreter. Compile-time checking of restricted hashes and pseudohashes is one of those features (the other is package attributes, covered in Chapter 10).

Giving a type to a lexical scalar associated with a pseudohash or restricted hash allows the interpreter to check the validity of accesses to the hash with literal keys at compile time. In the case of a pseudohash, any hash-like access of the variable can then be silently converted into simple array accesses of the underlying array on which the pseudohash is based, with a corresponding improvement in run-time performance.

For this to work, we must make use of the package-based pseudohash via the fields::new function, since typing is a package-based syntax. Here is a simple one-line example:

# declare a typed scalar to cause compile-time optimization
my My::Package $pseudo_converts_to_array=fields::new(My::Package);

This idiom is not usually seen in application-level code, but it is quite common in object classes, including several provided by the standard Perl library. Since we won't cover objects in detail for a while, we will restrict ourselves to a simple example to illustrate the general point:

#!/usr/bin/perl
# typedscalar.pl
use strict;
use warnings;

{
    package My::Package;

    use fields qw(one two three);

    sub new {
        return fields::new({
            one => 1, two =>2, three => 3
        });
    }
}

print "This happens first? ";
my My::Package $obj=new My::Package;
#my $obj=new My::Package;

$obj->{one}=5;  # Ok, exists
$obj->{four}=4; # Bad key

The creation of a new object happens in the call to new My::Package. When this is assigned to a typed scalar, we get a syntax error at the last line of the file during compilation. If we comment out this line and enable the untyped scalar assignment below it, the error is only detected at run time; we see "This happens first?" appear, and only then does Perl tell us we've tried to use an invalid key.

The only thing this particular object class does is generate restricted hashes with the keys one, two, and three, so if we need to create many restricted hashes with the same keys, this can be a good way to implement that hash. Unfortunately, typing in Perl really is no more than an invitation to the interpreter to do some optimization if possible. It won't cause Perl to complain if we assign something different to the scalar, even objects of a different class, but it will catch attempts to use literal invalid keys at compile time.

Typed scalars and restricted hashes are also used under the covers to implement other parts of Perl's syntax. A good example is the implementation of package-based attributes, in collusion with the attributes and Attribute::Handlers modules. We will take a look at implementing attributes in Chapter 10.

References

Rather than referring to a variable directly, Perl lets us refer to it by a reference—a pointer to the real value stored somewhere else. There are two kinds of references in Perl: hard references, which are immutable values in the style of C++ references, and symbolic references, which are simply strings containing the name of a variable or subroutine, minus the leading punctuation.

Of the two, hard references are by far the most common, and are the basis for complex data structures like arrays of arrays. Internally they are memory pointers, and we can access the value that they point to by following or dereferencing the reference. Perl provides a flexible syntax for doing this, involving the backslash and arrow operators.

Conversely, symbolic references are actually banned by use strict (more accurately, use strict refs) because they are a common source of bugs due to their malleable nature and resistance to compile-time error checking—by changing the contents of the string we change the thing that it points to. It is also possible to accidentally create a symbolic reference when we didn't mean to, especially if we fail to turn on warnings as well. Having made these points, symbolic references can be useful in the right places, so long as we are careful.

Hard References

Hard references, usually just called references, are not really a data type but just a kind of scalar value. They differ from integer, floating-point, or string values because they are pointers to other values, and are not malleable—unlike C, we cannot perform operations to change the value of a reference to make it point to something else. We can assign a new reference to a scalar variable, but that is all. Worldly programmers generally consider this a good thing.

Creating References

To create a reference for an existing value or variable, we use the backslash operator. This will convert any value or data type, be it scalar, array, hash, subroutine, and so forth, and create a scalar reference that points to it.

# references to values
$numberref = 42;
$messageref = "Don't Drink The Wine!";
@listofrefs = (1, 4, 9, 16, 25);

# references to variables
$scalarref = $number;
$arrayref = @array;
$hashref = \%hash;
$globref = *typeglob;   # typeglobs are introduced later in the chapter

# reference to anonymous subroutine
$subref = sub { return "This is an anonymous subroutine" };

# reference to named subroutine
$namedsubref = &mysubroutine;

If we pass a list to the backslash operator, it returns a second list of references, each one pointing to an element of the original list.

@reflist = (1, 2, 3);

This is identical to, but shorter than

@reflist = (1, 2, 3);

We can declare a reference and initialize it to point to an empty list or hash with [] and {}.

my $arrayref=[]; # reference to empty array
my $hashref={};  # reference to empty hash

Note that both references are "true" because they point to something real, albeit empty.

References have implicit knowledge of the type of thing that they are pointing to, so an array reference is always an array reference, and we can demonstrate this by attempting to print a reference. For example, this is what we might get if we attempted to print $scalarref:

SCALAR(0x8141f78)

A common mistake in Perl is to try to use the backslash operator to create a reference to an existing list, but as we showed previously, this is not what backslash does. In order to create an array reference from a list, we must first place the list into an array. This causes Perl to allocate an array structure for the values, which we can then create a reference for—the original list is not stored as an array, so it cannot be referenced. This is essentially what the [...] construct does.

The [...] and {...} constructors also create a reference to an array or hash. These differ from the backslash operator in that they create a copy of their contents and return a reference to it, not a reference to the original.

$samearrayref = @array;
$copyarrayref = [@array];
$samehashref = \%hash;
$copyhashref = {%hash};

The [..] and {..} constructors are not strictly operators and have the precedence of terms (like variable names, subroutines, and so on) which is the highest precedence of all. The contents of the constructors are always evaluated before they are used in other expressions.

The hash reference constructor constructs a hash, which requires key-value pairs, and so spots things like odd numbers of elements. We can't create hash references with the backslash operator either—we have to pass it a hash variable. But that's why we have the {...} constructor.

Confusing constructors with lists is a very common Perl error, especially as Perl is quite happy for us to do the following:

# this does not do what it might appear to
@array = [1, 2, 3, 4];

What this probably meant to do was assign @array a list of four elements. What it actually does is assign @array one element containing a reference to an anonymous array of four elements, i.e., it is actually the same as

@inner_array = (1, 2, 3, 4);
@array = @inner_array;

When arrays and hashes do not appear to contain the values that they should, this is one of the first things to check. The error Reference found where even-sized list expected ... is a clue that this may be happening during a hash definition, but for arrays we are on our own.

Perl sometimes creates references automatically, in order to satisfy assignments to complex data structures. This saves what would otherwise be a lot of monotonous construction work on our part. For instance, the following statements create several hash references and automatically chain them together to form a composite structure, a process known immemorially as autovivification:

my %hash;
$hash{'name'}{'address'}{'street'}{'number'} = 88;

Comparing References

References to the same underlying value are equal, but only if they point to the same actual value (literally, the same location in memory):

#!/usr/bin/perl
# ref1.pl
use warnings;
use strict;

my $text = "This is a value";

my $ref1 = $text;
my $ref2 = $text;

print $ref1 == $ref2   # produces '1'

$$ref1 = 'New value';
print $$ref2;   # produces 'New value'

Pointing to two values that happen to be equal will not result in equal references:

#!/usr/bin/perl
# ref2.pl
use warnings;
use strict;

my $text1 = "This is a value";
my $text2 = "This is a value";

my $ref1 = $text1;
my $ref2 = $text2;

print $ref1 == $ref2;   # produces ''

$$ref1 = 'New value';
print $$ref2;   # produces 'New value'

Dereferencing

A reference is only useful if we can access the underlying value, a process called dereferencing. We can extract the value and assign it to a variable, or we can simply work through the reference, a little like keyhole surgery.

Dereferencing is dependent on the type of the reference; we can only get a scalar from a scalar reference, and we can only get an array from an array reference. However, since all references, regardless of type, are scalars, Perl cannot perform compile-time syntax checks to ascertain whether a reference is being dereferenced with the correct type. This compels us to take a little care when using references, since incorrectly using a reference may only show up as a run-time error.

Dereferencing any reference can be done by prefixing the reference with the symbol appropriate for the underlying data type; the previous comparison example includes four scalar dereferences using $$. As a more complete example, here is how we can copy out the value pointed to by a scalar, array, hash, and typeglob reference into a new variable:

$value = $$ref;
@array = @$arrayref;
%hash = %$hashref;
*glob = *$globref;

Similarly, we can call a subroutine through a code reference like this (we will come back to code references in Chapter 7):

&$coderef(@args);

We cannot dereference with impunity—attempting to access an array or hash reference as a scalar produces a syntax error.

Not a SCALAR reference ...

Similarly, while a statement like @a=21 will create an array with one element (with the value 21), and might conceivably be what we intended, Perl is skeptical that we would ever want to create such an array by dereferencing, and so produces a run-time error if we say

@a = @$scalarref;

If we want to use the values held by a hash reference in the manner of an array, we have to re-create the reference (or generate a new one), because hashes are not organized in the same way as arrays. So the values must be extracted and stored in the other format.

$ref = {a=>1, b=>2, c=>3};
print %$ref;   # produces a1b2c3 (dependent on internal ordering of hash)
print @$ref;   # run-time error 'Not an ARRAY reference ...'

$ref = [ %$ref ];   # convert '$ref' from hash to array reference

print %$ref;   # run-time error 'Can't coerce array into hash ...'
print @$ref;   # produces a1b2c3 (dependent on order of hash)

Working with References

Instead of just pulling out the value from a reference and assigning it to something else, we can work directly through the reference. For example, to access a scalar value in an array or hash value, we would use

$element_2 = $$arrayref[1];
$hashvalue = $$hashref{'key_name'};

If we mentally replace the $arrayref and $hashref with array and hash, we can see that these are really just conventional array and hash accesses, just being done through a reference (the keyhole). Similarly, we can get an array slice via a reference.

@slice = @$arrayref[6..9];

This works well when we are accessing a scalar containing an array reference, but it can cause problems if we try to access an array containing array references. For example, consider the following nested array:

@array = (1, [2, 3, 4], 5);

This array contains an array reference as its second element (note that if we had not used an array reference constructor and just used parentheses, we would have ended up with a plain old five-element array). We might try to access that array with

@subarray = @$array[1];

Unfortunately this gives us an array with an array reference as its only element, not the three elements 2, 3, 4. This is because prefixes bind more closely than indices, and so the @ is applied before the [1]. The preceding is therefore actually equivalent to

@subarray = ($$array[1]);

This explains why we get a single array reference as the only element of @subarray. In order to get the index to happen first, we need to use curly braces to apply the dereferencing operation to the array element instead of to the array.

@subarray = @{$array[1]};

This more explicit dereferencing syntax also has its scalar, hash, code, and typeglob counterparts, for example:

%subhash = %{$hashofhashes{$hashkey}};

An alternative technique for dereferencing is the arrow or dereference operator. This is often more legible than the double prefix syntax.

$element_2 = $arrayref->[1];
$hashvalue = $hashref -> {'key_name'};

Multidimensional arrays and hashes can omit the arrow, since Perl is smart enough to translate adjacent indices or hash keys into an implicit dereference. The following are therefore equivalent, but the first is easier to read:

$value = $threedeepreference[9]{'four'}[1];
$value = $threedeepreference[9] -> {'four'} -> [1];

This only applies to the second and subsequent indices or hash keys, however. If we are accessing a reference, we still need to use the first arrow so Perl knows that we are going via a reference and not accessing an element or hash value directly.

Passing References to Subroutines

One of the major advantages of hard references is that they allow us to package up a compound value like an array or hash into a scalar. This allows us to create complex data structures, and it also allows us to pass arrays and hashes into subroutines, keeping them intact.

As we observed earlier, if we combine lists directly, then they merge together. This is handy if we want to create a combined list, but problematic if we want to pass, say, a couple of arrays to a subroutine, since inside the subroutine we will be unable to tell one from the other.

mysub (@array1, @array2);

sub mysub {
   my @combinedarray = @_;

   foreach (@combinedarray) {
      ...
   }
}

References solve this problem by replacing the arrays with array references.

mysub (@array1, @array2);

sub mysub {
   my ($arrayref1, $arrayref2) = @_;
foreach (@$arrayref1) {
      ...
   }
   foreach (@$arrayref2) {
      ...
   }
}

Not only does this solve the problem, but it is also more efficient if the arrays happen to be large ones, because we pass two scalars, and not an indefinite number of values.

However, see the section "Typeglobs" later in this chapter for an alternative, and also subroutine prototypes in Chapter 7 for two alternative approaches to passing arrays and hashes without using references. Each has its merits and drawbacks.

Finding the Type of a Reference

Perl cannot perform syntax checks to ensure that references are being dereferenced with the correct prefix because the content of a scalar variable is defined at run time, and can change during the lifetime of a program. Consequently, it is occasionally useful to be able to check the type of a reference before we access it. Fortunately, we can find out the type of a reference with the ref function. This is analogous to a "type of" function, but only for references. Since nonreferences are implicitly typed by their prefix, this is all we need.

ref takes a single reference as an argument, or uses $_ if no argument is supplied. It returns a string containing the reference type, or undef if the argument is not a reference.

$ref = [1, 2, 3];
print "The reference type of $ref is '", ref($ref),"' ";

When executed, these lines produce a message of the form

The reference type of ARRAY(0x8250290) is 'ARRAY'

The string representation of a reference is the reference type followed by the memory address it points to. While useful for debugging, we cannot convert this back into a reference again, so it is rarely useful otherwise. Conversely, ref returns a string description of the reference type, which is more useful as well as being easier to use.

The values returned for references are strings containing the name of the reference. These are the same names produced when we print a reference, for example, SCALAR, and include those listed in Table 5-1.

Table 5-1. Return Values of ref

Value Meaning
SCALAR A scalar reference
ARRAY An array reference
HASH A hash reference
CODE A reference to an anonymous subroutine
GLOB A reference to a typeglob
IO (or IO::Handle) A filehandle reference
REF A reference to another reference
LVALUE A reference to an assignable value that isn't a SCALAR, ARRAY, or HASH (e.g., the return value from substr)

In general, the first three reference types on this list are by far the most commonly encountered; see "Complex Data Structures" later in the chapter for an example that uses ref to recursively explore an arbitrarily complex structure of scalars, arrays, and hashes.

Finding the Type of a Blessed Reference

Blessed references are a very specific and important subclass of hard references, being the primary mechanism by which Perl implements objects and object-oriented programming. They are created by using the bless function on a hard reference to assign a package name to it, converting it from an ordinary reference into an object of the class defined by the package.

The ref function will return the name of the blessed class when called on an object, rather than the type of the underlying reference. In general, this is what we want, because the point of objects is that we treat them as opaque values that hide the details of their implementation from us. In the rare cases that we do want to know the underlying reference type (perhaps because we want to dump out the object's state or save it to a file on disk), we can use the reftype function, which can be found in both the Scalar::Util and attributes modules (attributes is automatically included if we use an attribute, Scalar::Util is the more natural choice if we aren't using attributes—see Chapters 7 and 10 for information on what they do).

#!/usr/bin/perl
# reftype.pl
use warnings;
use strict;

use Scalar::Util qw(reftype);

die "Usage: $0 <object module> ... " unless @ARGV;

foreach (@ARGV) {
   my $filename = $_;
   $filename =˜ s|::|/|g;
   require "$filename.pm";
   my $obj = new $_;

   print "Object class ", ref($obj), " uses underlying data type ", reftype($obj),
   " ";
}

We can use this script like this:

> perl reftype.pl CGI


Object class CGI uses underlying data type HASH

Reference Counting, Garbage Collection, and Weak References

Perl keeps a count of how many places a reference is stored, and will delete the item being referenced only when no more references to that item exist. This process is called garbage collection, and it is fundamental to Perl's memory management. Unfortunately, if two data structures contain references to each other, then neither of the references will ever reach a count of zero and the data structures will become immortal, at least until the program terminates.

my $node1={next => undef, last => undef, value => "First item" };
my $node2={next => undef, last => $node1, value => "Next Item"});
$node1{next}=$node2; # create a reference loop
$node1=undef; $node2=undef; # memory leak!

Here $node1 references a hash that at the end of this code contains a reference to the hash pointed to by $node2. At the same time, $node2 contains a reference to the hash pointed to by $node1. The result is that even if we undefine both variables, the hashes will continue to exist in memory. Since we can no longer access them, the memory they are using cannot be reclaimed—a memory leak.

In general, the solution to this problem is to avoid creating loops between referenced data structures, but this is not always a practical solution. In the case of structures like doubly-linked lists (of which the preceding code is a very compressed example), a loop might even be unavoidable. We could use a symbolic reference in place of the hard reference in one or both directions, since symbolic references aren't included in the reference count, but symbolic references are much slower and require that we turn off strict references.

Fortunately, there is a better solution. To help prevent memory leaks in cases like this, we can weaken a hard reference using the weaken routine provided by the Scalar::Util package. Similarly, we can test a reference to see whether it is weak or not with the isweak routine.

#!/usr/bin/perl -w
# weakreference.pl
use Scalar::Util qw(weaken isweak);

my $node1={next => undef, last => undef, value => "First item"};
my $node2={next => undef, last => $node1, value => "Next Item"};
$node1->{next}=$node2; # create a reference loop

weaken $node1->{next};
print "node1->next is ",
    (isweak($node1->{next})?"weak":"hard")," "; # produces 'node1->next is weak'
print "node2 is ",(isweak($node2)?"weak":"hard")," "; # produces 'node2 is hard'
$node1=undef; $node2=undef; # no more memory leak

Now when the variables $node1 and $node2 are changed to undef so that neither of the hashes they point to have any external references, the hash that $node2 originally pointed to has no more strong references to it, so Perl can garbage-collect it. This destroys the one remaining reference to the hash originally pointed to by $node1, so it is now also garbage-collected. Note that we need to weaken the copy of the reference stored in the hash, not the scalar variable $node2, for this to work. It is the reference, not the thing the reference points to, that is made weak. As a side note, we could make the preceding more efficient by weakening the reference at the moment we copy it.

weaken($node1->{next}=$node);

For more on garbage collection, see Chapter 19 and the special method DESTROY provided by Perl for object instances.

Symbolic References

Symbolic references, as opposed to hard references, are simply descriptions of variables represented as text. More accurately, they contain a label that holds the name of a typeglob, which in turn provides access to the scalar, array, hash, filehandle, or subroutine with that name.

For instance, the symbolic reference for the variable @array is the string "array". Here is an example of a symbolic reference in action:

#!/usr/bin/perl
# symbolic_ref.pl
use warnings;
use strict;
no strict 'refs';
our @array = (1, 2, 3);   # only package variables allowed
my $symref = 'array';
my $total = $#$symref;
$total++;
print "$symref has $total elements ";
foreach (@$symref) {
   print "Got: $_ ";
}

The notation for symbolic references is exactly the same as it is for hard references—in both cases we say things like @$arrayref to dereference the reference. The distinction is that in the case of a hard reference, the scalar variable contains an immutable pointer, whereas in the case of a symbolic reference, it contains an all-too-mutable string. We can even construct the reference name from pieces and evaluate the result, using braces to disambiguate the assembled reference from the surrounding code.

my %hash = ( "key" => "value" );
my $value=${"ha".lc('S').(++'g')}{key}; #assigns 'value';

A symbolic reference can only refer to a global variable, or to be more technical, a variable that exists in the symbol table, though the reference itself can be lexical. We cannot therefore refer to variables that have been declared with my. This is a significant caveat. If the symbolic reference is unqualified, it is presumed to be a reference to a variable in the current package; otherwise it refers to the variable in the named package.

my $symbolrefinotherpackage = 'My::Other::Package::variable';

Be careful if assembling a symbolic reference that contains double colons, though, especially if interpolating variables into a string—Perl will parse a double-colon the moment it sees it and will dereference a name within the string if it looks valid. Use backslashes to escape the colons to prevent this from happening, or construct the string in another way. Consider this faulty attempt to access a variable %hash in a package provided via a string variable, and the three different valid ways that follow it:

my $class="main";
my $value=${"$class::hash"}{value};    # ERROR - will dereference '$class::hash'
                                       # within the string

my $value=${"$class::hash"}{value};  # no longer looks like symbolic ref, OK
my $value=${$class."::hash"}{value};   # concatenate strings, also OK
my $value=${${class}::hash"}{value};   # disambiguate with braces
                                       # also OK but hard to read

Since symbolic references do not have a type, we can dereference any variable whose name matches the reference by prefixing it with the appropriate symbol.

my $symref = "Package::variable";

my $scalar = $$symref;
my @array  = @$symref;
my %hash   = %$symref;
my $code   = &$symref;
my *glob   = *$symref;

Because symbolic references are mutable, they are not counted as references for the purpose of garbage collection (see the discussion of weak references earlier) and are banned by the strict module by default.

use strict;   # strict 'vars', 'subs' and 'refs'

To enable symbolic references, we therefore have to make special dispensation.

no strict 'refs';

Since this is not in general an advisable idea (we should generally use the strict pragma unless we are writing "throwaway" code), it is best to do this inside a subroutine or other lexically limited scope, where the range of permissibility of symbolic references is clearly defined.

The reason for restricting the use of symbolic references is that it is very easy to accidentally create a symbolic reference where we did not mean to, especially if we don't have warnings enabled (which we should never do globally anyway, but might do temporarily inside a subroutine). However, a few places do allow symbolic references as special cases, i.e., functions that take filehandles as arguments (like print).

Complex Data Structures

Combining lists and hashes with references allows us to create arbitrarily complex data structures such as lists of lists, hashes of hashes, and lists of hashes of lists of lists, and so on. However, Perl lacks the ability to explicitly declare things like multidimensional arrays, because lists and hashes can only contain scalar values.

The Problem with Nesting—My Lists Went Flat!

One consequence of not being able to declare multidimensional arrays explicitly is that nesting lists does not work the way we might expect it to. A seemingly obvious way to store one list in another would be to write

@inner = (3, 4);
@outer = (1, 2, @inner, 5, 6);

We would then like to be able to access the inner list by writing $outer[2] and then access its elements with something like $outer[2][1]. Unfortunately, this does not work because the preceding example does not produce a list containing another list. Instead the lists are "flattened," the inner list being converted into its elements and integrated into the outer list. The preceding example actually results in this:

@outer = (1, 2, 3, 4, 5, 6);

While this is a perfectly acceptable way to merge lists together, it does not produce the nested data structure that we actually intended. The heart of the problem is that Perl does not allow lists and hashes to store other lists or hashes as values directly. Instead we must store a reference (which is a scalar value) to the hash or list we want to nest.

We can fix the flattening problem using either of the modified examples that follow, the first using square brackets to construct a reference and the second using a backslash to get the reference to the original array.

@outer = (1, 2, [@inner], 5, 6);   # using square brackets
@outer = (1, 2, @inner, 5, 6);    # using a backslash

Note that the second example avoids duplicating the inner array by taking a direct reference but assumes we only do this once. In a loop this would cause duplicated references, which we probably did not intend. For more on this issue, see "Creating Complex Data Structures Programmatically" later in this chapter.

Now we know how to construct complex data structures, we can go on to create more complex animals like lists of lists and hashes of hashes.

Lists of Lists and Multidimensional Arrays

The way to create a list of lists is to create a list of list references, either with the square bracket notation, [...], or using the backslash operator. Defining a list of lists is actually quite simple. The following example shows a list of lists defined using square brackets:

@array = (
   ["One", "Two", "Three"],
   ["Red", "Yellow", "Blue"],
   ["Left", "Middle", "Right"],

The important point to note about this is that the outer array contains a list of references—one for each inner list. The result is, in effect, a two-dimensional array that we can access using two sets of indices.

print $array[0][2];   #displays third element of first row - 'Three'
print $array[2][1];   #displays second element of third row - 'Middle'

This is actually a piece of Perl shorthand, in deference to languages like C where real multidimensional arrays can be laid out in memory as a contiguous block and accessed using multiple indices (which translate into pointer arithmetic). In Perl an index is just a count into an array, so the value of $array[0] is in fact a reference, which we should not be able to tack a [2] onto. In other words, we would expect to have to write

print $array[0] -> [2];

This does indeed work, because this is exactly what happens internally. Perl is clever enough to automatically spot multiple indices and do the additional dereferencing without having to be told explicitly.

We can retrieve an individual array row by using one index, which will give us a scalar array reference, as we just observed.

$second_row = $array[1];

We can dereference this reference to get an array.

@second_row = @{$array[1]};

There is an important difference between using $second_row and @second_row, however. $second_row is a reference to the second row of the original multidimensional array, so if we modify the array that $second_row points to, we are actually affecting the original array.

print $array[1][1];   #prints 'Yellow'
$second_row [1] = "Green";
print $array[1][1];   #prints 'Green'

By contrast, @second_row contains a copy of the second row (because the assignment is actually a straight array copy), so modifying it does not affect @array. This distinction can be very important when working with complex data structures since we can affect values we did not mean to, or conversely, not modify the contents of arrays that we meant to.

Instead of defining a straight list of lists, we can also define a reference to a list of lists, in which case we just have to modify the outer array definition by replacing the parentheses with square brackets and changing the variable type to a scalar, like so:

$arrayref = [
   ["One", "Two", "Three"],
   ["Red", "Yellow", "Blue"],
   ["Left", "Middle", "Right"],
];

Accessing the elements of this array can be done either by dereferencing the outer array reference $arrayref, or by using the dereferencing operator, ->, to access the underlying array, which is somewhat clearer to read.

print $$arrayref[0][2];
print $arrayref -> [0][2];   #using '->' is clearer

Accessing a row is similar to before, but with an extra layer of dereferencing. Either of the following will do the trick, though again the second is clearer:

$second_row = $$array[1];
$second_row = $array->[1];

Hashes of Hashes and Other Variations

Creating a hash of hashes is similar to creating a list of lists, differing only in our use of syntax. Here is an example of a three-deep nested hash of hashes:

%identities = (
   JohnSmith => {
      Name => { First=>"John", Last=>"Smith" },
      Phone => { Home=>"123 4567890", Work=>undef },
      Address => { Street => "13 Acacia Avenue",
      City => "Arcadia City",
      Country => "El Dorado",
   }
},
   AlanSmithee => {
      Name => { First=>"Alan", Last=>"Smithee" },
      Phone => { Work=>"not applicable" },
   }
);

Accessing this structure is similar too, and again Perl allows us to omit the dereferencing operator for consecutive hash keys.

$alans_first_name = $identities{'AlanSmithee'}{'Name'}{'First'};

Since nesting data structures is just a case of storing references, we can also create lists of hashes, hashes of lists, and anything in between.

#!/usr/bin/perl
# lists.pl
use warnings;
use strict;

my (@list_of_hashes, %hash_of_lists, %mixed_bag, $my_object);
my @my_list = (1,2,3,4,5);

@list_of_hashes = (
   { Monday=>1, Tuesday=>2, Wednesday=>3, Thursday=>4, Friday=>5 },
   { Red=>0xff0000, Green=>0x00ff00, Blue=>0x0000ff },
);
print "Tuesday is the $list_of_hashes[0]{Tuesday}nd day of the week.", " ";

%hash_of_lists = (
   List_1 => [1, 2, 3],
   List_2 => ["Red", "Yellow", "Blue"],
);
print "The second element of List_1 is: $hash_of_lists{List_1}[1]", " ";

%mixed_bag = (
   Scalar1 => 3,
   Scalar2 => "Hello World",
   List1 => [1, 2, 3],
   Hash1 => { A => 'Horses', C => 'Miles' },
   List2 => ['Hugh','Pugh',
   ['Barley-McGrew','Cuthbert'],
    'Dibble', 'Grubb'],
   Scalar3 => $my_object,
   Hash2 => { Time => [ gmtime ],
   Date => scalar(gmtime),
   },
List3 => @my_list[0..2],
);

print $mixed_bag{List2}[2][1]; # produces 'Cuthbert'

Adding to and Modifying Complex Data Structures

Manipulating nested data structures is essentially no different to manipulating simple ones; we just have to be sure to modify the correct thing in the right way. For example, to add a new row to our two-dimensional array, we can either define the row explicitly, or use the push function to add it. In either case we have to be sure to add a reference, not the list itself, or we will end up adding the list contents to the outer array instead.

# Right - adds a reference
$array[2] = @third_row;   #backslash operator creates reference to array
push @array, ["Up", "Level", "Down"];   #explicit reference
push @array, ( "Large", "Medium", "Small" );   #backslashed reference

# ERROR: this is probably not what we want
$array[2] = (8, 9, 10);   # $array[2] becomes 10, the 8 and 9 are discarded
push @array, @third_row;   # contents of @third_row added to @array

In the first wrong example, we will get a warning from Perl about the useless use of a constant in void context. The second example, which is perfectly legal Perl, will not generate any warnings. This is consequently one of the commonest sources of bugs when manipulating complex data structures. The way to avoid it is to be extremely clear and consistent about the structure of the data, and to avoid complicated mixtures of scalars, lists, and hashes unless their use is transparent and obvious.

Modifying the contents of nested lists and hashes is likewise simple. We have already seen how to replace a row in a list of lists, but we can also replace individual elements and array slices.

# Right
$array[2][1] = 9;   #replace an individual element
$array[2][12] = 42;   #grow the list by adding an element

@{$array[2]} = (8, 9, 10);   #replace all the elements
@{$array[2]}[1..2] = (9, 10);   #replace elements 2 and 3, keeping 1

# ERROR: Wrong
$array[2][1..2] = (9, 10);   #cannot take a slice of a list reference

The essential point to remember is that this is no different from manipulating simple lists and hashes, so long as we remember that we are really working through references. Perl allows a shorthand for indices when accessing elements, but this doesn't extend to array slices or more complex manipulations, so we need to handle the reference ourselves in these cases.

Creating Complex Data Structures Programmatically

Explicitly writing the code to define a complex structure is one way to achieve our goal, but we might also want to generate things like lists of lists programmatically. This is actually straightforward, but a couple of nasty traps lurk for the unwary Perl programmer. Here is a loop that appears to create a list of lists, but actually constructs a list of integers:

#!/usr/bin/perl
# complex1.pl
use warnings;
use strict;

my (@outer, @inner);
foreach my $element (1..3) {
   @inner = ("one", "two");
   $outer[$element] = @inner;
}
print '@outer is ', "@outer ";

Running this program produces the following output:

> perl complex1.pl


Use of uninitialized value in join at test.pl line 11.
@outer is  2 2 2

Although this might appear correct, we are in fact assigning a list in a scalar context. All that actually happens is that a count of the two elements in each of the three instances of the @inner array that the foreach loop reads is assigned to an element of the @outer array. This is why @outer consists of three twos rather than three @inner arrays, each of which has the two elements one and two.

The following variant is also defective—it suffers from list flattening, so the contents of all the inner arrays will be merged into the outer array:

#ERROR: list flattening
#!/usr/bin/perl
# complex2.pl
use warnings;
use strict;
my (@outer, @inner);
foreach my $element (1..3) {
   @inner = ("one", "two");
   push @outer, @inner;
}
print '@outer is ', "@outer ";

If we run this program we see the following output:

> perl complex2.pl


@outer is one two one two one two

The correct thing to do is to assign references, not lists. The following loop does the task we actually wanted. Note the additional square brackets.

#!/usr/bin/perl
# complex3.pl
use warnings;
use strict;

my (@outer, @inner);
foreach my $element (1..3) {
   @inner = ("one", "two");
   push @outer, [@inner];   #push reference to copy of @inner
}
print '@outer is ', "@outer ";

Running this program produces output like this:

> perl complex3.pl


@outer is ARRAY(0x176f0d0) ARRAY(0x176505c) ARRAY(0x17650bc)

Note that @outer consists of three different arrays despite the fact that @inner didn't change. The reason for this is that each of the three instances of @inner has a different address, which we used to create @outer.

We have already referred to the important distinction between creating a reference with square brackets and using the backslash operator to take a reference to the list. In the preceding code, the brackets make a copy of the contents of @inner and return a reference to the copy, which is pushed onto the end of @outer. By contrast, a backslash returns a reference to the original list, so the following apparently equivalent code would not work:

#!/usr/bin/perl
# complex4.pl
use warnings;
use strict;

my (@outer, @inner);
foreach my $element (1..3) {
   @inner = ("one", "two");
   push @outer, @inner;   #push reference to @inner
}
print '@outer is ', "@outer ";

When run, this program produces output like the following:

> perl complex4.pl


@outer is ARRAY(0x1765188) ARRAY(0x1765188) ARRAY(0x1765188)

What actually happens is that the @outer array is filled with the same reference to the @inner array three times. Each time the @inner array is filled with a new double of elements, but the elements of @outer all point to the same list, the current contents of @inner. At the end of the loop, all the elements of @outer are identical and only two different elements are actually stored in total.

Another way to approach this task, avoiding the pitfalls of accidentally creating duplicate references or counting lists we meant to assign as references, is to use references explicitly. This makes it much harder to make a mistake, and also saves a list copy.

#!/usr/bin/perl
@Code:# complex5.pl
use warnings;
use strict;

my (@outer, $inner_ref);
foreach my $element (1..3) {
   $inner_ref = ["one", "two"];
   push @outer, $inner_ref;   #push scalar reference
}
print '@outer is ', "@outer ";

Running this program results in

> perl complex5.pl


@outer is ARRAY(0x176f0ac) ARRAY(0x1765044) ARRAY(0x17650a4)

Rather than redefining a list, this time we redefine a list reference, so we are guaranteed not to assign the same reference more than once. Finally, another way to ensure that we don't assign the same array is to create a new array each time by declaring @inner inside the loop.

#!/usr/bin/perl
# complex6.pl
use warnings;
use strict;

my @outer;
foreach my $element (1..3) {
   my @inner = ("one", "two");
   push @outer, @inner;   #push reference to @inner
}
print '@outer is ', "@outer ";

If we run this program, we see

> perl complex6.pl


@outer is ARRAY(0x17651b8) ARRAY(0x176f0d0) ARRAY(0x1765074)

Here @inner is declared each time around the loop, and remains in scope for that iteration only. At the start of each new iteration, the old definition of @inner is discarded and replaced by a new one (note that while the elements of @inner don't change, their addresses change). As with the explicit reference example, this is also more efficient than using square brackets since no additional array copy takes place; however, it is more prone to bugs if we omit warnings since there is nothing programmatically wrong with assigning the same reference multiple times, even if it wasn't what we actually intended.

Although we have only discussed lists of lists in this section, exactly the same principles also apply to any other kind of complex data structure such as hashes of hashes or hybrid structures; just substitute braces, {}, for square brackets and percent signs for @ signs where appropriate.

Traversing Complex Data Structures

Iterating over simple data structures is easy, as we saw when we covered arrays and hashes earlier. Traversing more complex structures is also simple if they are homogeneous (that is, each level of nesting contains the same type of reference and we don't have other data types like scalars or undefined values lurking). Here's a simple loop that iterates over a list of lists:

#!/usr/bin/perl
# simple1.pl
use warnings;
use strict;

my @outer = (['a1', 'a2', 'a3'], ['b1', 'b2', 'b3'], ['c1', 'c2', 'c3']);

foreach my $outer_el (@outer) {
   foreach (@{$outer_el}) {
      print "$_ ";
   }
   print " ";
}

And here's one that iterates over a hash of hashes:

#!/usr/bin/perl
# simple2.pl
use warnings;
use strict;

my %outer = (A=> {a1=>1, a2=>2, a3=>3}, B=> {b1=>4, b2=>5, b3=>6},
             C=> {c1=>7,c2=>8, c3=>9});

foreach my $outer_key (keys %outer) {
   print "$outer_key => ";
   foreach (keys %{$outer{$outer_key}} ) {
      print" $_ => $outer{$outer_key} {$_} ";
   }
   print " ";
}

Finally, here is another list-of-lists loop that also prints out the indices and catches undefined rows:

#!/usr/bin/perl
# simple3.pl
use warnings;
use strict;

my @outer;
@outer[1, 2, 5] = (['First', 'Row'], ['Second', 'Row'], ['Last', 'Row']);

for my $outer_elc (0..$#outer) {
   if ($outer [$outer_elc] ) {
      my $inner_elcs = $#{ $outer[$outer_elc] };
      print "$outer_elc : ", $inner_elcs+1," elements ";
      for my $inner_elc (0..$inner_elcs) {
         print " $inner_elc : $outer[$outer_elc][$inner_elc] ";
      }
   } else {
print "Row $outer_elc undefined ";
   }
}

Traversing other structures is just a matter of extending these examples in the relevant direction. Things become more complex, however, if our structures contain a mixture of different data types. In most cases when we have structures like this, it is because different parts of the structure have different purposes, and we would therefore not normally want to traverse the whole structure. But it can be useful for debugging purposes; so in order to handle structures that could contain any kind of data, we can resort to the ref function. The following recursive subroutine will print out nested scalars (which includes objects), lists, and hashes to any level of depth, using ref to determine what to do at each stage:

#!/usr/bin/perl
# print_struct.pl
use warnings;
use strict;

my $mixed = [
    'scalar', ['a', 'list', ['of', 'many'], 'values'],
    {And=>{'A Hash'=>'Of Hashes'}}, 'plus a scalar ref'
];

print_structure($mixed);

sub print_structure {
   my ($data, $depth) = @_;

   $depth=0 unless defined $depth; #for initial call

   foreach (ref $data) {
      /^$/ and print($data," "), next;
      /^SCALAR/ and print('-> ', $$data, " "), next;
      /^HASH/ and do {
      print " ";
      foreach my $key (keys %{$data}) {
         print " " x$depth, "$key => ";
         print_structure ($data->{$key}, $depth+1);
      }
      next;
   };

   /^ARRAY/ and do {
      print " ";
      for my $elc (0..$#{$data}) {
         print " " x$depth, "[$elc] : ";
         print_structure ($data->[$elc], $depth+1);
      }
      next;
   };
   # it is something else - an object, filehandle or typeglob
   print "?$data?";
   }
}

If all we are interested in is debugging data structures, then we can have the Perl debugger do it for us, as this example demonstrates (there is much more on the Perl debugger in Chapter 17):

> perl -d -e 1;


Default die handler restored.

Loading DB routines from perl5db.pl version 1.07
Editor support available. Enter h or 'h h' for help, or 'man perldebug' for more
help.

main::(-e:1):   1
DB<1> $hashref={a=>1,b=>2,h=>{c=>3,d=>4},e=>[6,7,8]}

DB<2> x $hashref
0  HASH(0x82502dc)
   'a' => 1
   'b' => 2
   'e' => ARRAY(0x8250330)
      0  6
      1  7
      2  8
   'h' => HASH(0x80f6a1c)
      'c' => 3
      'd' => 4
DB<3>

Here we have just used the debugger as a kind of shell, created a hash containing an array and another hash, and used the x command of the Perl debugger to print it out in a nice legible way for us.

Several Perl modules perform similar functions. Notably, the Data::Dumper module generates a string containing a formatted Perl declaration that, when executed, constructs the passed data structure.

#!/usr/bin/perl
# datadumper.pl
use warnings;

use Data::Dumper;

my $hashref = {a=>1, b=>2, h=>{c=>3, d=>4}, e=>[6, 7, 8]};

print Dumper($hashref);

Running this program produces the output that follows:

> perl datadumper.pl


$VAR1 = {
   'e' => [
      6,
      7,
      8,
      ],
   'h' => {
      c' => 3,
      d' => 4
   }
'a' => 1,
   'b' => 2
};

Note that the output of Data::Dumper is actually Perl code. We can also configure it in a variety of ways, most notably by setting the value of $Data::Dumper::Indent (which ranges from 0 to 4, each producing an increasing level of formatting, with 2 being the default) to control the style of output. Finally, if we want to store complex data structures in a file, then we will also want to look at modules like Data::Dumper, FreezeThaw and Storable, and possibly also the MLDBM module or DBM::Deep.

Typeglobs

The typeglob is a composite data type that contains one instance of each of the other data types; it is an amalgam (or in Perl-speak, glob) of all Perl's data types, from which it gets its name. It is a sort of super reference whose value is not a single reference to something but six slots that can contain six different references, all at once:

scalar A reference to a scalar
array A reference to an array
hash A reference to a hash
code A code reference to a subroutine
handle A file or directory handle
format A format definition

Typeglobs programming is a little obscure and rather lower level than many programmers are entirely happy with. It is actually quite possible (and even recommended) to avoid typeglobs in everyday Perl programming, and there are now few reasons to use typeglobs in Perl programs. In ancient days, before references were invented, typeglobs were the only way to pass arguments into subroutines by reference (so they could be assigned to) instead of by value.

The other common use of typeglobs was to pass filehandles around, since filehandles have no specific syntax of their own and so cannot be passed directly. The IO::Handle, IO::File, and IO::Dir modules have largely replaced typeglobs for dealing with filehandles, but since the IO:: family of modules is comparatively bulky to a fundamental built-in datatype, typeglobs are still a popular choice for dealing with filehandles and will often be seen in older Perl code.

Defining Typeglobs

Typeglobs are defined using an asterisk prefix, in exactly the same way as scalars are prefixed with a dollar sign, or arrays with an at-sign. To create a typeglob, we need only assign a value to it. The most obvious example is assigning a typeglob from another typeglob:

*glob = *anotherglob;

This copies all the six references (which need not all be defined) held in anotherglob to the typeglob glob. For example:

$message = "some text";
*missive = *message;
print $missive;   # produce "some text";

Alternatively, we can assign references individually.

*glob = $scalar;

This creates a new typeglob containing a defined scalar reference, and an undefined value for the other five. We can access this new scalar value with

print $glob;   # access typeglob scalar reference

Assigning a scalar reference to a typeglob creates a new variable called $glob that contains the same value as the original scalar.

Interestingly, we can then fill other slots of the typeglob without affecting the ones currently defined (unless of course we overwrite one). Perl treats glob assignments intelligently, and only overwrites the part of the glob that corresponds to the reference being assigned to it, a property unique amongst Perl's data types. The following statement fills the array reference slot, but leaves the scalar reference slot alone:

*glob = @array;

By filling in the array slot, we create a variable called @glob, which points to the same values as the original @array; changing either variable will cause the other to see the same changes. The same applies to our earlier $glob variable. Changing the value of $glob also changes the value of $scalar, and vice versa. This is called variable aliasing, and we can use it to great effect in several ways on variables, subroutines, and filehandles.

The upshot of this is that we rarely need to access a typeglob's slots directly, since we can simply access the relevant variable (the exception is, of course, filehandles, which do not have their own syntax for direct access), but we can play some interesting tricks by assigning to typeglobs.

Manipulating Typeglobs

We have already seen how we can create aliases for scalars and arrays (the same applies to hashes too, of course):

*glob = $scalar;   # create $glob as alias for $scalar
*glob = @array;    # create @glob as alias for @array
*glob = \%hash;     # create %glob as alias for %hash

If we assign the typeglob to a new name, we copy all three references. For example, the following statement invents the variables $glob2, @glob2, and %glob2, all of which point to the same underlying values as the originals:

*glob2 = *glob;

So far we have considered only the three standard variable types, but typeglobs also contain a code reference slot, which is how Perl defines subroutine names. A roundabout way to define a named subroutine is to assign a code reference to a typeglob.

*subglob = sub {return "An anonymous subroutine?"};

or

*subglob = &mysubroutine;

Both of these assignments cause a subroutine called subglob to spring into existence. The first demonstrates that the only difference between a named and an anonymous subroutine (see Chapter 7 for more on subroutines) is a typeglob entry. The second creates an alias for the subroutine mysubroutine, so we can now call mysubroutine or subglob with equal effect.

# these two statements are identical
print mysubroutine(@args);
print subglob(@args);

Both typeglobs contain the same code reference, so the two names are simply two different ways to refer to the same thing.

Accessing Typeglobs

If we want to access the different parts of a typeglob, we can do so by casting it into the appropriate form. For example:

# assign a new KEY to %glob
${*glob}{$KEY} = $value;

The same approach works for ${*glob}, @{*glob}, and &{*glob}, which access the scalar, array, and subroutine parts of the typeglob, respectively. However, we cannot do the same for filehandles or report formats, since they do not have a prefix.

We can also access the different parts of a typeglob directly. This uses a notation similar to hashes, but with a typeglob rather than a scalar prefix. There are five slots in a typeglob that can be accessed (reports being the exception). Each has its own specific key that returns the appropriate reference, or undef if the slot is not defined.

$scalarref = *glob{SCALAR};
$arrayref = *glob{ARRAY};
$hashref = *glob{HASH};
$subref = *glob{CODE};
$fhref = *glob{IO};

We can also generate a reference to the typeglob itself with

$globref = *glob{GLOB};

The unqualified name of the typeglob, without any package prefix, is available via NAME.

$globname = *glob{NAME}; # returns the string 'glob'

Much of the time we do not need to access the contents of a typeglob this way. Scalar, array, hash, and code references are all more easily accessed directly. Perl's file handling functions are also smart, in that they can spot a typeglob and extract the filehandle from it automatically.

print STDOUT "This goes to standard output";

print *STDOUT "The same thing, only indirectly";

Assigning a typeglob to anything other than another typeglob causes it to be interpreted like a reference; that is, the name of the typeglob, complete with package specifier and asterisk prefix, is written into the scalar.

$globname = *glob;
print $globname;   # produces '*main::glob'

This is basically just a way to create a symbolic reference to a typeglob, which is getting dangerously abstract and obscure, and is exactly the sort of thing that use strict was implemented to prevent.

*$globname = *anotherglob;   # aliases '*anotherglob' to '*glob'

However, it does have one use, which comes about because we can refer to filehandles via their typeglobs, coupled with the fact that Perl's file handling functions accept the name of a filehandle (in a string) as a substitute for the filehandle itself.

We can take a reference to a typeglob in the usual manner, and then access it via the reference.

my $globref = *glob;

$scalarref = $globref->{SCALAR};

Since a glob reference is very much like any other reference, a scalar, we can store it in an array element, a hash value, or even another glob.

*parentglob = $globref;

The Undefined Value

The undefined value is a curious entity, being neither a scalar, list, hash, nor any other data type. Although it isn't strictly speaking a datatype, it can be helpful to think of it as a special datatype with only one possible value. It isn't any of the other data types, and so cannot be confused for them. We can assign an undefined value to a scalar variable, or anywhere else a literal value may live, so the undefined value can also be considered a special case of a scalar value. Conveniently, it evaluates to an empty string (or zero, numerically), which is a false value, so we can ignore its special properties in Boolean tests if we wish, or check for it and handle it specially if we need to. This dual nature makes the undefined value particularly useful.

It is common to initialize a scalar variable with undef to indicate that it is not meant to have defined value initially. This is technically not necessary because when we declare a variable, Perl automatically initializes its value to undef unless we provide one (from Perl 5.8.4 onwards the assignment is even optimized away at compile time for efficiency). The following statements are therefore equivalent:

my $undefinedtostartwith=undef; # explicitly undefined
my $undefinedtostartwith;       # implicitly undefined

The concept of a value-that-is-not-a-value is common to many languages. In Perl, the undef function returns an undefined value, performing the same role as NULL does in C—it also undefines variable arguments passed to it, freeing the memory used to store their values. If we declare a variable without initializing it, it automatically takes on the undefined value too. Perl also provides the defined function that tests for the undefined value and allows us to distinguish it from an empty string or numeric zero.

$a = undef;         # assign undefined value to $a
$b;                 # assign undefined value to $b implicitly
$a = 1;             # define $a
print defined($a)   # produces '1'
undef $a            # undefine $a
print defined ($a)  # produces '0'

The undefined value is returned by many of Perl's built-in functions to indicate an error or an operation that did not complete. Since many operations cannot legally return any other value for failure, undef becomes a useful way to indicate failure because it is not a real value. We can distinguish between undef and zero with the defined function, as the following example demonstrates. The main code passes a filename to a subroutine called get_results and handles three different possible outcomes, one "success" and two different kinds of "failure."

#!/usr/bin/perl
# undef.pl
use warnings;
use strict;

# get a filename
my $file = $ARGV[0] or die "Usage $0 <result file> ";

# process and return result
my $result = get_results($file);
# test result
if ($result) {
   print "Result of computation on '$file' is $result ";
} elsif (defined $result) {
   print "No results found in file ";
} else {
   print "Error - could not open file: $! ";
}

# and the subroutine...
sub get_results {
   # return 'undef' to indicate error
   open RESULTS, $_[0] or return undef;

   # compute result (simple sum)
   my $file_result = 0;
   while (<RESULTS>) {
      $file_result += $_;
   }

   # return result, 0 if file empty
   return $file_result;
}

The get_results subroutine uses undef to distinguish between two different but equally possible kinds of nonresult. It is designed to read results from a file and performs a calculation on them (for simplicity, we've just used a simple sum), returning the result. It is possible that there are no results, so the calculation returns zero. This isn't actually an error, just a lack of result. If the results file is missing, however, that is an error. By passing back undef rather than zero for an error, we can distinguish between these two possible results of calling the subroutine and act accordingly. If we did not care about the reason for the nonresult, we could simplify our code to

if ($result) { # true
   print "Result of computation on '$file' is $result ";
} else { # false or undef
   print "No results ";
}

Without an argument to undefine, undef is very much like a value that happens to be undefined; we can treat it almost as a number with no value. However, it is always distinct from a scalar because it returns false when given to the defined function. Having said that, the undefined value does have some things in common with a scalar, it is a single value (in a manner of speaking) and we can even take a reference to it, just like a scalar or list.

my $undefref = undef;
print defined($$undefref);   # produces '0'

Tests of Existence

The defined function tests a value to see if it is undefined, or has a real value. The number 0 and the empty string are both empty values, and test false in many conditions, but they are defined values unlike undef. The defined function allows us to tell the difference.

print "It is defined!" if defined $scalar;

defined comes up short when we use it on hashes, however, since it cannot tell the difference between a nonexistent key and a key with an undefined value, as noted previously. All it does is convert undef to an empty value ('' or 0, depending on the context) and everything else to 1. In order to test for the existence of a hash key, we instead use the exists function.

my %hash = ('A Key' => 'A Value', 'Another Key' => 'Another Value'),
print "It exists!" if exists $hash{'A Key'};

Or, in a fuller example that tests for definition as well:

#!/usr/bin/perl
# exists.pl
use strict;
use warnings;

my %hash = ('Key1' => 'Value1', 'Key2' => 'Value2'),
my $key = 'Key1';

# the first if tests for the presence of the key 'Key1'
# the second if checks whether the key 'Key1' is defined
if (exists $hash{$key}) {
   if (defined $hash{$key}) {
      print "$key exists and is defined as $hash{$key} ";
   } else {
      print "$key exists but is not defined ";
   }
} else {
   print "$key does not exist ";
}

In a sense, defined is the counterpart of undef and exists is the counterpart of delete (at least for hashes). For arrays, delete and undef are actually the same thing, and exists tests for array elements that have never been assigned to. exists is not applicable to scalar values; use defined for them.

Using the Undefined Value

If we do not define a variable before using it, Perl will emit a warning, if warnings are enabled.

my $a;
print "The value is $a ";   # produces 'Use of uninitialized value ...'

If warnings are not enabled, undef simply evaluates to an empty string. A loop like the following will also work correctly, even if we do not predeclare the count variable beforehand, because on the first iteration the undefined variable will be evaluated as 0:

#!/usr/bin/perl
# no_warnings.pl;
# warnings not enabled...

while ($a<100) {
   print $a++, " ";
}

Leaving warnings disabled globally is not good programming, but if we know what we are doing, we can disable them locally to avoid warnings when we know we may be using undefined values. In this case, we should really declare the loop variable, but for illustrative purposes we could use a warnings pragma, or a localized copy of $^W to disable warnings temporarily like this:

# warnings enabled here ...
{
   no warnings;   # use 'local $^W = 0' for Perl < 5.6
   while ($a < 100) {
print $a++, " ";
   }
}
# ... and here

Perl is smart enough to let some uses of the undefined value pass, if they seem to be sensible ones. For example, if we try to increment the value of an undefined key in a hash variable, Perl will automatically define the key and assign it a value without complaining about it. This allows us to write counting hashes that contain keys only for items that were actually found, as this letter counting program illustrates:

#!/usr/bin/perl
# frequency.pl
use warnings;
use strict;

sub frequency {
   my $text = join('', @_);
   my %letters;
   foreach (split //, $text) {
      $letters{$_}++;
   }
   return %letters;
}
my $text = "the quick brown fox jumps over the lazy dog";

my %count = frequency($text);

print "'$text' contains: ";
foreach (sort keys %count) {
   print " ", $count{$_}, " '$_", ($count{$_} == 1)? "'": "'s", " ";
}

This will create a hash of letter keys with the frequency of each letter's occurrence as their values. Of note is the split statement, which uses an empty pattern //. This is a special case of split that returns characters one at a time by using a delimiter of nothing at all.

The trick to this program lies in the line $letters{$_}++. To start with, there are no keys in the hash, so the first occurrence of any letter causes a new key and value to be entered into the hash. Perl allows this, even though the increment implies an existing value. If a letter does not appear at all, there won't even be a key in the hash for it, eliminating redundant entries.

Using undef As a Function

Although we often use undef as if it were a value by assigning it or returning it from subroutines, it is in fact a function that returns the undefined value (for which there is no written equivalent). When used on variables, undef undefines them, destroying the value. The variable remains intact, but now returns undef when it is accessed. For example:

undef $scalar;

This is essentially the same as

$scalar = undef;

If the undef function is used on an array or a hash variable, it destroys the entire contents of the variable, turning it into an empty array or hash. The following two statements are therefore equivalent:

undef @array;
@array = ();

Undefining an array element, a slice of an array, or a hash key, undefines the value, but not the array element or hash key, which continues to exist.

undef $hash{'key'};   # undefine value of key 'key'
my $value = $hash{'key'};   # $value is now 'undef'

Similarly:

my @array = (1, 2, 3, 4, 5);   # define a five element array
@array[1..3] = undef;   # @array contains (1, undef, undef, undef, 5)

To actually remove the element or hash key, we use the delete function.

my @array = (1, 2, 3, 4, 5);   # define a five-element array
delete @array[1..3];   # no more second, third, and fourth elements

Constants

A constant is a value that remains unchanged throughout the lifetime of a program. By defining a named constant and then using it rather than the value, we can avoid retyping, and potentially mistyping, the value. In addition, it makes our code more legible. A good example of a constant is the value of pi, 3.14159265358979.... Clearly it would be preferable to just type PI in our programs than reel out a string of digits each time. A second reason for using a constant is that we can, if we wish, change it. By defining it in one place and then using the definition in every other place in our code, we can easily alter the value throughout the application from a single definition.

One simple but not very satisfactory way to define a constant is with a scalar variable. By convention, constants use fully capitalized names, for example:

# define constant '$PI'
$PI = 3.1415926;

# use it
$deg = 36;
print "$deg degrees is ", $PI*($deg/180), " radians";

However, this constant is constant in typography only. It's still a regular scalar variable and can be assigned a new value as normal. A more reliable way to define a scalar constant is by assigning a value, by reference, to a typeglob. Here is how we could define the constant $PI using this approach:

# define constant
*PI = 3.1415926;

This causes Perl to create the variable $PI, since the assigned reference is to a scalar. Because the reference is to a literal value rather than a variable, it cannot be redefined and so the scalar "variable" $PI is read-only, a true constant. Attempting to assign a new value to it will provoke an error from Perl.

# A more rational, if inaccurate, value of PI
$PI = 3;   # produces 'Modification of a read-only value attempted ...'

However, this still does not reinforce the fact that PI is supposed to be constant, because it looks like a regular scalar variable, even if we are prevented from altering it. What we would ideally like is constants that look constant, without any variable prefix character, which is what the constant pragma provides us with.

Declaring Scalar Constants with the constant Pragma

The constant pragmatic module allows us to define scalar constants that both look and behave like constants. Like any module, we use it through a use statement, providing the name and value of the constant we wish to define. To define a value for PI, we could write

use constant PI => 3.1415926;

This notation is an immediate improvement over using a scalar variable or a typeglob since it legibly declares to the reader, as well as to Perl, that we are defining a constant. The use of => is optional. We could equally have used a comma, but in this context => makes sense since we are defining an association. It also allows us to omit the quotes we would otherwise need if use strict is in effect, which is elegant since the result of this statement is to define a constant PI, which we can use like this:

print "$deg degrees is ", PI*($deg/180);

This is an immediate improvement over the first example, since PI is clearly a constant, not a variable like $PI. It also cannot be assigned to, since it is no longer a scalar variable. (For the curious, it is actually a subroutine, defined on-the-fly by the module that takes no arguments and returns the value we supplied. This makes a surprisingly effective constant even though it is not actually a built-in feature of the language.)

Constants are a good place to perform one-off calculations too. The definition of pi shown previously is adequate for most purposes, but it is not the best that we can do. We saw earlier that we can calculate pi easily using the expression 4*atan2(1, 1). We can use this expression to define our constant PI:

use constant PI => 4 * atan2(1, 1);

Although this is more work than just defining the value explicitly, Perl only evaluates it once, and we end up with the best possible value of pi that can be handled on any architecture that we run the program on without needing to rewrite the code.

Calculating constants is also useful for clarity and avoiding errors; it is easier to get the preceding expression right because it is shorter to type and errors are more obvious. Detecting one wrong digit in a 15-digit floating point number is not so simple. Similarly, computed values such as the number of seconds in a year look better like this:

use constant SECONDS_IN_YEAR => 60 * 60 * 24 * 365;

than this:

use constant SECONDS_IN_YEAR => 31536000;

Constants are conventionally defined in entirely uppercase, to enable them to be easily distinguished from functions that happen to take no arguments. This is not an enforced rule, but it is often a good idea to improve the legibility of our code.

Expressions used to define constants are evaluated in a list context. That means that if we want the scalar result of a calculation, we need to say so explicitly. For example, the gmtime function returns a list of date values in list context, but in a scalar context it instead returns a nicely formatted string containing the current date. To get the nicely formatted string, we need to use scalar to force gmtime into a scalar context.

use constant START_TIME => scalar(gmtime);

As a final note on scalar constants, we can also define a constant to be undefined.

use constant TERMINATION_DATE => undef;
use constant OVERRIDE_LIST => ();

Both of these statements create constants that evaluate to undef in a scalar context and () in a list context.

Declaring Constant Scalar Variables

It can sometimes be useful to create a variable that has a constant value. One particular use is in interpolation—we can embed the variable into a double-quoted string. We cannot do this with constants created by the constant pragma, and must resort to concatenation instead.

A constant scalar variable (which is admittedly twisting the term "variable") can be created in a number of ways. Perhaps the simplest is to assign a reference to a constant string to a typeglob.

#!/usr/bin/perl
use warnings;
use strict;

use vars qw($constantstring); # declare use of package scalar
*constantstring="immutable"; # assign constant string to scalar slot of glob
print $constantstring;        # produces 'immutable'
$constantstring='no!';        # Error

Alternatively, from Perl 5.8 we can use the built-in function Internals::SvREADONLY. As its name suggests, this function is somewhat secret and technically deemed to be unofficial. It is likely a friendlier and more official face will be put on it in the future since it is the underlying foundation of restricted hashes as provided by the Hash::Util module. For now:

my $constantstring='immutable';
Internals::SvREADONLY($constantstring => 1);

Finally, we can create a tied object class that overrides write operations, and we will see such a class in Chapter 21.

Declaring List and Hash Constants

Unlike the typeglob definition of constants, which only works for literal values and hence only defines scalar constants, the constant pragma also allows us to define constant arrays and constant hashes. Both of these work in essentially the same way as a scalar constant, with the values to be made constant passed to the module and a subroutine that is defined behind the scenes to implement the resulting constant. Here is how we can define a constant list of weekdays that we can use to retrieve the day names by index:

use constant WEEKDAYS=>('Monday', 'Tuesday', 'Wednesday', 'Thursday'', 'Friday'),

Accessing the individual elements of a constant array can be tricky though, because the constant returns a list of values to us, not an array. Because of this, we cannot simply use an index to retrieve an element.

print "The third day is", WEEKDAYS[2];   #ERROR: syntax error

To solve this problem, we only need to add parentheses to make the returned list indexable.

print "The third day is", (WEEKDAYS)[2];   # works ok

A similar technique can be used to create hash constants, though the values are stored and returned as a list, so they cannot be accessed through a key without first being transferred into a real hash.

    use constant WEEKABBR => (
    Monday=>'Mon', Tuesday=>'Tue', Wednesday=>'Wed',
    Thu=>'Thursday', Fri=>'Friday'
);
my %abbr = WEEKABBR;
my $day = 'Wednesday';
print "The abbreviation for $day is ", $abbr{$day};

Because of this limitation, constant hashes are better defined via a reference, which can hold a real hash as its value, rather than a simple list of keys and values that happen to resemble a hash. Given that, however, if we really want a reference to a constant hash, a pseudohash or restricted hash may be a better solution to the same problem.

Constant References

Since references are scalars, we can also define constant references. As a simple example, the preceding array could also have been declared as

use constant WEEKDAYS=>[ 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'];

Because the constant is a reference to an array, we must dereference it to access the elements, which is marginally more attractive (and certainly more legible) than adding parentheses.

print "The third day is ", WEEKDAYS->[2];

However, all that is being defined here is a constant reference. We cannot assign a new value to WEEKDAYS, but we can still alter the values in the array through the existing reference. The list is not truly constant, though it still looks like a constant.

WEEKDAYS->[0]='Lundi';   #this is perfectly legal

Depending on our programming goals, this might actually be a good thing, allowing us to secretly customize the value of a constant inside a package while presenting it as an unchanging and unmodifiable value to the outside world. However, this kind of behavior should be exercised with caution.

Listing and Checking for the Existence of Constants

To check for the existence of a constant, we can make use of the declared hash in the constant package to see if the constant exists or not.

unless (exists $constant::declared{'MY_CONSTANT'}) {
   use constant MY_CONSTANT => "My value";
}

We can also dump out a list of all the currently declared constants by iterating over the keys of the hash.

foreach (keys %constant::declared) {
   print "Constant $_ is defined as '$constant::declared{$_}'";
}

To detect a constant scalar such as a locked value in a restricted hash, we can use the readonly function from Scalar::Util.

#!/usr/bin/perl -w
# testforconstantscalar.pl
use Scalar::Util qw(readonly);
my $constant="immutable";
print "scalar is ",(readonly($constant)?"constant":"variable")," ";
Internals::SvREADONLY($constant => 1);
print "scalar is ",(readonly($constant)?"constant":"variable")," ";

Summary

We began this chapter by looking at lists and arrays. Specifically, we saw how to manipulate and modify them; count the number of elements; add, resize, and remove elements; and sort arrays. We also noted the essential difference between lists and arrays—lists are passed values, arrays are storage for lists.

We then took a similar look at hashes, and also saw how to convert them into scalars and arrays. We then covered two ways to create hashes with fixed and unchangeable keys: pseudohashes, now deprecated, and their replacement, restricted hashes. We also covered the fields pragma and the use of typed scalar variables to provide compile-time checks and optimization where pseudohashes or restricted hashes are in use.

From there we discussed references, both hard and symbolic, and learned how to create references and then dereference them. We also covered weak references and garbage collection, passing data to subroutines by reference, and finding the type of a reference.

Armed with this information, we dug into complex data structures, including problems inherent with nesting. We learned how to construct hashes of hashes, arrays of hashes, and more esoteric constructions. From there we learned how to create complex data structures programmatically and then navigate them. We also covered typeglobs and saw how to define and manipulate them. We looked at the undefined value, and, amongst other things, explored its use as a function.

Finally, we examined constants and put the constant pragma to use. We saw how to declare constant scalars, lists, hashes, and references, and how to detect constants in code.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset