Chapter 15. Objects

Object-oriented programming offers a sustainable way to write spaghetti code. It lets you accrete programs as a series of patches.

Paul Graham The Hundred-Year Language

Perl's approach to object orientation is almost excessively Perlish: there are far too many ways to do it.

There are at least a dozen different ways to build an object (from a hash, from an array, from a subroutine, from a string, from a database, from a memory-mapped file, from an empty scalar variable, etc., etc.). Then there are scores of ways to implement the behaviour of the associated class. On top of that, there are also hundreds of different techniques for access control, inheritance, method dispatch, operator overloading, delegation, metaclasses, generics, and object persistence.[91] And, of course, many developers also make use of one or more of the over 400 "helper" modules from the CPAN's Class:: and Object:: namespaces.

There are just so many possible combinations of implementation, structure, and semantics that it's quite rare to find two unrelated class hierarchies that use precisely the same style of Perl OO.

That diversity creates a huge problem. The dizzying number of possible OO implementations makes it very much harder to comprehend any particular implementation, because the reader might not encounter a single familiar code structure by which to navigate the class definitions.

There is no guarantee of what a class declaration will look like, nor how it will specify its attributes and methods, nor where it will store its data, nor how its methods will mediate access to that data, nor what the class constructor will be called, nor what a method call will look like, nor how inheritance relationships will be declared, nor just about anything else.

You can't even assume that there will be a class declaration (see the Class::Classless module, for example), or that the attributes or methods are specified at all (as in Class::Tables), or that object data isn't stored outside the program completely (like Cache::Mmap does), or that methods don't magically redefine themselves in derived classes when called a certain special way (as happens under Class::Data::Inheritable).

This cornucopia of alternatives rarely results in robust or efficient code. Of the several dozen OO approaches most frequently used in Perl development, none of them scales well to the demands of large systems. That includes the consensus first choice: hash-based classes.

This chapter summarizes a better approach; one that produces concise readable classes, ensures reliable object behaviour, prevents several common errors, and still manages to maintain near-optimal performance.

Using OO

Make object orientation a choice, not a default.

There are plenty of excellent reasons to use object orientation: to achieve cleaner encapsulation of data; to better decouple the components of a system; to take advantage of hierarchical type relationships using polymorphism; or to ensure better long-term maintainability.

There are also plenty of reasons not to use object orientation: because it tends to result in poorer overall performance; because large numbers of method calls can reduce syntactic diversity and make your code less readable; or just because object orientation is simply a poor fit for your particular problem, which might be better solved using a procedural, functional, data flow, or constraint-based approach.[92].

Make sure you choose to use OO because of the pros and despite the cons, not just because it's the big, familiar, comfortable hammer in your toolset.

Criteria

Choose object orientation using appropriate criteria.

When deciding whether to use object orientation, look for features of the problem—or of the proposed solution—that suggest that OO might be a good fit. For example, object orientation might be the right approach in any of the following situations:

The system being designed is large, or is likely to become large

Object orientation helps in large systems, because it breaks them down into smaller decoupled systems (called "classes"), which are generally still simple enough to fit in a single brain—unlike the large system as a whole.

The data can be aggregated into obvious structures, especially if there's a large amount of data in each aggregate

Object orientation is about classifying data into coherent chunks (called "objects") and then specifying how those chunks can interact and change over time. If there are natural clusterings in the data to be handled by your system, then the natural place for those clusterings is probably inside an object. And the larger the amount of data in each chunk, the more likely it is that you're going to need to think of those chunks at some higher, more abstract level. It's also more likely that you'll need to control access to that data more tightly to ensure it remains consistent.

The various types of data aggregate form a natural hierarchy that facilitates the use of inheritance and polymorphism

Object orientation provides a way to capture, express, and take advantage of the abstract relationships between chunks of data in your code. If one kind of data is a special form of another kind of data (a restriction, or elaboration, or some other variation), then organizing that data into class hierarchies can minimize the amount of nearly identical code that has to be written.

You have a piece of data on which many different operations are applied

You're going to be calling many different subroutines on the same data. But Perl subroutines don't type-check their arguments in any way, so it's very easy to send the wrong type of data to the wrong subroutine. If that happens, Perl's otherwise helpful implicit behaviours[93] can conspire to mask the mistake, making it vastly harder to detect and correct. In contrast, if the data is an object, then only those subroutines that are methods of the appropriate class can ever be called on that data.

You need to perform the same general operations on related types of data, but with slight variations depending on the specific type of data the operations are applied to

For example, if every piece of equipment needs to have check( ), register( ), deploy( ), and activate( ) applied to it, but the process of checking, registration, deployment, and activation differs for each type of equipment, then you have the textbook conditions for using polymorphic method calls.

It's likely you'll have to add new data types later

New data types will usually be related in some way to existing data types. If those existing data types are in class hierarchies, you'll be able to use inheritance to create the new type with minimal extra effort, and little or no duplication of code. Better still, the new data type will then be usable in existing code, without the need to modify that code in any way.

The typical interactions between pieces of data are best represented by operators

Perl operators can be overloaded only when at least one of their operands is an object.

The implementation of individual components of the system is likely to change over time

Proper encapsulation of objects can ensure that any code that uses those objects is isolated from the details of how the objects' data is stored and manipulated. This means that no source code outside your control will ever rely on those details, so you're free to change how the object is implemented whenever necessary, without having to rewrite huge amounts of client code.

The system design is already object-oriented

If the designers have designed a huge, complex, awkward nail, then the big, familiar, comfortable hammer of OO will almost certainly be the best tool for the job.

Large numbers of other programmers will be using your code modules

Object-oriented modules tend to have interfaces that are more clearly defined, which often makes them easier to understand and use correctly. Unlike procedural APIs that export subroutines into the caller's symbol table, classes never pollute client namespaces, and therefore are much less likely to clash with other modules. And the need to use object constructors and accessors can often improve the integrity of data as well, because it's easy to embed vetting procedures into the initialization or access methods.

Pseudohashes

Don't use pseudohashes.

Pseudohashes were a mistake. Their goal—better compile-time type-checking, leading to comparatively faster run-time access—was entirely laudable. But they achieved that goal by actually slowing down all normal hash and array accesses.

They can also double both the memory footprint and the access-time for objects, unless they're used in exactly the right way. They're particularly inefficient if you ever forget to give their container variables a type (which is pretty much guaranteed, since you never have to give any other Perl variable a type, so you're not in the habit). Pseudohashes are also prone to very hard-to-fathom errors when used in inheritance hierarchies[94].

Don't use them. If you're currently using them, plan to remove them from your code. They don't work with Perl releases prior to Perl 5.005, they're deprecated in Perl 5.8, and will be removed from the language entirely in 5.10.

Restricted Hashes

Don't use restricted hashes.

Restricted hashes were developed as a mechanism to partially replace pseudohashes. An ordinary hash can be converted into a restricted hash simply by calling one or more of the lock_keys( ), lock_value( ), or lock_hash( ) subroutines provided by the Hash::Util module, which is standard in Perl 5.8 and later.

If the keys of a hash are locked with lock_keys( ), that hash is prevented from creating entries for keys other than the keys that existed at the time the hash keys were locked. If a hash value is locked with lock_value( ), the value for that particular hash entry is made constant. And if the entire hash is locked with lock_hash( ), neither its keys nor their associated values can be altered.

If you build a hash-based object and then lock its keys, no-one can accidentally access $self->{Name} when the object's attribute is supposed to be in $self->{name} instead. That's a valuable form of consistency checking. If you also lock the values before the constructor returns the object, then no-one outside the class can mess with the contents of your object, so you also get encapsulation. And as they're still just regular hashes, you don't lose any appreciable performance.

The problem is that like the now-deprecated pseudohashes, restricted hashes still offer only voluntary security[95]. The Hash::Util module also provides unlock_keys( ), unlock_value( ), and unlock_hash( ) subroutines, with which all that pesky consistency checking and annoying attribute encapsulation can be instantly circumvented.

Encapsulation

Always use fully encapsulated objects.

The voluntary nature of the security that restricted hashes offer is a genuine problem. Lack of encapsulation is one of the reasons why plain, unrestricted hashes aren't a suitable basis for objects either. Objects without effective encapsulation are vulnerable. Instead of politely respecting their public interface, like so:

# Use our company's proprietary OO file system interface...
use File::Hierarchy;

# Make an object representing the user's home directory...
my $fs = File::Hierarchy->new('~'),

# Ask for the list of files in it...
for my $file ( $fs->get_files( ) ) {
    # ...then ask for the name of each file, and print it...
    print $file->get_name( ), "
";}

some clever client coder inevitably will realize that it's marginally faster to interact directly with the underlying implementation:

# Use our company's proprietary OO file system interface...
use File::Hierarchy;

# Make an object representing the user's home directory...
my $fs = File::Hierarchy->new('~'),

# Then poke around inside the (array-based) object
# and pull out its embedded file objects...
for my $file (@{$fs->{files}}) {
    # Then poke around inside each (hash-based) file object,
    # pull out its name, and print it...
    print $file->{name}, "
";
}

From the moment someone does that, your class is no longer cleanly decoupled from the code that uses it. You can't be sure that any bugs in your class are actually caused by the internals of your class, and are not the result of some kind of monkeying by the client code. And to make matters worse, now you can't ever change those internals without the risk of breaking some other part of the system.

Of course, if the client programmers have deliberately flouted the (unenforced) encapsulation of your objects, and your subsequent essential class modifications unavoidably and necessarily break several thousands of errant lines of their malignant code, surely that's just instant justice, isn't it? Unfortunately, your pointy-haired boss will probably only hear that "your sub . . . essential class modifications un . . . necessarily break . . . thousands of . . . lines of . . . code". Now, guess who's going to have to fix it all.

So you have to be aggressively pre-emptive about enforcing object encapsulation. If the first attempt to circumvent your interface fails, there won't be a second. Or a thousandth. From the very start, you need to enforce the encapsulation of your class rigorously; fatally, if possible. Fortunately, that's not difficult in Perl.

There is a simple, convenient, and utterly secure way to prevent client code from accessing the internals of the objects you provide. Happily, that approach guards against misspelling attribute names, as well as being just as fast as—and often more memory-efficient than—ordinary hash-based objects.

That approach is referred to by various names—flyweight scalars, warehoused attributes, inverted indices—but is most commonly known as: inside-out objects.

They're aptly named, too, because they reverse all of Perl's standard object-oriented conventions. For example, instead of storing the collected attributes of an object in an individual hash, inside-out objects store the individual attributes of an object in a collection of hashes. And, rather than using the object's attributes as separate keys into an object hash, they use each object as a key into separate attribute hashes.

That description might sound horribly convoluted, but the technique itself certainly isn't. For example, consider the two typical hash-based Perl classes shown in Example 15-1. Each declares a constructor named new( ), which blesses an anonymous hash to produce a new object. The constructor then initializes the attributes of the nascent object by assigning values to appropriate keys within the blessed hash. The other methods defined in the classes (get_files( ) and get_name( )) then access the state of the object using the standard hash look-up syntax: $self->{attribute}.

Example 15-1. Typical hash-based Perl classes

package File::Hierarchy;

# Objects of this class have the following attributes...

#     'root'   - The root directory of the file hierarchy
#     'files'  - An array storing an object for each file in the root directory

# Constructor takes path of file system root directory...
sub new {
    my ($class, $root) = @_;

    # Bless a hash to instantiate the new object...
    my $new_object = bless {}, $class;

    # Initialize the object's "root" attribute...
    $new_object->{root} = $root;

    return $new_object;
}

# Retrieve files from root directory...
sub get_files {
    my ($self) = @_;

    # Load up the "files" attribute, if necessary...
    if (!exists $self->{files}) {
        $self->{files} 
            = File::System->list_files($self->{root});
    }

    # Flatten the "files" attribute's array to produce a file list...
    return @{$self->{files}};
}

package File::Hierarchy::File;

# Objects of this class have the following attributes...
#     'name' - the name of the file

# Constructor takes name of file...
sub new {
    my ($class, $filename) = @_;

    # Bless a hash to instantiate the new object...
    my $new_object = bless {}, $class;

    # Initialize the object's "name" attribute...
    $new_object->{name} = $filename;

    return $new_object;
}

# Retrieve name of file...
sub get_name {
    my ($self) = @_;

    return $self->{name};
}

Example 15-2 shows the same two classes, reimplemented using inside-out objects. The first thing to note is that the inside-out version of each class requires exactly the same number of lines of code as the hash-based version[96]. Moreover, the structure of each class is line-by-line identical to that of its previous version, with only minor syntactic differences on a few corresponding lines.

Example 15-2. Atypical inside-out Perl classes

package File::Hierarchy;
use Class::Std::Utils;
{
    # Objects of this class have the following attributes...

    my %root_of;   # The root directory of the file hierarchy
    my %files_of;  # An array storing an object for each file in the root directory

    # Constructor takes path of file system root directory...
    sub new {
        my ($class, $root) = @_;

        # Bless a scalar to instantiate the new object...
        my $new_object = bless do{my $anon_scalar}, $class;

        # Initialize the object's "root" attribute...
        $root_of{ident $new_object} = $root;

        return $new_object;
    }

    # Retrieve files from root directory...
    sub get_files {
        my ($self) = @_;

        # Load up the "files" attribute, if necessary...
        if (!exists $files_of{ident $self}) {
            $files_of{ident $self}
                = File::System->list_files($root_of{ident $self});
        }

        # Flatten the "files" attribute's array to produce a file list...
        return @{ $files_of{ident $self} };
    }
}

package File::Hierarchy::File;
use Class::Std::Utils;
{  
    # Objects of this class have the following attributes...
    my %name_of;  # the name of the file

    # Constructor takes name of file...
    sub new {
        my ($class, $filename) = @_;

        # Bless a scalar to instantiate the new object...
        my $new_object = bless do{my $anon_scalar}, $class;

        # Initialize the object's "name" attribute...
        $name_of{ident $new_object} = $filename;

        return $new_object;
    }

    # Retrieve name of file...
    sub get_name {
        my ($self) = @_;

        return $name_of{ident $self};
    }}

But although those few differences are minor and syntactic, their combined effect is enormous, because they make the resulting classes significantly more robust, completely encapsulated, and considerably more maintainable[97].

The first difference between the two approaches is that, unlike the hash-based classes, each inside-out class is specified inside a surrounding code block:

package File::Hierarchy;
{
    # [Class specification here]
}

package File::Hierarchy::File;
{
    # [Class specification here]}

That block is vital, because it creates a limited scope, to which any lexical variables that are declared as part of the class will automatically be restricted. The benefits of that constraint will be made apparent shortly.

Speaking of lexical variables, the next difference between the two versions of the classes is that the descriptions of attributes in Example 15-1:

# Objects of this class have the following attributes...
#     'root'   - The root directory of the file hierarchy#     'files'  - An array storing an object for each file in the root directory

have become declarations of attributes in Example 15-2:

# Objects of this class have the following attributes...
my %root_of;   # The root directory of the file hierarchymy %files_of;  # An array storing an object for each file in the root directory

This is an enormous improvement. By telling Perl what attributes you expect to use, you enable the compiler to check—via use strict—that you do indeed use only those attributes.

That's possible because of the third difference in the two approaches. Each attribute of a hash-based object is stored in an entry in the object's hash: $self->{name}. In other words, the name of a hash-based attribute is symbolic: specified by the string value of a hash key. In contrast, each attribute of an inside-out object is stored in an entry of the attribute's hash: $name_of{ident $self}. So the name of an inside-out attribute isn't symbolic; it's a hard-coded variable name.

With hash-based objects, if an attribute name is accidentally misspelled in some method:

sub set_name {

    my ($self, $new_name) = @_;

    $self->{naem} = $new_name;             # Oops!

    return;
}

then the $self hash will obligingly—and silently!—create a new entry in the hash, with the key 'naem', then assign the new name to it. But since every other method in the class correctly refers to the attribute as $self->{name}, assigning the new value to $self->{naem} effectively makes that assigned value "vanish".

With inside-out objects, however, an object's "name" attribute is stored as an entry in the class's lexical %name_of hash. If the attribute name is misspelled, then you're attempting to refer to an entirely different hash: %naem_of. Like so:

sub set_name {
    my ($self, $new_name) = @_;

    $naem_of{ident $self} = $new_name;     # Kaboom!

    return;}

But, because there's no such hash declared in the scope, use strict will complain (with extreme prejudice):

Global symbol "%naem_of" requires explicit package name at Hierarchy.pm line 86

Not only is that consistency check now automatic, it's also performed at compile time.

The next difference is even more important and beneficial. Instead of blessing an empty anonymous hash as the new object:

my $new_object = bless {}, $class;

the inside-out constructor blesses an empty anonymous scalar:

my $new_object = bless do{my $anon_scalar}, $class;

That odd-looking do{my $anon_scalar} construct is needed because there's no built-in syntax in Perl for creating a reference to an anonymous scalar; you have to roll-your-own (see the upcoming "Nameless Scalars" sidebar for details). Alternatively, you may prefer to avoid the oddity and just use the anon_scalar( ) function that's provided by the Class::Std::Utils CPAN module:

use Class::Std::Utils;

# and later...my $new_object = bless anon_scalar( ), $class;

Whichever way the anonymous scalar is created, it's immediately passed to bless, which anoints it as an object of the appropriate class. The resulting object reference is then stored in $new_object.

Once the object exists, it's used to create a unique key (ident $new_object) under which each attribute that belongs to the object will be stored (e.g., $root_of{ident $new_object} or $name_of{ident $self}). The ident( ) utility that produces this unique key is provided by the Class::Std::Utils module and is identical in effect to the refaddr( ) function in the standard Scalar::Util module. That is, ident($obj) simply returns the memory address of the object as an integer. That integer is guaranteed to be unique to the object, because only one object can be stored at any given memory address. You could use refaddr( ) directly to get the address if you prefer, but the Class::Std::Utils gives it a shorter, less obtrusive name, which makes the resulting code more readable.

To recap: every inside-out object is a blessed scalar, and has—intrinsic to it—a unique identifying integer. That integer can be obtained from the object reference itself, and then used to access a unique entry for the object in each of the class's attribute hashes.

But why is that so much better than just using hashes as objects? Because it means that every inside-out object is nothing more than an uninitialized scalar. When your constructor passes a new inside-out object back to the client code, all that comes back is an empty scalar, which makes it impossible for that client code to gain direct access to the object's internal state.

Oh, sure, the client code could pass an object reference to refaddr( ) or ident( ) to obtain the unique identifier under which that object's state is stored. But that won't help. The client code is outside the block that surrounds the object's class. So, by the time the client code gets hold of an object, the lexical attribute hashes inside the class block (such as %names_of and %files_of) will be out of scope. The client code won't even be able to see them, let alone access them.

At this point you might be wondering: if those attribute hashes are out of scope, why didn't they cease to exist? As explained in the "Nameless Scalars" sidebar, variables are garbage-collected only when nothing refers to them anymore. But the attribute hashes in each class are permanently referred to—by name—in the code of the various methods of the class. It's those references that keep the hashes "alive" even after their scope ends. Interestingly, that also means that if you declare an attribute hash and then don't actually refer to it in any of the class's methods, that hash will be garbage-collected as soon as the declaration scope finishes. So you don't even pay a storage penalty for attributes you mistakenly declare but never use.

With a hash-based object, object state is protected only by the client coder's self-discipline and sense of honour (that is, not at all):

# Find the user's videos...
$vid_lib = File::Hierarchy->new('~/videos'),

# Replace the first three with titles that aren't
# actually in the directory (bwah-ha-ha-hah!!!!)...
$vid_lib->{files}[0]  = q{Phantom Menace};
$vid_lib->{files}[1]  = q{The Man Who Wasn't There};
$vid_lib->{files}[2]  = q{Ghost};

But if the File::Hierarchy constructor returns an inside-out object instead, then the client code gets nothing but an empty scalar, and any attempt to mess with the object's internal state by treating the object as a raw hash will now produce immediate and fatal results:

Not a HASH reference at client_code.pl line 6

By implementing all your classes using inside-out objects from the very beginning, you can ensure that client code never has the opportunity to rely on the internals of your class—as it will never be given access to those internals. That guaranteed isolation of internals from interface makes inside-out objects intrinsically more maintainable, because it leaves you free to make changes to the class's implementation whenever you need to.

Of the several popular methods of reliably enforcing encapsulation in Perl[98], inside-out objects are also by far the cheapest. The run-time performance of inside-out classes is effectively identical to that of regular hash-based classes. In particular, in both schemes, every attribute access requires only a single hash look-up. The only appreciable difference in speed occurs when an inside-out object is destroyed (see the "Destructors" guideline later in this chapter).

The relative memory overheads of the two schemes are a little more complex to analyze. Hash-based classes require one hash per object (obviously). On the other hand, inside-out classes require one (empty) scalar per object, plus one hash per declared attribute (i.e., %name_of, %files_of, and so on). Both schemes also need one scalar per attribute per object (the actual storage for their data inside the various hashes), but that cancels out in the comparison and can be ignored. All of which means that, given the relative sizes of an empty hash and an empty scalar (about 7.7 to 1), inside-out objects are more space-efficient than hash-based objects whenever the number of objects to be created is at least 15% higher than the number of attributes per object. In practical terms, inside-out classes scale better than hash-based classes as the total number of objects increases.

The only serious drawback of inside-out objects stems directly from their greatest benefit: encapsulation. Because their internals cannot be accessed outside their class, you can't use Data::Dumper (or any other serialization tool) to help you debug the structure of your objects. "Automating Class Hierarchies" in Chapter 16 describes a simple means of overcoming this limitation.

Constructors

Give every constructor the same standard name.

Specifically, name the constructor of every class you write: new( ). It's short, accurate, and standard across many OO languages.

If every constructor uses the same name, the developers using your classes will always be able to guess correctly what method they should call to create an object, which will save them time and frustration looking up the fine manual—yet again—to remind themselves which obscurely named method call is required to instantiate objects of each particular class.

More importantly, using a standard constructor will make it easier for the maintainers of your code to understand what a particular method call is doing. Specifically, if the call is to new( ), then it will definitely be creating an object.

Constructors with clever names are cute and may sometimes even improve readability:

my $port = Port->named($url);
my $connection = Socket->connected_to($port);

But constructors with standard names make the resulting code easier to write correctly, and possible to comprehend in six months time:

my $port = Port->new({ name => $url });

my $connection = Socket->new({ connect_to => $port });

Cloning

Don't let a constructor clone objects.

If you overload your constructors to also clone objects, it's too hard to tell the difference between construction and copying in client code:

$next_obj = $requested->new(\%args);     # New object or copy?

Methods that create new objects and methods that clone existing objects have a large amount of overlap in their behaviour. They both have to create a new data structure, bless it into an object, locate and verify the data to initialize its attributes, initialize its attributes, and finally return the new object. The only significant difference between construction and cloning is where the attribute data originates: externally in the case of a constructor, and internally in the case of a clone method.

The natural temptation is to combine the two methods into a single method. And the usual mental leap at that point is that Perl methods can always be called either as class methods or as instance methods. So, hey, why not simply have new( ) act like a constructor if it's called as a class method:

$new_queue = Queue::Priority->new({ selector => &most_urgent });

and then act like a cloning method if it's called on an existing object:

$new_queue = $curr_queue->new( );

Because that can be achieved by adding only a single "paragraph" at the start of the existing constructor, as Example 15-3 illustrates. Cool!

Example 15-3. A constructor that also clones

sub new {
    my ($invocant, $arg_ref) = @_;

    # If method called on an object (i.e., a blessed reference)...
    if (ref $invocant) {
        # ...then build the argument list by copying the data from the object...
        $arg_ref = {
            selector => $selector_of{ident $invocant},
            data     => [ @{$data_of{ident $invocant} } ],
        }
    }

    # Work out the actual class name...
    my $class = ref($invocant)||$invocant;

    # Build the object...
    my $new_object = bless anon_scalar( ), $class;

    # And initialize its attributes...
    $selector_of{ident $new_object} = $arg_ref->{selector};
    $data_of{ident $new_object}     = $arg_ref->{data};

    return $new_object;
}

A variation on this idea is to allow constructor calls on objects, but have them still act like ordinary constructors, creating a new object of the same class as the object on which they're called:

sub new {
    my ($invocant, $arg_ref) = @_;

    # Work out the actual class name...
    my $class = ref($invocant)||$invocant;

    # Build the object...
    my $new_object = bless anon_scalar( ), $class;

    # And initialize its attributes...
    $selector_of{ident $new_object} = $arg_ref->{selector};
    $data_of{ident $new_object}     = $arg_ref->{data};

    return $new_object;
}

Unfortunately, there are several flaws in either of these approaches. The most obvious is that it suddenly becomes impossible to be sure what a given call to new( ) is actually doing. That is, there's no way to tell whether a statement like:

$next_possibility->new( \%defaults );

is creating a new object or copying an existing one. At least, no way to tell without first determining what's in $next_possibility. If, for example, the call to new( ) is part of a processing loop like:

# Investigate alternative storage mechanisms...
for my $next_possibility ( @possible_container_classes ) {
    push @active_queues, $next_possibility->new( \%defaults );
    # etc.
}

then it's (probably) a constructor, but if it's part of a loop like:

# Examine possible data sources...
for my $next_possibility ( @active_queues ) {
    push @phantom_queues, $next_possibility->new( \%defaults );
    # etc.
}

then it's likely to be cloning. The point is, you can no longer tell what's happening just by looking at the code where it's happening. You can't even really tell by looking at the array that's being iterated, until you trace back further and work out what kind of values that array is actually storing.

In contrast, if new( ) only ever constructs, and cloning is always done with a method called clone( ), then the very same method call:

$next_possibility->new( \%defaults );

is now clearly and unambiguously a constructor, regardless of context. Had it been intended to be a cloning operation, it would—equally unambiguously—have been written:

$next_possibility->clone( \%defaults );

Apart from not being able to say precisely what you mean, multipurpose constructors create a second maintenance problem: as Example 15-3 illustrates, adding cloning support needlessly complicates the constructor code itself. Especially when a separate clone( ) method can often be implemented far more cleanly in fewer lines of code, without modifying new( ) at all:

sub clone {
    my ($self) = @_;

    # Work out the object's class (and verify that it actually has one)...
    my $class = ref $self
        or croak( qq{Can't clone non-object: $self} );

    # Construct a new object,
    # copying the current object's state into the constructor's argument list...
    return $class->new({
        selector => $selector_of{ident $self},
        data     => [ @{ $data_of{ident $self} } ],
    });}

Separating your new( ) and clone( ) methods makes it possible to accurately encode your intentions in any code that creates new objects. That, in turn, makes understanding and debugging that code very much easier. Separate creation methods also make your class's own code cleaner and more maintainable.

Note that the same reasoning and advice applies in any situation where you're tempted to overload the behaviour of a single method or subroutine to provide two or more related functions. Resist that urge.

Destructors

Always provide a destructor for every inside-out class.

The many advantages of inside-out classes described earlier come at almost no performance cost. Almost. The one respect in which they are marginally less efficient is their destructor requirements.

Hash-based classes often don't even have a destructor requirement. When the object's reference count decrements to zero, the hash is automatically reclaimed, and any data structures stored inside the hash are likewise cleaned up. This technique works so well that many OO Perl programmers find that they never need to write a DESTROY( ) method; Perl's built-in garbage collection handles everything just fine.

The only time that hash-based classes do need a destructor is when their objects are managing resources that are external to the objects themselves: databases, files, system processes, hardware devices, and so on. Because the resources aren't inside the objects (or inside the program, for that matter), they aren't affected by the object's garbage collection. Their "owner" has ceased to exist, but they remain: still reserved for the use of the program in question, but now completely unbeknownst to it.

So the general rule for Perl classes is: always provide a destructor for any object that manages allocated resources that are not actually located inside the object.

But the whole point of an inside-out object is that its attributes are stored in allocated hashes that are not actually located inside the object. That's precisely how it achieves secure encapsulation: by not sending the attributes out into the client code.

Unfortunately, that means when an inside-out object is eventually garbage-collected, the only storage that is reclaimed is the single blessed scalar implementing the object. The object's attributes are entirely unaffected by the object's deallocation, because the attributes are not inside the object, nor are they referred to by it in any way.

Instead, the attributes are referred to by the various attribute hashes in which they're stored. And because those hashes will continue to exist until the end of the program, the defunct object's orphaned attributes will likewise continue to exist, safely nestled inside their respective hashes, but now untended by any object. In other words, when an inside-out object dies, its associated attribute hashes leak memory.

The solution is simple. Every inside-out class has to provide a destructor that "manually" cleans up the attributes of the object being destructed. Example 15-4 shows the necessary addition to the File::Hierarchy class from Example 15-2.

Example 15-4. An inside-out class with its necessary destructor

package File::Hierarchy;
use Class::Std::Utils;
{

    # Objects of this class have the following attributes...
    my %root_of;   # The root directory of the file hierarchy
    my %files_of;  # An array storing an object for each file in the root directory

    # Constructor takes path of file system root directory...
    sub new {
         # [As in Example 15-2]
    }

    # Retrieve files from root directory...
    sub get_files {
        # [As in Example 15-2]
    }

    # Clean up attributes when object is destroyed...
    sub DESTROY {
        my ($self) = @_;

        delete $root_of{ident $self};
        delete $files_of{ident $self};

        return;
    }}

The obligation to provide a destructor like this in every inside-out class can be mildly irritating, but it is still a very small price to pay for the considerable benefits that the inside-out approach otherwise provides for free. And the irritation can easily be eliminated by using the appropriate class construction tools, as explained under "Automating Class Hierarchies" in Chapter 16.

Methods

When creating methods, follow the general guidelines for subroutines.

Despite their obvious differences in dispatch semantics, methods and subroutines are similar in most respects. From a coding point of view, about the only significant difference between the two is that methods tend to have fewer parameters[99].

When you're writing methods, use the same approach to layout (Chapter 2), and the same naming conventions (Chapter 3), and the same argument-passing mechanisms and return behaviours (Chapter 9), and the same error-handling techniques (Chapter 13) as for subroutines.

The only exception to that advice concerns naming. Specifically, the "Homonyms" guideline in Chapter 9 doesn't apply to methods. Unlike subroutines, it's acceptable for a method to have the same name of a built-in function. That's because methods are always called with a distinctive syntax, so there's no possible ambiguity between:

$size = length $target;     # Stringify target object; take length of string

and:

$size = $target->length( );  # Call length( ) method on target object

It's important to be able to use builtin names for methods, because one of the commonest uses of object-oriented Perl is to create new data types, which often need to provide the same kinds of behaviours as Perl's built-in data types. If that's the case, then those behaviours ought to be named the same as well. For instance, the class in Example 15-5 is a kind of queue, so code that uses that class will be easier to write, and later comprehend, if the queue objects push and shift data using push( ) and shift( ) methods:

my $waiting_list = FuzzyQueue->new( );

# Load client names...
while (my $client = prompt 'Client: ') {
    $waiting_list->push($client);
}

# Then rotate the contents of the queue (approximately) one notch...$waiting_list->push(  $waiting_list->shift( ) );

Naming those same methods append( ) and next( ) makes it slightly harder to work out what's going on (as you can't reason by analogy to Perl's builtins):

my $waiting_list = FuzzyQueue->new( );
# Load client names...
while (my $client = prompt('Client: ')) {
    $waiting_list->append($client);
}

# Then rotate the contents of the queue (approximately) one notch...
$waiting_list->append(  $waiting_list->next( ) );

Example 15-5. A mildly stochastic queue

# Implement a queue that's slightly blurry about where it adds new elements...
package FuzzyQueue;
use Class::Std::Utils;
use List::Util qw( max );
{
    # Attributes...
    my %contents_of;     # The array storing each fuzzy queue's data
    my %vagueness_of;    # How fuzzy should the queue be?

    # The usual inside-out constructor...
    sub new {
        my ($class, $arg_ref) = @_;

        my $new_object = bless anon_scalar( ), $class;

        $contents_of{ident $new_object} = [];
        $vagueness_of{ident $new_object}
            = exists $arg_ref->{vagueness} ? $arg_ref->{vagueness} : 1;

        return $new_object;
    }

    # Push each element somewhere near the end of queue...
    sub push {
        my ($self) = shift;

        # Unpack contents of queue...
        my $queue_ref = $contents_of{ident $self};

        # Grab each datum...
        for my $datum (@_) {
            # Scale the random fuzziness to the amount specified for this queue...
            my $fuzziness = rand $vagueness_of{ident $self};

            # Squeeze the datum into the array, using a negative number
            # to count (fuzzily) back from the end, but making sure not
            # to run off the front...
            splice @{$queue_ref}, max(-@{$queue_ref}, -$fuzziness), 0, $datum;
        }

        return;
    }

    Grab the object's data and shift off the first datum (in a non-fuzzy way)...
    sub shift {
        my ($self) = @_;
        return shift @{ $data_of{ident $self} };
    }}

Accessors

Provide separate read and write accessors.

Most developers who write classes in Perl provide access to an object's attributes in the way that's demonstrated in Example 15-6.

That is, they write a single method[100] for each attribute, giving that method the same name as the attribute. Each accessor method always returns the current value of its corresponding attribute, and each can be called with an extra argument, in which case it also updates the attribute to that new value. For example:

# Create the new military record...
my $dogtag = Dogtag->new({ serial_num => 'AGC10178B' });

$dogtag->name( 'MacArthur', 'Dee' );    # Called with args, so store name attr
$dogtag->rank( 'General' );             # Called with arg, so store rank attr

# Called without arg, so just retrieve attribute values...
print 'Your new commander is: ', 
      $dogtag->rank(), $SPACE, $dogtag->name( )->{surname},
      "
";

print 'Her serial number is:  ', $dogtag->serial_num( ), "
";

This approach has the advantage of requiring only a single, obviously named method per attribute, which means less code to maintain. It also has the advantage that it's a widely known convention, used both throughout Perl's OO-related manpages and in numerous books.

However, despite those features, it's clearly not the best way to write accessor methods.

Example 15-6. The usual way accessor methods are implemented

package Dogtag;
use Class::Std::Utils;
{
    # Attributes...
    my %name_of;
    my %rank_of;
    my %serial_num_of;

    # The usual inside-out constructor...
    sub new {
        my ($class, $arg_ref) = @_;

        my $new_object = bless anon_scalar( ), $class;

        $serial_num_of{ident $new_object} =  $arg_ref->{serial_num},

        return $new_object;
    }

    # Control access to the name attribute...
    sub name {
        my ($self, $new_surname, $new_first_name) = @_;
        my $ident = ident($self);          # Factor out repeated calls to ident( )

        # No argument means return the current value...
        return $name_of{$ident} if @_ == 1;

        # Otherwise, store the two components of the new value...
        $name_of{$ident}{surname}    = $new_surname;
        $name_of{$ident}{first_name} = $new_first_name;

        return;
    }


    # Same deal for accessing the rank attribute...
    sub rank {
        my ($self, $new_rank) = @_;

        return $rank_of{ident $self} if @_ == 1;

        $rank_of{ident $self} = $new_rank;

        return;
    }

    # Serial numbers are read-only, so this accessor is much simpler...
    sub serial_num {
        my ($self) = @_;

        return $serial_num_of{ident $self};
    }

    # [Other methods of the class here]

    sub DESTROY {
        my ($self) = @_;
        my $ident = ident($self);     # Factor out repeated calls to ident( )

        for my $attr_ref (\%name_of, \%rank_of, \%serial_num_of) {
            delete $attr_ref->{$ident};
        };

        return;
    }
}

For a start, these dual-purpose methods suffer from some of the same drawbacks as the dual-purpose constructors that were advised against earlier (see the "Cloning" guideline). For example, this might or might not change the dogtag's name:

$dogtag->name(@curr_soldier);

depending on whether @curr_soldier is empty. That might sometimes be very desirable behaviour, but it can also mask some very subtle bugs if it's not what was intended. Either way, a dual-purpose accessor doesn't always give you the ability to encode your intentions unambiguously.

The combined store/retrieve methods are also marginally less efficient than they could be, as they have to perform an extra conditional test every time they're called, in order to work out what they're supposed to do. Comparisons of this kind are very cheap, so it's not a big deal—at least, not until your system scales to the point where you're doing a very large number of accesses.

The final problem with this approach is subtler and more profound; in fact, it's psychological. There's actually a nasty flaw in the code of one of the accessors shown in Example 15-6. It's comparatively hard to see because it's a sin of omission. It bites developers because of the way they naturally think.

The problem is in the serial_num( ) method: unlike the other two accessors, it isn't dual-purpose. The consistent get/set behaviour of the name( ) and rank( ) methods[101] sets up and then reinforces a particular expectation: pass an argument, update the attribute.

So it's natural to expect that the following will also work as intended:

# convert from old serial numbers to the new prefixed scheme...
for my $dogtag (@division_personnel) {
    my $old_serial_num = $dogtag->serial_num( );
    $dogtag->serial_num( $division_code . $old_serial_num );
}

But, of course, it doesn't work at all. Worse, it fails silently. The call to serial_num( ) completely ignores any arguments passed to it, and quietly goes about its sole task of returning the existing serial number, which is then silently thrown away. Debugging these kinds of problems can be exceptionally difficult, because your brain gets in the way. Having subliminally recognized the "pass argument; set attribute" pattern, your brain will have filed that belief away as one of the axioms of the class, and when it later sees:

$dogtag->serial_num( $division_code . $old_serial_num );

it automatically excludes the possibility that that statement could possibly be the cause of the program's misbehaviour. You're passing an argument, so it must be updating the attribute. That's a given. The problem must be somewhere else.

Of course, none of this happens at a conscious level. You just automatically ignore the offending line and start debugging the hard way, tracing the data back to see where it "got corrupted" and then forward to see where it "gets erased". Finally, after a couple of fruitless, frustrating hours some weedy intern on his very first day, being shown around by your boss, will glance over your shoulder, look straight at the serial_num( ) call, and point out your "obvious" error.

The real problem here isn't your brain's psychological blind-spot; the real problem is that the rest of the dual-purpose accessors are guessing your intention from the data sent to them. But the single-purpose serial_num( ) doesn't need to guess; it always knows exactly what to do. The natural, human response is to rejoice in that certainty and simply code for what you know the method should always do, rather than catering for what others might potentially think it could do.

The problem isn't hard to solve, of course. You simply rewrite serial_num( ) to anticipate and avoid the inevitable psychological trap:

# Serial numbers are read-only, so this accessor is much simpler...
sub serial_num {
    my ($self) = @_;

    croak q{Can't update serial number} if @_ > 1;

    return $serial_num_of{ident $self};

}

Unfortunately, very few developers ever do that. It's easier not to write the extra line. And it's much easier not to have to ponder the gestalt psychodynamic ramifications of the class on the collective developer consciousness in order to work out that you needed to write that extra line in the first place.

Under the dual-purpose accessor idiom, the natural inclination to omit that "unnecessary" code leaves the interpreter unable to diagnose a common mistake. Fortunately, it isn't difficult to turn those consequences around, so that leaving unnecessary code out causes the interpreter to diagnose the mistake. All you need to do is split the two distinct access tasks into two distinct methods, as shown in Example 15-7.

Example 15-7. A better way to implement class accessors

    # Control access to the name attribute...
    sub set_name {
        my ($self, $new_surname, $new_first_name) = @_;

        # Check that all arguments are present and accounted for...
        croak( 'Usage: $obj->set_name($new_surname, $new_first_name)' )
            if @_ < 3;

        # Store components of new value in a hash...
        $name_of{ident $self}{surname}    = $new_surname;
        $name_of{ident $self}{first_name} = $new_first_name;

        return;
    }

    sub get_name {
        my ($self) = @_;
        return $name_of{ident $self};
    }

    # Same deal for accessing the rank attribute...

    sub set_rank {
        my ($self, $new_rank) = @_;

        $rank_of{ident $self} = $new_rank;

        return;
    }

    sub get_rank {
        my ($self) = @_;
        return $rank_of{ident $self};
    }

    # Serial numbers are read-only, so there's no set_serial_num( ) accessor...
    sub get_serial_num {
        my ($self) = @_;
        return $serial_num_of{ident $self};    }

Here, each accessor that returns a value just returns that value, whereas each accessor that stores a value expects a second argument (the new value), uses it to update the attribute, and then returns nothing.

Any code that uses these accessors will now explicitly record the developer's intention for each accessor call:

# Create the new military record...
my $dogtag = Dogtag->new( {serial_num => 'AGC10178B'} );

$dogtag->set_name( 'MacArthur', 'Dee' );
$dogtag->set_rank( 'General' );

# Retrieve attribute values...
print 'Your new commander is: ',
      $dogtag->get_rank(), $SPACE, $dogtag->get_name( )->{surname}, "
";

print 'Her serial number is:  ',      $dogtag->get_serial_num( ), "
";

The code is also now slightly easier to read, because you can tell at a glance whether a particular accessor call is updating or retrieving an attribute value. So the former "reminder" comments (# Called with arg, so store name attr) are no longer necessary; the code is now self-documenting in that respect.

More importantly, no-one is ever going to mistakenly write:

$dogtag->get_serial_num( $division_code . $old_serial_num );

Human brains don't misbehave that particular way—which means you don't have to remember to have get_serial_number( ) test for that possibility.

That's not to say that developers who use the class won't still misgeneralize the getting-vs-storing axiom. They will. But now, having successfully called set_name( ) and set_rank( )[102], the rule they'll mistakenly devise is: "call set_whatever( ); update an attribute". Hence when they erroneously try to update the serial number, what they'll write is:

$dogtag->set_serial_num( $division_code . $old_serial_num );

At which point the interpreter will immediately shoot to kill:

Can't locate object method "set_serial_num" via package "Dogtag"
at rollcall.pl line 99

Now the natural programmer tendency to leave out extraneous code is actually working in your favour. By not implementing set_serial_num( ), you've ensured that any erroneous attempts to use it are automatically detected, and loudly reported.

Implementing separate "get" and "set" accessors for attributes offers a significant improvement in readability and self-documentation, and even a marginal boost in performance. By using distinct method names for distinct operations, you can better encode your intentions in your source code, use one human frailty (under-exertion) to guard against another (overgeneralization) and—best of all—convince the compiler to debug your colleagues' miswired brains for you.

Lvalue Accessors

Don't use lvalue accessors.

Since Perl 5.6, it has been possible to specify a subroutine that returns a scalar result as an lvalue, which can then be assigned to. So another popular approach to implementing attribute accessor methods has arisen: using lvalue subroutines, as in Example 15-8.

Example 15-8. Another way to implement accessor methods

# Provide access to the name attribute...
sub name :lvalue {
    my ($self) = @_;
    return $name_of{ident $self};
}

sub rank :lvalue {
    my ($self) = @_;
    return $rank_of{ident $self};
}

# Serial numbers are read-only, so not lvalue...
sub serial_num  {
    my ($self) = @_;
    return $serial_num_of{ident $self};
}

The resulting code is certainly much more concise. And, perhaps surprisingly, the return to a single accessor per attribute doesn't reinstate the problems of uncertain intention leading to invisible errors, because the accessors would now be used differently, with a clear syntactic distinction between storing and retrieving:

# Create the new military record...
my $dogtag = Dogtag->new( {serial_num => 'AGC10178B'} );

# Store attribute values...
$dogtag->name = {surname=>'MacArthur', first_name=>'Dee'}; 
$dogtag->rank = 'General' ;

# Retrieve attribute values...
print 'Your new commander is: ', 
      $dogtag->rank(), $SPACE, $dogtag->name( )->{surname}, "
";

print 'Her serial number is:  ', 
      $dogtag->serial_num( ), "
";

And, now, if overgeneralization again leads to a misguided attempt to update the serial number:

$dogtag->serial_num( ) = $division_code . $old_serial_num;

the compiler will again detect and report the problem:

Can't modify non-lvalue subroutine call at rollcall.pl line 99

This certainly looks like a viable alternative to separate getting and storing. It requires less code and handles the psychology just as well. Unfortunately, lvalue methods are less reliable and less maintainable.

They're unreliable because they remove all your carefully crafted encapsulation from around the object, by granting direct and unrestricted access to its attributes. That is, a call such as $obj->name( ) is now identical to a direct access like $name_of{$obj}. So you can no longer guarantee that your Dogtag objects store their name information under the correct keys, or even in a hash at all.

For example, the set_name( ) method in Example 15-7 ensures that both names are passed and then stored in a hash in the appropriate attribute entry, so a misuse like this:

$dogtag->set_name('Dee MacArthur'),

throws an immediate exception:

Usage: $obj->set_name($new_surname, $new_first_name) at 'promote.pl' line 33

But using the equivalent lvalue name( ) accessor from Example 15-8 doesn't do any data validation; it just returns the attribute storage, with which client code can then have its wicked way:

$dogtag->name = 'Dee MacArthur';

That string is assigned directly to the internal $name_of{ident $dogtag} attribute, which is supposed to store only a hash reference. So any other methods that rely on $name_of{ident $self} being a hash reference:

# much later...
$dogtag->log_orders($orders);

are going to produce unexpected and hard-to-debug errors, because the object's internal state is no longer as expected:

Can't use string ("Dee MacArthur") as a HASH ref 
while "strict refs" in use at 'promote.pl' line 702

Lvalue accessors also make it very much harder to extend or improve your class. Get/set accessors retain control over how attributes are accessed, so if you need to add some sanity checking when military ranks are updated, that's relatively easy to accommodate. For example, you might create a look-up table of known military ranks and a utility subroutine to verify that its argument is a known rank (or die trying):

# Create look-up table of known ranks...
Readonly my @KNOWN_RANKS => (
#   Enlisted...           Commissioned...
    'Private',            'Lieutenant',
    'PFC',                'Captain',
    'Corporal',           'Colonel',
    'Sergeant',           'General',
    # etc.                etc.
);
Readonly my %IS_KNOWN_RANK => map { $_ => 1 } @KNOWN_RANKS;

# Utility subroutine to vet new "rank" values....
sub _check_rank {
    my ($rank) = @_;

    return $rank if $IS_KNOWN_RANK{$rank};

    croak "Can't set unknown rank ('$rank')";}

It would then be trivial to modify the set_rank( ) accessor from Example 15-7 to apply that check every time a dogtag's rank attribute is updated:

sub set_rank {
    my ($self, $new_rank) = @_;

    # New rank now checked first...
    $rank_of{ident $self} = _check_rank($new_rank);  

    return;}

On the other hand, there's no way to add this same check to the lvalue rank( ) accessor from Example 15-8, except by resorting to a tied variable (which is not an acceptable solution—see Chapter 19).

Indirect Objects

Don't use the indirect object syntax.

Quite simply: indirect object syntax is ambiguous. Whereas an "arrowed" method call is certain to call the corresponding method:

my $female_parent = $family->mom( );
my $male_parent   = $family->pop( );

with an indirect object call, the outcome is not at all certain:

my $female_parent = mom $family;    # Sometimes the same as: $family->mom( )
my $male_parent   = pop $family;# Never the same as: $family->pop( )

The pop( ) case is fairly obvious: Perl assumes you're calling the built-in pop function . . . and then complains that it's not being applied to an array[103]. The potential problem in the mom( ) case is a little more subtle: if there's a mom( ) subroutine declared in the package in which mom $family is called, then Perl will interpret that call as mom($family) instead (that is, as a subroutine call, rather than as a method call).

Unfortunately, that particular problem often bites under the most common use of the indirect object syntax: constructor calls. Many programmers who would otherwise never write indirect object method calls will happily call their constructors that way:

my $uniq_id = new Unique::ID;

The problem is that they often do this kind of thing in the method of some other class. For example, they might decide to improve the Dogtag class by using Unique::ID objects as serial numbers:

package Dogtag;
use Class::Std::Utils;
{
    # Attributes...
    my %name_of;
    my %rank_of;
    my %serial_num_of;

    # The usual inside-out constructor...
    sub new {
        my ($class, $arg_ref) = @_;

        my $new_object = bless anon_scalar( ), $class;

        # Now using special objects to ensure serial numbers are unique...
        $serial_num_of{ident $new_object} = new Unique::ID;

        return $new_object;
    }

That approach works fine, until they decide they need to factor it out into a separate class method:

    # The usual inside-out constructor...
    sub new {
        my ($class, $arg_ref) = @_;

        my $new_object = bless anon_scalar( ), $class;

        # Now allocating serial numbers polymorphically...
        $serial_num_of{ident $new_object} =  $class->_allocate_serial_num( );

        return $new_object;
    }

    # Override this method in any derived class that needs a
    # different serial number allocation mechanism...
    sub _allocate_serial_num {
        return new Unique::ID;
    }

As soon as they make this change, the first call to Dogtag->new( ) produces the exception:

Can't locate object method "_allocate_serial_num" via package "Unique::ID"
at Dogtag.pm line 17.

where line 17 is (mysteriously) the assignment:

        $serial_num_of{ident $new_object} =  $class->_allocate_serial_num( );

What happened? Previously, when the new Unique::ID call was still directly inside new( ), that call had to be compiled before new( ) itself could be completely defined. Thus, when the compiler looked at the call, there was—as yet—no subroutine named new( ) defined in the current package, so Perl interpreted new Unique::ID as an indirect method call.

But once the new Unique::ID call has been factored out into a method that's defined after new( ), then the call will be compiled after the compilation of new( ) is complete. So, this time, when the compiler looks at that call, there is a subroutine named new( ) already defined in the current package. So Perl interprets new Unique::ID as a direct unparenthesized subroutine call (to the subroutine Dogtag::new( )) instead. Which means that it immediately calls DogTag::new( ) again, this time passing the string 'Unique::ID' as the sole argument. And when that recursive call to new( ) reaches line 17 again, $class will now contain the 'Unique::ID' string, so the $class->_allocate_serial_num( ) call will attempt to call the non-existent method Unique::ID::_allocate_serial_num( ), and the mysterious exception will be thrown.

That code is hard enough to debug, but it could also have gone wrong in a much more subtle and silent way. Suppose the Unique::ID class actually did happen to have its own _allocate_serial_num( ) method. In that case, the recursive call from Dogtag::_allocate_serial_num back into to the Dogtag constructor wouldn't fail; it would instead put whatever value was returned by the call to Unique::ID->_allocate_serial_num( ) into the $serial_num{ident $self} attribute of the object being created by the recursive Dogtag constructor call, and then return that object. Back in the original constructor call, that Dogtag object would then be assigned to yet another $serial_num{ident $self} attribute: this time the one for the object created in the non-recursive constructor call. The outermost constructor would also succeed and return its own Dogtag object.

But, now, instead of having a Unique::ID object for its serial number, that final Dogtag object would possess a serial number that consisted of a (nested) Dogtag object, whose own serial number attribute would contain whatever kind of value Unique::ID::_allocate_serial_num( ) happened to return: perhaps a Unique::ID object, or possibly a raw string, or maybe even just undef (if Unique::ID::_allocate_serial_num( ) happened to be a mutator method that merely updates its own object and doesn't return a value at all).

Pity the poor maintenance programmer who has to unravel that mess[104].

Indirect object method calls are ambiguous, brittle, fickle, and extremely context-sensitive. They can be broken simply by moving them about within a file, or by declaring an entirely unrelated subroutine somewhere else in the current package. They can lead to complex and subtle bugs. Don't use them.

Class Interfaces

Provide an optimal interface, rather than a minimal one.

When it comes to designing the interface of a class, developers are often advised to follow Occam's Razor and avoid multiplying their methods unnecessarily. The result is all too often a class that offers only the absolute minimal set of functionality, as in Example 15-9.

Example 15-9. A bit-string class with the smallest possible interface

package Bit::String;
use Class::Std::Utils;
{
    Readonly my $BIT_PACKING => 'b*';    # i.e. vec( ) compatible binary
    Readonly my $BIT_DENSITY => 1;       # i.e. 1 bit/bit

    # Attributes...
    my %bitset_of;

    # Internally, bits are packed eight-to-the-character...
    sub new {
        my ($class, $arg_ref) = @_;

        my $new_object = bless anon_scalar( ), $class;

        $bitset_of{ident $new_object} 
            = pack $BIT_PACKING,join $EMPTY_STR,map {$_ ? 1 : 0} @{$arg_ref->{bits}};

        return $new_object;
    }

    # Retrieve a specified bit...
    sub get_bit {
        my ($self, $bitnum) = @_;

        return vec($bitset_of{ident $self}, $bitnum, $BIT_DENSITY);
    }

    # Update a specified bit...
    sub set_bit {
        my ($self, $bitnum, $newbit) = @_;

        vec($bitset_of{ident $self}, $bitnum, $BIT_DENSITY) = $newbit ? 1 : 0;

        return 1;
    }
}

Rather than enhancing maintainability, classes like that often reduce it, because they force developers who are using the class to invent their own sets of utility subroutines for frequent tasks:

# Convenience subroutine to flip individual bits...
sub flip_bit_in {
    my ($bitset_obj, $bitnum) = @_;

    my $bit_val = $bitset_obj->get_bit($bitnum);
    $bitset_obj->set_bit( $bitnum, !$bit_val );

    return;
}

# Convenience subroutine to provide a string representation of the bits...
sub stringify {
    my ($bitset_obj) = @_;

    my $bitstring = $EMPTY_STR;
    my $next_bitnum = 0;

    RETRIEVAL :
    while (1) {
        my $nextbit = $bitset_obj->get_bit($next_bitnum++);
        last RETRIEVAL if !defined $nextbit;

        $bitstring .= $nextbit;
    }

    return $bitstring;
}

And that's definitely "sets" (plural), because it's highly likely that every developer—or at least every project team—will develop a separate set of these utility subroutines. And it's also likely—because of the strong encapsulation provided by inside-out objects—that every one of those sets of utility subroutines will be just as inefficient as the ones shown earlier.

Don't be afraid to provide optimized methods for the common usages. Implementing frequently used procedures internally, as in Example 15-10, often makes those utilities far more efficient, as well as making the class itself more useful and user-friendly.

Example 15-10. A bit-string class with a more useful interface

package Bit::String;
use Class::Std::Utils;
{
    Readonly my $BIT_PACKING => 'b*';    # i.e. vec( ) compatible binary
    Readonly my $BIT_DENSITY => 1;       # i.e. 1 bit/bit

    # Attributes...
    my %bitset_of;

    sub new {
        # [As in] Example 15-9
    }

    sub get_bit {
        # [As in Example 15-9]
    }

    sub set_bit {
        # [As in Example 15-9]
    }

    # Convenience method to flip individual bits...
    sub flip_bit {
        my ($self, $bitnum) = @_;

        vec($bitset_of{ident $self}, $bitnum, $BIT_DENSITY)
            = !vec($bitset_of{ident $self}, $bitnum, $BIT_DENSITY);

        return;
    }


    # Convenience method to provide a string representation of the bits...
    sub as_string {
        my ($self) = @_;

        return join $EMPTY_STR, unpack $BIT_PACKING, $bitset_of{ident $self};
    }}

Convenience methods can also dramatically improve the readability and self-documentation of the resulting client code:

$curr_state->flip_bit($VERBOSITY_BIT);

print 'The current state is: ', $curr_state->as_string( ), "
";

Because, if they aren't provided, the developers may not choose to devise their own utility subroutines, preferring instead to cut and paste nasty, incomprehensible fragments like:

$curr_state->set_bit($_, !$curr_state->get_bit($_)) for $VERBOSITY_BIT;

print 'The current state is: ', 
      do {
          my @bits;
          while (defined(my $bit = $curr_state->get_bit(scalar @bits))) {
              push @bits, $bit;
          }
          @bits;
       }, 
       "
";

Operator Overloading

Overload only the isomorphic operators of algebraic classes.

Operator overloading is very tempting. It offers the prospect of being able to express operations of your new data type in a compact and syntactically distinctive way. Unfortunately, overloading operators more often produces code that is both hard to comprehend and vastly less maintainable. For example:

# Special string class with useful operators...

package OpString;
{
    use overload (
        '+'   => 'concatenate',
        '-'   => 'up_to',
        '/'   => 'either_or',
        '<=>' => 'swap_with',
        '~'   => 'optional',

        # Use Perl standard behaviours for other operations...
        fallback => 1,
    );
}

# And later...

$search_for = $MR/$MRS + ~$first_name + $family_name;
 
$allowed_pet_range = $CAT-$DOG;

$home_phone <=> $work_phone;

Though the resulting client code is compact, the non-standard usages of the various operators make it much harder to understand and maintain, compared to:

package OpString;
{
    use overload (
        '.'   => 'concatenate',

        # Use Perl standard behaviours for other operations...
        fallback => 1,
    );
}

# And later...

$search_for = $MR->either_or($MRS) . first_name->optional( ) . $family_name;

$allowed_pet_range = $CAT->up_to($DOG);$home_phone->swap_with($work_phone);

Note that overloading the "dot" operator was perfectly acceptable here, as it (presumably) works just like Perl's built-in string concatenator.

Overloading other operators can make good sense (and good code), provided two conditions are met. First, the operators you choose to overload must match the standard algebraic notation within the problem's native domain: the set of operators that the domain experts routinely use. Second, the standard domain-specific notation you're recreating in your Perl class must conform to the Perlish precedences and associativities of the operators you're overloading.

Together, those two conditions ensure that the appearance of the selected Perl operator mirrors that of the desired problem domain operator, and that the algebraic properties (precedence and associativity) of the problem domain operator mirror those of the selected Perl operator. In other words, there must be a one-to-one correspondence of form and function: the two notations must be isomorphic.

For example, if your domain experts use the operators +, ., and ! on certain types of values, then it may be appropriate to overload those Perl operators for the corresponding class. However, if those domain experts treat . as being of higher precedence than + (as many mathematicians do), then overloading the corresponding Perl operators isn't appropriate, because . and + have the same precedence in Perl. That kind of mismatch between expectation and reality always leads to hard-to-find bugs.

On the other hand, if the domain experts use ±, ·, and ¬, it's definitely inappropriate to overload the Perl operators +, ., and ! to represent them. The notation isn't the same, so it won't help those who understand the domain to understand your code. In fact, the mismatch of algebraic syntax is far more likely to get in the way.

Coercions

Always consider overloading the boolean, numeric, and string coercions of objects.

When an object reference is used as a boolean, it always evaluates to true by default, so:

croak( q{Can't use non-zero value} ) if $fuzzynum;

always throws an exception, even when $fuzzynum contains 0±0.

An even more serious problem arises when object references are treated as numbers: by default, they numerify to the integer value of their memory address. That means that a statement like:

$constants[$fuzzynum] = 42;

is really something like:

$constants[0x256ad1f3] = 42;

which is:

$constants[627757555] = 42;

which will almost certainly segfault when it tries to allocate six hundred million elements in the @constants array.

A similar problem arises if an object is used where a string is expected:

my $fuzzy_pi = Num::Fuzzy->new({val => 3.1, plus_or_minus => 0.0416});

# And later...

print "Pi is $fuzzy_pi
";     # $fuzzy_pi expected to interpolate a string

In a string context, the object's reference is converted to a debugging value that specifies the class of the object, its underlying data type, and its hexadecimal memory address. So the previous print statement would print something like:

Pi is Num::Fuzzy=SCALAR[0x256ad1f3]

The developer was probably hoping for something more like:

Pi is 3.1± 0.0416

All of these problems occur because objects in Perl are almost always accessed via references. And those references behave like objects only when they're specifically used like objects (i.e., when methods are called on them). When they're used like values (as in the examples), they behave like reference values. The resulting bugs can be particularly hard to discover, and even harder to diagnose once they're noticed.

It's good practice to overload the boolean, numeric, and string coercion behaviour of objects to do something useful and expected. For example:

package Num::Fuzzy;
use charnames qw( :full );
{
    use overload (
        # Ignore the error range when converting to a number...
        q{0+} => sub {
            my ($self) = @_;
            return $self->get_value( );
        },

        # Only true if the range of possible values doesn't include zero...
        q{bool} => sub {
            my ($self) = @_;
            return ! $self->range_includes(0);
        },

        # Convert to string using the as_str( ) method...
        q{""} => sub {
            my ($self) = @_;
            return $self->get_value( )
                   . "N{PLUS-MINUS SIGN}"
                   . $self->get_fuzziness( );
        },

        # Use Perl standard behaviours for other operations...
        fallback => 1,
    );

    # etc.}

In many classes, the most useful thing to do is simply to signal that attempting the coercion was a bad idea:

package Process::Queue;
use Carp;
{
    use overload (
        # Type coercions don't make sense for process queues...
        q{0+} => sub {
            croak( q{Can't numerify a Process::Queue } );
        },

        q{bool} => sub {
            croak( q{Can't get the boolean value of a Process::Queue } );
        },

        q{""} => sub {
            croak( q{Can't get the string value of a Process::Queue } );
        },

        # Use Perl standard behaviours for other operations...
        fallback => 1,
    );

    # etc.}

This last example, suitably adapted, makes an excellent default for any class.



[91] Object Oriented Perl (Manning, 1999) gives a comprehensive overview of the main techniques.

[92] An excellent starting point for exploring these alternatives in Perl is the book Higher-Order Perl, by Mark Jason Dominus (Morgan Kaufmann, 2005)

[93] Such as the automatic interconversion of strings and numbers, and the autovivification of non-existent hash and array elements.

[94] For details of the numerous problems with the pseudohash construct, see Chapters Chapter 4 and Chapter 6 of Object Oriented Perl (Manning, 1999).

[95] You know, the type of safety measures that are effective only against well-meaning, law-abiding folk, for whom they're not actually needed. Like airport security.

[96] Okay, so there's a small fudge there: the hash-based versions could each save three lines by leaving out the comments describing the class attributes. Of course, in that case the two versions, although still functionally identical, would no longer be identically maintainable.

[97] They can be made thread-safe, too, provided each attribute hash is declared as being :shared and the attribute entries themselves are consistently passed to lock( ) before each attribute access. See the perlthrtut documentation for more details.

[98] Including subroutine-based objects, "flyweight" objects, and the Class::Securehash module—see Chapter 11 of Object Oriented Perl (Manning, 1999).

[99] If that proves not to be the case, you should probably re-evaluate your design. Do certain combinations of arguments regularly appear together? Perhaps they ought to be encapsulated in an object of their own that's then passed to the method. Or maybe they ought to be attributes of the invocant itself.

[100] Sometimes referred to as a mutator.

[101] And of the future billet( ) and company( ) and platoon( ) and assignment( ) and service_history( ) and fitrep( ) and medical_record( ) and citations( ) and shoesize( ) methods.

[102] And set_billet( ) and set_company( ) and set_platoon( ) and . . . aw, you get the idea.

[103] Even that helpful message can be confusing when you're working in a method-call mindset: "I thought methods could be called only on scalars? And why would the Family::pop( ) method require a polygamous array of families anyway?"

[104] If you got lost reading the explanation of this problem, you can no doubt imagine how hard it would be to debug the error in live code.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset