Chapter 2. Code Layout

Most people's […] programs should be indented six feet downward and covered with dirt

Blair P. Houghton

Formatting. Indentation. Style. Code layout. Whatever you choose to call it, it's one of the most contentious aspects of programming discipline. More and bloodier wars have been fought over code layout than over just about any other aspect of coding.

So what is the best practice here? Should you use classic Kernighan & Ritchie (K&R) style? Or go with BSD code formatting? Or adopt the layout scheme specified by the GNU project? Or conform to the Slashcode coding guidelines?

Of course not! Everyone knows that [insert your personal coding style here] is the One True Layout Style, the only sane choice, as ordained by [insert your favorite Programming Deity here] since Time Immemorial! Any other choice is manifestly absurd, willfully heretical, and self-evidently a Work of Darkness!!!

And that's precisely the problem. When deciding on a layout style, it's hard to decide where rational choices end and rationalized habits begin.

Adopting a coherently designed approach to code layout, and then applying that approach consistently across all your coding, is fundamental to best practice programming. Good layout can improve the readability of a program, help detect errors within it, and make the structure of your code much easier to comprehend. Layout matters.

But most coding styles—including the four mentioned earlier—confer those benefits almost equally well. So while it's true that having a consistent code layout scheme matters very much indeed, the particular code layout scheme you ultimately decide upon … does not matter at all!

All that matters is that you adopt a single, coherent style; one that works for your entire programming team. And, having agreed upon that style, that you then apply it consistently across all your development.

The layout guidelines suggested in this chapter have been carefully and consciously selected from many alternatives, in a deliberate attempt to construct a coding style that is self-consistent and concise, that improves the readability of the resulting code, that makes it easy to detect coding mistakes, and that works well for a wide range of programmers in a wide range of development environments.

Undoubtedly, there will be some layout guideline here that you disagree with. Probably violently. When you find it, come back and reread the five words at the top of this page. Then decide whether your reasons for your disagreement outweigh the reasons given for the guideline. If they do, then not following that particular guideline won't matter at all.

Bracketing

Brace and parenthesize in K&R style.

When setting out a code block, use the K&R[5] style of bracketing. That is, place the opening brace at the end of the construct that controls the block. Then start the contents of the block on the next line, and indent those contents by one indentation level. Finally, place the closing brace on a separate line, at the same indentation level as the controlling construct.

Likewise, when setting out a parenthesized list over multiple lines, put the opening parenthesis at the end of the controlling expression; arrange the list elements on the subsequent lines, indented by one level; and place the closing parenthesis on its own line, outdenting it back to the level of the controlling expression. For example:

my @names = (
    'Damian',    # Primary key
    'Matthew',   # Disambiguator
    'Conway',    # General class or category
);

for my $name (@names) {
    for my $word ( anagrams_of(lc $name) ) {
        print "$word
";
    }}

Don't place the opening brace or parenthesis on a separate line, as is common under the BSD and GNU styles of bracketing:

# Don't use BSD style...
my @names =
(
    'Damian',    # Primary key
    'Matthew',   # Disambiguator
    'Conway',    # General class or category
);

for my $name (@names)
{
    for my $word (anagrams_of(lc $name))
    {
        print "$word
";
    }
}

# And don't use GNU style either...

for my $name (@names)
  {
    for my $word (anagrams_of(lc $name))
      {
        print "$word
";
      }
  }

The K&R style has one obvious advantage over the other two styles: it requires one fewer line per block, which means one more line of actual code will be visible at any time on your screen. If you're looking at a series of blocks, that might add up to three or four extra code lines per screen.

The main counter-argument in favour of the BSD and GNU styles is usually that having the opening bracket[6] on its own line makes it easier to visually match up the start and end of a block or list. But this argument ignores the fact that it's equally easy to match them up under K&R style. You just scroll upwards until you "bump your head" on the overhanging control construct, then scroll right to the end of the line.

Or, more likely, you'd just hit whatever key your editor uses to bounce between matched brackets. In vi that's %. Emacs doesn't have a native "bounce" command, but it's easy to create one by adding the following to your .emacs file[7]:

;; Use % to match various kinds of brackets...
(global-set-key "%" 'match-paren)
  (defun match-paren (arg)
    "Go to the matching paren if on a paren; otherwise insert %."
    (interactive "p")
    (cond ((string-match "[[{(<]"  next-char) (forward-sexp 1))
          ((string-match "[]})>]" prev-char) (backward-sexp 1))          (t (self-insert-command (or arg 1)))))

More importantly, finding the matching brace or parenthesis is rarely a goal in itself. Most often you're interested in the closing bracket only because you need to determine where the current construct (for loop, if statement, or subroutine) ends. Or you want to determine which construct a particular closing bracket terminates. Both those tasks are marginally easier under K&R style. To find the end of a construct, just look straight down from the construct's keyword; to find what construct a particular bracket terminates, scan straight up until you hit the construct's keyword.

In other words, the BSD and GNU styles make it easy to match the syntax of brackets, whereas K&R makes it easy to match the semantics of brackets. That being said, there is nothing wrong with the BSD or GNU styles of bracketing. If you, and your fellow developers, find that vertically aligned brackets improve your comprehension of code, then use them instead. What matters most is that all the members of your programming team agree on a single style and use it consistently.

Keywords

Separate your control keywords from the following opening bracket.

Control structures regulate the dynamic behaviour of a program, so the keywords of control structures are amongst the most critical components of a program. That's why it's important that those keywords stand out clearly in the source code.

In Perl, most control structure keywords are immediately followed by an opening parenthesis, which can make it easy to confuse them with subroutine calls. It's important to distinguish the two. To do this, use a single space between a keyword and the following brace or parenthesis:

for my $result (@results) {
    print_sep();
    print $result;
}

while ($min < $max) {
    my $try = ($max - $min) / 2;
    if ($value[$try] < $target) {
        $max = $try;
    }
    else {
        $min = $try;
    }}

Without the intervening space, it's harder to pick out the keyword, and easier to mistake it for the start of a subroutine call:

for(@results) {
    print_sep();
    print;
}

while($min < $max) {
    my $try = ($max - $min) / 2;
    if($value[$try] < $target) {
        $max = $try;
    }
    else{
        $min = $try;
    }
}

Subroutines and Variables

Don't separate subroutine or variable names from the following opening bracket.

In order for the previous rule to work properly, it's important that subroutines and variables not have a space between their names and any following brackets. Otherwise, it's too easy to mistake a subroutine call for a control structure, or misread the initial part of an array element as an independent scalar variable.

So cuddle subroutine calls and variable names against their trailing parentheses or braces:

my @candidates = get_candidates($marker);

CANDIDATE:
for my $i (0..$#candidates) {
    next CANDIDATE if open_region($i);

    $candidates[$i]
        = $incumbent{ $candidates[$i]{region} };}

Spacing them out only makes them harder to recognize:

my @candidates = get_candidates ($marker);

CANDIDATE:
for my $i (0..$#candidates) {
    next CANDIDATE if open_region ($i);

    $candidates [$i]
        = $incumbent {$candidates [$i] {region}};
}

Builtins

Don't use unnecessary parentheses for builtins and "honorary" builtins.

Perl's many built-in functions are effectively keywords of the language, so they can legitimately be called without parentheses, except where it's necessary to enforce precedence.

Calling builtins without parentheses reduces clutter in your code, and thereby enhances readability. The lack of parentheses also helps to visually distinguish between subroutine calls and calls to builtins:

while (my $record = <$results_file>) {
    chomp $record;
    my ($name, $votes) = split "	", $record;
    print 'Votes for ',
          substr($name, 0, 10),        # Parens needed for precedence

          ": $votes (verified)
";}

Certain imported subroutines, usually from modules in the core distribution, also qualify as "honorary" builtins, and may be called without parentheses. Typically these will be subroutines that provide functionality that ought to be in the language itself but isn't. Examples include carp and croak (from the standard Carp module—see Chapter 13), first and max (from the standard List::Util module—see Chapter 8), and prompt (from the IO::Prompt CPAN module—see Chapter 10).

Note, however, that in any cases where you find that you need to use parentheses in builtins, they should follow the rules for subroutines, not those for control keywords. That is, treat them as subroutines, with no space between the builtin name and the opening parenthesis:

while (my $record = <$results_file>) {
    chomp( $record );
    my ($name, $votes) = split("	", $record);
    print(
        'Votes for ',
        substr($name, 0, 10),
        ": $votes (verified)
"
    );}

Don't treat them as control keywords (by adding a trailing space):

while (my $record = <$results_file>) {
    chomp ($record);
    my ($name, $votes) = split ("	", $record);
    print (
        'Votes for ',
        substr ($name, 0, 10),
        ": $votes (verified)
"
    );
}

Keys and Indices

Separate complex keys or indices from their surrounding brackets.

When accessing elements of nested data structures (hashes of hashes of arrays of whatever), it's easy to produce a long, complex, and visually dense expression, such as:

$candidates[$i] = $incumbent{$candidates[$i]{get_region()}};

That's especially true when one or more of the indices are themselves indexed variables. Squashing everything together without any spacing doesn't help the readability of such expressions. In particular, it can be difficult to detect whether a given pair of brackets is part of the inner or outer index.

Unless an index is a simple constant or scalar variable, it's much clearer to put spaces between the indexing expression and its surrounding brackets:

$candidates[$i] = $incumbent{ $candidates[$i]{ get_region() } };

Note that the determining factors here are both the complexity and the overall length of the index. Occasionally, "spacing-out" an index makes sense even if that index is just a single constant or scalar. For example, if that simple index is unusually long, it's better written as:

print $incumbent{ $largest_gerrymandered_constituency };

rather than:

print $incumbent{$largest_gerrymandered_constituency};

Operators

Use whitespace to help binary operators stand out from their operands.

Long expressions can be hard enough to comprehend without adding to their complexity by jamming their various components together:

my $displacement=$initial_velocity*$time+0.5*$acceleration*$time**2;

my $price=$coupon_paid*$exp_rate+(($face_val+$coupon_val)*$exp_rate**2);

Give your binary operators room to breathe, even if it requires an extra line to do so:

my $displacement
    = $initial_velocity * $time  +  0.5 * $acceleration * $time**2;

my $price    = $coupon_paid * $exp_rate  +  ($face_val + $coupon_paid) * $exp_rate**2;

Choose the amount of whitespace according to the precedence of the operators, to help the reader's eyes pick out the natural groupings within the expression. For example, you might put additional spaces on either side of the lower-precedence + to visually reinforce the higher precedence of the two multiplicative subexpressions surrounding it. On the other hand, it's quite appropriate to sandwich the ** operator tightly between its operands, given its very high precedence and its longer, more easily identified symbol.

A single space is always sufficient whenever you're also using parentheses to emphasize (or to vary) precedence:

my $velocity
    = $initial_velocity + ($acceleration * ($time + $delta_time));

my $future_price    = $current_price * exp($rate - $dividend_rate_on_index) * ($delivery - $now);

Symbolic unary operators should always be kept with their operands:

my $spring_force = !$hyperextended ? -$spring_constant * $extension : 0;

my $payoff = max(0, -$asset_price_at_maturity + $strike_price);

Named unary operators should be treated like builtins, and spaced from their operands appropriately:

my $tan_theta = sin $theta / cos $theta;

my $forward_differential_1_year = $delivery_price * exp -$interest_rate;

Semicolons

Place a semicolon after every statement.

In Perl, semicolons are statement separators, not statement terminators, so a semicolon isn't required after the very last statement in a block. Put one in anyway, even if there's only one statement in the block:

while (my $line = <>) {
    chomp $line;
    if ( $line =~ s{A (s*) -- (.*)}{$1#$2}xms ) {
        push @comments, $2;
    }

    print $line;}

The extra effort to do this is negligible, and that final semicolon confers two very important advantages. It signals to the reader that the preceding statement is finished, and (perhaps more importantly) it signals to the compiler that the statement is finished. Telling the compiler is more important than telling the reader, because the reader can often work out what you really meant, whereas the compiler reads only what you actually wrote.

Leaving out the final semicolon usually works fine when the code is first written (i.e., when you're still paying proper attention to the entire piece of code):

while (my $line = <>) {
    chomp $line;

    if ( $line =~ s{A (s*) -- (.*)}{$1#$2}xms ) {
        push @comments, $2
    }

    print $line
}

But, without the semicolons, there's nothing to prevent later additions to the code from causing subtle problems:

while (my $line = <>) {
    chomp $line;

    if ( $line =~ s{A (s*) -- (.*)}{$1#$2}xms ) {
        push @comments, $2
        /shift/mix
    }

    print $line
    $src_len += length;
}

The problem is that those two additions don't actually add new statements; they just absorb the existing ones. So the previous code actually means:

while (my $line = <>) {
    chomp $line;

    if ( $line =~ s{A (s*) -- (.*)}{$1#$2}xms ) {
        push @comments, $2 / shift() / mix()
    }

    print $line ($src_len += length);
}

This is a very common mistake, and an understandable one. When extending existing code, you will naturally focus on the new statements you're adding, on the assumption that the existing ones will continue to work correctly. But, without its terminating semicolon, an existing statement may be assimilated into the new one instead.

Note that this rule does not apply to the block of a map or grep if that block consists of only a single statement. In that case, it's better to omit the terminator:

my @sqrt_results
    = map { sqrt $_ } @results;

because putting a semicolon in the block makes it much more difficult to detect where the full statement ends:

my @sqrt_results
    = map { sqrt $_; } @results;

Note that this exception to the stated rule is not excessively error-prone, as having more than one statement in a map or grep is relatively unusual, and often a sign that a map or grep was not the right choice in the first place (see "Complex Mappings" in Chapter 6).

Commas

Place a comma after every value in a multiline list.

Just as semicolons act as separators in a block of statements, commas act as separators in a list of values. That means that exactly the same arguments apply in favour of treating them as terminators instead.

Adding an extra trailing comma (which is perfectly legal in any Perl list) also makes it much easier to reorder the elements of the list. For example, it's much easier to convert:

my @dwarves = (
    'Happy',
    'Sleepy',
    'Dopey',
    'Sneezy',
    'Grumpy',
    'Bashful',
    'Doc',);

to:

my @dwarves = (
    'Bashful',
    'Doc',
    'Dopey',
    'Grumpy',
    'Happy',
    'Sleepy',
    'Sneezy',);

You can manually cut and paste lines or even feed the list contents through sort.

Without that trailing comma after 'Doc', reordering the list would introduce a bug:

my @dwarves = (
    'Bashful',
    'Doc'
    'Dopey',
    'Grumpy',
    'Happy',
    'Sleepy',
    'Sneezy',
);

Of course, that's a trivial mistake to find and fix, but why not adopt a coding style that eliminates the very possibility of such problems?

Line Lengths

Use 78-column lines.

In these modern days of high-resolution 30-inch screens, anti-aliased fonts, and laser eyesight correction, it's entirely possible to program in a terminal window that's 300 columns wide.

Please don't.

Given the limitations of printed documents, legacy VGA display devices, presentation software, and unreconstructed managerial optics, it isn't reasonable to format code to a width greater than 80 columns. And even an 80-column line width is not always safe, given the text-wrapping characteristics of some terminals, editors, and mail systems.

Setting your right margin at 78 columns maximizes the usable width of each code line whilst ensuring that those lines appear consistently on the vast majority of display devices.

In vi, you can set your right margin appropriately by adding:

set textwidth=78

to your configuration file. For Emacs, use:

(setq fill-column 78)
(setq auto-fill-mode t)

Another advantage of this particular line width is that it ensures that any code fragment sent via email can be quoted at least once without wrapping:

From: boss@headquarters
To: you@saltmines
Subject: Please explain

I came across this chunk of code in your latest module.
Is this your idea of a joke???

> $;=$/;seek+DATA,undef$/,!$s;$_=<DATA>;$s&&print||(*{q;::;
> ;}=sub{$d=$d-1?$d:$0;s;';	#$d#;,$_})&&$g&&do{$y=($x||=20)*($y||8);sub
> i{sleep&f}sub'p{print$;x$=,join$;,$b=~/.{$x}/g,$;}sub'f{pop||1}sub'n{substr($b
> ,&f%$y,3)=~tr,O,O,}sub'g{@_[@_]=@_;--($f=&f);$m=substr($b,&f,1);($w,$w,$m,O)
> [n($f-$x)+n($x+$f)-(${m}eq+O=>)+n$f]||$w}$w="40";$b=join'',@ARGV?<>:$_,$w
> x$y;$b=~s).)$&=~/w/?O:$w)gse;substr($b,$y)=q++;$g='$i=0;$i?$b:$c=$b;
> substr+$c,$i,1,g$i;$g=~s?d+?($&+1)%$y?e;$i-$y+1?eval$g:do{$b=$c;p;i}';
> sub'e{eval$g;&e};e}||eval||die+No.$;

Please see me at once!!Y.B.

Indentation

Use four-column indentation levels.

Indentation depth is far more controversial than line width. Ask four programmers the right number of columns per indentation level and you'll get four different answers: two-, three-, four-, or eight-column indents. You'll usually also get a heated argument.

The ancient coding masters, who first cut code on teletypes or hardware terminals with fixed tabstops, will assert that eight columns per level of indentation is the only acceptable ratio, and support that argument by pointing out that most printers and software terminals still default to eight-column tabs. Eight columns per indentation level ensures that your code looks the same everywhere:

while (my $line = <>) {
        chomp $line;
        if ( $line =~ s{A (s*) -- ([^
]*) }{$1#$2}xms ) {
                push @comments, $2;
        }
        print $line;
}

Yes (agree many younger hackers), eight-column indents ensure that your code looks equally ugly and unreadable everywhere! Instead, they insist on no more than two or three columns per indentation level. Smaller indents maximize the number of levels of nesting available across a fixed-width display: about a dozen levels under a two- or three-column indent, versus only four or five levels with eight-column indents. Shallower indentation also reduces the horizontal distance the eye has to track, thereby keeping indented code in the same vertical sight-line and making the context of any line of code easier to ascertain:

while (my $line = <>) {
  chomp $line;
  if ( $line =~ s{A (s*) -- ([^
]*) }{$1#$2}xms ) {
    push @comments, $2;
  }
  print $line;
}

The problem with this approach (cry the ancient masters) is that it can make indentations impossible to detect for anyone whose eyes are older than 30, or whose vision is worse than 20/20. And that's the crux of the problem. Deep indentation enhances structural readability at the expense of contextual readability; shallow indentation, vice versa. Neither is ideal.

The best compromise[8] is to use four columns per indentation level. This is deep enough that the ancient masters can still actually see the indentation, but shallow enough that the young hackers can still nest code to eight or nine levels[9] without wrapping:

while (my $line = <>) {
    chomp $line;
    if ( $line =~ s{A (s*) -- (.*)}{$1#$2}xms ) {
        push @comments, $2;
    }
    print $line;}

Tabs

Indent with spaces, not tabs.

Tabs are a bad choice for indenting code, even if you set your editor's tabspacing to four columns. Tabs do not appear the same when printed on different output devices, or pasted into a word-processor document, or even just viewed in someone else's differently tabspaced editor. So don't use tabs alone or (worse still) intermix tabs with spaces:

sub addarray_internal {
»   my ($var_name, $need_quotemeta) = @_;

»   $raw .= $var_name;

»   my $quotemeta = $need_quotemeta ? q{ map {quotemeta $_} }
»   »     »    »    »    :          $EMPTY_STR
»   .............;

....my $perl5pat
....»   = qq{(??{join q{|}, $quotemeta @{$var_name}})};

»  push @perl5pats, $perl5pat;

»  return;
}

The only reliable, repeatable, transportable way to ensure that indentation remains consistent across viewing environments is to indent your code using only spaces. And, in keeping with the previous rule on indentation depth, that means using four space characters per indentation level:

sub addarray_internal {
....my ($var_name, $need_quotemeta) = @_;

....$raw .= $var_name;

....my $quotemeta = $need_quotemeta ? q{ map {quotemeta $_} }
..................:...................$EMPTY_STR
..................;

....my $perl5pat
........= qq{(??{join q{|}, $quotemeta @{$var_name}})};

....push @perl5pats, $perl5pat;

....return;}

Note that this rule doesn't mean you can't use the Tab key to indent your code; only that the result of pressing that key can't actually be a tab. That's usually very easy to ensure under modern editors, most of which can easily be configured to convert tabs to spaces. For example, if you use vim, you can include the following directives in your .vimrc file:

set tabstop=4      "An indentation level every four columns"
set expandtab      "Convert all tabs typed into spaces"
set shiftwidth=4   "Indent/outdent by four columns"set shiftround     "Always indent/outdent to the nearest tabstop"

Or in your .emacs initialization file (using "cperl" mode):

(defalias 'perl-mode 'cperl-mode)

;; 4 space indents in cperl mode
'(cperl-close-paren-offset -4)
'(cperl-continued-statement-offset 4)
'(cperl-indent-level 4)
'(cperl-indent-parens-as-block t)'(cperl-tab-always-indent t)

Ideally, your code should not contain a single instance of the tab character. In your layout, they should have been transformed to spaces; in your literal strings, they should all be specified using (see Chapter 4).

Blocks

Never place two statements on the same line.

If two or more statements share one line, each of them becomes harder to comprehend:

RECORD:
while (my $record = <$inventory_file>) {
    chomp $record; next RECORD if $record eq $EMPTY_STR;
    my @fields = split $FIELD_SEPARATOR, $record; update_sales(@fields);$count++;
}

You're already saving vertical space by using K&R bracketing; use that space to improve the code's readability, by giving each statement its own line:

RECORD:
while (my $record = <$inventory_file>) {
    chomp $record;
    next RECORD if $record eq $EMPTY_STR;
    my @fields = split $FIELD_SEPARATOR, $record;
    update_sales(@fields);
    $count++;}

Note that this guideline applies even to map and grep blocks that contain more than one statement. You should write:

my @clean_words
    = map {
          my $word = $_;
          $word =~ s/$EXPLETIVE/[DELETED]/gxms;
          $word;      } @raw_words;

not:

my @clean_words
    = map { my $word = $_; $word =~ s/$EXPLETIVE/[DELETED]/gxms; $word } @raw_words;

Chunking

Code in paragraphs.

A paragraph is a collection of statements that accomplish a single task: in literature, it's a series of sentences conveying a single idea; in programming, it's a series of instructions implementing a single step of an algorithm.

Break each piece of code into sequences that achieve a single task, placing a single empty line between each sequence. To further improve the maintainability of the code, place a one-line comment at the start of each such paragraph, describing what the sequence of statements does. Like so:

# Process an array that has been recognized...
sub addarray_internal {
    my ($var_name, $needs_quotemeta) = @_;

    # Cache the original...
    $raw .= $var_name;

    # Build meta-quoting code, if requested...
    my $quotemeta = $needs_quotemeta ?  q{map {quotemeta $_} } : $EMPTY_STR;

    # Expand elements of variable, conjoin with ORs...
    my $perl5pat = qq{(??{join q{|}, $quotemeta @{$var_name}})};

    # Insert debugging code if requested...
    my $type = $quotemeta ? 'literal' : 'pattern';
    debug_now("Adding $var_name (as $type)");
    add_debug_mesg("Trying $var_name (as $type)");

    return $perl5pat;}

Paragraphs are useful because humans can focus on only a few pieces of information at once[10]. Paragraphs are one way of aggregating small amounts of related information, so that the resulting "chunk" can fit into a single slot of the reader's limited short-term memory. Paragraphs enable the physical structure of a piece of writing to reflect and emphasize its logical structure. Adding comments at the start of each paragraph further enhances the chunking by explicitly summarizing the purpose[11] of each chunk.

Note, however, that the contents of paragraphs are only of secondary importance here. It is the vertical gaps separating each paragraph that are critical. Without them, the readability of the code declines dramatically, even if the comments are retained:

sub addarray_internal {
    my ($var_name, $needs_quotemeta) = @_;
    # Cache the original...
    $raw .= $var_name;
    # Build meta-quoting code, if required...
    my $quotemeta = $needs_quotemeta ?  q{map {quotemeta $_} } : $EMPTY_STR;
    # Expand elements of variable, conjoin with ORs...
    my $perl5pat = qq{(??{join q{|}, $quotemeta @{$var_name}})};
    # Insert debugging code if requested...
    my $type = $quotemeta ? 'literal' : 'pattern';
    debug_now("Adding $var_name (as $type)");
    add_debug_mesg("Trying $var_name (as $type)");
    return $perl5pat;
}

Elses

Don't cuddle an else.

A "cuddled" else looks like this:

} else {

An uncuddled else looks like this:

}
else {

Cuddling saves an additional line per alternative, but ultimately it works against the readability of code in other ways, especially when that code is formatted using K&R bracketing. A cuddled else keyword is no longer in vertical alignment with its controlling if, nor with its own closing bracket. This misalignment makes it harder to visually match up the various components of an if-else construct.

More importantly, the whole point of an else is to distinguish an alternate course of action. But cuddling the else makes that distinction less distinct. For a start, it removes the near-empty line provided by the closing brace of the preceding if, which reduces the visual gap between the if and else blocks. Squashing the two blocks together in that way undermines the paragraphing inside the two blocks (see the previous guideline, "Chunking"), especially if the contents of the blocks are themselves properly paragraphed with empty lines between chunks.

Cuddling also moves the else from the leftmost position on its line, which means that the keyword is harder to locate when you are scanning down the code. On the other hand, an uncuddled else improves both the vertical separation of your code and the identifiability of the keyword:

if ($sigil eq '$') {
    if ($subsigil eq '?') {
        $sym_table{ substr($var_name,2) } = delete $sym_table{$var_name};

        $internal_count++;
        $has_internal{$var_name}++;
    }
    else {
        ${$var_ref} = q{$sym_table{$var_name}};

        $external_count++;
        $has_external{$var_name}++;
    }
}
elsif ($sigil eq '@' && $subsigil eq '?') {
    @{ $sym_table{$var_name} }
        = grep {defined $_} @{$sym_table{$var_name}};
}
elsif ($sigil eq '%' && $subsigil eq '?') {
    delete $sym_table{$var_name}{$EMPTY_STR};
}
else {
    ${$var_ref} = q{$sym_table{$var_name}};}

In contrast, a cuddled else or elsif reduces readability by obscuring both the chunking of the blocks and the visibility of the keywords:

if ($sigil eq '$') {
    if ($subsigil eq '?') {
        $sym_table{ substr($var_name,2) } = delete $sym_table{$var_name};

        $internal_count++;
        $has_internal{$var_name}++;
    } else {
        ${$var_ref} = q{$sym_table{$var_name}};

        $external_count++;
        $has_external{$var_name}++;
    }
} elsif ($sigil eq '@' && $subsigil eq '?') {
    @{$sym_table{$var_name}}
        = grep {defined $_} @{$sym_table{$var_name}};
} elsif ($sigil eq '%' && $subsigil eq '?') {
    delete $sym_table{$var_name}{$EMPTY_STR};
} else {
    ${$var_ref} = q{$sym_table{$var_name}};
}

Vertical Alignment

Align corresponding items vertically.

Tables are another familiar means of chunking related information, and of using physical layout to indicate logical relationships. When setting out code, it's often useful to align data in a table-like series of columns. Consistent indentation can suggest equivalences in structure, usage, or purpose.

For example, initializers for non-scalar variables are often much more readable when laid out neatly using extra whitespace. The following array and hash initializations are very readable in tabular layout:

my @months = qw(
    January   February   March
    April     May        June
    July      August     September
    October   November   December
);

my %expansion_of = (
    q{it's}    => q{it is},
    q{we're}   => q{we are},
    q{didn't}  => q{did not},
    q{must've} => q{must have},
    q{I'll}    => q{I will},);

Compressing them into lists saves lines, but also significantly reduces their readability:

my @months = qw(
    January February March April May June July August September
    October November December
);

my %expansion_of = (
    q{it's} => q{it is}, q{we're} => q{we are}, q{didn't} => q{did not},
    q{must've} => q{must have}, q{I'll} => q{I will},
);

Take a similar tabular approach with sequences of assignments to related variables, by aligning the assignment operators:

$name   = standardize_name($name);
$age    = time - $birth_date;$status = 'active';

rather than:

$name = standardize_name($name);
$age = time - $birth_date;
$status = 'active';

Alignment is even more important when assigning to a hash entry or an array element. In such cases, the keys (or indices) should be aligned in a column, with the surrounding braces (or square brackets) also aligned. That is:

$ident{ name   } = standardize_name($name);
$ident{ age    } = time - $birth_date;$ident{ status } = 'active';

Notice how this tabular layout emphasizes the keys of the entries being accessed, and thereby highlights the purpose of each assignment. Without that layout, your attention is drawn instead to the "column" of $ident prefixes, and the keys are consequently much harder to discern:

$ident{name} = standardize_name($name);
$ident{age} = time - $birth_date;
$ident{status} = 'active';

Aligning the assignment operators but not the hash keys is better than not aligning either, but still not as readable as aligning both:

$ident{ name }   = standardize_name($name);
$ident{ age }    = time - $birth_date;
$ident{ status } = 'active';

Breaking Long Lines

Break long expressions before an operator.

When an expression at the end of a statement gets too long, it's common practice to break that expression after an operator and then continue the expression on the following line, indenting it one level. Like so:

push @steps, $steps[-1] +
    $radial_velocity * $elapsed_time +
    $orbital_velocity * ($phase + $phase_shift) -
    $DRAG_COEFF * $altitude;

The rationale is that the operator that remains at the end of the line acts like a continuation marker, indicating that the expression continues on the following line.

Using the operator as a continuation marker seems like an excellent idea, but there's a serious problem with it: people rarely look at the right edge of code. Most of the semantic hints in a program—such as keywords—appear on the left side of that code. More importantly, the structural cues for understanding code—for example, indenting—are predominantly on the left as well (see the upcoming "Keep Left" sidebar). This means that indenting the continued lines of the expression actually gives a false impression of the underlying structure, a misperception that the eye must travel all the way to the right margin to correct.

A cleaner solution is to break long lines before an operator. That approach ensures that each line of the continued expression will start with an operator, which is unusual in Perl code. That way, as the reader's eye scans down the left margin of the code, it's immediately obvious that an indented line is merely the continuation of the previous line, because it starts with an operator.

The indenting of the second and subsequent lines of the expression is also critical. Continued lines should not simply be indented to the next indentation level. Instead, they should be indented to the starting column of the expression to which they belong. That is, instead of:

push @steps, $steps[-1]
    + $radial_velocity * $elapsed_time
    + $orbital_velocity * ($phase + $phase_shift)
    - $DRAG_COEFF * $altitude
    ;

you should write:

push @steps, $steps[-1]
             + $radial_velocity * $elapsed_time
             + $orbital_velocity * ($phase + $phase_shift)
             - $DRAG_COEFF * $altitude
             ;

This style of layout has the added advantage that it keeps the two arguments of the push visually separated in the horizontal, and thereby makes them easier to distinguish.

When a broken expression is continued over multiple lines, it is good practice to place the terminating semicolon on a separate line, indented to the same column as the start of the continued expression. As the reader's eye scans down through the leading operators on each line, encountering a semicolon instead makes it very clear that the continued expression is now complete.

Non-Terminal Expressions

Factor out long expressions in the middle of statements.

The previous guideline applies only if the long expression to be broken is the last value in a statement. If the expression appears in the middle of a statement, it is better to factor that expression out into a separate variable assignment. For example:

my $next_step = $steps[-1]
                + $radial_velocity * $elapsed_time
                + $orbital_velocity * ($phase + $phase_shift)
                - $DRAG_COEFF * $altitude
                ;add_step( @steps, $next_step, $elapsed_time);

rather than:

add_step( @steps, $steps[-1]
                   + $radial_velocity * $elapsed_time
                   + $orbital_velocity * ($phase + $phase_shift)
                   - $DRAG_COEFF * $altitude
                   , $elapsed_time);

Breaking by Precedence

Always break a long expression at the operator of the lowest possible precedence.

As the examples in the previous two guidelines show, when breaking an expression across several lines, each line should be broken before a low-precedence operator. Breaking at operators of higher precedence encourages the unwary reader to misunderstand the computation that the expression performs. For example, the following layout might surreptitiously suggest that the additions and subtractions happen before the multiplications:

push @steps, $steps[-1] + $radial_velocity
             * $elapsed_time + $orbital_velocity
             * ($phase + $phase_shift) - $DRAG_COEFF
             * $altitude
             ;

If you're forced to break on an operator of less-than-minimal precedence, indent the broken line one additional level relative to the start of the expression, like so:

push @steps, $steps[-1]
             + $radial_velocity * $elapsed_time
             + $orbital_velocity
                 * ($phase + $phase_shift)
             - $DRAG_COEFF * $altitude             ;

This strategy has the effect of keeping the subexpressions of the higher precedence operation visually "together".

Assignments

Break long assignments before the assignment operator.

Often, the long statement that needs to be broken will be an assignment. The preceding rule does work in such cases, but leads to code that's unaesthetic and hard to read:

$predicted_val = $average
                 + $predicted_change * $fudge_factor
                 ;

A better approach when breaking assignment statements is to break before the assignment operator itself, leaving only the variable being assigned to on the first line. Then indent one level, and place the assignment operator at the start of the next line—once again indicating a continued statement:

$predicted_val
    = $average + $predicted_change * $fudge_factor;

Note that this approach often allows the entire righthand side of an assignment to be laid out on a single line, as in the preceding example. However, if the righthand expression is still too long, break it again at a low-precedence operator, as suggested in the previous guideline:

$predicted_val
    = ($minimum + $maximum) / 2      + $predicted_change * max($fudge_factor, $local_epsilon);

A commonly used alternative layout for broken assignments is to break after the assignment operator, like so:

$predicted_val =
    $average + $predicted_change * $fudge_factor;

This approach suffers from the same difficulty described earlier: it's impossible to detect the line continuation without scanning all the way to the right of the code, and the "unmarked" indentation of the second line can mislead the casual reader. This problem of readability is most noticeable when the variable being assigned to is itself quite long:

$predicted_val{$current_data_set}[$next_iteration] =
    $average + $predicted_change * $fudge_factor;

which, of course, is precisely when such an assignment would most likely need to be broken. Breaking before the assignment operator makes long assignments much easier to identify, by keeping the assignment operator visually close to the start of the variable being assigned to:

$predicted_val{$current_data_set}[$next_iteration]
    = $average + $predicted_change * $fudge_factor;

Ternaries

Format cascaded ternary operators in columns.

One operator that is particularly prone to creating long expressions is the ternary operator. Because the ? and : of a ternary have very low precedence, a straightforward interpretation of the expression-breaking rule doesn't work well in this particular case, since it produces something like:

my $salute = $name eq $EMPTY_STR ? 'Customer'
             : $name =~ m/A((?:Sir|Dame) s+ S+)/xms ? $1
             : $name =~ m/(.*), s+ Ph[.]?D z/xms ? "Dr $1" : $name;

which is almost unreadable.

The best way to lay out a series of ternary selections is in two columns, like so:

             # When their name is...                    Address them as...
my $salute = $name eq $EMPTY_STR                      ? 'Customer'
           : $name =~ m/A((?:Sir|Dame) s+ S+) /xms ? $1
           : $name =~ m/(.*), s+ Ph[.]?D z     /xms ? "Dr $1"
           :                                            $name           ;

In other words, break a series of ternary operators before every colon, aligning the colons with the operator preceding the first conditional. Doing so will cause the conditional tests to form a column. Then align the question marks of the ternaries so that the various possible results of the ternary also form a column. Finally, indent the last result (which has no preceding question mark) so that it too lines up in the results column.

This special layout converts the typical impenetrably obscure ternary sequence into a simple look-up table: for a given condition in column one, use the corresponding result from column two.

You can use the tabular layout even if you have only a single ternary:

my $name = defined $customer{name} ? $customer{name}
         :                           'Sir or Madam'         ;

Starting out this way makes it easier for maintainers to subsequently add new alternatives to the table. This idea is explored further in the "Tabular Ternaries" guideline in Chapter 6.

Lists

Parenthesize long lists.

The comma operator is really an operator only in scalar contexts. In lists, the comma is an item separator. Consequently, commas in multiline lists are best treated as item terminators. Moreover, multiline lists are particularly easy to confuse with a series of statements, as there is very little visual difference between a , and a ;.

Given the potential for confusion, it's important to clearly mark a multiline list as being a list. So, if you need to break a list across multiple lines, place the entire list in parentheses. The presence of an opening parenthesis highlights the fact that the subsequent expressions form a list, and the closing parenthesis makes it immediately apparent that the list is complete.

When laying out a statement containing a multiline list, place the opening parenthesis on the same line as the preceding portion of the statement. Then break the list after every comma, placing the same number of list elements on each separate line and indenting those lines one level deeper than the surrounding statement. Finally, outdent the closing parenthesis back to the same level as the statement. Like so:

my @months = qw(
    January   February   March
    April     May        June
    July      August     September
    October   November   December
);
for my $item (@requested_items) {
    push @items, (
        "A brand new $item",
        "A fully refurbished $item",
        "A ratty old $item",
    );
}

print (
    'Processing ',
    scalar(@items),
    ' items at ',
    time,
    "
",);

Note that the final item in the list should still have a comma, even though it isn't required syntactically.

When writing multiline lists, always use parentheses (with K&R-style bracketing), keep to the same number of items on each line, and remember that in list contexts a comma isn't an operator, so the "break-before-an-operator rule" doesn't apply. In other words, not like this:

my @months = qw( January   February   March   April   May   June   July   August
                 September   October   November   December
                );

for my $item (@requested_items) {
    push @items, "A brand new $item"
               , "A fully refurbished $item"
               , "A ratty old $item"
               ;
}

print 'Processing '
      , scalar(@items)
      , ' items at '
      , time
      , "
"
      ;

The "Thin Commas" guideline in Chapter 4 presents several other good reasons for parenthesizing lists.

Automated Layout

Enforce your chosen layout style mechanically.

In the long term, it's best to train yourself and your team to code in a consistent, rational, and readable style such as the one suggested earlier. However, the time and commitment necessary to accomplish that isn't always available. In such cases, a reasonable compromise is to prescribe a standard code-formatting tool that must be applied to all code before it's committed, reviewed, or otherwise displayed in public.

There is now an excellent code formatter available for Perl: perltidy. It's freely available from SourceForge at http://perltidy.sourceforge.net and provides an extensive range of user-configurable options for indenting, block delimiter positioning, column-like alignment, and comment positioning.

Using perltidy, you can convert code like this:

if($sigil eq '$'){
    if($subsigil eq '?'){
        $sym_table{substr($var_name,2)}=delete $sym_table{locate_orig_var($var)};
        $internal_count++;$has_internal{$var_name}++
    } else {
        ${$var_ref} =
            q{$sym_table{$var_name}}; $external_count++; $has_external{$var_name}++;
}} elsif ($sigil eq '@'&&$subsigil eq '?') {
    @{$sym_table{$var_name}} = grep
        {defined $_} @{$sym_table{$var_name}};
} elsif ($sigil eq '%' && $subsigil eq '?') {
delete $sym_table{$var_name}{$EMPTY_STR}; } else
{
${$var_ref}
=
q{$sym_table{$var_name}}
}

into something readable:

if ( $sigil eq '$' ) {
    if ( $subsigil eq '?' ) {
        $sym_table{ substr( $var_name, 2 ) }
            = delete $sym_table{ locate_orig_var($var) };
        $internal_count++;
        $has_internal{$var_name}++;
    }
    else {
        ${$var_ref} = q{$sym_table{$var_name}};
        $external_count++;
        $has_external{$var_name}++;
    }
}
elsif ( $sigil eq '@' && $subsigil eq '?' ) {
    @{ $sym_table{$var_name} }
        = grep {defined $_} @{ $sym_table{$var_name} };
}
elsif ( $sigil eq '%' && $subsigil eq '?' ) {
    delete $sym_table{$var_name}{$EMPTY_STR};
}
else {
    ${$var_ref} = q{$sym_table{$var_name}};}

Notice how closely the tidied version follows the various formatting guidelines in this chapter. To achieve that result, you need to configure your .perltidyrc file like this:

-l=78   # Max line width is 78 cols
-i=4    # Indent level is 4 cols
-ci=4   # Continuation indent is 4 cols
-st     # Output to STDOUT
-se     # Errors to STDERR
-vt=2   # Maximal vertical tightness
-cti=0  # No extra indentation for closing brackets
-pt=1   # Medium parenthesis tightness
-bt=1   # Medium brace tightness
-sbt=1  # Medium square bracket tightness
-bbt=1  # Medium block brace tightness
-nsfs   # No space before semicolons
-nolq   # Don't outdent long quoted strings
-wbb="% + - * / x != == >= <= =~ !~ < > | & >= < = **= += *= &= <<= &&= -=
      /= |= >>= ||= .= %= ^= x="        # Break before all operators

Mandating that everyone use a common tool to format their code can also be a simple way of sidestepping the endless objections, acrimony, and dogma that always surround any discussion on code layout. If perltidy does all the work for them, then it will cost developers almost no effort to adapt to the new guidelines. They can simply set up an editor macro that will "straighten" their code whenever they need to.



[5] "K&R" are Brian Kernighan and Dennis Ritchie, authors of the book The C Programming Language (Prentice Hall, 1988).

[6] Throughout this book, the word "bracket" will be used as a generic term to refer to any of the four types of paired delimiters: "braces" ({}), "parentheses" (()), "square brackets" ([]), and "angle brackets" (<>).

[7] The editor configurations suggested throughout this book are collected in Appendix C, Editor Configurations. They are also available to download from http://www.oreilly.com/catalog/perlbp

[8] According to the research reported in "Program Indentation and Comprehensibility" (Communications of the ACM, Vol. 26, No. 11, pp. 861-867).

[9] But don't do that! If you need more than four or five levels of indentation, you almost certainly need to factor some of that nested code out into a subroutine or module. See Chapters Chapter 9 and Chapter 17.

[10] An idea made famous in 1956 by George A. Miller in "The Magical Number Seven, Plus or Minus Two" (The Psychological Review, 1956, Vol. 63, pp. 81-97).

[11] The purpose, not the actions. Paragraph comments need to explain why the code is needed, not merely paraphrase what it's doing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset