While most of the work of programming may be simply getting your program working properly, you may find yourself wanting more bang for the buck out of your Perl program. Perl's rich set of operators, data types, and control constructs are not necessarily intuitive when it comes to speed and space optimization. Many trade-offs were made during Perl's design, and such decisions are buried in the guts of the code. In general, the shorter and simpler your code is, the faster it runs, but there are exceptions. This section attempts to help you make it work just a wee bit better.
If you want it to work a lot better, you can play with the Perl compiler backend described in Chapter 18, or rewrite your inner loop as a C extension as illustrated in Chapter 21.
Note that optimizing for time may sometimes cost you in space or programmer efficiency (indicated by conflicting hints below). Them's the breaks. If programming was easy, they wouldn't need something as complicated as a human being to do it, now would they?
Use hashes instead of linear searches. For
example, instead of searching through
@keywords
to see if $_
is
a keyword, construct a hash with:
my %keywords; for (@keywords) { $keywords{$_}++; }
Then you can quickly tell if $_
contains a keyword by testing $keyword{$_}
for a nonzero value.
Avoid subscripting when a
foreach
or list operator will do. Not only is
subscripting an extra operation, but if your subscript variable
happens to be in floating point because you did arithmetic, an
extra conversion from floating point back to integer is
necessary. There's often a better way to do it. Consider using
foreach
, shift
, and
splice
operations. Consider saying
use integer
.
Avoid goto
. It scans outward
from your current location for the indicated label.
Avoid $&
and its two
buddies, $`
and $
'. Any
occurrence in your program causes all matches to save the
searched string for possible future reference. (However, once
you've blown it, it doesn't hurt to have more of them.)
Avoid using eval
on a string.
An eval
of a string (although not of a
BLOCK
) forces recompilation every
time through. The Perl parser is pretty fast for a parser, but
that's not saying much. Nowadays there's almost always a better
way to do what you want anyway. In particular, any code that
uses eval
merely to construct variable names
is obsolete since you can now do the same directly using
symbolic references:
no strict 'refs'; $name = "variable"; $$name = 7; # Sets $variable to 7
Avoid eval
STRING
inside a loop. Put the loop
into the eval
instead, to avoid redundant
recompilations of the code. See the study
operator in Chapter 29 for
an example of this.
Avoid run-time-compiled patterns. Use the
/
pattern
/o
(once only) pattern modifier to avoid pattern recompilation when
the pattern doesn't change over the life of the process. For
patterns that change occasionally, you can use the fact that a
null pattern refers back to the previous pattern, like
this:
"foundstring" =~ /$currentpattern/; # Dummy match (must succeed). while (<>) { print if //; }
Alternatively, you can precompile your regular expression
using the qr
quote construct. You can also
use eval
to recompile a subroutine that does
the match (if you only recompile occasionally). That works even
better if you compile a bunch of matches into a single
subroutine, thus amortizing the subroutine call overhead.
Short-circuit alternation is often faster than the corresponding regex. So:
print if /one-hump/ || /two/;
is likely to be faster than:
print if /one-hump|two/;
at least for certain values of one-hump
and two
. This is because the optimizer likes
to hoist certain simple matching operations up into higher parts
of the syntax tree and do very fast matching with a Boyer-Moore
algorithm. A complicated pattern tends to defeat this.
Reject common cases early with next
if
. As with simple regular expressions, the optimizer
likes this. And it just makes sense to avoid unnecessary work.
You can typically discard comment lines and blank lines even
before you do a split
or
chop
:
while (<>) { next if /^#/; next if /^$/; chop; @piggies = split(/,/); … }
Avoid regular expressions with many quantifiers
or with big
{
MIN
,MAX
}
numbers on parenthesized expressions. Such patterns can result
in exponentially slow backtracking behavior unless the
quantified subpatterns match on their first "pass". You can also
use the (?>…)
construct to force a
subpattern to either match completely or fail without
backtracking.
Try to maximize the length of any nonoptional
literal strings in regular expressions. This is
counterintuitive, but longer patterns often match faster than
shorter patterns. That's because the optimizer looks for
constant strings and hands them off to a Boyer-Moore search,
which benefits from longer strings. Compile your pattern with
Perl's -Dr
debugging switch to see what
Dr. Perl thinks the longest literal string is.
Avoid expensive subroutine calls in tight loops. There is overhead associated with calling subroutines, especially when you pass lengthy parameter lists or return lengthy values. In order of increasing desperation, try passing values by reference, passing values as dynamically scoped globals, inlining the subroutine, or rewriting the whole loop in C. (Better than all of those solutions is if you can define the subroutine out of existence by using a smarter algorithm.)
Avoid getc
for anything but
single-character terminal I/O. In fact, don't use it for that
either. Use sysread
.
Avoid frequent substr
s on long strings,
especially if the string contains UTF-8. It's okay to use
substr
at the front of a string, and for some
tasks you can keep the substr
at the front by
"chewing up" the string as you go with a four-argument
substr
, replacing the part you grabbed with
"":
while ($buffer) { process(substr($buffer, 0, 10, "")); }
Use substr
as an lvalue rather than
concatenating substrings. For example, to replace the fourth
through seventh characters of $foo
with the
contents of the variable $bar
, don't do
this:
$foo = substr($foo,0,3) . $bar . substr($foo,7);
Instead, simply identify the part of the string to be replaced and assign into it, as in:
substr($foo, 3, 4) = $bar;
But be aware that if $foo
is a huge
string and $bar
isn't exactly the length of
the "hole", this can do a lot of copying too. Perl tries to
minimize that by copying from either the front or the back, but
there's only so much it can do if the substr
is in the middle.
Use s///
rather than concatenating
substrings. This is especially true if you can replace one
constant with another of the same size. This results in an
in-place substitution.
Use statement modifiers and equivalent
and
and or
operators
instead of full-blown conditionals. Statement modifiers (like
$ring = 0 unless $engaged
) and logical
operators avoid the overhead of entering and leaving a block.
They can often be more readable too.
Use $foo = $a || $b || $c
. This is much
faster (and shorter to say) than:
if ($a) { $foo = $a; } elsif ($b) { $foo = $b; } elsif ($c) { $foo = $c; }
Similarly, set default values with:
$pi ||= 3;
Group together any tests that want the same initial
string. When testing a string for various prefixes in anything
resembling a switch structure, put together all the
/^a/
patterns, all the
/^b/
patterns, and so on.
Don't test things you know won't match. Use
last
or elsif
to avoid
falling through to the next case in your switch
statement.
Use special operators like study
,
logical string operations, pack 'u
', and
unpack '%
' formats.
Beware of the tail wagging the dog. Misstatements
resembling (<STDIN>)[0]
can cause Perl
much unnecessary work. In accordance with Unix philosophy, Perl
gives you enough rope to hang yourself.
Factor operations out of loops. The Perl optimizer does not attempt to remove invariant code from loops. It expects you to exercise some sense.
Strings can be faster than arrays.
Arrays can be faster than strings. It all depends on whether you're going to reuse the strings or arrays and which operations you're going to perform. Heavy modification of each element implies that arrays will be better, and occasional modification of some elements implies that strings will be better. But you just have to try it and see.
my
variables are faster than
local
variables.
Sorting on a manufactured key array may be faster than using a fancy sort subroutine. A given array value will usually be compared multiple times, so if the sort subroutine has to do much recalculation, it's better to factor out that calculation to a separate pass before the actual sort.
If you're deleting characters,
tr/abc//d
is faster than
s/[abc]//g
.
print
with a comma separator
may be faster than concatenating strings. For example:
print $fullname{$name} . " has a new home directory " . $home{$name} . " ";
has to glue together the two hashes and the two fixed strings before passing them to the low-level print routines, whereas:
print $fullname{$name}, " has a new home directory ", $home{$name}, " ";
doesn't. On the other hand, depending on the values and the architecture, the concatenation may be faster. Try it.
Prefer join("", …)
to a series of
concatenated strings. Multiple concatenations may cause strings
to be copied back and forth multiple times. The
join
operator avoids this.
split
on a fixed string is
generally faster than split
on a pattern.
That is, use split(/ /, …)
rather than
split(/ +/, …)
if you know there will only be
one space. However, the patterns /s+/
,
/^/
, and / /
are specially
optimized, as is the special split
on
whitespace.
Pre-extending an array or string can save some
time. As strings and arrays grow, Perl extends them by
allocating a new copy with some room for growth and copying in
the old value. Pre-extending a string with the
x
operator or an array by setting
$#array
can prevent this occasional overhead
and reduce memory fragmentation.
Don't undef
long strings and
arrays if they'll be reused for the same purpose. This helps
prevent reallocation when the string or array must be
re-extended.
Prefer "