Perl lets you define your own functions to be called
like Perl's built-in functions. Consider push(@array,
$item)
, which must tacitly receive a reference to
@array
, not just the list values held in
@array
, so that the array can be modified.
Prototypes let you declare subroutines to take
arguments just like many of the built-ins, that is, with certain
constraints on the number and types of arguments. We call them
"prototypes", but they work more like automatic templates for the
calling context than like what C or Java programmers would think of as
prototypes. With these templates, Perl will automatically add implicit
backslashes, or calls to scalar
, or whatever else
it takes to get things to show up in a way that matches the template.
For instance, if you declare:
sub mypush (@@);
then mypush
takes arguments exactly like
push
does. For this to work, the declaration of the
function to be called must be visible at compile time. The prototype
only affects the interpretation of function calls when the
&
character is omitted. In other words, if you
call it like a built-in function, it behaves like a built-in function.
If you call it like an old-fashioned subroutine, then it behaves like
an old-fashioned subroutine. The &
suppresses
prototype checks and associated contextual effects.
Since prototypes are taken into consideration only at compile
time, it naturally falls out that they have no influence on subroutine
references like &foo
or on indirect subroutine
calls like &{$subref}
or
$subref->()
. Method calls are not influenced by
prototypes, either. That's because the actual function to be called is
indeterminate at compile time, depending as it does on inheritance,
which is dynamically determined in Perl.
Since the intent is primarily to let you define subroutines that work like built-in functions, here are some prototypes you might use to emulate the corresponding built-ins:
Declared as | Called as |
---|---|
sub mylink ($$) | mylink $old, $new |
sub myreverse (@) | myreverse $a,$b,$c |
sub myjoin ($@) | myjoin ":",$a,$b,$c |
sub mypop (@) | mypop @array |
sub mysplice (@$$@) | mysplice
@array,@array,0,@pushme |
sub mykeys (\%) | mykeys %{$hashref} |
sub mypipe (**) | mypipe READHANDLE,
WRITEHANDLE |
sub myindex ($$;$) | myindex &getstring,
"substr " |
myindex &getstring, "substr",
$start | |
sub mysyswrite (*$;$$) | mysyswrite OUTF, $buf |
mysyswrite OUTF, $buf, length($buf)-$off,
$off | |
sub myopen (*;$@) | myopen HANDLE |
myopen HANDLE, $name | |
myopen HANDLE, "-|", @cmd | |
sub mygrep (&@) | mygrep { /foo/ } $a,$b,$c |
sub myrand ($) | myrand 42 |
sub mytime () | mytime |
Any backslashed prototype character (shown between parentheses
in the left column above) represents an actual argument (exemplified
in the right column), which absolutely must start with that character.
Just as the first argument to keys
must start with
%
, so too must the first argument to
mykeys
.
A semicolon separates mandatory arguments from optional
arguments. (It would be redundant before @
or
%
, since lists can be null.) Unbackslashed
prototype characters have special meanings. Any unbackslashed
@
or %
eats all the rest of the
actual arguments and forces list context. (It's equivalent to
LIST
in a syntax description.) An argument
represented by $
has scalar context forced on it.
An &
requires a reference to a named or
anonymous subroutine.
A *
allows the subroutine to accept
anything in that slot that would be accepted by a built-in as a
filehandle: a bare name, a constant, a scalar expression, a typeglob,
or a reference to a typeglob. The value will be available to the
subroutine either as a simple scalar or (in the latter two cases) as a
reference to the typeglob. If you wish to always convert such
arguments to a typeglob reference, use
Symbol::qualify_to_ref
as follows:
use Symbol 'qualify_to_ref'; sub foo (*) { my $fh = qualify_to_ref(shift, caller); … }
Note how the last three examples in the table are treated
specially by the parser. mygrep
is parsed as a true
list operator, myrand
is parsed as a true unary
operator with unary precedence the same as rand
,
and mytime
is truly argumentless, just like
time
.
That is, if you say:
mytime +2;
you'll get mytime() + 2
, not
mytime(2)
, which is how it would be parsed without
the prototype, or with a unary prototype.
The mygrep
example also illustrates how
&
is treated specially when it is the first
argument. Ordinarily, an &
prototype would
demand an argument like &foo
or
sub{}
. When it is the first argument, however, you
can leave off the sub
of your anonymous subroutine,
and just pass a bare block in the "indirect object" slot (with no
comma after it). So one nifty thing about the &
prototype is that you can generate new syntax with it, provided the
&
is in the initial position:
sub try (&$) { my ($try, $catch) = @_; eval { &$try }; if ($@) { local $_ = $@; &$catch; } } sub catch (&) { $_[0] } try { die "phooey"; } # not the end of the function call! catch { /phooey/ and print "unphooey "; };
This prints "unphooey
". What happens is that
try
is called with two arguments, the anonymous
function {die "phooey";}
and the return value of
the catch
function, which in this case is nothing
but its own argument, the entire block of yet another anonymous
function. Within try
, the first function argument
is called while protected within an eval
block to
trap anything that blows up. If something does blow up, the second
function is called with a local version of the global
$_
variable set to the raised exception.[3] If this all sounds like pure gobbledygook, you'll have
to read about die
and eval
in
Chapter 29, and then go check out
anonymous functions and closures in Chapter 8. On the other hand, if it
intrigues you, you might check out the Error
module
on CPAN, which uses this to implement elaborately structured exception
handling with try
, catch
,
except
, otherwise
, and
finally
clauses.
Here's a reimplementation of the grep
operator (the built-in one is more efficient, of course):
sub mygrep (&@) { my $coderef = shift; my @result; foreach $_ (@_) { push(@result, $_) if &$coderef; } return @result; }
Some folks would prefer to see full alphanumeric prototypes. Alphanumerics have been intentionally left out of prototypes for the express purpose of someday adding named, formal parameters. (Maybe.) The current mechanism's main goal is to let module writers enforce a certain amount of compile-time checking on module users.
Functions prototyped with ()
,
meaning that they take no arguments at all, are parsed like the
time
built-in. More interestingly, the compiler
treats such functions as potential candidates for inlining. If the
result of that function, after Perl's optimization and
constant-folding pass, is either a constant or a lexically scoped
scalar with no other references, then that value will be used in
place of calls to that function. Calls made using
&
NAME
are never
inlined, however, just as they are not subject to any other
prototype effects. (See the use constant
pragma
in Glossary, for an easy way
to declare such constants.)
Both version of these functions to compute
will be inlined by the compiler:
sub pi () { 3.14159 } # Not exact, but close sub PI () { 4 * atan2(1, 1) } # As good as it gets
In fact, all of the following functions are inlined because Perl can determine everything at compile time:
sub FLAG_FOO () { 1 << 8 } sub FLAG_BAR () { 1 << 9 } sub FLAG_MASK () { FLAG_FOO | FLAG_BAR } sub OPT_GLARCH () { (0x1B58 & FLAG_MASK) == 0 } sub GLARCH_VAL () { if (OPT_GLARCH) { return 23 } else { return 42 } } sub N () { int(GLARCH_VAL) / 3 } BEGIN { # compiler runs this block at compile time my $prod = 1; # persistent, private variable for (1 .. N) { $prod *= $_ } sub NFACT () { $prod } }
In the last example, the NFACT
function is
inlined because it has a void prototype and the variable it returns
is not changed by that function--and furthermore can't be changed by
anyone else, since it's in a lexical scope. So the compiler replaces
uses of NFACT
with that value, which was
precomputed at compile time because of the surrounding
BEGIN
.
If you redefine a subroutine that was eligible for
inlining, you'll get a mandatory warning. (You can use this warning
to tell whether the compiler inlined a particular subroutine.) The
warning is considered severe enough not to be optional, because
previously compiled invocations of the function will still use the
old value of the function. If you need to redefine the subroutine,
ensure that it isn't inlined either by dropping the
()
prototype (which changes calling semantics, so
beware) or by thwarting the inlining mechanism in some other way,
such as:
sub not_inlined () { return 23 if $$; }
See Chapter 18 for more about what happens during the compilation and execution phases of your program's life.
It's probably best to put prototypes on new functions, not retrofit prototypes onto older ones. These are context templates, not ANSI C prototypes, so you must be especially careful about silently imposing a different context. Suppose, for example, you decide that a function should take just one parameter, like this:
sub func ($) { my $n = shift; print "you gave me $n "; }
That makes it a unary operator (like the
rand
built-in) and changes how the compiler
determines the function's arguments. With the new prototype, the
function consumes just one, scalar-context argument instead of many
arguments in list context. If someone has been calling it with an
array or list expression, even if that array or list contained just
a single element, where before it worked, now you've got something
completely different:
func @foo; # counts @foo elements func split /:/; # counts number of fields returned func "a", "b", "c"; # passes "a" only, discards "b" and "c" func("a", "b", "c"); # suddenly, a compiler error!
You've just supplied an implicit scalar
in
front of the argument list, which can be more than a bit surprising.
The old @foo
that used to hold one thing doesn't
get passed in. Instead, 1 (the number of elements in
@foo
) is now passed to func
.
And the split
, being called in scalar context,
scribbles all over your @_
parameter list. In the
third example, because func
has been prototyped
as a unary operator, only "a
" is passed in; then
the return value from func
is discarded as the
comma operator goes on to evaluate the next two items and return
"c
." In the final example, the user now gets a
syntax error at compile time on code that used to compile and run
just fine.
If you're writing new code and would like a unary operator that takes only a scalar variable, not any old scalar expression, you could prototype it to take a scalar reference:
sub func ($) { my $nref = shift; print "you gave me $$nref "; }
Now the compiler won't let anything by that doesn't start with a dollar sign:
func @foo; # compiler error, saw @, want $ func split/:/; # compiler error, saw function, want $ func $s; # this one is ok -- got real $ symbol func $a[3]; # and this one func $h{stuff}[-1]; # or even this func 2+5; # scalar expr still a compiler error func ${ (2+5) }; # ok, but is the cure worse than the disease?
If you aren't careful, you can get yourself into trouble with prototypes. But if you are careful, you can do a lot of neat things with them. This is all very powerful, of course, and should only be used in moderation to make the world a better place.
[3] Yes, there are still unresolved issues having to do with the
visibility of @_
. We're ignoring that question
for the moment. But if we make @_
lexically
scoped someday, as already occurs in the experimental threaded
versions of Perl, those anonymous subroutines can act like
closures.