Many of the following function names are annotated with, um, annotations. Here are their meanings:
Uses $_
($ARG
) as a
default variable.
Sets $!
($OS_ERROR
)
on syscall errors.
Raises exceptions; use eval
to trap
$@
($EVAL_ERROR
).
Sets $?
($CHILD_ERROR
) when child process
exits.
Taints returned data.
Taints returned data under some system, locale, or handle settings.
Raises an exception if given an argument of inappropriate type.
Raises an exception if modifying a read-only target.
Raises an exception if fed tainted data.
Raises an exception if unimplemented on current platform.
Functions that return tainted data when fed tainted
data are not marked, since that's most of them. In particular, if you
use any function on %ENV
or
@ARGV
, you'll get tainted data.
Functions marked with raise an exception when they require,
but do not receive, an argument of a particular type (such as
filehandles for I/O operations, references for
bless
ing, etc.).
Functions marked with sometimes need to alter their
arguments. If they can't modify the argument because it's marked
read-only, they'll raise an exception. Examples of read-only variables
are the special variables containing data captured during a pattern
match and variables that are really aliases to constants.
Functions marked with may not be implemented on all
platforms. Although many of these are named after functions in the
Unix C library, don't assume that just because you aren't running
Unix, you can't call any of them. Many are emulated, even those you
might never expect to see--such as
fork
on Win32
systems, which works as of the 5.6 release of Perl. For more
information about the portability and behavior of system-specific
functions, see the perlport manpage, plus any
platform-specific documentation that came with your Perl port.
Functions that raise other miscellaneous exceptions are marked
with , including math functions that throw
range errors, such as
sqrt(-1)
.
abs VALUE
abs
This function returns the absolute value of its argument.
$diff = abs($first - $second);
Note: here and in subsequent examples, good style (and the
use strict
pragma) would dictate that you add a
my
modifier to declare a new lexically scoped
variable, like this:
my $diff = abs($first - $second);
However, we've omitted my
from most of our
examples for clarity. Just assume that any such variable was
declared earlier, if that cranks your rotor.
acceptSOCKET
,PROTOSOCKET
This function is used by server processes that wish
to listen for socket connections from clients.
PROTOSOCKET
must be a filehandle already
opened via the socket
operator and bound to one
of the server's network addresses or to
INADDR_ANY
. Execution is suspended until a
connection is made, at which point the
SOCKET
filehandle is opened and attached
to the newly made connection. The original
PROTOSOCKET
remains unchanged; its only
purpose is to be cloned into a real socket. The function returns the
connected address if the call succeeds, false otherwise. For
example:
unless ($peer = accept(SOCK, PROTOSOCK)) { die "Can't accept a connection: $! "; }
On systems that support it, the close-on-exec flag will be set
for the newly opened file descriptor, as determined by the value of
$^F
($SYSTEM_FD_MAX
).
See accept (2). See also the example in Section 16.5 in Chapter 16.
alarm EXPR
alarm
This function sends a SIGALRM
signal to the current process after EXPR
seconds.
Only one timer may be active at once. Each call disables the
previous timer, and an EXPR
of 0 may be
supplied to cancel the previous timer without starting a new one.
The return value is the amount of time remaining on the previous
timer.
print "Answer me within one minute, or die: "; alarm(60); # kill program in one minute $answer = <STDIN>; $timeleft = alarm(0); # clear alarm print "You had $timeleft seconds remaining ";
It is usually a mistake to intermix alarm
and sleep
calls, because many systems use the
alarm (2) syscall mechanism to
implement sleep (3). On older
machines, the elapsed time may be up to one second less than you
specified because of how seconds are counted. Additionally, a busy
system may not get around to running your process immediately. See
Chapter 16 for information on
signal handling.
For alarms of finer granularity than one second, you might be
able to use the syscall
function to access
setitimer (2) if your system supports
it. The CPAN module, Timer::HiRes
, also provides
functions for this purpose.
atan2Y
,X
This function returns the principal value of the arc
tangent of
Y
/
X
in the range -
to
. A quick way to get an approximate value of
is to say:
$pi = atan2(1,1) * 4;
For the tangent operation, you may use the
tan
function from either the
Math::Trig
or the POSIX
modules, or just use the familiar relation:
sub tan { sin($_[0]) / cos($_[0]) }
bindSOCKET
,NAME
This function attaches an address (a name) to an
already opened socket specified by the
SOCKET
filehandle. The function returns
true if it succeeded, false otherwise.
NAME
should be a packed address of the
proper type for the socket.
use Socket; $port_number = 80; # pretend we want to be a web server $sockaddr = sockaddr_in($port_number, INADDR_ANY); bind SOCK, $sockaddr or die "Can't bind $port_number: $! ";
See bind (2). See also the examples in Section 16.5 in Chapter 16.
binmodeFILEHANDLE
,DISCIPLINES
binmodeFILEHANDLE
This function arranges for the
FILEHANDLE
to have the semantics
specified by the DISCIPLINES
argument. If
DISCIPLINES
is omitted, binary (or "raw")
semantics are applied to the filehandle. If
FILEHANDLE
is an expression, the value is
taken as the name of the filehandle or a reference to a filehandle,
as appropriate.
The binmode
function should be called after
the open
but before any I/O is done on the
filehandle. The only way to reset the mode on a filehandle is to
reopen the file, since the various disciplines may have treasured up
various bits and pieces of data in various buffers. This restriction
may be relaxed in the future.
In the olden days, binmode
was used
primarily on operating systems whose run-time libraries
distinguished text from binary files. On those systems, the purpose
of binmode
was to turn off the default text
semantics. However, with the advent of Unicode, all programs on all
systems must take some cognizance of the distinction, even on Unix
and Mac systems. These days there is only one kind of binary file
(as far as Perl is concerned), but there are many kinds of text
files, which Perl would also like to treat in a single way. So Perl
has a single internal format for Unicode text, UTF-8. Since there
are many kinds of text files, text files often need to be translated
upon input into UTF-8, and upon output back into some legacy
character set, or some other representation of Unicode. You can use
disciplines to tell Perl how exactly (or inexactly) to do these
translations.[2]
For example, a discipline of ":text
" will
tell Perl to do generic text processing without telling Perl which
kind of text processing to do. But disciplines like
":utf8
" and ":latin1
" tell
Perl which text format to read and write. On the other hand, the
":raw
" discipline tells Perl to keep its
cotton-pickin' hands off the data. For more on how disciplines work
(or will work), see the open
function. The rest
of this discussion describes what binmode
does
without the DISCIPLINES
argument, that
is, the historical meaning of binmode
, which is
equivalent to:
binmode FILEHANDLE
, ":raw";
Unless instructed otherwise, Perl will assume your freshly
opened file should be read or written in text mode. Text mode means
that
(newline) will be your internal line
terminator. All systems use
as the internal
line terminator, but what that really represents varies from system
to system, device to device, and even file to file, depending on how
you access the file. In such legacy systems (including MS-DOS and
VMS), what your program sees as a
may not be
what's physically stored on disk. The operating system might, for
example, store text files with cMcJ
sequences
that are translated on input to appear as
to
your program, and have
from your program
translated back to cMcJ
on output to a file.
The binmode
function disables this automatic
translation on such systems.
In the absence of a DISCIPLINES
argument, binmode
has no effect under Unix or Mac
OS, both of which use
to end each line and
represent that as a single character. (It may, however, be a
different character: Unix uses cJ
and older Macs
use cM
. Doesn't matter.)
The following example shows how a Perl script might read a GIF
image from a file and print it to the standard output. On systems
that would otherwise alter the literal data into something other
than its exact physical representation, you must prepare both
handles. While you could use a ":raw
" discipline
directly in the GIF open, you can't do that so easily with
pre-opened filehandles like STDOUT
:
binmode STDOUT; open(GIF, "vim-power.gif") or die "Can't open vim-power.gif: $! "; binmode GIF; while (read(GIF, $buf, 1024)) { print STDOUT $buf; }
blessREF
,CLASSNAME
blessREF
This function tells the referent pointed to by
reference REF
that it is now an object in
the CLASSNAME
package--or the current
package if no CLASSNAME
is specified. If
REF
is not a valid reference, an
exception is raised. For convenience, bless
returns the reference, since it's often the last function in a
constructor subroutine. For example:
$pet = Beast->new(TYPE => "cougar", NAME => "Clyde"); # then in Beast.pm: sub new { my $class = shift; my %attrs = @_; my $self = { %attrs }; return bless($self, $class); }
You should generally bless objects into
CLASSNAME
s that are mixed case. Namespaces
with all lowercase names are reserved for internal use as Perl
pragmata (compiler directives). Built-in types (such as
"SCALAR
", "ARRAY
",
"HASH
", etc., not to mention the base class of
all classes, "UNIVERSAL
") all have uppercase
names, so you may wish to avoid such package names as well.
Make sure that CLASSNAME
is not
false; blessing into false packages is not supported and may result
in unpredictable behavior.
It is not a bug that there is no corresponding
curse
operator. (But there is a
sin
operator.) See also Chapter 12, for more about the
blessing (and blessings) of objects.
caller EXPR
caller
This function returns information about the stack of current subroutine calls and such. Without an argument, it returns the package name, filename, and line number that the currently executing subroutine was called from:
($package, $filename, $line) = caller;
Here's an example of an exceedingly picky function, making use
of the special tokens __PACKAGE__
and
__FILE__
described in Chapter 2:
sub careful { my ($package, $filename) = caller; unless ($package eq __PACKAGE__ && $filename eq __FILE__) { die "You weren't supposed to call me, $package! "; } print "called me safely "; } sub safecall { careful(); }
When called with an argument, caller
evaluates EXPR
as the number of stack
frames to go back before the current one. For example, an argument
of 0 means the current stack frame, 1 means the caller, 2 means the
caller's caller, and so on. The function also reports additional
information as shown here:
$i = 0; while (($package, $filename, $line, $subroutine, $hasargs, $wantarray, $evaltext, $is_require, $hints, $bitmask) = caller($i++) ) { … }
If the frame is a subroutine call, $hasargs
is true if it has its own @_
array (not one
borrowed from its caller). Otherwise, $subroutine
may be "(eval)
" if the frame is not a subroutine
call, but an eval
. If so, additional elements
$evaltext
and $is_require
are
set: $is_require
is true if the frame is created
by a require
or use
statement,
and $evaltext
contains the text of the
eval
EXPR
statement.
In particular, for a eval
BLOCK
statement,
$filename
is "(eval)
", but
$evaltext
is undefined. (Note also that each
use
statement creates a
require
frame inside an eval
EXPR
frame.) The
$hints
and $bitmask
are
internal values; please ignore them unless you're a member of the
thaumatocracy.
In a fit of even deeper magic,
caller
also sets the array
@DB::args
to the arguments passed in the given
stack frame--but only when called from within the
DB
package. See Chapter 20.
chdir EXPR
chdir
This function changes the current process's working
directory to EXPR
, if possible. If
EXPR
is omitted, the caller's home
directory is used. The function returns true upon success, false
otherwise.
chdir "$prefix/lib" or die "Can't cd to $prefix/lib: $! ";
See also the Cwd
module, described in Chapter 32, which lets you keep track
of your current directory automatically.
chmod LIST
This function changes the permissions of a list of files. The first element of the list must be the numerical mode, as in the chmod (2) syscall. The function returns the number of files successfully changed. For example:
$cnt = chmod 0755, 'file1', 'file2';
will set $cnt
to 0
,
1
, or 2
, depending on how many
files were changed. Success is measured by lack of error, not by an
actual change, because a file may have had the same mode before the
operation. An error probably means you lacked sufficient privileges
to change its mode because you were neither the file's owner nor the
superuser. Check $!
to find the actual reason for
failure.
Here's a more typical usage:
chmod(0755, @executables) == @executables or die "couldn't chmod some of @executables: $!";
If you need to know which files didn't allow the change, use something like this:
@cannot = grep {not chmod 0755, $_} 'file1', 'file2', 'file3'; die "$0: could not chmod @cannot " if @cannot;
This idiom makes use of the
grep
function to select only those elements of
the list for which the chmod
function
failed.
When using nonliteral mode data, you may need to convert an
octal string to a number using the oct
function.
That's because Perl doesn't automatically assume a string contains
an octal number just because it happens to have a leading
"0".
$DEF_MODE = 0644; # Can't use quotes here! PROMPT: { print "New mode? "; $strmode = <STDIN>; exit unless defined $strmode; # test for eof if ($strmode =~ /^s*$/) { # test for blank line $mode = $DEF_MODE; } elsif ($strmode !~ /^d+$/) { print "Want numeric mode, not $strmode "; redo PROMPT; } else { $mode = oct($strmode); # converts "755" to 0755 } chmod $mode, @files; }
This function works with numeric modes much like the Unix
chmod (2) syscall. If you want a
symbolic interface like the one the chmod
(1) command provides, see the
File::chmod
module on CPAN.
You can also import the symbolic S_I*
constants from the Fcntl
module:
use Fcntl ':mode'; chmod S_IRWXU|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH, @executables;
Some people consider that more readable than
0755
. Go figure.
chompVARIABLE
chompLIST
chomp
This function (normally) deletes a trailing newline
from the end of a string contained in a variable. This is a slightly
safer version of chop
(described next) in that it
has no effect upon a string that doesn't end in a newline. More
specifically, it deletes the terminating string corresponding to the
current value of $/
, and not just any last
character.
Unlike chop
, chomp
returns the number of characters deleted. If $/
is "" (in paragraph mode), chomp
removes all
trailing newlines from the selected string (or strings, if chomping
a LIST
). You cannot
chomp
a literal, only a variable.
For example:
while (<PASSWD>) { chomp; # avoid on last field @array = split /:/; … }
With version 5.6, the meaning of chomp
changes slightly in that input disciplines are allowed to override
the value of the $/
variable and mark strings as
to how they should be chomped. This has the advantage that an input
discipline can recognize more than one variety of line terminator
(such as Unicode paragraph and line separators), but still safely
chomp
whatever terminates the current
line.
chopVARIABLE
chopLIST
chop
This function chops off the last character of a
string variable and returns the character chopped. The
chop
operator is used primarily to remove the
newline from the end of an input record, and is more efficient than
using a substitution. If that's all you're doing, then it would be
safer to use chomp
, since chop
always shortens the string no matter what's there, and
chomp
is more selective.
You cannot chop
a literal, only a
variable.
If you chop
a
LIST
of variables, each string in the
list is chopped:
@lines = `cat myfile`; chop @lines;
You can chop
anything that is an lvalue,
including an assignment:
chop($cwd = `pwd`); chop($answer = <STDIN>);
This is different from:
$answer = chop($tmp = <STDIN>); # WRONG
which puts a newline into $answer
because chop
returns the character chopped, not
the remaining string (which is in $tmp
). One way
to get the result intended here is with
substr
:
$answer = substr <STDIN>, 0, -1;
But this is more commonly written as:
chop($answer = <STDIN>);
In the most general case, chop
can be
expressed in terms of substr
:
$last_char = chop($var); $last_char = substr($var, -1, 1, ""); # same thing
Once you understand this equivalence, you can use it to do
bigger chops. To chop more than one character, use
substr
as an lvalue, assigning a null string. The
following removes the last five characters of
$caravan
:
substr($caravan, -5) = "";
The negative subscript causes substr
to
count from the end of the string instead of the beginning. If you
wanted to save the characters so removed, you could use the
four-argument form of substr
, creating something
of a quintuple chop:
$tail = substr($caravan, -5, 5, "");
chown LIST
This function changes the owner and group of a list
of files. The first two elements of the list must be the
numeric UID and GID, in that order. A value of
-1
in either position is interpreted by most
systems to leave that value unchanged. The function returns the
number of files successfully changed. For example:
chown($uidnum, $gidnum, 'file1', 'file2') == 2 or die "can't chown file1 or file2: $!";
will set $cnt
to 0
,
1
, or 2
, depending on how many
files got changed (in the sense that the operation succeeded, not in
the sense that the owner was different afterward). Here's a more
typical usage:
chown($uidnum, $gidnum, @filenames) == @filenames or die "can't chown @filenames: $!";
Here's a subroutine that accepts a username, looks up the user
and group IDs for you, and does the chown
:
sub chown_by_name { my($user, @files) = @_; chown((getpwnam($user))[2,3], @files) == @files or die "can't chown @files: $!"; } chown_by_name("fred", glob("*.c"));
However, you may not want the group changed as
the previous function does, because the
/etc/passwd file associates each user with a
single group even though that user may be a member of many secondary
groups according to /etc/group. An alternative
is to pass a -1
for the GID, which leaves the
group of the file unchanged. If you pass a -1
as
the UID and a valid GID, you can set the group without altering the
owner.
On most systems, you are not allowed to change the ownership of the file unless you're the superuser, although you should be able to change the group to any of your secondary groups. On insecure systems, these restrictions may be relaxed, but this is not a portable assumption. On POSIX systems, you can detect which rule applies like this:
use POSIX qw(sysconf _PC_CHOWN_RESTRICTED); # only try if we're the superuser or on a permissive system if ($> == 0 || !sysconf(_PC_CHOWN_RESTRICTED) ) { chown($uidnum, -1, $filename) or die "can't chown $filename to $uidnum: $!"; }
chr NUMBER
chr
This function returns the character represented by that
NUMBER
in the character set. For example,
chr(65)
is "A
" in either ASCII
or Unicode, and chr(0x263a)
is a Unicode smiley
face. For the reverse of chr
, use
ord
.
If you'd rather specify your characters by name than by number
(for example, "N{WHITE SMILING FACE}
" for a
Unicode smiley), see charnames
in Glossary.
chroot FILENAME
chroot
If successful, FILENAME
becomes the new root directory for the current process--the starting
point for pathnames beginning with "/
". This
directory is inherited across exec
calls and by
all subprocesses fork
ed after the
chroot
call. There is no way to undo a
chroot
. For security reasons, only the superuser
can use this function. Here's some code that approximates what many
FTP servers do:
chroot((getpwnam('ftp'))[7]) or die "Can't do anonymous ftp: $! ";
This function is unlikely to work on non-Unix systems. See chroot (2).
close FILEHANDLE
close
This function closes the file, socket, or pipe
associated with FILEHANDLE
. (It closes
the currently selected filehandle if the argument is omitted.) It
returns true if the close is successful, false otherwise. You don't
have to close FILEHANDLE
if you are
immediately going to do another open
on it, since
the next open
will close it for you. (See
open
.) However, an explicit
close
on an input file resets the line counter
($
.), while the implicit close done by
open
does not.
FILEHANDLE
may be an expression
whose value can be used as an indirect filehandle (either the real
filehandle name or a reference to anything that can be interpreted
as a filehandle object).
If the filehandle came from a piped open,
close
will return false if any underlying syscall
fails or if the program at the other end of the pipe exited with
nonzero status. In the latter case, the close
forces $!
($OS_ERROR
) to zero.
So if a close
on a pipe returns a nonzero status,
check $!
to determine whether the problem was
with the pipe itself (nonzero value) or with the program at the
other end (zero value). In either event, $?
($CHILD_ERROR
) contains the wait status value
(see its interpretation under system
) of the
command associated with the other end of the pipe. For
example:
open(OUTPUT, '| sort -rn | lpr -p') # pipe to sort and lpr or die "Can't start sortlpr pipe: $!"; print OUTPUT @lines; # print stuff to output close OUTPUT # wait for sort to finish or warn $! ? "Syserr closing sortlpr pipe: $!" : "Wait status $? from sortlpr pipe";
A filehandle produced by dup (2)ing
a pipe is treated as an ordinary filehandle, so
close
will not wait for the child on that
filehandle. You have to wait for the child by closing the original
filehandle. For example:
open(NETSTAT, "netstat -rn |") or die "can't run netstat: $!"; open(STDIN, "<&NETSTAT") or die "can't dup to stdin: $!";
If you close STDIN
above, there is no wait,
but if you close NETSTAT
, there is.
If you somehow manage to reap an exited pipe child on your
own, the close will fail. This could happen if you had a
$SIG{CHLD}
handler of your own that got triggered
when the pipe child exited, or if you intentionally called
waitpid
on the process ID returned from the
open
call.
closedir DIRHANDLE
This function closes a directory opened by
opendir
and returns the success of that
operation. See the examples under readdir
.
DIRHANDLE
may be an expression whose
value can be used as an indirect dirhandle, usually the real
dirhandle name.
connectSOCKET
,NAME
This function initiates a connection with another
process that is waiting at an accept
. The
function returns true if it succeeded, false otherwise.
NAME
should be a packed network address
of the proper type for the socket. For example, assuming
SOCK
is a previously created socket:
use Socket; my ($remote, $port) = ("www.perl.com", 80); my $destaddr = sockaddr_in($port, inet_aton($remote)); connect SOCK, $destaddr or die "Can't connect to $remote at port $port: $!";
To disconnect a socket, use either close
or
shutdown
. See also the examples in Section 16.5 in Chapter 16. See
connect (2).
cos EXPR
cos
This function returns the cosine of
EXPR
(expressed in radians). For example,
the following script will print a cosine table of angles measured in
degrees:
# Here's the lazy way of getting degrees-to-radians. $pi = atan2(1,1) * 4; $piover180 = $pi/180; # Print table. for ($deg = 0; $deg <= 90; $deg++) { printf "%3d %7.5f ", $deg, cos($deg * $piover180); }
For the inverse cosine operation, you may use the
acos()
function from the
Math::Trig
or POSIX
modules,
or use this relation:
sub acos { atan2( sqrt(1 - $_[0] * $_[0]), $_[0] ) }
cryptPLAINTEXT
,SALT
This function computes a one-way hash of a string exactly in the manner of crypt (3). This is somewhat useful for checking the password file for lousy passwords,[3] although what you really want to do is prevent people from adding the bad passwords in the first place.
crypt
is intended to be a one-way function,
much like breaking eggs to make an omelette. There is no (known) way
to decrypt an encrypted password apart from exhaustive, brute-force
guessing.
When verifying an existing encrypted string, you should use
the encrypted text as the SALT
(like
crypt($plain, $crypted) eq $crypted
). This allows
your code to work with the standard crypt
, and
with more exotic implementations, too.
When choosing a new SALT
, you
minimally need to create a random two character string whose
characters come from the set [./0-9A-Za-z]
(like
join '', ('.', '/', 0..9, 'A'..'Z', 'a'..'z')[rand 64, rand
64]
). Older implementations of crypt
only needed the first two characters of the
SALT
, but code that only gives the first
two characters is now considered nonportable. See your local
crypt (3) manpage for interesting
details.
Here's an example that makes sure that whoever runs this program knows their own password:
$pwd = (getpwuid ($<))[1]; # Assumes we're on Unix. system "stty -echo"; # or look into Term::ReadKey on CPAN print "Password: "; chomp($word = <STDIN>); print " "; system "stty echo"; if (crypt($word, $pwd) ne $pwd) { die "Sorry… "; } else { print "ok "; }
Of course, typing in your own password to whoever asks for it is unwise.
Shadow password files are slightly more secure than
traditional password files, and you might have to be a superuser to
access them. Because few programs should run under such powerful
privileges, you might have the program maintain its own independent
authentication system by storing the
crypt
strings in a different file than
/etc/passwd or
/etc/shadow.
The crypt
function is unsuitable for
encrypting large quantities of data, not least of all because you
can't get the information back. Look at the
by-module/Crypt and
by-module/PGP directories on your favorite CPAN
mirror for a slew of potentially useful modules.
dbmclose HASH
This function breaks the binding between a DBM
(database management) file and a hash. dbmclose
is really just a call to
untie
with the proper arguments, but is provided
for backward compatibility with ancient versions of Perl.
dbmopenHASH
,DBNAME
,MODE
This binds a DBM file to a hash (that is, an
associative array). (DBM stands for database management, and
consists of a set of C library routines that allow random access to
records via a hashing algorithm.) HASH
is
the name of the hash (including the %
).
DBNAME
is the name of the database
(without any .dir or .pag
extension). If the database does not exist and a valid
MODE
is specified, the database is
created with the protection specified by
MODE
, as modified by the umask. To
prevent creation of the database if it doesn't exist, you may
specify a MODE
of
undef
, and the function will return false if it
can't find an existing database. Values assigned to the hash prior
to the dbmopen
are not accessible.
The dbmopen
function is really just a call
to tie
with the proper arguments, but is provided
for backward compatibility with ancient versions of Perl. You can
control which DBM library you use by using the
tie
interface directly or by loading the
appropriate module before you call dbmopen
.
Here's an example that works on some systems for versions of
DB_File
similar to the version in your Netscape
browser:
use DB_File; dbmopen(%NS_Hist, "$ENV{HOME}/.netscape/history.dat", undef) or die "Can't open netscape history file: $!"; while (($url, $when) = each %NS_Hist) { next unless defined($when); chop ($url, $when); # kill trailing null bytes printf "Visited %s at %s. ", $url, scalar(localtime(unpack("V",$when))); }
If you don't have write access to the DBM file, you can only
read the hash variables, not set them. If you want to test whether
you can write, either use a file test like -w
$file
, or try setting a dummy hash entry inside an
eval {}
, which will trap the exception.
Functions such as keys
and
values
may return huge list values when used on
large DBM files. You may prefer to use the each
function to iterate over large DBM files so that you don't load the
whole thing in memory at once.
Hashes bound to DBM files have the same limitations as the
type of DBM package you're using, including restrictions on how much
data you can put into a bucket. If you stick to short keys and
values, it's rarely a problem. See also the
DB_File
module in Chapter 32.
Another thing you should bear in mind is that many existing
DBM databases contain null-terminated keys and values because they
were set up with C programs in mind. The Netscape history file and
the old sendmail aliases file are examples.
Just use "$key
" when pulling out a value, and
remove the null from the value.
$alias = $aliases{"postmaster "}; chop $alias; # kill the null
There is currently no built-in way to lock a generic
DBM file. Some would consider this a bug. The
GDBM_File
module does attempt to provide locking
at the granularity of the entire file. When in doubt, your best bet
is to use a separate lock file.
defined EXPR
defined
This function returns a Boolean value saying whether
EXPR
has a defined value or not. Most of
the data you deal with is defined, but a scalar that contains no
valid string, numeric, or reference value is said to contain the
undefined value, or undef
for short. Initializing
a scalar variable to a particular value will define it, and it will
stay defined until you assign an undefined value to it or explicitly
call the undef
function on that variable.
Many operations return undef
under
exceptional conditions, such as at end-of-file, when using an
uninitialized variable's value, an operating system error, etc.
Since undef
is just one kind of false value, a
simple Boolean test does not distinguish between
undef
, numeric zero, the null string, and the
one-character string, "0
"--all of which are
equally false. The defined
function allows you to
distinguish between an undefined null string and a defined null
string when you're using operators that might return a real null
string.
Here is a fragment that tests a scalar value from a hash:
print if defined $switch{D};
When used on a hash element like this,
defined
only tells you whether the value is
defined, not whether the key has an entry in the hash. It's possible
to have a key whose value is undefined; the key itself however still
exists. Use exists
to determine whether the hash
key exists.
In the next example we exploit the convention that some operations return the undefined value when you run out of data:
print "$val " while defined($val = pop(@ary));
And in this one, we do the same thing with the
getpwent
function for retrieving information
about the system's users.
setpwent(); while (defined($name = getpwent())) { print "<<$name>> "; } endpwent();
The same thing goes for error returns from syscalls that could validly return a false value:
die "Can't readlink $sym: $!" unless defined($value = readlink $sym);
You may also use defined
to see whether a
subroutine has been defined yet. This makes it possible to avoid
blowing up on nonexistent subroutines (or subroutines that have been
declared but never given a definition):
indir("funcname", @arglist); sub indir { my $subname = shift; no strict 'refs'; # so we can use subname indirectly if (defined &$subname) { &$subname(@_); # or $subname->(@_); } else { warn "Ignoring call to invalid function $subname"; } }
Use of defined
on aggregates (hashes and
arrays) is deprecated. (It used to report whether memory for that
aggregate had ever been allocated.) Instead, use a simple Boolean
test to see whether the array or hash has any elements:
if (@an_array) { print "has array elements " } if (%a_hash) { print "has hash members " }
See also undef
and
exists
.
delete EXPR
This function deletes an element (or a slice of
elements) from the specified hash or array. (See
unlink
if you want to delete a file.) The deleted
elements are returned in the order specified, though this behavior
is not guaranteed for tied variables such as DBM files. After the
delete operation, the exists
function will return
false on any deleted key or index. (In contrast, after the
undef
function, the exists
function continues to return true, because the
undef
function only undefines the value of the
element, but doesn't delete the element itself.)
Deleting from the %ENV
hash modifies the
environment. Deleting from a hash that is bound to a (writable) DBM
file deletes the entry from that DBM file.
Historically, you could only delete from a hash, but with Perl
version 5.6 you may also delete from an array. Deleting from an
array causes the element at the specified position to revert to a
completely uninitialized state, but it doesn't close up the gap,
since that would change the positions of all the subsequent entries.
Use a splice
for that. (However, if you delete
the final element in an array, the array size will shrink by one (or
more, depending on the position of the next largest existing element
(if any))).
EXPR
can be arbitrarily
complicated, provided that the final operation is a hash or array
lookup:
# set up array of array of hash $dungeon[$x][$y] = \%properties; # delete one property from hash delete $dungeon[$x][$y]{"OCCUPIED"}; # delete three properties all at once from hash delete @{ $dungeon[$x][$y] }{ "OCCUPIED", "DAMP", "LIGHTED" }; # delete reference to %properties from array delete $dungeon[$x][$y];
The following naïve example inefficiently deletes all the
values of a %hash
:
foreach $key (keys %hash) { delete $hash{$key}; }
And so does this:
delete @hash{keys %hash};
But both of these are slower than just assigning the empty list or undefining it:
%hash = (); # completely empty %hash undef %hash; # forget %hash ever existed
Likewise for arrays:
foreach $index (0 .. $#array) { delete $array[$index]; }
and:
delete @array[0 .. $#array];
are less efficient than either of:
@array = (); # completely empty @array undef @array; # forget @array ever existed
die LIST
die
Outside an eval
, this function
prints the concatenated value of LIST
to
STDERR
and exits with the current value of
$!
(the C-library errno
variable). If $!
is 0, it exits with the value of
$? >> 8
(which is the status of the last
reaped child from a system
,
wait
, close
on a pipe, or
`command`
). If $? >> 8
is 0, it exits with 255.
Within an eval
, the function sets the
$@
variable to the error message that would have
otherwise been produced, then aborts the eval
,
which returns undef
. The die
function can thus be used to raise named exceptions that can be
caught at a higher level in the program. See eval
later in this chapter.
If LIST
is a single object
reference, that object is assumed to be an exception object and is
returned unmodified as the exception in
$@
.
If LIST
is empty and
$@
already contains a string value (typically
from a previous eval
) that value is reused after
appending " …propagated
". This is useful for
propagating (reraising) exceptions:
eval { … }; die unless $@ =~ /Expected exception/;
If LIST
is empty and
$@
already contains an exception object, the
$@->PROPAGATE
method is called to determine
how the exception should be propagated.
If LIST
is empty and
$@
is empty, then the string
"Died
" is used.
If the final value of LIST
does not
end in a newline (and you're not passing an exception object), the
current script filename, line number, and input line number (if any)
are appended to the message, as well as a newline. Hint: sometimes
appending ", stopped
" to your message will cause
it to make better sense when the string "at scriptname line
123
" is appended. Suppose you are running script
canasta; consider the difference between the
following two ways of dying:
die "/usr/games is no good"; die "/usr/games is no good, stopped";
which produce, respectively:
/usr/games is no good at canasta line 123. /usr/games is no good, stopped at canasta line 123.
If you want your own error messages reporting the filename and
line number, use the __FILE__
and
__LINE__
special tokens:
die '"', __FILE__, '", line ', __LINE__, ", phooey on you! ";
This produces output like:
"canasta", line 38, phooey on you!
One other style issue--consider the following equivalent examples:
die "Can't cd to spool: $! " unless chdir '/usr/spool/news'; chdir '/usr/spool/news' or die "Can't cd to spool: $! "
Because the important part is the chdir
,
the second form is generally preferred.
See also exit
, warn
,
%SIG
, and the Carp
module.
do BLOCK
The do
BLOCK
form executes the sequence of
statements in the BLOCK
and returns the
value of the last expression evaluated in the block. When modified
by a while
or until
statement
modifier, Perl executes the BLOCK
once
before testing the loop condition. (On other statements the loop
modifiers test the conditional first.) The do
BLOCK
itself does
not count as a loop, so the loop control
statements next
, last
, or
redo
cannot be used to leave or restart the
block. See Section 4.5 in
Chapter 4, for
workarounds.
do FILE
The do
FILE
form uses the value of
FILE
as a filename and executes the
contents of the file as a Perl script. Its primary use is (or rather
was) to include subroutines from a Perl subroutine library, so
that:
do 'stat.pl';
is rather like:
scalar eval `cat stat.pl`; # `type stat.pl` on Windows
except that do
is more efficient, more
concise, keeps track of the current filename for error messages,
searches all the directories listed in the @INC
array, and updates %INC
if the file is found.
(See Chapter 28.) It also
differs in that code evaluated with do
FILE
cannot see lexicals in the enclosing
scope, whereas code in eval
FILE
does. It's the same, however, in
that it reparses the file every time you call it--so you might not
want to do this inside a loop unless the filename itself changes at
each loop iteration.
If do
can't read the file, it returns
undef
and sets $!
to the
error. If do
can read the file but can't compile
it, it returns undef
and sets an error message in
$@
. If the file is successfully compiled,
do
returns the value of the last expression
evaluated.
Inclusion of library modules (which have a mandatory
.pm suffix) is better done with the
use
and require
operators,
which also do error checking and raise an exception if there's a
problem. They also offer other benefits: they avoid duplicate
loading, help with object-oriented programming, and provide hints to
the compiler on function prototypes.
But do
FILE
is
still useful for such things as reading program configuration files.
Manual error checking can be done this way:
# read in config files: system first, then user for $file ("/usr/share/proggie/defaults.rc", "$ENV{HOME}/.someprogrc") { unless ($return = do $file) { warn "couldn't parse $file: $@" if $@; warn "couldn't do $file: $!" unless defined $return; warn "couldn't run $file" unless $return; } }
A long-running daemon could periodically examine the timestamp
on its configuration file, and if the file has changed since it was
last read in, the daemon could use do
to reload
that file. This is more tidily accomplished with
do
than with require
or
use
.
doSUBROUTINE
(LIST
)
The do
SUBROUTINE
(
LIST
)
is a deprecated form of a subroutine call. An exception is raised if
the SUBROUTINE
is undefined. See Chapter 6.
dump LABEL
dump
This function causes an immediate core dump.
Primarily this is so that you can use the
undump program (not supplied) to turn your core
dump into an executable binary after having initialized all your
variables at the beginning of the program. When the new binary is
executed it will begin by executing a goto
LABEL
(with all the restrictions that
goto
suffers). Think of it as a goto with an
intervening core dump and reincarnation. If
LABEL
is omitted, the program is
restarted from the top. Warning: any files opened at the time of the
dump will not be open any more when the program
is reincarnated, with possible resulting confusion on the part of
Perl. See also the -u
command-line option in
Chapter 19.
This function is now largely obsolete, partly because it's difficult in the extreme to convert a core file into an executable in the general case, and because various compiler backends for generating portable bytecode and compilable C code have superseded it.
If you're looking to use dump
to speed up
your program, check out the discussion of efficiency matters in
Chapter 24, as well the Perl
native-code generator in Chapter
18. You might also consider autoloading or selfloading, which
at least make your program appear to run
faster.
each HASH
This function steps through a hash one key/value pair
at a time. When called in list context, each
returns a two-element list consisting of the key and value for the
next element of a hash, so that you can iterate over it. When called
in scalar context, each
returns just the key for
the next element in the hash. When the hash is entirely read, the
empty list is returned, which when assigned produces a false value
in scalar context, such as a loop test. The next call to
each
after that will start iterating again. The
typical use is as follows, using predefined %ENV
hash:
while (($key,$value) = each %ENV) { print "$key=$value "; }
Internally, a hash maintains its own entries in an apparently
random order. The each
function iterates through
this sequence because every hash remembers which entry was last
returned. The actual ordering of this sequence is subject to change
in future versions of Perl, but is guaranteed to be in the same
order as the keys
(or values
)
function would produce on the same (unmodified) hash.
There is a single iterator for each hash, shared by all
each
, keys
, and
values
function calls in the program; it can be
reset by reading all the elements from the hash, or by evaluating
keys %hash
or values %hash
. If
you add or delete elements of a hash while you're iterating over it,
the resulting behavior is not well-defined: entries might get
skipped or duplicated.
See also keys
, values
,
and sort
.
eof FILEHANDLE
eof()
eof
This function returns true if the next read on
FILEHANDLE
would return end-of-file, or
if FILEHANDLE
is not open.
FILEHANDLE
may be an expression whose
value gives the real filehandle, or a reference to a filehandle
object of some sort. An eof
without an argument
returns the end-of-file status for the last file read. An
eof()
with empty parentheses
()
tests the ARGV
filehandle
(most commonly seen as the null filehandle in
<>
). Therefore, inside a while
(<>)
loop, an eof()
with
parentheses will detect the end of only the last of a group of
files. Use eof
(without the parentheses) to test
each file in a while
(<>)
loop. For example, the following code inserts
dashes just before the last line of the last
file:
while (<>) { if (eof()) { print "-" x 30, " "; } print; }
On the other hand, this script resets line numbering on each input file:
# reset line numbering on each input file while (<>) { next if /^s*#/; # skip comments print "$. $_"; } continue { close ARGV if eof; # Not eof()! }
Like "$
" in a sed
program, eof
tends to show up in line number
ranges. Here's a script that prints lines from
/pattern/
to end of each input file:
while (<>) { print if /pattern/ .. eof; }
Here, the flip-flop operator (.
.) evaluates
the pattern match for each line. Until the pattern matches, the
operator returns false. When it finally matches, the operator starts
returning true, causing the lines to be printed. When the
eof
operator finally returns true (at the end of
the file being examined), the flip-flop operator resets, and starts
returning false again for the next file in
@ARGV
.
Warning: The eof
function reads a byte and
then pushes it back on the input stream with
ungetc (3), so it is not useful in an
interactive context. In fact, experienced Perl programmers rarely
use eof
, since the various input operators
already behave politely in while
-loop
conditionals. See the example in the description of
foreach
in Chapter
4.
evalBLOCK
evalEXPR
eval
The eval
keyword serves two
distinct but related purposes in Perl. These purposes are
represented by two forms of syntax, eval
BLOCK
and eval
EXPR
. The first form traps run-time
exceptions (errors) that would otherwise prove fatal, similar to the
"try block" construct in C++ or Java. The second form compiles and
executes little bits of code on the fly at run time, and also
(conveniently) traps any exceptions just like the first form. But
the second form runs much slower than the first form, since it must
parse the string every time. On the other hand, it is also more
general. Whichever form you use, eval
is the
preferred way to do all exception handling in Perl.
For either form of eval
, the value returned
from an eval
is the value of the last expression
evaluated, just as with subroutines. Similarly, you may use the
return
operator to return a value from the middle
of the eval
. The expression providing the return
value is evaluated in void, scalar, or list context, depending on
the context of the eval
itself. See
wantarray
for more on how the evaluation context
can be determined.
If there is a trappable error (including any produced by the
die
operator), eval
returns
undef
and puts the error message (or object) in
$@
. If there is no error, $@
is guaranteed to be set to the null string, so you can test it
reliably afterward for errors. A simple Boolean test
suffices:
eval { … }; # trap run-time errors if ($@) { … } # handle error
The eval
BLOCK
form is syntax-checked at compile time, so it is quite efficient.
(People familiar with the slow eval
EXPR
form are occasionally confused on
this issue.) Since the code in the BLOCK
is compiled at the same time as the surrounding code, this form of
eval
cannot trap syntax errors.
The eval
EXPR
form can trap syntax errors because it parses the code at run time.
(If the parse is unsuccessful, it places the parse error in
$@
, as usual.) Otherwise, it executes the value
of EXPR
as though it were a little Perl
program. The code is executed in the context of the current Perl
program, which means that it can see any enclosing lexicals from a
surrounding scope, and that any non-local variable settings remain
in effect after the eval
is complete, as do any
subroutine or format definitions. The code of the
eval
is treated as a block, so any locally scoped
variables declared within the eval
last only
until the eval
is done. (See
my
and local
.) As with any
code in a block, a final semicolon is not required.
Here is a simple Perl shell. It prompts the user to enter a string of arbitrary Perl code, compiles and executes that string, and prints whatever error occurred:
print " Enter some Perl code: "; while (<STDIN>) { eval; print $@; print " Enter some more Perl code: "; }
Here is a rename program to do a mass renaming of files using a Perl expression:
#!/usr/bin/perl # rename - change filenames $op = shift; for (@ARGV) { $was = $_; eval $op; die if $@; # next line calls the built-in function, not the script by the same name rename($was,$_) unless $was eq $_; }
You'd use that program like this:
$ rename 's/.orig$//' *.orig $ rename 'y/A-Z/a-z/ unless /^Make/' * $ rename '$_ .= ".bad"' *.f
Since eval
traps errors that would
otherwise prove fatal, it is useful for determining whether
particular features (such as fork
or
symlink
) are implemented.
Because eval
BLOCK
is syntax-checked at compile time,
any syntax error is reported earlier. Therefore, if your code is
invariant and both eval
EXPR
and eval
BLOCK
will suit your purposes equally
well, the BLOCK
form is preferred. For
example:
# make divide-by-zero nonfatal eval { $answer = $a / $b; }; warn $@ if $@; # same thing, but less efficient if run multiple times eval '$answer = $a / $b'; warn $@ if $@; # a compile-time syntax error (not trapped) eval { $answer = }; # WRONG # a run-time syntax error eval '$answer ='; # sets $@
Here, the code in the BLOCK
has to
be valid Perl code to make it past the compile phase. The code in
the EXPR
doesn't get examined until run
time, so it doesn't cause an error until run time.
The block of eval
BLOCK
does not count
as a loop, so the loop control statements next
,
last
, or redo
cannot be used
to leave or restart the block.
execPATHNAME
LIST
execLIST
The exec
function terminates the
current program and executes an external command and never
returns!!! Use system
instead of
exec
if you want to recover control after the
commands exits. The exec
function fails and
returns false only if the command does not exist
and if it is executed directly instead of via
your system's command shell (discussed below).
If there is only one scalar argument, the argument is checked for shell metacharacters. If metacharacters are found, the entire argument is passed to the system's standard command interpreter (/bin/sh under Unix). If there are no metacharacters, the argument is split into words and executed directly, since in the interests of efficiency this bypasses all the overhead of shell processing. It also gives you more control of error recovery should the program not exist.
If there is more than one argument in
LIST
, or if
LIST
is an array with more than one
value, the system shell will never be used. This also bypasses any
shell processing of the command. The presence or absence of
metacharacters in the arguments doesn't affect this list-triggered
behavior, which makes it the preferred form in security-conscious
programs that do not wish to expose themselves to potential shell
escapes.
This example causes the currently running Perl program to replace itself with the echo program, which then prints out the current argument list:
exec 'echo', 'Your arguments are: ', @ARGV;
This example shows that you can exec
a
pipeline, not just a single program.
exec "sort $outfile | uniq" or die "Can't do sort/uniq: $! ";
Ordinarily, exec
never returns--if it does
return, it always returns false, and you should check
$!
to find out what went wrong. Be aware that in
older releases of Perl, exec
(and
system
) did not flush your output buffer, so you
needed to enable command buffering by setting $|
on one or more filehandles to avoid lost output in the case of
exec
, or misordered output in the case of
system
. This situation was largely remedied in
the 5.6 release of Perl.
When you ask the operating system to execute a new
program within an existing process (as Perl's
exec
function does), you tell the system the
location of the program to execute, but you also tell the new
program (through its first argument) the name under which the
program was invoked. Customarily, the name you tell it is just a
copy of the location of the program, but it doesn't necessarily have
to be, since there are two separate arguments at the level of the C
language. When it is not a copy, you have the odd result that the
new program thinks it's running under a name that may be totally
different from the actual pathname where the program resides. Often
this doesn't matter to the program in question, but some programs do
care and adopt a different persona depending on what they think
their name is. For example, the vi editor looks
to see whether it was called as "vi
" or as
"view
". If invoked as "view
",
it automatically enables read-only mode, just as though it was
called with the -R
command-line
option.
This is where exec
's optional
PATHNAME
parameter comes into play.
Syntactically, it goes in the indirect-object slot like the
filehandle for print
or
printf
. Therefore, it doesn't take a comma after
it, because it's not exactly part of the argument list. (In a sense,
Perl takes the opposite approach from the operating system in that
it assumes the first argument is the important one, and lets you
modify the pathname if it differs.) For example:
$editor = "/usr/bin/vi"; exec $editor "view", @files # trigger read-only mode or die "Couldn't execute $editor: $! ";
As with any other indirect object, you can also replace the simple scalar holding the program name with a block containing arbitrary code, which simplifies the previous example to:
exec { "/usr/bin/vi" } "view" @files # trigger read-only mode or die "Couldn't execute $editor: $! ";
As we mentioned earlier, exec
treats a
discrete list of arguments as an indication that it should bypass
shell processing. However, there is one place where you might still
get tripped up. The exec
call (and
system
, too) will not distinguish between a
single scalar argument and an array containing only one
element.
@args = ("echo surprise"); # just one element in list exec @args # still subject to shell escapes or die "exec: $!"; # because @args == 1
To avoid this, you can use the
PATHNAME
syntax, explicitly duplicating
the first argument as the pathname, which forces the rest of the
arguments to be interpreted as a list, even if there is only one of
them:
exec { $args[0] } @args # safe even with one-argument list or die "can't exec @args: $!";
The first version, the one without the curlies, runs the
echo program, passing it
"surprise
" as an argument. The second version
doesn't--it tries to run a program literally called echo
surprise, doesn't find it (we hope), and sets
$!
to a nonzero value indicating failure.
Because the exec
function is most often
used shortly after a fork
, it is assumed that
anything that normally happens when a Perl process terminates should
be skipped. Upon an exec
, Perl will not call your
END
blocks, nor will it call any
DESTROY
methods associated with any objects.
Otherwise, your child process would end up doing the cleanup you
expected the parent process to do. (We wish that were the case in
real life.)
Because it's such a common mistake to use
exec
instead of system
, Perl
warns you if there is a following statement that isn't
die
, warn
, or
exit
when run with the popular
-w
command-line option, or if you've used the
use warnings qw(exec syntax)
pragma. If you
really want to follow an exec
with some other
statement, you can use either of these styles to avoid the
warning:
exec ('foo') or print STDERR "couldn't exec foo: $!"; { exec ('foo') }; print STDERR "couldn't exec foo: $!";
As the second line above shows, a call to
exec
that is the last statement in a block is
exempt from this warning.
See also system
.
exists EXPR
This function returns true if the specified hash key or array index exists in its hash or array. It doesn't matter whether the corresponding value is true or false, or whether the value is even defined.
print "True " if $hash{$key}; print "Defined " if defined $hash{$key}; print "Exists " if exists $hash{$key}; print "True " if $array[$index]; print "Defined " if defined $array[$index]; print "Exists " if exists $array[$index];
An element can be true only if it's defined, and can be defined only if it exists, but the reverse doesn't necessarily hold.
EXPR
can be arbitrarily
complicated, provided that the final operation is a hash key or
array index lookup:
if (exists $hash{A}{B}{$key}) { … }
Although the last element will not spring into
existence just because its existence was tested, intervening ones
will. Thus $$hash{"A"}
and
$hash{"A"}->{"B"}
will both spring into
existence. This is not a function of exists
,
per se; it happens anywhere the arrow operator
is used (explicitly or implicitly):
undef $ref; if (exists $ref->{"Some key"}) { } print $ref; # prints HASH(0x80d3d5c)
Even though the "Some key
" element didn't
spring into existence, the previously undefined
$ref
variable did suddenly come to hold an
anonymous hash. This is a surprising instance of
autovivification in what does not at first--or
even second--glance appear to be an lvalue context. This behavior is
likely to be fixed in a future release. As a workaround, you can
nest your calls:
if ($ref and exists $ref->[$x] and exists $ref->[$x][$y] and exists $ref->[$x][$y]{$key} and exists $ref->[$x][$y]{$key}[2] ) { … }
If EXPR
is the name of a
subroutine, the exists
function will return true
if that subroutine has been declared, even if it has not yet been
defined. The following will just print
"Exists
":
sub flub; print "Exists " if exists &flub; print "Defined " if defined &flub;
Using exists
on a subroutine name can be
useful for an AUTOLOAD
subroutine that needs to
know whether a particular package wants a particular subroutine to
be defined. The package can indicate this by declaring a stub
sub
like flub
.
exit EXPR
exit
This function evaluates
EXPR
as an integer and exits immediately
with that value as the final error status of the program. If
EXPR
is omitted, the function exits with
0
status (meaning "no error"). Here's a fragment
that lets a user exit the program by typing x
or
X
:
$ans = <STDIN>; exit if $ans =~ /^[Xx]/;
You shouldn't use exit
to abort a
subroutine if there's any chance that someone might want to trap
whatever error happened. Use die
instead, which
can be trapped by an eval
. Or use one of
die
's wrappers from the Carp
module, like croak
or
confess
.
We said that the exit
function exits
immediately, but that was a bald-faced lie. It exits as soon as
possible, but first it calls any defined END
routines for at-exit handling. These routines cannot abort the exit,
although they can change the eventual exit value by setting the
$?
variable. Likewise, any class that defines a
DESTROY
method will invoke that method on behalf
of all its objects before the real program exits. If you really need
to bypass exit processing, you can call the POSIX
module's _exit
function to avoid all
END
and destructor processing. And if
POSIX
isn't available, you can exec
"/bin/false
" or some such.
exp EXPR
exp
This function returns e to the
power of EXPR
. To get the value of
e, just use exp(1)
. For
general exponentiation of different bases, use the
**
operator we stole from FORTRAN:
use Math::Complex; print -exp(1) ** (i * pi); # prints 1
fcntlFILEHANDLE
,FUNCTION
,SCALAR
This function calls your operating system's file
control functions, as documented in the fcntl
(2) manpage. Before you call fcntl
,
you'll probably first have to say:
use Fcntl;
to load the correct constant definitions.
SCALAR
will be read or written (or
both) depending on the FUNCTION
. A
pointer to the string value of SCALAR
will be passed as the third argument of the actual
fcntl call. (If
SCALAR
has no string value but does have
a numeric value, that value will be passed directly rather than
passing a pointer to the string value.) See the
Fcntl
module for a description of the more common
permissible values for FUNCTION
.
The fcntl
function will raise an exception
if used on a system that doesn't implement
fcntl (2). On systems that do
implement it, you can do such things as modify the close-on-exec
flags (if you don't want to play with the $^F
($SYSTEM_FD_MAX
) variable), modify the
nonblocking I/O flags, emulate the lockf
(3) function, and arrange to receive the
SIGIO
signal when I/O is pending.
Here's an example of setting a filehandle named
REMOTE
to be nonblocking at the system level.
This makes any input operation return immediately if nothing is
available when reading from a pipe, socket, or serial line that
would otherwise block. It also works to cause output operations that
normally would block to return a failure status instead. (For those,
you'll likely have to negotiate $|
as
well.)
use Fcntl qw(F_GETFL F_SETFL O_NONBLOCK); $flags = fcntl(REMOTE, F_GETFL, 0) or die "Can't get flags for the socket: $! "; $flags = fcntl(REMOTE, F_SETFL, $flags | O_NONBLOCK) or die "Can't set flags for the socket: $! ";
The return value of fcntl
(and
ioctl
) is as follows:
Thus Perl returns true on success and false on failure, yet you can still easily determine the actual value returned by the operating system:
$retval = fcntl(…) || -1; printf "fcntl actually returned %d ", $retval;
Here, even the string "0 but true
"
prints as 0, thanks to the %d
format. This string
is true in Boolean context and 0
in numeric
context. (It is also happily exempt from the normal warnings on
improper numeric conversions.)
fileno FILEHANDLE
This function returns the file descriptor underlying
a filehandle. If the filehandle is not open,
fileno
returns undef
. A
file descriptor is a small, non-negative
integer like 0 or 1, in contrast to filehandles like
STDIN
and STDOUT
, which are
symbols. Unfortunately, the operating system doesn't know about your
cool symbols. It only thinks of open files in terms of these small
file numbers, and although Perl will usually do the translations for
you automatically, occasionally you have to know the actual file
descriptor.
So, for example, the fileno
function is
useful for constructing bitmaps for select
and
for passing to certain obscure system calls if
syscall (2) is implemented. It's also
useful for double-checking that the open
function
gave you the file descriptor you wanted and for determining whether
two filehandles use the same system file descriptor.
if (fileno(THIS) == fileno(THAT)) { print "THIS and THAT are dups "; }
If FILEHANDLE
is an expression, the
value is taken as an indirect filehandle, generally its name or a
reference to something resembling a filehandle object.
One caution: don't count on the association of a Perl
filehandle and a numeric file descriptor throughout the life of the
program. If a file has been closed and reopened, the file descriptor
may change. Perl takes a bit of trouble to try to ensure that
certain file descriptors won't be lost if an open
on them fails, but it only does this for file descriptors that don't
exceed the current value of the special $^F
($SYSTEM_FD_MAX
) variable (by default, 2).
Although filehandles STDIN
,
STDOUT
, and STDERR
start out
with file descriptors of 0, 1, and 2 (the Unix standard convention),
even they can change if you start closing and opening them with wild
abandon. You can't get into trouble with 0, 1, and 2 as long as you
always reopen immediately after closing. The basic rule on Unix
systems is to pick the lowest available descriptor, and that'll be
the one you just closed.
flockFILEHANDLE
,OPERATION
The flock
function is Perl's
portable file-locking interface, although it locks only entire
files, not records. The function manages locks on the file
associated with FILEHANDLE
, returning
true for success and false otherwise. To avoid the possibility of
lost data, Perl flushes your FILEHANDLE
before locking or unlocking it. Perl might implement its
flock
in terms of flock
(2), fcntl (2),
lockf (3), or some other
platform-specific lock mechanism, but if none of these is available,
calling flock
raises an exception. See the
section "File Locking" in Chapter
16.
OPERATION
is one of
LOCK_SH
, LOCK_EX
, or
LOCK_UN
, possibly ORed with
LOCK_NB
. These constants are traditionally valued
1
, 2
, 8
,
and 4
, but you can use the symbolic names if you
import them from the Fcntl
module, either
individually or as a group using the :flock
tag.
LOCK_SH
requests a shared lock, so it's
typically used for reading. LOCK_EX
requests an
exclusive lock, so it's typically used for writing.
LOCK_UN
releases a previously requested lock;
closing the file also releases any locks. If the
LOCK_NB
bit is used with
LOCK_SH
or LOCK_EX
,
flock
returns immediately rather than waiting for
an unavailable lock. Check the return status to see whether you got
the lock you asked for. If you don't use LOCK_NB
,
you might wait indefinitely for the lock to be granted.
Another nonobvious but traditional aspect of
flock
is that its locks are merely
advisory. Discretionary locks are more flexible but offer
fewer guarantees than mandatory ones. This means that files locked
with flock
may be modified by programs that do
not also use flock
. Cars that stop for red lights
get on well with each other, but not with cars that don't stop for
red lights. Drive defensively.
Some implementations of flock
cannot lock
things over the network. While you could in theory use the more
system-specific fcntl
for that, the jury (having
sequestered itself on the case for a decade or so) is still out on
whether this is (or even can be) reliable.
Here's a mailbox appender for Unix systems that use flock (2) to lock mailboxes:
use Fcntl qw/:flock/; # import LOCK_* constants sub mylock { flock(MBOX, LOCK_EX) or die "can't lock mailbox: $!"; # in case someone appended while we were waiting # and our stdio buffer is out of sync seek(MBOX, 0, 2) or die "can't seek to the end of mailbox: $!"; } open(MBOX, ">>/usr/spool/mail/$ENV{'USER'}") or die "can't open mailbox: $!"; mylock(); print MBOX $msg, " "; close MBOX or die "can't close mailbox: $!";
On systems that support a real
flock (2) syscall, locks are inherited
across fork
calls. Other implementations are not
so lucky, and are likely to lose the locks across forks. See also
the DB_File
module in Chapter 32 for other
flock
examples.
fork
This function creates two processes out of one by
invoking the fork (2) syscall. If it
succeeds, the function returns the new child process's ID to the
parent process and 0 to the child process. If the system doesn't
have sufficient resources to allocate a new process, the call fails
and returns undef
. File descriptors (and
sometimes locks on those descriptors) are shared, while everything
else is copied—or at least made to look that way.
In versions of Perl prior to 5.6, unflushed buffers
remain unflushed in both processes, which means you may need to set
$|
on one or more filehandles earlier in the
program to avoid duplicate output.
A nearly bulletproof way to launch a child process while checking for "cannot fork" errors would be:
use Errno qw(EAGAIN); FORK: { if ($pid = fork) { # parent here # child process pid is available in $pid } elsif (defined $pid) { # $pid is zero here if defined # child here # parent process pid is available with getppid } elsif ($! == EAGAIN) { # EAGAIN is the supposedly recoverable fork error sleep 5; redo FORK; } else { # weird fork error die "Can't fork: $! "; } }
These precautions are not necessary
on operations that do an implicit fork (2),
such as system
, backticks, or opening a process
as a filehandle, because Perl automatically retries a fork on a
temporary failure when it's doing the fork
for
you. Be careful to end the child code with an
exit
, or else your child will inadvertently leave
the conditional block and start executing code intended only for the
parent process.
If you fork
without ever waiting on your
children, you will accumulate zombies (exited processes whose
parents haven't waited on them yet). On some systems, you can avoid
this by setting $SIG{CHLD}
to
"IGNORE
"; on most, you must
wait
for your moribund children. See the
wait
function for examples of doing this, or see
the "Signals" section of Chapter
16 for more on SIGCHLD
.
If a forked child inherits system file descriptors like
STDIN
and STDOUT
that are
connected to a remote pipe or socket, you may have to reopen these
in the child to /dev/null. That's because even
when the parent process exits, the child will live on with its
copies of those filehandles. The remote server (such as, say, a CGI
script or a background job launched from a remote shell) will appear
to hang because it's still waiting for all copies to be closed.
Reopening the system filehandles to something else fixes
this.
On most systems supporting fork (2),
great care has gone into making it extremely efficient (for example,
using copy-on-write technology on data pages), making it the
dominant paradigm for multitasking over the last few decades. The
fork
function is unlikely to be implemented
efficiently, or perhaps at all, on systems that don't resemble Unix.
For example, Perl 5.6 emulates a proper fork
even
on Microsoft systems, but no assurances can be made on performance
at this point. You might have more luck there with the
Win32::Process
module.
format NAME
=
picture line
value list
…
.
This function declares a named sequence of picture
lines (with associated values) for use by the
write
function. If
NAME
is omitted, the name defaults to
STDOUT
, which happens to be the default format
name for the STDOUT
filehandle. Since, like a
sub
declaration, this is a package-global
declaration that happens at compile time, any variables used in the
value list need to be visible at the point of the format's
declaration. That is, lexically scoped variables must be declared
earlier in the file, while dynamically scoped variables merely need
to be set at the time write
is called. Here's an
example (which assumes we've already calculated
$cost
and $quantity
):
my $str = "widget"; # Lexically scoped variable. format Nice_Output = Test: @<<<<<<<< @||||| @>>>>> $str, $%, '$' . int($num) . local $~ = "Nice_Output"; # Select our format. local $num = $cost * $quantity; # Dynamically scoped variable. write;
Like filehandles, format names are identifiers that
exist in a symbol table (package) and may be fully qualified by
package name. Within the typeglobs of a symbol table's entries,
formats reside in their own namespace, which is distinct from
filehandles, directory handles, scalars, arrays, hashes, and
subroutines. Like those other six types, however, a format named
Whatever
would also be affected by a
local
on the *Whatever
typeglob. In other words, a format is just another gadget contained
in a typeglob, independent of the other gadgets.
Section 7.1 in
Chapter 7 contains numerous
details and examples of their use. Chapter 28 describes the internal
format-specific variables, and the English
and
IO::Handle
modules provide easier access to
them.
formlinePICTURE
,LIST
This is an internal function used by
format
s, although you may also call it yourself.
It always returns true. It formats a list of values according to the
contents of PICTURE
, placing the output
into the format output accumulator, $^A
(or
$ACCUMULATOR
if you use the
English
module). Eventually, when a
write
is done, the contents of
$^A
are written to some filehandle, but you could
also read $^A
yourself and then set
$^A
back to "". A format typically does one
formline
per line of form, but the
formline
function itself doesn't care how many
newlines are embedded in the PICTURE
.
This means that the ~
and ~~
tokens will treat the entire PICTURE
as a
single line. You may therefore need to use multiple
formlines
to implement a single record-format,
just as the format compiler does internally.
Be careful if you put double quotes around the
picture, since an @
character may be taken to
mean the beginning of an array name. See "Formats" in Chapter 6 for example uses.
getc FILEHANDLE
getc
This function returns the next byte from the input
file attached to FILEHANDLE
. It returns
undef
at end-of-file, or if an I/O error was
encountered. If FILEHANDLE
is omitted,
the function reads from STDIN
.
This function is somewhat slow, but occasionally useful for single-character (byte, really) input from the keyboard--provided you manage to get your keyboard input unbuffered. This function requests unbuffered input from the standard I/O library. Unfortunately, the standard I/O library is not so standard as to provide a portable way to tell the underlying operating system to supply unbuffered keyboard input to the standard I/O system. To do that, you have to be slightly more clever, and in an operating-system-dependent fashion. Under Unix you might say this:
if ($BSD_STYLE) { system "stty cbreak </dev/tty >/dev/tty 2>&1"; } else { system "stty", "-icanon", "eol", " 01"; } $key = getc; if ($BSD_STYLE) { system "stty -cbreak </dev/tty >/dev/tty 2>&1"; } else { system "stty", "icanon", "eol", "^@"; # ASCII NUL } print " ";
This code puts the next character (byte) typed on the
terminal in the string $key
. If your
stty program has options like
cbreak
, you'll need to use the code where
$BSD_STYLE
is true. Otherwise, you'll need to use
the code where it is false. Determining the options for
stty (1) is left as an exercise to the
reader.
The POSIX
module provides a more
portable version of this using the POSIX::getattr
function. See also the Term::ReadKey
module from
your nearest CPAN site for a more portable and flexible
approach.
getgrent setgrent endgrent
These routines iterate through your
/etc/group file (or maybe someone else's
/etc/group file, if it's coming from a server
somewhere). The return value from getgrent
in
list context is:
($name, $passwd, $gid, $members)
where $members
contains a space-separated
list of the login names of the members of the group. To set up a
hash for translating group names to GIDs, say this:
while (($name, $passwd, $gid) = getgrent) { $gid{$name} = $gid; }
In scalar context, getgrent
returns only the group name. The standard
User::grent
module supports a by-name interface
to this function. See getgrent (3).
getgrgid GID
This function looks up a group file entry by group number. The return value in list context is:
($name, $passwd, $gid, $members)
where $members
contains a space-separated
list of the login names of the members of the group. If you want to
do this repeatedly, consider caching the data in a hash using
getgrent
.
In scalar context, getgrgid
returns only
the group name. The User::grent
module supports a
by-name interface to this function. See
getgrgid (3).
getgrnam NAME
This function looks up a group file entry by group name. The return value in list context is:
($name, $passwd, $gid, $members)
where $members
contains a space-separated
list of the login names of the members of the group. If you want to
do this repeatedly, consider caching the data in a hash using
getgrent
.
In scalar context, getgrnam
returns only
the numeric group ID. The User::grent
module
supports a by-name interface to this function. See
getgrnam (3).
gethostbyaddrADDR
,ADDRTYPE
This function translates addresses into names (and
alternate addresses). ADDR
should be a
packed binary network address, and
ADDRTYPE
should in practice usually be
AF_INET
(from the Socket
module). The return value in list context is:
($name, $aliases, $addrtype, $length, @addrs) = gethostbyaddr($packed_binary_address, $addrtype);
where @addrs
is a list of packed binary
addresses. In the Internet domain, each address is (historically)
four bytes long, and can be unpacked by saying something
like:
($a, $b, $c, $d) = unpack('C4', $addrs[0]);
Alternatively, you can convert directly to dot vector notation
with the v
modifier to
sprintf
:
$dots = sprintf "%vd", $addrs[0];
The inet_ntoa
function from the
Socket
module is useful for producing a printable
version. This approach will become important if and when we all ever
manage to switch over to IPv6.
use Socket; $printable_address = inet_ntoa($addrs[0]);
In scalar context, gethostbyaddr
returns
only the host name.
To produce an ADDR
from a dot
vector, say this:
use Socket; $ipaddr = inet_aton("127.0.0.1"); # localhost $claimed_hostname = gethostbyaddr($ipaddr, AF_INET);
Interestingly, with version 5.6 of Perl you can skip
the inet_aton()
and use the new v-string notation
that was invented for version numbers but happens to work for IP
addresses as well:
$ipaddr = v127.0.0.1;
See the section Section
16.5 in Chapter 16 for
more examples. The Net::hostent
module supports a
by-name interface to this function. See
gethostbyaddr (3).
gethostbyname NAME
This function translates a network hostname to its corresponding addresses (and other names). The return value in list context is:
($name, $aliases, $addrtype, $length, @addrs) = gethostbyname($remote_hostname);
where @addrs
is a list of raw addresses. In
the Internet domain, each address is (historically) four bytes long,
and can be unpacked by saying something like:
($a, $b, $c, $d) = unpack('C4', $addrs[0]);
You can convert directly to vector notation with the
v
modifier to sprintf
:
$dots = sprintf "%vd", $addrs[0];
In scalar context, gethostbyname
returns
only the host address:
use Socket; $ipaddr = gethostbyname($remote_host); printf "%s has address %s ", $remote_host, inet_ntoa($ipaddr);
See Section 16.5
in Chapter 16 for another
approach. The Net::hostent
module supports a
by-name interface to this function. See also
gethostbyname (3).
gethostent
sethostent STAYOPEN
endhostent
These functions iterate through your
/etc/hosts file and return each entry one at a
time. The return value from gethostent
is:
($name, $aliases, $addrtype, $length, @addrs)
where @addrs
is a list of raw addresses. In
the Internet domain, each address is four bytes long, and can be
unpacked by saying something like:
($a, $b, $c, $d) = unpack('C4', $addrs[0]);
Scripts that use gethostent
should
not be considered portable. If a machine uses a name server, it
would have to interrogate most of the Internet to try to satisfy a
request for all the addresses of every machine on the planet. So
gethostent
is unimplemented on such machines. See
gethostent (3) for other
details.
The Net::hostent
module supports a by-name
interface to this function.
getlogin
This function returns the current login name if
found. On Unix systems, this is read from the
utmp (5) file. If it returns false,
use getpwuid
instead. For example:
$login = getlogin() || (getpwuid($<))[0] || "Intruder!!";
getnetbyaddrADDR
,ADDRTYPE
This function translates a network address to the corresponding network name or names. The return value in list context is:
use Socket; ($name, $aliases, $addrtype, $net) = getnetbyaddr(127, AF_INET);
In scalar context, getnetbyaddr
returns
only the network name. The Net::netent
module
supports a by-name interface to this function. See
getnetbyaddr (3).
getnetbyname NAME
This function translates a network name to its corresponding network address. The return value in list context is:
($name, $aliases, $addrtype, $net) = getnetbyname("loopback");
In scalar context, getnetbyname
returns
only the network address. The Net::netent
module
supports a by-name interface to this function. See
getnetbyname (3).
getnetent
setnetent STAYOPEN
endnetent
These functions iterate through your /etc/networks file. The return value in list context is:
($name, $aliases, $addrtype, $net) = getnetent();
In scalar context, getnetent
returns only
the network name. The Net::netent
module supports
a by-name interface to this function. See
getnetent (3).
The concept of network names seems rather quaint these days; most IP addresses are on unnamed (and unnameable) subnets.
getpeername SOCKET
This function returns the packed socket address of
the other end of the SOCKET
connection.
For example:
use Socket; $hersockaddr = getpeername SOCK; ($port, $heraddr) = sockaddr_in($hersockaddr); $herhostname = gethostbyaddr($heraddr, AF_INET); $herstraddr = inet_ntoa($heraddr);
getpgrp PID
This function returns the current process group for
the specified PID
(use a
PID
of 0
for the
current process). Invoking getpgrp
will raise an
exception if used on a machine that doesn't implement
getpgrp (2). If
PID
is omitted, the function returns the
process group of the current process (the same as using a
PID
of 0
). On systems
implementing this operator with the POSIX
getpgrp (2) syscall,
PID
must be omitted or, if supplied, must
be 0
.
getppid
This function returns the process ID of the parent process. On the typical Unix system, if your parent process ID changes to 1, it means your parent process has died and you've been adopted by the init (8) program.
getpriorityWHICH
,WHO
This function returns the current priority for a
process, a process group, or a user. See
getpriority (2). Invoking
getpriority
will raise an exception if used on a
machine that doesn't implement getpriority
(2).
The BSD::Resource
module from CPAN provides
a more convenient interface, including the
PRIO_PROCESS
, PRIO_PGRP
, and
PRIO_USER
symbolic constants to supply for the
WHICH
argument. Although these are
traditionally set to 0
, 1
, and
2
respectively, you really never know what may
happen within the dark confines of C's #include
files.
A value of 0
for
WHO
means the current process, process
group, or user, so to get the priority of the current process,
use:
$curprio = getpriority(0, 0);
getprotobyname NAME
This function translates a protocol name to its corresponding number. The return value in list context is:
($name, $aliases, $protocol_number) = getprotobyname("tcp");
When called in scalar context,
getprotobyname
returns only the protocol number.
The Net::proto
module supports a by-name
interface to this function. See getprotobyname
(3).
getprotobynumber NUMBER
This function translates a protocol number to its corresponding name. The return value in list context is:
($name, $aliases, $protocol_number) = getprotobynumber(6);
When called in scalar context,
getprotobynumber
returns only the protocol name.
The Net::proto
module supports a by-name
interface to this function. See
getprotobynumber (3).
getprotoent
setprotoent STAYOPEN
endprotoent
These functions iterate through the
/etc/protocols file. In list context, the
return value from getprotoent
is:
($name, $aliases, $protocol_number) = getprotoent();
When called in scalar context, getprotoent
returns only the protocol name. The Net::proto
module supports a by-name interface to this function. See
getprotent (3).
getpwent setpwent endpwent
These functions conceptually iterate through your /etc/passwd file, though this may involve the /etc/shadow file if you're the superuser and are using shadow passwords, or NIS (née YP) or NIS+ if you're using either of those. The return value in list context is:
($name,$passwd,$uid,$gid,$quota,$comment,$gcos,$dir,$shell) = getpwent();
Some machines may use the quota and comment fields for other than their named purposes, but the remaining fields will always be the same. To set up a hash for translating login names to UIDs, say this:
while (($name, $passwd, $uid) = getpwent()) { $uid{$name} = $uid; }
In scalar context, getpwent
returns only
the username. The User::pwent
module supports a
by-name interface to this function. See
getpwent (3).
getpwnam NAME
This function translates a username to the corresponding /etc/passwd file entry. The return value in list context is:
($name,$passwd,$uid,$gid,$quota,$comment,$gcos,$dir,$shell) = getpwnam("daemon");
On systems that support shadow passwords, you will have to be the superuser to retrieve the actual password. Your C library should notice that you're suitably empowered and open the /etc/shadow file (or wherever it keeps the shadow file). At least, that's how it's supposed to work. Perl will try to do this if your C library is too stupid to notice.
For repeated lookups, consider caching the data in a hash
using getpwent
.
In scalar context, getpwnam
returns only
the numeric user ID. The User::pwent
module
supports a by-name interface to this function. See
getpwnam (3) and
passwd (5).
getpwuid UID
This function translates a numeric user ID to the corresponding /etc/passwd file entry. The return value in list context is:
($name,$passwd,$uid,$gid,$quota,$comment,$gcos,$dir,$shell) = getpwuid(2);
For repeated lookups, consider caching the data in a hash
using getpwent
.
In scalar context, getpwuid
returns the
username. The User::pwent
module supports a
by-name interface to this function. See
getpwnam (3) and
passwd (5).
getservbynameNAME
,PROTO
This function translates a service (port) name to its
corresponding port number. PROTO
is a
protocol name such as "tcp
". The return value in
list context is:
($name, $aliases, $port_number, $protocol_name) = getservbyname("www", "tcp");
In scalar context, getservbyname
returns
only the service port number. The Net::servent
module supports a by-name interface to this function. See
getservbyname (3).
getservbyportPORT
,PROTO
This function translates a service (port) number to
its corresponding names. PROTO
is a
protocol name such as "tcp
". The return value in
list context is:
($name, $aliases, $port_number, $protocol_name) = getservbyport(80, "tcp");
In scalar context, getservbyport
returns
only the service name. The Net::servent
module
supports a by-name interface to this function. See
getservbyport (3).
getservent
setservent STAYOPEN
endservent
This function iterates through the /etc/services file or its equivalent. The return value in list context is:
($name, $aliases, $port_number, $protocol_name) = getservent();
In scalar context, getservent
returns only
the service port name. The Net::servent
module
supports a by-name interface to this function. See
getservent (3).
getsockname SOCKET
This function returns the packed socket address of
this end of the SOCKET
connection. (And
why wouldn't you know your own address already? Maybe because you
bound an address containing wildcards to the server socket before
doing an accept
and now you need to know what
interface someone used to connect to you. Or you were passed a
socket by your parent process--inetd, for
example.)
use Socket; $mysockaddr = getsockname(SOCK); ($port, $myaddr) = sockaddr_in($mysockaddr); $myname = gethostbyaddr($myaddr,AF_INET); printf "I am %s [%vd] ", $myname, $myaddr;
getsockoptSOCKET
,LEVEL
,OPTNAME
This function returns the socket option requested, or
undef
if there is an error. See
setsockopt
for more information.
glob EXPR
glob
This function returns the value of
EXPR
with filename expansions such as a
shell would do. This is the internal function implementing the
<*>
operator.
For historical reasons, the algorithm matches the
csh (1)'s style of expansion, not the
Bourne shell's. Versions of Perl before the 5.6 release used an
external process, but 5.6 and later perform globs internally. Files
whose first character is a dot (".") are ignored unless this
character is explicitly matched. An asterisk
("*
") matches any sequence of any character
(including none). A question mark ("?
") matches
any one character. A square bracket sequence
("[
...]
") specifies a simple
character class, like "[chy0-9]
". Character
classes may be negated with a circumflex, as in
"*.[^oa]
", which matches any non-dot files whose
names contain a period followed by one character which is neither an
"a" nor an "o" at the end of the name. A tilde
("~
") expands to a home directory, as in
"~/.*rc
" for all the current user's "rc" files,
or "~jane/Mail/*
" for all of Jane's mail files.
Braces may be used for alternation, as in
"~/.{mail,ex,csh,twm,}rc
" to get those particular
rc files.
If you want to glob filenames that might contain whitespace,
you'll need to use the File::Glob
module
directly, since glob
grandfathers the use of
whitespace to separate multiple patterns such as <*.c
*.h>
. For details, see File::Glob
in
Chapter 32. Calling
glob
(or the <*>
operator) automatically use
s that module, so if
the module mysteriously vaporizes from your library, an exception is
raised.
When you call open
, Perl does not expand
wildcards, including tildes. You need to glob
the
result first.
open(MAILRC, "~/.mailrc") # WRONG: tilde is a shell thing or die "can't open ~/.mailrc: $!"; open(MAILRC, (glob("~/.mailrc"))[0]) # expand tilde first or die "can't open ~/.mailrc: $!";
The glob
function is not related to the
Perl notion of typeglobs, other than that they both use a
*
to represent multiple items.
See also Section 2.11.3 of Chapter 2.
gmtime EXPR
gmtime
This function converts a time as returned by the
time
function to a nine-element list with the
time correct for the Greenwich time zone (a.k.a. GMT, or UTC, or
even Zulu in certain cultures, not including the Zulu culture, oddly
enough). It's typically used as follows:
# 0 1 2 3 4 5 6 7 8 ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = gmtime;
If, as in this case, the EXPR
is
omitted, it does gmtime(time())
. The Perl library
module Time::Local
contains a subroutine,
timegm
, that can convert the list back into a
time value.
All list elements are numeric and come straight out of a
struct tm
(that's a C programming
structure--don't sweat it). In particular this means that
$mon
has the range 0..11
with
January as month 0, and $wday
has the range
0..6
with Sunday as day 0
. You
can remember which ones are zero-based because those are the ones
you're always using as subscripts into zero-based arrays containing
month and day names.
For example, to get the current month in London, you might say:
$london_month = (qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec))[(gmtime)[4]];
$year
is the number of years since 1900;
that is, in year 2023, $year
is
123
, not simply
23
. To get the 4-digit year, just say
$year + 1900
. To get the 2-digit year (for
example "01" in 2001), use sprintf("%02d", $year %
100)
.
In scalar context, gmtime
returns a
ctime (3)-like string based on the GMT
time value. The Time::gmtime
module supports a
by-name interface to this function. See also
POSIX::strftime()
for a more fine-grained
approach to formatting times.
This scalar value is not locale dependent
but is instead a Perl built-in. Also see the
Time::Local
module and the
strftime (3) and
mktime (3) functions available via the
POSIX
module. To get somewhat similar but
locale-dependent date strings, set up your locale environment
variables appropriately (please see the
perllocale manpage), and try:
use POSIX qw(strftime); $now_string = strftime "%a %b %e %H:%M:%S %Y", gmtime;
The %a
and %b
escapes,
which represent the short forms of the day of the week and the month
of the year, may not necessarily be three characters wide in all
locales.
gotoLABEL
gotoEXPR
goto&
NAME
goto
LABEL
finds the statement labeled with
LABEL
and resumes execution there. If the
LABEL
cannot be found, an exception is
raised. It cannot be used to go into any construct that requires
initialization, such as a subroutine or a foreach
loop. It also can't be used to go into a construct that is optimized
away. It can be used to go almost anywhere else within the dynamic
scope,[4] including out of subroutines, but for that purpose
it's usually better to use some other construct such as
last
or die
. The author of
Perl has never felt the need to use this form of
goto
(in Perl, that is--C is another
matter).
Going to even greater heights of orthogonality (and depths of
idiocy), Perl allows goto
EXPR
, which expects
EXPR
to evaluate to a label name, whose
location is guaranteed to be unresolvable until
run time since the label is unknown when the statement is compiled.
This allows for computed goto
s per FORTRAN, but
isn't necessarily recommended[5] if you're optimizing for maintainability:
goto +("FOO", "BAR", "GLARCH")[$i];
The unrelated goto
&
NAME
is highly
magical, substituting a call to the named subroutine for the
currently running subroutine. This construct may be used without
shame by AUTOLOAD
subroutines that wish to load
another subroutine and then pretend that this new subroutine--and
not the original one--had been called in the first place (except
that any modifications to @_
in the original
subroutine are propagated to the replacement subroutine). After the
goto
, not even caller
will be
able to tell that the original AUTOLOAD
routine
was called first.
grepEXPR
,LIST
grepBLOCK
LIST
This function evaluates
EXPR
or BLOCK
in Boolean context for each element of
LIST
, temporarily setting
$_
to each element in turn, much like the
foreach
construct. In list context, it returns a
list of those elements for which the expression is true. (The
operator is named after a beloved Unix program that extracts lines
out of a file that match a particular pattern. In Perl, the
expression is often a pattern, but doesn't have to be.) In scalar
context, grep
returns the number of times the
expression was true.
If @all_lines
contains lines of code, this
example weeds out comment lines:
@code_lines = grep !/^s*#/, @all_lines;
Because $_
is an implicit alias to each
list value, altering $_
will modify the elements
of the original list. While this is useful and supported, it can
occasionally cause bizarre results if you aren't expecting it. For
example:
@list = qw(barney fred dino wilma); @greplist = grep { s/^[bfd]// } @list;
@greplist
is now
"arney
", "red
",
"ino
", but @list
is now
"arney
", "red
",
"ino
", "wilma
"! Ergo, Caveat
Programmor.
See also map
. The following two statements
are functionally equivalent:
@out = grep {EXPR
} @in; @out = map {EXPR
? $_ : () } @in
hex EXPR
hex
This function interprets
EXPR
as a hexadecimal string and returns
the equivalent decimal value. A leading "0x
" is
ignored, if present. To interpret strings that might start with any
of 0
, 0b
, or
0x
, see oct
. The following
code sets $number
to 4,294,906,560:
$number = hex("ffff12c0");
To do the inverse function, use
sprintf
:
sprintf "%lx", $number; # (That's an ell, not a one.)
Hex strings may only represent integers. Strings that would cause integer overflow trigger a warning.
importCLASSNAME
LIST
importCLASSNAME
There is no built-in import
function. It is merely an ordinary class method defined (or
inherited) by modules that wish to export names to another module
through the use
operator. See
use
for details.
indexSTR
,SUBSTR
,OFFSET
indexSTR
,SUBSTR
This function searches for one string within another.
It returns the position of the first occurrence of
SUBSTR
in STR
.
The OFFSET
, if specified, says how many
characters from the start to skip before beginning to look.
Positions are based at 0 (or whatever you've set the subscript base
$[
variable to--but don't do that). If the
substring is not found, the function returns one less than the base,
ordinarily -1
. To work your way through a string,
you might say:
$pos = -1; while (($pos = index($string, $lookfor, $pos)) > -1) { print "Found at $pos "; $pos++; }
int EXPR
int
This function returns the integer portion of
EXPR
. If you're a C programmer, you're
apt to forget to use int
in conjunction with
division, which is a floating-point operation in Perl:
$average_age = 939/16; # yields 58.6875 (58 in C) $average_age = int 939/16; # yields 58
You should not use this function for generic rounding, because
it truncates towards 0 and because machine representations of
floating-point numbers can sometimes produce counterintuitive
results. For example,
int(-6.725/0.025)
produces
-268
rather than the correct
-269
; that's because the value is really more
like -268.99999999999994315658
. Usually, the
sprintf
, printf
, or the
POSIX::floor
and POSIX::ceil
functions will serve you better than will
int
.
$n = sprintf("%.0f", $f); # round (not trunc) to nearest integer
ioctlFILEHANDLE
,FUNCTION
,SCALAR
This function implements the ioctl (2) syscall which controls I/O. To get the correct function definitions, first you'll probably have to say:
require "sys/ioctl.ph"; # perhaps /usr/local/lib/perl/sys/ioctl.ph
If sys/ioctl.ph doesn't exist or doesn't
have the correct definitions, you'll have to roll your own based on
your C header files such as sys/ioctl.h. (The
Perl distribution includes a script called h2ph
to help you do this, but running it is nontrivial.)
SCALAR
will be read or written (or both)
depending on the FUNCTION
--a pointer to
the string value of SCALAR
will be passed
as the third argument of the actual ioctl
(2) call. (If SCALAR
has no
string value but does have a numeric value, that value will be
passed directly rather than a pointer to the string value.) The
pack
and unpack
functions are
useful for manipulating the values of structures used by
ioctl
. The following example determines how many
bytes are available for reading using the
FIONREAD
ioctl
:
require 'sys/ioctl.ph'; $size = pack("L", 0); ioctl(FH, FIONREAD(), $size) or die "Couldn't call ioctl: $! "; $size = unpack("L", $size);
If h2ph wasn't installed or doesn't work for you, you can grep the include files by hand or write a small C program to print out the value.
The return value of ioctl
(and
fcntl
) is as follows:
Thus Perl returns true on success and false on failure, yet you can still easily determine the actual value returned by the operating system:
$retval = ioctl(…) || -1; printf "ioctl actually returned %d ", $retval;
The special string "0 but true
" is exempt
from -w
complaints about improper numeric
conversions.
Calls to ioctl
should not be considered
portable. If, say, you're merely turning off echo once for the whole
script, it's more portable to say:
system "stty -echo"; # Works on most Unix boxen.
Just because you can do something in Perl doesn't mean you ought to. To quote the Apostle Paul, "Everything is permissible--but not everything is beneficial."
For still better portability, you might look at the
Term::ReadKey
module from CPAN.
joinEXPR
,LIST
This function joins the separate strings of
LIST
into a single string with fields
separated by the value of EXPR
, and
returns the string. For example:
$rec = join ':', $login,$passwd,$uid,$gid,$gcos,$home,$shell;
To do the opposite, see split
. To join
things together into fixed-position fields, see
pack
. The most efficient way to concatenate many
strings together is to join
them with a null
string:
$string = join "", @array;
Unlike split
, join
doesn't take a pattern as its first argument, and will produce a
warning if you try.
keys HASH
This function returns a list consisting of all the
keys of the indicated HASH
. The keys are
returned in an apparently random order, but it is the same order
produced by either the values
or
each
function (assuming the hash has not been
modified between calls). As a side effect, it resets
HASH
's iterator. Here is a (rather
cork-brained) way to print your environment:
@keys = keys %ENV; # keys are in the same order as @values = values %ENV; # values, as this demonstrates while (@keys) { print pop(@keys), '=', pop(@values), " "; }
You're more likely to want to see the environment sorted by keys:
foreach $key (sort keys %ENV) { print $key, '=', $ENV{$key}, " "; }
You can sort the values of a hash directly, but
that's somewhat useless in the absence of any way to map the values
back to the keys. To sort a hash by value, you generally need to
sort the keys
by providing a comparison function
that accesses the values based on the keys. Here's a descending
numeric sort of a hash by its values:
foreach $key (sort { $hash{$b} <=> $hash{$a} } keys %hash) { printf "%4d %s ", $hash{$key}, $key; }
Using keys
on a hash bound to a largish DBM
file will produce a largish list, causing you to have a largish
process. You might prefer to use the each
function here, which will iterate over the hash entries one by one
without slurping them all into a single gargantuan list.
In scalar context, keys
returns the number
of elements of the hash (and resets the each
iterator). However, to get this information for tied hashes,
including DBM files, Perl must walk the entire hash, so it's not
efficient then. Calling keys
in a void context
helps with that.
Used as an lvalue, keys
increases the
number of hash buckets allocated for the given hash. (This is
similar to pre-extending an array by assigning a larger number to
$#array
.) Pre-extending your hash can gain a
measure of efficiency if you happen to know the hash is going to get
big, and how big it's going to get. If you say:
keys %hash = 1000;
then %hash
will have at least 1000 buckets
allocated for it (you get 1024 buckets, in fact, since it rounds up
to the next power of two). You can't shrink the number of buckets
allocated for the hash using keys
in this way
(but you needn't worry about doing this by accident, as trying has
no effect). The buckets will be retained even if you do
%hash = ()
. Use undef %hash
if
you want to free the storage while %hash
is still
in scope.
See also each
, values
,
and sort
.
killSIGNAL
,LIST
This function sends a signal to a list of processes.
For SIGNAL
, you may use either an integer
or a quoted signal name (without a "SIG
" on the
front). Trying to use an unrecognized
SIGNAL
name raises an exception. The
function returns the number of processes successfully signalled. If
SIGNAL
is negative, the function kills
process groups instead of processes. (On SysV, a negative process
number will also kill process groups, but that's not portable.) A
PID of zero sends the signal to all processes of the same group ID
as the sender. For example:
$cnt = kill 1, $child1, $child2; kill 9, @goners; kill 'STOP', getppid # Can *so* suspend my login shell… unless getppid == 1; # (But don't taunt init(8).)
A SIGNAL
of 0
tests whether a process is still alive and that you still have
permission to signal it. No signal is sent. This way you can check
whether the process is still alive and hasn't changed its
UID.
use Errno qw(ESRCH EPERM); if (kill 0 => $minion) { print "$minion is alive! "; } elsif ($! == EPERM) { # changed UID print "$minion has escaped my control! "; } elsif ($! == ESRCH) { print "$minion is deceased. "; # or zombied } else { warn "Odd; I couldn't check on the status of $minion: $! "; }
See Section 16.1 in Chapter 16.
last LABEL
last
The last
operator immediately
exits the loop in question, just like the break
statement in C or Java (as used in loops). If the
LABEL
is omitted, the operator refers to
the innermost enclosing loop. The continue
block,
if any, is not executed.
LINE: while (<MAILMSG>) { last LINE if /^$/; # exit when done with header # rest of loop here }
last
cannot be used to exit a block which
returns a value, such as eval {}
, sub
{}
, or do {}
, and should not be used to
exit a grep
or map
operation.
With warnings enabled, Perl will warn you if you
last
out of a loop that's not in your current
lexical scope, such as a loop in a calling subroutine.
A block by itself is semantically identical to a loop
that executes once. Thus last
can be used to
effect an early exit out of such a block.
See also Chapter 4 for
illustrations of how last
,
next
, redo
, and
continue
work.
lc EXPR
lc
This function returns a lowercased version of
EXPR
. This is the internal function
implementing the L
escape in double-quoted
strings. Your current LC_CTYPE
locale is
respected if use locale
is in effect, though how
locales interact with Unicode is still a topic of ongoing research,
as they say. See the perllocale manpage for the
most recent results.
lcfirst EXPR
lcfirst
This function returns a version of
EXPR
with the first character lowercased.
This is the internal function implementing the l
escape in double-quoted strings. Your current
LC_CTYPE
locale is respected if you use
locale
and if we figure out how that relates to
Unicode.
length EXPR
length
This function returns the length in characters of the
scalar value EXPR
. If
EXPR
is omitted, it returns the length of
$_
. (But be careful that the next thing doesn't
look like the start of an EXPR
, or Perl's
lexer will get confused. For example, length <
10
won't compile. When in doubt, use parentheses.)
Do not try to use length
to find the size
of an array or hash. Use scalar @array
for the
size of an array, and scalar keys %hash
for the
number of key/value pairs in a hash. (The scalar
is typically omitted when redundant.)
To find the length of a string in bytes rather than characters, say:
$blen = do { use bytes; length $string; };
or:
$blen = bytes::length($string); # must use bytes first
linkOLDFILE
,NEWFILE
This function creates a new filename linked to the
old filename. The function returns true for success, false
otherwise. See also symlink
later in this
chapter. This function is unlikely to be implemented on
non-Unix-style filesystems.
listenSOCKET
,QUEUESIZE
This function tells the system that you're going to
be accepting connections on this SOCKET
and that the system can queue the number of waiting connections
specified by QUEUESIZE
. Imagine having
call-waiting on your phone, with up to 17 callers queued. (Gives me
the willies!) The function returns true if it succeeded, false
otherwise.
use Socket; listen(PROTOSOCK, SOMAXCONN) or die "cannot set listen queue on PROTOSOCK: $!";
See accept
. See also Section 16.5 in Chapter 16. See
listen (2).
local EXPR
This operator does not create a local variable; use
my
for that. Instead, it localizes existing
variables; that is, it causes one or more global variables to have
locally scoped values within the innermost enclosing block,
eval
, or file. If more than one variable is
listed, the list must be placed in parentheses because the operator
binds more tightly than commas. All listed variables must be legal
lvalues, that is, something you could assign to; this can include
individual elements of arrays or hashes.
This operator works by saving the current values of the
specified variables on a hidden stack and restoring them upon
exiting the block, subroutine, eval
, or file.
After the local
is executed, but before the scope
is exited, any subroutines and executed formats will see the local,
inner value, instead of the previous, outer value because the
variable is still a global variable, despite having a localized
value. The technical term for this is "dynamic scoping". See Section 4.8 in Chapter 4.
The EXPR
may be assigned to if
desired, which allows you to initialize your variables as you
localize them. If no initializer is given, all scalars are
initialized to undef
, and all arrays and hashes
to ()
. As with ordinary assignment, if you use
parentheses around the variables on the left (or if the variable is
an array or hash), the expression on the right is evaluated in list
context. Otherwise, the expression on the right is evaluated in
scalar context.
In any event, the expression on the right is evaluated before the localization, but the initialization happens after localization, so you can initialize a localized variable with its nonlocalized value. For instance, this code demonstrates how to make a temporary modification to a global array:
if ($sw eq '-v') { # init local array with global array local @ARGV = @ARGV; unshift @ARGV, 'echo'; system @ARGV; } # @ARGV restored
You can also temporarily modify global hashes:
# temporarily add a couple of entries to the %digits hash if ($base12) { # (NOTE: We're not claiming this is efficient!) local(%digits) = (%digits, T => 10, E => 11); parse_num(); }
You can use local
to give
temporary values to individual elements of arrays and hashes, even
lexically scoped ones:
if ($protected) { local $SIG{INT} = 'IGNORE'; precious(); # no interrupts during this function } # previous handler (if any) restored
You can also use local
on
typeglobs to create local filehandles without loading any bulky
object modules:
local *MOTD; # protect any global MOTD handle my $fh = do { local *FH }; # create new indirect filehandle
(As of the 5.6 release of Perl, a plain my
$fh;
is good enough, because if you give an undefined
variable where a real filehandle is expected, like the first
argument to open
or socket
,
Perl now autovivifies a brand new filehandle for you.)
But in general, you usually want to use my
instead of local
, because
local
isn't really what most people think of as
"local", or even "lo-cal". See my
.
localtime EXPR
localtime
This function converts the value returned by
time
to a nine-element list with the time
corrected for the local time zone. It's typically used as
follows:
# 0 1 2 3 4 5 6 7 8 ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime;
If, as in this case, EXPR
is
omitted, it does localtime(time())
.
All list elements are numeric and come straight out of a
struct tm
. (That's a bit of C programming
lingo--don't worry about it.) In particular, this means that
$mon
has the range 0..11
with
January as month 0, and $wday
has the range
0..6
with Sunday as day 0
. You
can remember which ones are zero-based because those are the ones
you're always using as subscripts into zero-based arrays containing
month and day names.
For example, to get the name of the current day of the week:
$thisday = (Sun,Mon,Tue,Wed,Thu,Fri,Sat)[(localtime)[6]];
$year
is the number of years since 1900,
that is, in year 2023, $year
is
123
, not simply
23
. To get the 4-digit year, just say
$year + 1900
. To get the 2-digit year (for
example "01" in 2001), use sprintf("%02d", $year %
100)
.
The Perl library module
Time::Local
contains a subroutine,
timelocal
, that can convert in the opposite
direction.
In scalar context, localtime
returns a ctime (3)-like string. For
example, the date (1) command can be
(almost)[6] emulated with:
perl -le 'print scalar localtime'
See also the standard POSIX
module's strftime
function for a more
fine-grained approach to formatting times. The
Time::localtime
module supports a by-name
interface to this function.
lock THING
The lock
function places a lock on
a variable, subroutine, or object referenced by
THING
until the lock goes out of scope.
For backward compatibility, this function is a built-in only if your
version of Perl was compiled with threading enabled, and if you've
said use Threads
. Otherwise, Perl will assume
this is a user-defined function. See Chapter 17.
log EXPR
log
This function returns the natural logarithm (that is,
base e) of EXPR
. If
EXPR
is negative, it raises an exception.
To get the log of another base, use basic algebra: the
base-N log of a number is equal to the natural
log of that number divided by the natural log of
N. For example:
sub log10 { my $n = shift; return log($n)/log(10); }
For the inverse of log
, see
exp
.
lstat EXPR
lstat
This function does the same thing as Perl's
stat
function (including setting the special
_
filehandle), but if the last component of the
filename is a symbolic link, it stat
s the
symbolic link itself instead of the file that the symbolic link
points to. (If symbolic links are unimplemented on your system, a
normal stat
is done instead.)
/PATTERN
/ m/PATTERN
/
This is the match operator, which interprets
PATTERN
as a regular expression. The
operator is parsed as a double-quoted string rather than as a
function. See Chapter 5.
mapBLOCK
LIST
mapEXPR
,LIST
This function evaluates the
BLOCK
or EXPR
for each element of LIST
(locally setting
$_
to each element) and returns the list
comprising the results of each such evaluation. It evaluates
BLOCK
or EXPR
in list context, so each element of LIST
may map to zero, one, or more elements in the returned value. These
are all flattened into one list. For instance:
@words = map { split ' ' } @lines;
splits a list of lines into a list of words. But often there is a one-to-one mapping between input values and output values:
@chars = map chr, @nums;
translates a list of numbers to the corresponding characters. And here's an example of a one-to-two mapping:
%hash = map { genkey($_) => $_ } @array;
which is just a funny functional way to write this:
%hash = (); foreach $_ (@array) { $hash{genkey($_)} = $_; }
Because $_
is an alias (implicit reference)
into the list's values, this variable can be used to modify the
elements of the array. This is useful and supported, although it can
cause bizarre results if the LIST
is not
a named array. Using a regular foreach
loop for this purpose may be clearer. See also
grep
; map
differs from
grep
in that map
returns a
list consisting of the results of each successive evaluation of
EXPR
, whereas grep
returns a list consisting of each value of
LIST
for which
EXPR
evaluates to true.
mkdirFILENAME
,MASK
mkdirFILENAME
This function creates the directory specified by
FILENAME
, giving it permissions specified
by the numeric MASK
as modified by the
current umask
. If the operation succeeds, it
returns true; otherwise, it returns false.
If MASK
is omitted, a mask of
0777
is assumed, which is almost always what you
want anyway. In general, creating directories with permissive
MASK
s (like 0777
) and
letting the user modify that with their umask
is
better than supplying a restrictive MASK
and giving the user no way to be more permissive. The exception to
this rule is when the file or directory should be kept private (mail
files, for instance). See umask
.
If the mkdir (2) syscall is not built into your C library, Perl emulates it by calling the mkdir (1) program for each directory. If you are creating a long list of directories on such a system, it'll be more efficient to call the mkdir program yourself with the list of directories than it is to start zillions of subprocesses.
msgctlID
,CMD
,ARG
This function calls the System V IPC
msgctl (2) syscall; see
msgctl (2) for more details. You may
have to use
IPC::SysV
first to
get the correct constant definitions. If
CMD
is IPC_STAT
, then
ARG
must be a variable that will hold the
returned msqid_ds
C structure. Return values are
like ioctl
and fcntl
:
undef
for error, "0 but true
"
for zero, or the actual return value otherwise.
This function is available only on machines supporting System V IPC, which turns out to be far fewer than those supporting sockets.
msggetKEY
,FLAGS
This function calls the System V IPC
msgget (2) syscall. See
msgget (2) for details. The function
returns the message queue ID, or undef
if there
is an error. Before calling, you should use
IPC::SysV
.
This function is available only on machines supporting System V IPC.
msgrcvID
,VAR
,SIZE
,TYPE
,FLAGS
This function calls the msgrcv
(2) syscall to receive a message from message queue
ID
into variable
VAR
with a maximum message size of
SIZE
. See msgrcv
(2) for details. When a message is received, the message
type will be the first thing in VAR
, and
the maximum length of VAR
is
SIZE
plus the size of the message type.
The function returns true if successful, or false if there is an
error. Before calling, you should use
IPC::SysV
.
This function is available only on machines supporting System V IPC.
msgsndID
,MSG
,FLAGS
This function calls the msgsnd
(2) syscall to send the message
MSG
to the message queue
ID
. See msgsnd
(2) for details. MSG
must begin
with the long integer message type. You can create a message like
this:
$msg = pack "L a*", $type, $text_of_message;
The function returns true if successful, or false if there is
an error. Before calling, use
IPC::SysV
.
This function is available only on machines supporting System V IPC.
myTYPE EXPR
:ATTRIBUTES
myEXPR
:ATTRIBUTES
myTYPE EXPR
myEXPR
This operator declares one or more private variables
to exist only within the innermost enclosing block, subroutine,
eval
, or file. If more than one variable is
listed, the list must be placed in parentheses because the operator
binds more tightly than commas. Only simple scalars or complete
arrays and hashes may be declared this way.
The variable name cannot be package qualified, because package
variables are all globally accessible through their corresponding
symbol table, and lexical variables are unrelated to any symbol
table. Unlike local
, then, this operator has
nothing to do with global variables, other than hiding any other
variable of the same name from view within its scope (that is, where
the private variable exists). A global variable can always be
accessed through its package-qualified form, however, or through a
symbolic reference.
A private variable's scope does not start until the statement after its declaration. The variable's scope extends into any enclosed blocks thereafter, up to the end of the scope of the variable itself.
However, this means that any subroutines you call from within the scope of a private variable cannot see the private variable unless the block that defines the subroutine itself is also textually enclosed within the scope of that variable. That sounds complicated, but it's not once you get the hang of it. The technical term for this is lexical scoping, so we often call these lexical variables. In C culture, they're sometimes called "auto" variables, since they're automatically allocated and deallocated at scope entry and exit.
The EXPR
may be assigned to if
desired, which allows you to initialize your lexical variables. (If
no initializer is given, all scalars are initialized to the
undefined value and all arrays and hashes to the empty list.) As
with ordinary assignment, if you use parentheses around the
variables on the left (or if the variable is an array or hash), the
expression on the right is evaluated in list context. Otherwise, the
expression on the right is evaluated in scalar context. For example,
you can name your formal subroutine parameters with a list
assignment, like this:
my ($friends, $romans, $countrymen) = @_;
But be careful not to omit the parentheses indicating list assignment, like this:
my $country = @_; # right or wrong?
This assigns the length of the array (that is, the number of
the subroutine's arguments) to the variable, since the array is
being evaluated in scalar context. You can profitably use scalar
assignment for a formal parameter though, as long as you use the
shift
operator. In fact, since object methods are
passed the object as the first argument, many method subroutines
start off by "stealing" the first argument:
sub simple_as { my $self = shift; # scalar assignment my ($a,$b,$c) = @_; # list assignment … }
If you attempt to declare a lexically scoped
subroutine with my sub
, Perl will die with the
message that this feature has not been implemented yet. (Unless, of
course, this feature has been implemented
yet.)
The TYPE
and
ATTRIBUTES
are optional, which is just as
well, since they're both considered experimental. Here's what a
declaration that uses them might look like:
my Dog $spot :ears(short) :tail(long);
The TYPE
, if specified,
indicates what kind of scalar or scalars are declared in
EXPR
, either directly as one or more
scalar variables, or indirectly through an array or hash. If
TYPE
is the name of the class, the
scalars will be assumed to contain references to objects of that
type, or to objects compatible with that type. In particular,
derived classes are considered compatible. That is, assuming
Collie
is derived from Dog
,
you might declare:
my Dog $lassie = new Collie;
Your declaration claims that you will use the
$lassie
object consistently with its being a
Dog
object. The fact that it's actually a
Collie
object shouldn't matter as long as you
only try to do Dog
things. Through the magic of
virtual methods, the implementation of those Dog
methods might well be in the Collie
class, but
the declaration above is only talking about the interface, not the
implementation. In theory.
Interestingly, up through version 5.6.0, the only
time Perl pays attention to the TYPE
declaration is when the corresponding class has declared fields with
the use fields
pragma. Together, these
declarations allow the pseudohash implementation of a class to "show
through" to code outside the class, so that hash lookups can be
optimized by the compiler into array lookups. In a sense, the
pseudohash is the interface to such a class, so
our theory remains intact, if a bit battered. For more on
pseudohashes, see Section
8.3.5 in Chapter
8.
In the future, other types of classes may interpret the
TYPE
differently. The
TYPE
declaration should be considered a
generic type interface that might someday be instantiated in various
ways depending on the class. In fact, the
TYPE
might not even be an official class
name. We're reserving the lowercase type names for Perl, because one
of the ways we'd like to extend the type interface is to allow
optional low-level type declarations such as int
,
num
, str
, and
ref
. These declarations will not be for the
purpose of strong typing; rather, they'll be hints to the compiler
telling it to optimize the storage of the variable with the
assumption that the variable will be used mostly as declared. The
semantics of scalars will stay pretty much the same—you'll still be
able to add two str
scalars, or print an
int
scalar, just as though they were the ordinary
polymorphic scalars you're familiar with. But with an
int
declaration Perl might decide to store only
the integer value and forget about caching the resulting string as
it currently does. Loops with int
loop variables
might run faster, particularly in code compiled down to C. In
particular, arrays of numbers could be stored much more compactly.
As a limiting case, the built-in vec
function
might even become obsolete when we can write declarations such
as:
my bit @bitstring;
The ATTRIBUTES
declaration
is even more experimental. We haven't done much more than reserve
the syntax and prototype the internal interface; see the
use attributes
pragma in Glossary for more on that. The first
attribute we'll implement is likely to be
constant
:
my num $PI : constant = atan2(1,1) * 4;
But there are many other possibilities, such as establishing default values for arrays and hashes, or letting variables be shared among cooperating interpreters. Like the type interface, the attribute interface should be considered a generic interface, a kind of workbench for inventing new syntax and semantics. We do not know how Perl will evolve in the next 10 years. We only know that we can make it easier on ourselves by planning for that in advance.
See also local
, our
, and
Section 4.8 in Chapter 4.
newCLASSNAME
LIST
newCLASSNAME
There is no built-in new
function.
It is merely an ordinary constructor method (that is, a user-defined
subroutine) that is defined or inherited by the
CLASSNAME
class (that is, package) to let
you construct objects of type CLASSNAME
.
Many constructors are named "new", but only by convention, just to
trick C++ programmers into thinking they know what's going on.
Always read the documentation of the class in question so you know
how to call its constructors; for example, the constructor that
creates a list box in the Tk widget set is just called
Listbox()
. See Chapter 12.
next LABEL
next
The next
operator is like the
continue
statement in C: it starts the next
iteration of the loop designated by
LABEL
:
LINE: while (<STDIN>) { next LINE if /^#/; # discard comments … }
If there were a continue
block in this
example, it would be executed immediately following the invocation
of next
. When LABEL
is
omitted, the operator refers to the innermost enclosing loop.
A block by itself is semantically identical to a loop that
executes once. Thus, next
will exit such a block
early (via the continue
block, if there is
one).
next
cannot be used to exit a block that
returns a value, such as eval {}
, sub
{}
, or do {}
, and should not be used to
exit a grep
or map
operation.
With warnings enabled, Perl will warn you if you
next
out of a loop not in your current lexical
scope, such as a loop in a calling subroutine. See Section 4.4 the section in
Chapter 4.
noMODULE
LIST
See the use
operator, which is the
opposite of no
, kind of. Most standard modules do
not unimport anything, making no
a no-op, as it
were. The pragmatic modules tend to be more obliging here. If the
MODULE
cannot be found, an exception is
raised.
oct EXPR
oct
This function interprets
EXPR
as an octal string and returns the
equivalent decimal value. If EXPR
happens
to start with "0x
", it is interpreted as a
hexadecimal string instead. If EXPR
starts off with "0b
", it is interpreted as a
string of binary digits. The following will properly convert to
numbers any input strings in decimal, binary, octal, and hex bases
written in standard C or C++ notation:
$val = oct $val if $val =~ /^0/;
To perform the inverse function, use
sprintf
with an appropriate format:
$perms = (stat("filename"))[2] & 07777; $oct_perms = sprintf "%lo", $perms;
The oct
function is commonly used when a
data string such as "644
" needs to be converted
into a file mode, for example. Although Perl will automatically
convert strings into numbers as needed, this automatic conversion
assumes base 10.
openFILEHANDLE
,MODE
,LIST
openFILEHANDLE
,EXPR
openFILEHANDLE
The open
function associates an
internal FILEHANDLE
with an external file
specification given by EXPR
or
LIST
. It may be called with one, two, or
three arguments (or more if the third argument is a command, and
you're running at least version 5.6.1 of Perl). If three or more
arguments are present, the second argument specifies the access
MODE
in which the file should be opened,
and the third argument (LIST
) supplies
the actual filename or the command to execute, depending on the
mode. In the case of a command, additional arguments may be supplied
if you wish to invoke the command directly without involving a
shell, much like system
or
exec
. Or the command may be supplied as a single
argument (the third one), in which case the decision to invoke the
shell depends on whether the command contains shell metacharacters.
(Don't use more than three arguments if the arguments are ordinary
filenames; it won't work.) If the MODE
is
not recognized, open
raises an exception.
If only two arguments are present, the mode and filename/command are assumed to be combined in the second argument. (And if you don't specify a mode in the second argument, just a filename, then the file is opened read-only to be on the safe side.)
With only one argument, the package scalar variable of the
same name as the FILEHANDLE
must contain
the filename and optional mode:
$LOG = ">logfile"; # $LOG must not be declared my! open LOG or die "Can't open logfile: $!";
But don't do that. It's not stylin'. Forget we mentioned it.
The open
function returns true when it
succeeds and undef
otherwise. If the
open
starts up a pipe to a child process, the
return value will be the process ID of that new process. As with any
syscall, always check the return value of open
to
make sure it worked. But this isn't C or Java, so don't use an
if
statement when the or
operator will do. You can also use ||
, but if you
do, use parentheses on the open
. If you choose to
omit parentheses on the function call to turn it into a list
operator, be careful to use "or die
" after the
list rather than "|| die
", because the precedence
of ||
is higher than list operators like
open
, and the ||
will bind to
your last argument, not the whole open
:
open LOG, ">logfile" || die "Can't create logfile: $!"; # WRONG open LOG, ">logfile" or die "Can't create logfile: $!"; # ok
That looks rather intense, but typically you'd introduce some whitespace to tell your eye where the list operator ends:
open LOG, ">logfile" or die "Can't create logfile: $!";
As that example shows, the
FILEHANDLE
argument is often just a
simple identifier (normally uppercase), but it may also be an
expression whose value provides a reference to the actual
filehandle. (The reference may be either a symbolic reference to the
filehandle name or a hard reference to any object that can be
interpreted as a filehandle.) This is called an indirect
filehandle, and any function that takes a
FILEHANDLE
as its first argument can
handle indirect filehandles as well as direct ones. But
open
is special in that if you supply it with an
undefined variable for the indirect filehandle, Perl will
automatically define that variable for you, that is, autovivifying
it to contain a proper filehandle reference. One advantage of this
is that the filehandle will be closed automatically when there are
no further references to it, typically when the variable goes out of
scope:
{ my $fh; # (uninitialized) open($fh, ">logfile") # $fh is autovivified or die "Can't create logfile: $!"; … # do stuff with $fh } # $fh closed here
The my
$fh
declaration can be readably incorporated into
the open
:
open my $fh, ">logfile" or die …
The >
symbol
you've been seeing in front of the filename is an example of a mode.
Historically, the two-argument form of open
came
first. The recent addition of the three-argument form lets you
separate the mode from the filename, which has the advantage of
avoiding any possible confusion between the two. In the following
example, we know that the user is not trying to open a filename that
happens to start with ">
". We can be sure that
they're specifying a MODE
of
">
", which opens the file named in
EXPR
for writing, creating the file if it
doesn't exist and truncating the file down to nothing if it already
exists:
open(LOG, ">", "logfile") or die "Can't create logfile: $!";
In the shorter forms, the filename and mode are in the same string. The string is parsed much as the typical shell processes file and pipe redirections. First, any leading and trailing whitespace is removed from the string. Then the string is examined, on either end if need be, for characters specifying how the file is to be opened. Whitespace is allowed between the mode and the filename.
The modes that indicate how to open a file are shell-like
redirection symbols. A list of these symbols is provided in Table 29.1. (To access a file
with combinations of open modes not covered by this table, see the
low-level sysopen
function.)
Table 29-1. Modes for open
Read | Write | Append | Create | Clobber | |
---|---|---|---|---|---|
Mode | Access | Access | Only | Nonexisting | Existing |
<
PATH | Y | N | N | N | N |
>
PATH | N | Y | N | Y | Y |
>>
PATH | N | Y | Y | Y | N |
+<
PATH | Y | Y | N | N | N |
+>
PATH | Y | Y | N | Y | Y |
+>>
PATH | Y | Y | Y | Y | N |
|
COMMAND | N | Y | n/a | n/a | n/a |
COMMAND
| | Y | N | n/a | n/a | n/a |
If the mode is "<
" or nothing,
an existing file is opened for input. If the mode is
">
", the file is opened for output, which
truncates existing files and creates nonexistent ones. If the mode
is ">>
", the file is created if needed and
opened for appending, and all output is automatically placed at the
end of the file. If a new file is created because you used a mode of
">
" or ">>
" and the
file did not previously exist, access permissions will depend on the
process's current umask
under the rules described
for that function.
Here are common examples:
open(INFO, "datafile") || die("can't open datafile: $!"); open(INFO, "< datafile") || die("can't open datafile: $!"); open(RESULTS, "> runstats") || die("can't open runstats: $!"); open(LOG, ">> logfile ") || die("can't open logfile: $!");
If you prefer the low-punctuation version, you can write:
open INFO, "datafile" or die "can't open datafile: $!"; open INFO, "< datafile" or die "can't open datafile: $!"; open RESULTS, "> runstats" or die "can't open runstats: $!"; open LOG, ">> logfile " or die "can't open logfile: $!";
When opened for reading, the special filename
"-
" refers to STDIN
. When
opened for writing, the same special filename refers to
STDOUT
. Normally, these are specified as
"<-
" and ">-
",
respectively.
open(INPUT, "-" ) or die; # re-open standard input for reading open(INPUT, "<-") or die; # same thing, but explicit open(OUTPUT, ">-") or die; # re-open standard output for writing
This way the user can supply a program with a filename that will use the standard input or the standard output, but the author of the program doesn't have to write special code to know about this.
You may also place a "+
" in front
of any of these three modes to request simultaneous read and write.
However, whether the file is clobbered or created and whether it
must already exist is still governed by your choice of less-than or
greater-than signs. This means that "+<
" is
almost always preferred for read/write updates, as the dubious
"+>
" mode would first clobber the file before
you could ever read anything from it. (Use that mode only if you
want to reread only what you just wrote.)
open(DBASE, "+< database") or die "can't open existing database in update mode: $!";
You can treat a file opened for update as a random-access
database and use seek
to move to a particular
byte number, but the variable-length records of regular text files
usually make it impractical to use read-write mode to update such
files. See the -i
command-line option in
Chapter 19 for a different
approach to updating.
If the leading character in EXPR
is
a pipe symbol, open
fires up a new process and
connects a write-only filehandle to the command. This way you can
write into that handle and what you write will show up on that
command's standard input. For example:
open(PRINTER, "| lpr -Plp1") or die "can't fork: $!"; print PRINTER "stuff "; close(PRINTER) or die "lpr/close failed: $?/$!";
If the trailing character in
EXPR
is a pipe symbol,
open
again launches a new process, but this time
with a read-only filehandle connected to it. This allows whatever
the command writes to its standard output to show up on your handle
for reading. For example:
open(NET, "netstat -i -n |") or die "can't fork: $!"; while (<NET>) { … } close(NET) or die "can't close netstat: $!/$?";
Explicitly closing any piped filehandle causes the
parent process to wait for the child to finish and returns the
status code in $?
($CHILD_ERROR
). It's also possible for
close
to set $!
($OS_ERROR
). See the examples under
close
and system
for how to
interpret these error codes.
Any pipe command containing shell metacharacters such as wildcards or I/O redirections is passed to your system's canonical shell (/bin/sh on Unix), so those shell-specific constructs can be processed first. If no metacharacters are found, Perl launches the new process itself without calling the shell.
You may also use the three-argument form to start up pipes. Using that style, the equivalent of the previous pipe opens would be:
open(PRINTER, "|-", "lpr -Plp1") or die "can't fork: $!"; open(NET, "-|", "netstat -i -n") or die "can't fork: $!";
Here the minus in the second argument represents the command in the third argument. These commands don't happen to invoke the shell, but if you want to guarantee no shell processing occurs, new versions of Perl let you say:
open(PRINTER, "|-", "lpr", "-Plp1") or die "can't fork: $!"; open(NET, "-|", "netstat", "-i", "-n") or die "can't fork: $!";
If you use the two-argument form to open a pipe to or
from the special command "-
",[7] an implicit fork
is done first. (On
systems that can't fork
, this raises an
exception. Microsoft systems did not support fork
prior to the 5.6 release of Perl.) In this case, the minus
represents your new child process, which is a copy of the parent.
The return value from this forking open
is the
process ID of the child when examined from the parent process,
0
when examined from the child process, and the
undefined value undef
if the
fork
fails--in which case, there is no child. For
example:
defined($pid = open(FROM_CHILD, "-|")) or die "can't fork: $!"; if ($pid) { @parent_lines = <FROM_CHILD>; # parent code } else { print STDOUT @child_lines; # child code }
The filehandle behaves normally for the parent, but for the
child process, the parent's input (or output) is piped from (or to)
the child's STDOUT
(or STDIN
).
The child process does not see the parent's filehandle opened. (This
is conveniently indicated by the 0
PID.)
Typically you'd use this construct instead of the
normal piped open
when you want to exercise more
control over just how the pipe command gets executed (such as when
you are running setuid) and don't want to have to scan shell
commands for metacharacters. The following piped opens are roughly
equivalent:
open FH, "| tr 'a-z' 'A-Z'"; # pipe to shell command open FH, "|-", 'tr', 'a-z', 'A-Z'; # pipe to bare command open FH, "|-" or exec 'tr', 'a-z', 'A-Z' or die; # pipe to child
as are these:
open FH, "cat -n 'file' |"; # pipe from shell command open FH, "-|", 'cat', '-n', 'file'; # pipe from bare command open FH, "-|" or exec 'cat', '-n', 'file' or die; # pipe from child
For more elaborate uses of fork open, see Section 16.3.2 in Chapter 16 and Section 23.1.2 in Chapter 23.
When starting a command with open
, you must
choose either input or output: "cmd|
" for reading
or "|cmd
" for writing. You may not use
open
to start a command that pipes both in and
out, as the (currently) illegal notation,
"|cmd|
", might appear to indicate. However, the
standard IPC::Open2
and
IPC::Open3
library routines give you a close
equivalent. For details on double-ended pipes, see Section 16.3.3 in Chapter 16.
You may also, in the Bourne shell tradition, specify
an EXPR
beginning with
>&
, in which case the rest of the string
is interpreted as the name of a filehandle (or file descriptor, if
numeric) to be duplicated using the dup2
(2) syscall.[8] You may use &
after
>
, >>
,
<
, +>
,
+>>
, and +<
. (The
specified mode should match the mode of the original
filehandle.)
One reason you might want to do this would be if you already had a filehandle open and wanted to make another handle that's really a duplicate of the first one.
open(SAVEOUT, ">&SAVEERR") or die "couldn't dup SAVEERR: $!"; open(MHCONTEXT, "<&4") or die "couldn't dup fd4: $!";
That means that if a function is expecting a filename, but you don't want to give it a filename because you already have the file open, you can just pass the filehandle with a leading ampersand. It's best to use a fully qualified handle though, just in case the function happens to be in a different package:
somefunction("&main::LOGFILE");
Another reason to "dup" filehandles is to temporarily redirect
an existing filehandle without losing track of the original
destination. Here is a script that saves, redirects, and restores
STDOUT
and STDERR
:
#!/usr/bin/perl open SAVEOUT, ">&STDOUT"; open SAVEERR, ">&STDERR"; open STDOUT, ">foo.out" or die "Can't redirect stdout"; open STDERR, ">&STDOUT" or die "Can't dup stdout"; select STDERR; $| = 1; # enable autoflush select STDOUT; $| = 1; # enable autoflush print STDOUT "stdout 1 "; # these I/O streams propagate to print STDERR "stderr 1 "; # subprocesses too system("some command"); # uses new stdout/stderrclose STDOUT; close STDERR; open STDOUT, ">&SAVEOUT"; open STDERR, ">&SAVEERR"; print STDOUT "stdout 2 "; print STDERR "stderr 2 ";
If the filehandle or descriptor number is preceded by
a &=
combination instead of a simple
&
, then instead of creating a completely new
file descriptor, Perl makes the
FILEHANDLE
an alias for the existing
descriptor using the fdopen (3) C
library call. This is slightly more parsimonious of systems
resources, although that's less of a concern these days.
$fd = $ENV{"MHCONTEXTFD"}; open(MHCONTEXT, "<&=$fdnum") or die "couldn't fdopen descriptor $fdnum: $!";
Filehandles STDIN
,
STDOUT
, and STDERR
always
remain open across an exec
. Other filehandles, by
default, do not. On systems supporting the fcntl
function, you may modify the close-on-exec flag for a
filehandle.
use Fcntl qw(F_GETFD F_SETFD); $flags = fcntl(FH, F_SETFD, 0) or die "Can't clear close-on-exec flag on FH: $! ";
See also the special $^F
($SYSTEM_FD_MAX
) variable in Chapter 28.
With the one- or two-argument form of open
,
you have to be careful when you use a string variable as a filename,
since the variable may contain arbitrarily weird characters
(particularly when the filename has been supplied by arbitrarily
weird characters on the Internet). If you're not careful, parts of
the filename might get interpreted as a
MODE
string, ignorable whitespace, a dup
specification, or a minus. Here's one historically interesting way
to insulate yourself:
$path =~ s#^(s)#./$1#; open (FH, "< $path ") or die "can't open $path: $!";
But that's still broken in several ways. Instead,
just use the three-argument form of open
to open
any arbitrary filename cleanly and without any (extra) security
risks:
open(FH, "<", $path) or die "can't open $path: $!";
On the other hand, if what you're looking for
is a true, C-style open (2) syscall
with all its attendant belfries and whistle-stops, then check out
sysopen
:
use Fcntl; sysopen(FH, $path, O_RDONLY) or die "can't open $path: $!";
If you're running on a system that distinguishes between text and binary files, you may need to put your filehandle into binary mode--or forgo doing so, as the case may be--to avoid mutilating your files. On such systems, if you use text mode on a binary file, or binary mode on a text file, you probably won't like the results.
Systems that need the binmode
function are
distinguished from those that don't by the format used for text
files. Those that don't need it terminate each line with a single
character that corresponds to what C thinks is a newline,
. Unix and Mac OS fall into this category. VMS,
MVS, MS-whatever, and S&M operating systems of other varieties
treat I/O on text files and binary files differently, so they need
binmode
.
Or its equivalent. As of the 5.6 release of Perl, you can
specify binary mode in the open
function without
a separate call to binmode
. As part of the
MODE
argument (but only in the
three-argument form), you may specify various input and output
disciplines. To do the equivalent of a binmode
,
use the three argument form of open
and stuff a
discipline of :raw
in after the other
MODE
characters:
open(FH, "<:raw", $path) or die "can't open $path: $!";
Since this is a very new feature, there will certainly be more disciplines by the time you read this than there were when we wrote it. However, we can reasonably predict that there will in all likelihood be disciplines resembling some or all of the ones in Table 29.2.
Table 29-2. I/O Disciplines
Discipline | Meaning |
---|---|
:raw | Binary mode; do no processing |
:text | Default text processing |
:def | Default declared by "use
open " |
:latin1 | File should be ISO-8859- 1 |
:ctype | File should be LC_CTYPE |
:utf8 | File should be UTF-8 |
:utf16 | File should be UTF-16 |
:utf32 | File should be UTF-32 |
:uni | Intuit Unicode (UTF-* ) |
:any | Intuit
Unicode/Latin1/LC_CTYPE |
:xml | Use encoding specified in file |
:crlf | Intuit newlines |
:para | Paragraph mode |
:slurp | Slurp mode |
You'll be able to stack disciplines that make sense to stack, so, for instance, you could say:
open(FH, "<:para:crlf:uni", $path) or die "can't open $path: $!"; while ($para = <FH>) { … }
That would set up disciplines to:
read in some form of Unicode and translate to Perl's internal UTF-8 format if the file isn't already in UTF-8,
look for variants of line-ending sequences, translating
them all to
, and
process the file into paragraph-sized chunks, much as
$/ = "
" does.
If you want to set the default open mode
(:def
) to something other than
:text
, you can declare that at the top of your
file with the open
pragma:
use open IN => ":any", OUT => ":utf8";
In fact, it would be really nice if that were the default
:text
discipline someday. It perfectly captures
the spirit of "Be liberal in what you accept, and strict in what you
produce."
opendirDIRHANDLE
,EXPR
This function opens a directory named
EXPR
for processing by
readdir
, telldir
,
seekdir
, rewinddir
, and
closedir
. The function returns true if
successful. Directory handles have their own namespace separate from
filehandles.
ord EXPR
ord
This function returns the numeric value (ASCII,
Latin-1, or Unicode) of the first character of
EXPR
. The return value is always
unsigned. If you want a signed value, use
unpack('c'
,
EXPR
)
. If you want all
the characters of the string converted to a list of numbers, use
unpack('U*'
,
EXPR
)
instead.
ourTYPE EXPR
:ATTRIBUTES
ourEXPR
:ATTRIBUTES
ourTYPE EXPR
ourEXPR
An our
declares one or more
variables to be valid globals within the enclosing block, file, or
eval
. That is, our
has the
same rules as a my
declaration for determination
of visibility, but does not create a new private variable; it merely
allows unfettered access to the existing package global. If more
than one value is listed, the list must be placed in
parentheses.
The primary use of an our
declaration is to
hide the variable from the effects of a use strict
"vars
" declaration; since the variable is masquerading as
a my
variable, you are permitted to use the
declared global variable without qualifying it with its package.
However, just like the my
variable, this only
works within the lexical scope of the our
declaration. In this respect, it differs from use
vars
, which affects the entire package and is not
lexically scoped.
our
is also like my
in
that you are allowed to declare variables with a
TYPE
and with
ATTRIBUTES
. Here is the syntax:
our Dog $spot :ears(short) :tail(long);
As of this writing, it's not entirely clear what that
will mean. Attributes could affect either the global or the local
interpretation of $spot
. On the one hand, it
would be most like my
variables for attributes to
warp the current local view of $spot
without
interfering with other views of the global in other places. On the
other hand, if one module declares $spot
to be a
Dog
, and another declares
$spot
to be a Cat
, you could
end up with meowing dogs or barking cats. This is a subject of
ongoing research, which is a fancy way to say we don't know what
we're talking about yet. (Except that we do know what to do with the
TYPE
declaration when the variable refers
to a pseudohash—see "Managing Instance Data" in Chapter 12.)
Another way in which our
is like
my
is in its visibility. An
our
declaration declares a global variable that
will be visible across its entire lexical scope, even across package
boundaries. The package in which the variable is located is
determined at the point of the declaration, not at the point of use.
This means the following behavior holds and is deemed to be a
feature:
package Foo; our $bar; # $bar is $Foo::bar for rest of lexical scope $bar = 582; package Bar; print $bar; # prints 582, just as if "our" had been "my"
However, the distinction between my
creating a new, private variable and our
exposing
an existing, global variable is important, especially in
assignments. If you combine a run-time assignment with an
our
declaration, the value of the global variable
does not disappear once the our
goes out of
scope. For that, you need local
:
($x, $y) = ("one", "two"); print "before block, x is $x, y is $y "; { our $x = 10; local our $y = 20; print "in block, x is $x, y is $y "; } print "past block, x is $x, y is $y ";
That prints out:
before block, x is one, y is two in block, x is 10, y is 20 past block, x is 10, y is two
Multiple our
declarations in the same
lexical scope are allowed if they are in different packages. If they
happen to be in the same package, Perl will emit warnings if you ask
it to.
use warnings; package Foo; our $bar; # declares $Foo::bar for rest of lexical scope $bar = 20; package Bar; our $bar = 30; # declares $Bar::bar for rest of lexical scope print $bar; # prints 30 our $bar; # emits warning
See also local
, my
, and
the section Section 4.8
in Chapter 4.
packTEMPLATE
,LIST
This function takes a LIST
of ordinary Perl values and converts them into a string of bytes
according to the TEMPLATE
and returns
this string. The argument list will be padded or truncated as
necessary. That is, if you provide fewer arguments than the
TEMPLATE
requires,
pack
assumes additional null arguments. If you
provide more arguments than the TEMPLATE
requires, the extra arguments are ignored. Unrecognized format
elements in TEMPLATE
will raise an
exception.
The template describes the structure of the string as a
sequence of fields. Each field is represented by a single character
that describes the type of the value and its encoding. For instance,
a format character of N
specifies an unsigned
four-byte integer in big-endian byte order.
Fields are packed in the order given in the template. For example, to pack an unsigned one-byte integer and a single-precision floating-point value into a string, you'd say:
$string = pack("Cf", 244, 3.14);
The first byte of the returned string has the value 244. The remaining bytes are the encoding of 3.14 as a single-precision float. The particular encoding of the floating point number depends on your computer's hardware.
Some important things to consider when packing are:
the type of data (such as integer or float or string),
the range of values (such as whether your integers will fit into one, two, four, or maybe even eight bytes; or whether you're packing 8-bit or Unicode characters),
whether your integers are signed or unsigned, and
the encoding to use (such as native, little-endian, or big-endian packing of bits and bytes).
Table 29.3 lists the format characters and their meanings. (Other characters can occur in formats as well; these are described later.)
Table 29-3. Template Characters for pack/unpack
Character | Meaning |
---|---|
a | A null-padded string of bytes |
A | A space-padded string of bytes |
b | A bit string, in ascending bit order inside each byte
(like vec ) |
B | A bit string, in descending bit order inside each byte |
c | A signed char (8-bit integer) value |
C | An unsigned char (8-bit integer) value; see
U for Unicode |
d | A double-precision floating-point number in native format |
f | A single-precision floating-point number in native format |
h | A hexadecimal string, low nybble first |
H | A hexadecimal string, high nybble first |
i | A signed integer value, native format |
I | An unsigned integer value, native format |
l | A signed long value, always 32 bits |
L | An unsigned long value, always 32 bits |
n | A 16-bit short in "network" (big-endian) order |
N | A 32-bit long in "network" (big-endian) order |
p | A pointer to a null-terminated string |
P | A pointer to a fixed-length string |
q | A signed quad (64-bit integer) value |
Q | An unsigned quad (64-bit integer) value |
s | A signed short value, always 16 bits |
S | An unsigned short value, always 16 bits |
u | A uuencoded string |
U | A Unicode character number |
v | A 16-bit short in "VAX" (little-endian) order |
V | A 32-bit long in "VAX" (little-endian) order |
w | A BER compressed integer |
x | A null byte (skip forward a byte) |
X | Back up a byte |
Z | A null-terminated (and null-padded) string of bytes |
@ | Null-fill to absolute position |
You may freely place whitespace and comments in your
TEMPLATE
s. Comments start with the
customary #
symbol and extend up through the
first newline (if any) in the
TEMPLATE
.
Each letter may be followed by a number indicating the
count, interpreted as a repeat count or length
of some sort, depending on the format. With all formats except
a
, A
, b
,
B
, h
, H
,
P
, and Z
,
count is a repeat count, so
pack
gobbles up that many values from the
LIST
. A *
for the
count means however many items are left.
The a
, A
, and
Z
formats gobble just one value, but pack it as a
byte string of length count, padding with nulls
or spaces as necessary. When unpacking, A
strips
trailing spaces and nulls, Z
strips everything
after the first null, and a
returns the literal
data unmolested. When packing, a
and
Z
are equivalent.
Similarly, the b
and B
formats pack a string count bits long. Each
byte of the input field generates 1 bit of the result based on the
least-significant bit of each input byte (that is, on
ord($byte) % 2
). Conveniently, that means bytes
0
and 1
generate bits 0 and 1.
Starting from the beginning of the input string, each 8-tuple of
bytes is converted to a single byte of output. If the length of the
input string is not divisible by 8, the remainder is packed as if
padded by 0's. Similarly, during unpack
ing any
extra bits are ignored. If the input string is longer than needed,
extra bytes are ignored. A *
for the
count means to use all bytes from the input
field. On unpack
ing, the bits are converted to a
string of 0
s and 1
s.
The h
and H
formats pack
a string of count nybbles (4-bit groups often
represented as hexadecimal digits).
The p
format packs a pointer to a
null-terminated string. You are responsible for ensuring the string
is not a temporary value (which can potentially get deallocated
before you get around to using the packed result). The
P
format packs a pointer to a structure of the
size indicated by count. A null pointer is
created if the corresponding value for p
or
P
is undef
.
The /
character allows packing and
unpacking of strings where the packed structure contains a byte
count followed by the string itself. You write
length-item/string-item
. The
length-item
can be any
pack
template letter, and describes how the
length value is packed. The ones likely to be of most use are
integer-packing ones like n
(for Java strings),
w
(for ASN.1 or SNMP) and N
(for Sun XDR). The string-item
must, at
present, be A*
, a*
, or
Z*
. For unpack
, the length of
the string is obtained from the
length-item
, but if you put in the
*
, it will be ignored.
unpack 'C/a', " 4Gurusamy"; # gives 'Guru' unpack 'a3/A* A*', '007 Bond J '; # gives (' Bond','J') pack 'n/a* w/a*','hello,','world'; # gives " 00 06hello, 05world"
The length-item
is not returned
explicitly from unpack
. Adding a
count to the
length-item
letter is unlikely to do
anything useful, unless that letter is A
,
a
, or Z
. Packing with a
length-item
of a
or
Z
may introduce null (