A pipe is a unidirectional I/O channel that can transfer a stream of bytes from one process to another. Pipes come in both named and nameless varieties. You may be more familiar with nameless pipes, so we'll talk about those first.
Perl's open
function opens a pipe
instead of a file when you append or prepend a pipe symbol to the
second argument to open
. This turns the rest of
the arguments into a command, which will be interpreted as a process
(or set of processes) that you want to pipe a stream of data either
into or out of. Here's how to start up a child process that you
intend to write to:
open SPOOLER, "| cat -v | lpr -h 2>/dev/null" or die "can't fork: $!"; local $SIG{PIPE} = sub { die "spooler pipe broke" }; print SPOOLER "stuff "; close SPOOLER or die "bad spool: $! $?";
This example actually starts up two processes, the first of which (running cat) we print to directly. The second process (running lpr) then receives the output of the first process. In shell programming, this is often called a pipeline. A pipeline can have as many processes in a row as you like, as long as the ones in the middle know how to behave like filters; that is, they read standard input and write standard output.
Perl uses your default system shell (/bin/sh on Unix) whenever a pipe command contains special characters that the shell cares about. If you're only starting one command, and you don't need--or don't want--to use the shell, you can use the multi-argument form of a piped open instead:
open SPOOLER, "|-", "lpr", "-h" # requires 5.6.1 or die "can't run lpr: $!";
If you reopen your program's standard output as a
pipe to another program, anything you subsequently
print
to STDOUT
will be
standard input for the new program. So to page your program's
output,[8] you'd use:
if (-t STDOUT) { # only if stdout is a terminal my $pager = $ENV{PAGER} || 'more'; open(STDOUT, "| $pager") or die "can't fork a pager: $!"; } END { close(STDOUT) or die "can't close STDOUT: $!" }
When you're writing to a filehandle connected to a pipe,
always explicitly close
that handle when you're
done with it. That way your main program doesn't exit before its
offspring.
Here's how to start up a child process that you intend to read from:
open STATUS, "netstat -an 2>/dev/null |" or die "can't fork: $!"; while (<STATUS>) { next if /^(tcp|udp)/; print; } close STATUS or die "bad netstat: $! $?";
You can open a multistage pipeline for input just as
you can for output. And as before, you can avoid the shell by using
an alternate form of open
:
open STATUS, "-|", "netstat", "-an" # requires 5.6.1 or die "can't run netstat: $!";
But then you don't get I/O redirection, wildcard expansion, or multistage pipes, since Perl relies on your shell to do those.
You might have noticed that you can use backticks to accomplish the same effect as opening a pipe for reading:
print grep { !/^(tcp|udp)/ } `netstat -an 2>&1`; die "bad netstat" if $?;
While backticks are extremely handy, they have to read the whole thing into memory at once, so it's often more efficient to open your own piped filehandle and process the file one line or record at a time. This gives you finer control over the whole operation, letting you kill off the child process early if you like. You can also be more efficient by processing the input as it's coming in, since computers can interleave various operations when two or more processes are running at the same time. (Even on a single-CPU machine, input and output operations can happen while the CPU is doing something else.)
Because you're running two or more processes concurrently,
disaster can strike the child process any time between the
open
and the close
. This means
that the parent must check the return values of both
open
and close
. Checking the
open
isn't good enough, since that will only tell
you whether the fork was successful, and possibly whether the
subsequent command was successfully launched. (It can tell you this
only in recent versions of Perl, and only if the command is executed
directly by the forked child, not via the shell.) Any disaster that
happens after that is reported from the child to the parent as a
nonzero exit status. When the close
function sees
that, it knows to return a false value, indicating that the actual
status value should be read from the $?
($CHILD_ERROR
) variable. So checking the return
value of close
is just as important as checking
open
. If you're writing to a pipe, you should
also be prepared to handle the PIPE
signal, which
is sent to you if the process on the other end dies before you're
done sending to it.
Another approach to IPC is to make your program talk to itself, in a manner of speaking. Actually, your process talks over pipes to a forked copy of itself. It works much like the piped open we talked about in the last section, except that the child process continues executing your script instead of some other command.
To represent this to the open
function, you use a pseudocommand consisting of a minus. So the
second argument to open
looks like either
"-|
" or "|-
", depending on
whether you want to pipe from yourself or to yourself. As with an
ordinary fork
command, the
open
function returns the child's process ID in
the parent process but 0
in the child process.
Another asymmetry is that the filehandle named by the
open
is used only in the parent process. The
child's end of the pipe is hooked to either STDIN
or STDOUT
as appropriate. That is, if you open a
pipe to minus with |-
, you
can write to the filehandle you opened, and your kid will find this
in STDIN
:
if (open(TO, "|-")) { print TO $fromparent; } else { $tochild = <STDIN>; exit; }
If you open a pipe from minus
with -|
, you can read from the filehandle you
opened, which will return whatever your kid writes to
STDOUT
:
if (open(FROM, "-|")) { $toparent = <FROM>; } else { print STDOUT $fromchild; exit; }
One common application of this construct is to bypass
the shell when you want to open a pipe from a command. You might
want to do this because you don't want the shell to interpret any
possible metacharacters in the filenames you're trying to pass to
the command. If you're running release 5.6.1 or greater of Perl, you
can use the multi-argument form of open
to get
the same result.
Another use of a forking open is to safely open a file or
command even while you're running under an assumed UID or GID. The
child you fork
drops any special access rights,
then safely opens the file or command and acts as an intermediary,
passing data between its more powerful parent and the file or
command it opened. Examples can be found in Section 23.1.3 in Chapter 23.
One creative use of a forking open is to filter your own output. Some algorithms are much easier to implement in two separate passes than they are in just one pass. Here's a simple example in which we emulate the Unix tee (1) program by sending our normal output down a pipe. The agent on the other end of the pipe (one of our own subroutines) distributes our output to all the files specified:
tee("/tmp/foo", "/tmp/bar", "/tmp/glarch"); while (<>) { print "$ARGV at line $. => $_"; } close(STDOUT) or die "can't close STDOUT: $!"; sub tee { my @output = @_; my @handles = (); for my $path (@output) { my $fh; # open will fill this in unless (open ($fh, ">", $path)) { warn "cannot write to $path: $!"; next; } push @handles, $fh; } # reopen STDOUT in parent and return return if my $pid = open(STDOUT, "|-"); die "cannot fork: $!" unless defined $pid; # process STDIN in child while (<STDIN>) { for my $fh (@handles) { print $fh $_ or die "tee output failed: $!"; } } for my $fh (@handles) { close($fh) or die "tee closing failed: $!"; } exit; # don't let the child return to main! }
This technique can be applied repeatedly to push as many
filters on your output stream as you wish. Just keep calling
functions that fork-open STDOUT
, and have the
child read from its parent (which it sees as
STDIN
) and pass the massaged output along to the
next function in the stream.
Another interesting application of talking to yourself with
fork-open is to capture the output from an ill-mannered function
that always splats its results to STDOUT
. Imagine
if Perl only had printf
and no
sprintf
. What you'd need would be something that
worked like backticks, but with Perl functions instead of external
commands:
badfunc("arg"); # drat, escaped! $string = forksub(&badfunc, "arg"); # caught it as string @lines = forksub(&badfunc, "arg"); # as separate lines sub forksub { my $kidpid = open my $self, "-|"; defined $kidpid or die "cannot fork: $!"; shift->(@_), exit unless $kidpid; local $/ unless wantarray; return <$self>; # closes on scope exit }
We're not claiming this is efficient; a tied filehandle would probably be a good bit faster. But it's a lot easier to code up if you're in more of a hurry than your computer is.
Although using open
to connect to
another command over a pipe works reasonably well for unidirectional
communication, what about bidirectional communication? The obvious
approach doesn't actually work:
open(PROG_TO_READ_AND_WRITE, "| some program |") # WRONG!
and if you forget to enable warnings, then you'll miss out entirely on the diagnostic message:
Can't do bidirectional pipe at myprog line 3.
The open
function doesn't allow
this because it's rather prone to deadlock unless you're quite
careful. But if you're determined, you can use the standard
IPC::Open2
library module to attach two pipes to
a subprocess's STDIN
and
STDOUT
. There's also an
IPC::Open3
module for tridirectional I/O
(allowing you to also catch your child's STDERR
),
but this requires either an awkward select
loop
or the somewhat more convenient IO::Select
module. But then you'll have to avoid Perl's buffered input
operations like <>
(readline
).
Here's an example using open2
:
use IPC::Open2; local (*Reader, *Writer); $pid = open2(*Reader, *Writer, "bc -l"); $sum = 2; for (1 .. 5) { print Writer "$sum * $sum "; chomp($sum = <Reader>); } close Writer; close Reader; waitpid($pid, 0); print "sum is $sum ";
You can also autovivify lexical filehandles:
my ($fhread, $fhwrite); $pid = open2($fhread, $fhwrite, "cat -u -n");
The problem with this in general is that standard I/O
buffering is really going to ruin your day. Even though your output
filehandle is autoflushed (the library does this for you) so that
the process on the other end will get your data in a timely manner,
you can't usually do anything to force it to return the favor. In
this particular case, we were lucky: bc
expects
to operate over a pipe and knows to flush each output line. But few
commands are so designed, so this seldom works out unless you
yourself wrote the program on the other end of the double-ended
pipe. Even simple, apparently interactive programs like
ftp fail here because they won't do line
buffering on a pipe. They'll only do it on a tty device.
The IO::Pty
and
Expect
modules from CPAN can help with this
because they provide a real tty (actually, a real pseudo-tty, but it
acts like a real one). This gets you line buffering in the other
process without modifying its program.
If you split your program into several processes and
want these to all have a conversation that goes both ways, you can't
use Perl's high-level pipe interfaces, because these are all
unidirectional. You'll need to use two low-level
pipe
function calls, each handling one direction
of the conversation:
pipe(FROM_PARENT, TO_CHILD) or die "pipe: $!"; pipe(FROM_CHILD, TO_PARENT) or die "pipe: $!"; select((select(TO_CHILD), $| = 1))[0]); # autoflush select((select(TO_PARENT), $| = 1))[0]); # autoflush if ($pid = fork) { close FROM_PARENT; close TO_PARENT; print TO_CHILD "Parent Pid $$ is sending this "; chomp($line = <FROM_CHILD>); print "Parent Pid $$ just read this: `$line' "; close FROM_CHILD; close TO_CHILD; waitpid($pid,0); } else { die "cannot fork: $!" unless defined $pid; close FROM_CHILD; close TO_CHILD; chomp($line = <FROM_PARENT>); print "Child Pid $$ just read this: `$line' "; print TO_PARENT "Child Pid $$ is sending this "; close FROM_PARENT; close TO_PARENT; exit; }
On many Unix systems, you don't actually have to make
two separate pipe
calls to achieve full duplex
communication between parent and child. The
socketpair
syscall provides bidirectional
connections between related processes on the same machine. So
instead of two pipe
s, you only need one
socketpair
.
use Socket; socketpair(Child, Parent, AF_UNIX, SOCK_STREAM, PF_UNSPEC) or die "socketpair: $!"; # or letting perl pick filehandles for you my ($kidfh, $dadfh); socketpair($kidfh, $dadfh, AF_UNIX, SOCK_STREAM, PF_UNSPEC) or die "socketpair: $!";
After the fork
, the parent closes the
Parent
handle, then reads and writes via the
Child
handle. Meanwhile, the child closes the
Child
handle, then reads and writes via the
Parent
handle.
If you're looking into bidirectional communications because the process you'd like to talk to implements a standard Internet service, you should usually just skip the middleman and use a CPAN module designed for that exact purpose. (See Section 16.5 later for a list of a some of these.)
A named pipe (often called a FIFO) is a mechanism for setting up a conversation between unrelated processes on the same machine. The names in a "named" pipe exist in the filesystem, which is just a funny way to say that you can put a special file in the filesystem namespace that has another process behind it instead of a disk.[9]
A FIFO is convenient when you want to connect a process to an unrelated one. When you open a FIFO, your process will block until there's a process on the other end. So if a reader opens the FIFO first, it blocks until the writer shows up--and vice versa.
To create a named pipe, use the POSIX
mkfifo
function--if you're on a POSIX system,
that is. On Microsoft systems, you'll instead want to look into the
Win32::Pipe
module, which, despite its possible
appearance to the contrary, creates named pipes. (Win32 users create
anonymous pipes using pipe
just like the rest of
us.)
For example, let's say you'd like to have your .signature file produce a different answer each time it's read. Just make it a named pipe with a Perl program on the other end that spits out random quips. Now every time any program (like a mailer, newsreader, finger program, and so on) tries to read from that file, that program will connect to your program and read in a dynamic signature.
In the following example, we use the rarely seen
-p
file test operator to determine whether anyone
(or anything) has accidentally removed our FIFO.[10]If they have, there's no reason to try to open it, so
we treat this as a request to exit. If we'd used a simple
open
function with a mode of ">
$fpath
", there would have been a tiny race condition that
would have risked accidentally creating the signature as a plain
file if it disappeared between the -p
test and
the open. We couldn't use a "+< $fpath
" mode,
either, because opening a FIFO for read-write is a nonblocking open
(this is only true of FIFOs). By using sysopen
and omitting the O_CREAT
flag, we avoid this
problem by never creating a file by accident.
use Fcntl; # for sysopen chdir; # go home $fpath = '.signature'; $ENV{PATH} .= ":/usr/games"; unless (-p $fpath) { # not a pipe if (-e _) { # but a something else die "$0: won't overwrite .signature "; } else { require POSIX; POSIX::mkfifo($fpath, 0666) or die "can't mknod $fpath: $!"; warn "$0: created $fpath as a named pipe "; } } while (1) { # exit if signature file manually removed die "Pipe file disappeared" unless -p $fpath; # next line blocks until there's a reader sysopen(FIFO, $fpath, O_WRONLY) or die "can't write $fpath: $!"; print FIFO "John Smith ([email protected]) ", `fortune -s`; close FIFO; select(undef, undef, undef, 0.2); # sleep 1/5th second }
The short sleep after the close is needed to give the reader a chance to read what was written. If we just immediately loop back up around and open the FIFO again before our reader has finished reading the data we just sent, then no end-of-file is seen because there's once again a writer. We'll both go round and round until during one iteration, the writer falls a little behind and the reader finally sees that elusive end-of-file. (And we were worried about race conditions?)