To the first approximation, Sparc programs only run on Sparc machines, Intel programs only run on Intel machines, and Perl programs only run on Perl machines. A Perl machine possesses those attributes that a Perl program would find ideal in a computer: memory that is automatically allocated and deallocated, fundamental data types that are dynamic strings, arrays, and hashes, and have no size limits, and systems that all behave pretty much the same way. The job of the Perl interpreter is to make whatever computer it happens to be running on appear to be one of these idealistic Perl machines.
This fictitious machine presents the illusion of a computer specially designed to do nothing but run Perl programs. Each opcode produced by the compiler is a fundamental command in this emulated instruction set. Instead of a hardware program counter, the interpreter just keeps track of the current opcode to execute. Instead of a hardware stack pointer, the interpreter has its own virtual stack. This stack is very important because the Perl virtual machine (which we refuse to call a PVM) is a stack-based machine. Perl opcodes are internally called PP codes (short for "push-pop codes") because they manipulate the interpreter's virtual stack to find all operands, process temporary values, and store all results.
If you've ever programmed in Forth or PostScript, or used an HP
scientific calculator with RPN ("Reverse Polish Notation") entry, you
know how a stack machine works. Even if you haven't, the concept is
simple: to add 3 and 4, you do things in the order 3 4
+
instead of the more conventional 3 + 4
.
What this means in terms of the stack is that you push
3
and then 4
onto the stack, and
+
then pops both arguments off the stack, adds
them, and pushes 7
back onto the stack, where it
will sit until you do something else with it.
Compared with the Perl compiler, the Perl interpreter is a straightforward, almost boring, program. All it does is step through the compiled opcodes, one at a time, and dispatch them to the Perl run-time environment, that is, the Perl virtual machine. It's just a wad of C code, right?
Actually, it's not boring at all. A Perl virtual machine keeps track of a great deal of dynamic context on your behalf so that you don't have to. Perl maintains quite a few stacks, which you don't have to understand, but which we'll list here anyway just to impress you:
Where localized values are saved pending restoration. Many internal routines also localize values without your knowing it.
The lightweight dynamic context that controls when the save stack should be "popped".
The heavyweight dynamic context; who called whom to get
where you are now. The caller
function
traverses this stack. Loop-control functions scan this stack to
find out which loop to control. When you peel back the context
stack, the scope stack gets peeled back appropriately, which
restores all your local variables from the save stack, even if
you left the earlier context by nefarious methods such as
raising an exception and longjmp (3)ing
out.
The stack of longjmp (3) contexts that allows us to raise exceptions or exit expeditiously.
Where the current variadic argument list on the operand stack starts.
Where the lexical variables and other "scratch register" storage is kept when subroutines are called recursively.
And of course, there's the C stack on which all the C variables are stored. Perl actually tries to avoid relying on C's stack for the storage of saved values, since longjmp (3) bypasses the proper restoration of such values.
All this is to say that the usual view of an interpreter, a program that interprets another program, is really woefully inadequate to describe what's going on here. Yes, there's some C code implementing some opcodes, but when we say "interpreter", we mean something more than that, in the same way that when we say "musician", we mean something more than a set of DNA instructions for turning notes into sounds. Musicians are real, live organisms and have "state". So do interpreters.
Specifically, all this dynamic and lexical context,
along with the global symbol tables, plus the parse trees, plus a
thread of execution, is what we call an interpreter. As a context for
execution, an interpreter really starts its existence even before the
compiler starts, and can run in rudimentary form even as the compiler
is building up the interpreter's context. In fact, that's precisely
what's happening when the compiler calls into the interpreter to
execute BEGIN
blocks and such. And the interpreter
can turn around and use the compiler to build itself up further. Every
time you define another subroutine or load another module, the
particular virtual Perl machine we call an interpreter is redefining
itself. You can't really say that either the compiler or the
interpreter is in control, because they're cooperating to control the
bootstrap process we commonly call "running a Perl script". It's like
bootstrapping a child's brain. Is it the DNA doing it or is it the
neurons? A little of both, we think, with some input from external
programmers.
It's possible to run multiple interpreters in the same process; they may or may not share parse trees, depending on whether they were started by cloning an existing interpreter or by building a new interpreter from scratch. It's also possible to run multiple threads in a single interpreter, in which case they share not only parse trees but also global symbols--see Chapter 17.
But most Perl programs use only a single Perl
interpreter to execute their compiled code. And while you can run
multiple, independent Perl interpreters within one process, the
current API for this is only accessible from C.[5] Each individual Perl interpreter serves the role of a
completely separate process, but doesn't cost as much to create as a
whole new process does. That's how Apache's
mod_perl
extension gets such great performance:
when you launch a CGI script under mod_perl
, that
script has already been compiled into Perl opcodes, eliminating the
need for recompilation--but more importantly, eliminating the need to
start a new process, which is the real bottleneck. Apache initializes
a new Perl interpreter in an existing process and hands that
interpreter the previously compiled code to execute. Of course,
there's much more to it than that--there always is. For more about
mod_perl
, see Writing Apache Modules
with Perl and C (O'Reilly, 1999).
Many other applications such as nvi, vim, and innd can embed Perl interpreters; we can't hope to list them all here. There are a number of commercial products that don't even advertise that they have embedded Perl engines. They just use it internally because it gets their job done in style.
[5] With one exception, so far: revision 5.6.0 of Perl can do
cloned interpreters in support of fork
emulation on Microsoft Windows. There may well be a Perl API to
"ithreads", as they're called, by the time you read this.