As we discussed in Chapter 18, perl (the program) contains both a compiler and an interpreter for programs written in Perl (the language). The Perl compiler/interpreter is itself written in C. In this chapter, we'll sketch how that C program works from the perspective of someone who wants either to extend or to embed Perl. When you extend Perl, you're putting a chunk of C code (called the extension) under the control of Perl, and when you embed Perl you're putting a Perl interpreter[1] under the control of a larger C program.
The brief coverage we provide here is no substitute for the online documentation of Perl's innards: see the documentation for perlguts, perlxs, perlxstut, perlcall, perlapi, and h2xs, all bundled with Perl. Again, unless you're extending or embedding Perl, you will never need to know any of this stuff.
Presuming you need to know, what you need to know first is a bit about Perl's guts. You'll also need to know C for most of what follows. You'll need a C compiler to run the examples. If your end goal is to create a module for other people to use, they'll need a C compiler too. Many of these examples will only run on Unix-like systems. Oh, and this material is subject to change in future releases of Perl.
In other words, here be dragons.
When the Perl compiler is fed a Perl program, the first task it performs is lexical analysis: breaking down the program into its basic syntactic elements (often called tokens). If the program is:
print "Hello, world! ";
the lexical analyzer breaks it down into three tokens:
print
, "Hello, world!
", and
the final semicolon. The token sequence is then
parsed, fixing the relationship between the
tokens. In Perl, the boundary between lexical analysis and parsing is
blurred more than in other languages. (Other computer languages, that
is. If you think about all the different meanings new
Critter
might have depending on whether there's a Critter
package or a subroutine named new
, you'll
understand why. On the other hand, we disambiguate these kinds of
things all the time in English.)
Once a program has been parsed and (presumably)
understood, it is compiled into a tree of opcodes
representing low-level operations, and finally that tree of operations
is executed--unless you invoked Perl with the
-c
("check syntax") switch, which exits upon
completing the compilation phase. It is during compilation, not
execution, that BEGIN
blocks,
CHECK
blocks, and use
statements
are executed.
[1] While we are careful to distinguish the compiler from the interpreter when that distinction is important, it gets a bit wearisome to keep saying "compiler/interpreter", so we often just shorten that to "interpreter" to mean the whole glob of C code and data that functions like one instance of perl (the program); when you're embedding Perl, you can have multiple instances of the interpreter, but each behaves like its own little perl.