The three current backends that convert Perl opcodes into some other format are all emphatically experimental. (Yes, we said this before, but we don't want you to forget.) Even when they happen to produce output that runs correctly, the resulting programs may take more disk space, more memory, and more CPU time than than they would ordinarily. This is an area of ongoing research and development. Things will get better.
The B::Bytecode
module writes the
parse tree's opcodes out in a platform-independent encoding. You can
take a Perl script compiled down to bytecodes and copy that to any
other machine with Perl installed on it.
The standard but currently experimental perlcc (1) command knows how to convert Perl source code into a byte-compiled Perl program. All you have to do is:
% perlcc -b -o pbyscript srcscript
And now you should be able to directly "execute" the resulting pbyscript. The start of that file looks somewhat like this:
#!/usr/bin/perl use ByteLoader 0.03; ^C^@^E^A^C^@^@^@^A^F^@^C^@^@^@^B^F^@^C^@^@^@^C^F^@^C^@^@^@ B^@^@^@^H9^A8M-^?M-^?M-^?M-^?7M-^?M-^?M-^?M-^?6^@^@^@^A6^@ ^G^D^D^@^@^@^KR^@^@^@^HS^@^@^@^HV^@M-2W<^FU^@^@^@^@X^Y@Z^@ …
There you find a small script header followed by
purely binary data. This may seem like deep magic, but its dweomer,
er, dwimmer is at most a minor one. The
ByteLoader
module uses a technique called a
source filter to alter the source code before
Perl gets a chance to see it. A source filter is a kind of
preprocessor that applies to everything below it in the current
file. Instead of being limited to simplistic transformations the way
macro processors like cpp (1) and
m4 (1) are, here there are no
constraints. Source filters have been used to augment Perl's syntax,
to compress or encrypt source code, even to write Perl programs in
Latin. E perlibus unicode; cogito, ergo substr; carp dbm, et al. Er,
caveat scriptor.
The ByteLoader
module is a source filter
that knows how to disassemble the serialized opcodes produced by
B::Bytecode
to reconstruct the original parse
tree. The reconstituted Perl code is spliced into the current parse
tree without using the compiler. When the interpreter hits those
opcodes, it just executes them as though they'd been there waiting
for it all along.
The remaining code generators,
B::C
and B::CC
, both produce C
code instead of serialized Perl opcodes. The code they generate is
far from readable, and if you try to read it you'll just go blind.
It's not something you can use to plug little translated Perl-to-C
bits into a larger C program. For that, see Chapter 21.
The B::C
module just writes out the C data
structures needed to recreate the entire Perl run-time environment.
You get a dedicated interpreter with all the compiler-built data
structures pre-initialized. In some senses, the code generated is
like what B::Bytecode
produces. Both are a
straight translation of the opcode trees that the compiler built,
but where B::Bytecode
lays them out in symbolic
form to be recreated later and plugged into a running Perl
interpreter, B::C
lays those opcodes down in C.
When you compile this C code with your C compiler and link in the
Perl library, the resulting program won't need a Perl interpreter
installed on the target system. (It might need some shared
libraries, though, if you didn't link everything statically.)
However, this program isn't really any different than the regular
Perl interpreter that runs your script. It's just precompiled into a
standalone executable image.
The B::CC
module, however, tries to do more
than that. The beginning of the C source file it generates looks
pretty much like what B::C
produced,[6] but eventually, any similarity ends. In the
B::C
code, you have a big opcode table in C
that's manipulated just as the interpreter would do on its own,
whereas in the C code generated by B::CC
is laid
out in the order corresponding to the run-time flow of your program.
It even has a C function corresponding to each function in your
program. Some amount of optimization based on variable types is
done; a few benchmarks can run twice as fast as in the standard
interpreter. This is the most ambitious of the current code
generators, the one that holds the greatest promise for the future.
By no coincidence, it is also the least stable of the three.
Computer science students looking for graduate thesis projects need look no further. There are plenty of diamonds in the rough waiting to be polished off here.