One of the principles underlying Perl's design is that simple things should be simple, and hard things should be possible. Documentation should be simple.
Perl supports a simple text markup format called pod that can stand on its own or be freely intermixed with your source code to create embedded documentation. Pod can be converted to many other formats for printing or viewing, or you can just read it directly, because it's plain.
Pod is not as expressive as languages like XML, [LaTeX], troff (1), or even HTML. This is intentional: we sacrificed that expressiveness for simplicity and convenience. Some text markup languages make authors write more markup than text, which makes writing harder than it has to be, and reading next to impossible. A good format, like a good movie score, stays in the background without causing distraction.
Getting programmers to write documentation is almost as hard as getting them to wear ties. Pod was designed to be so easy to write that even a programmer could do it--and would. We don't claim that pod is sufficient for writing a book, although it was sufficient for writing this one.
Most document formats require the entire document to be in that format. Pod is more forgiving: you can embed pod in any sort of file, relying on pod translators to extract the pod. Some files consist entirely of 100% pure pod. But other files, notably Perl programs and modules, may contain dollops of pod sprinkled about wherever the author feels like it. Perl simply skips over the pod text when parsing the file for execution.
The Perl lexer knows to begin skipping when, at a spot where it would ordinarily find a statement, it instead encounters a line beginning with an equal sign and an identifier, like this:
=head1 Here There Be Pods!
That text, along with all remaining text up through and
including a line beginning with =cut
, will be
ignored. This allows you to intermix your source code and your
documentation freely, as in:
=item snazzle The snazzle() function will behave in the most spectacular form that you can possibly imagine, not even excepting cybernetic pyrotechnics. =cut sub snazzle { my $arg = shift; …. } =item razzle The razzle() function enables autodidactic epistemology generation. =cut sub razzle { print "Epistemology generation unimplemented on this platform. "; }
For more examples, look at any standard or CPAN Perl module. They're all supposed to come with pod, and nearly all do, except for the ones that don't.
Since pod is recognized by the Perl lexer and thrown
out, you may also use an appropriate pod directive to quickly comment
out an arbitrarily large section of code. Use a
=for
pod block to comment out one paragraph, or a
=begin
/=end
pair for a larger
section. We'll cover the syntax of those pod directives later.
Remember, though, that in both cases, you're still in pod mode
afterwards, so you need to =cut
back to the
compiler.
print "got 1 "; =for commentary This paragraph alone is ignored by anyone except the mythical "commentary" translator. When it's over, you're still in pod mode, not program mode. print "got 2 ";=cut # ok, real program again print "got 3 "; =begin comment print "got 4 "; all of this stuff here will be ignored by everyone print "got 5 "; =end comment =cut print "got 6 ";
This will print out that it got 1
,
3
, and 6
. Remember that these
pod directives can't go just anywhere. You have to put them only where
the parser is expecting to see a new statement, not just in the middle
of an expression or at other arbitrary locations.
From the viewpoint of Perl, all pod markup is thrown out, but
from the viewpoint of pod translators, it's the code that is thrown
out. Pod translators view the remaining text as a sequence of
paragraphs separated by blank lines. All modern pod translators parse
pod the same way, using the standard Pod::Parser
module. They differ only in their output, since each translator
specializes in one output format.
There are three kinds of paragraphs: verbatim paragraphs, command paragraphs, and prose paragraphs.
Verbatim paragraphs are used for literal text that you want to
appear as is, such as snippets of code. A verbatim paragraph must be
indented; that is, it must begin with a space or tab character. The
translator should reproduce it exactly, typically in a constant
width font, with tabs assumed to be on eight-column boundaries.
There are no special formatting escapes, so you can't play font
games to italicize or embolden. A <
character
means a literal <
, and nothing else.
All pod directives start with =
followed by an identifier. This may be followed by any amount of
arbitrary text that the directive can use however it pleases. The
only syntactic requirement is that the text must all be one
paragraph. Currently recognized directives (sometimes called
pod commands) are:
=head1
=head2
The =head1
,
=head2
,... directives produce headings at
the level specified. The rest of the text in the paragraph is
treated as the heading description. These are similar to the
.SH
and .SS
section and
subsection headers in man (7),
or to
<H1>
...</H1>
and
<H2>
...</H2>
tags in HTML. In fact, that's exactly what those translators
convert these directives into.
=cut
The =cut
directive indicates the end
of a stretch of pod. (There might be more pod later in the
document, but if so it will be introduced with another pod
directive.)
=pod
The =pod
directive does nothing
beyond telling the compiler to lay off parsing code through
the next =cut
. It's useful for adding
another paragraph to the document if you're mixing up code and
pod a lot.
=over
NUMBER
=item
SYMBOL
=back
The =over
directive starts a section
specifically for the generation of a list using the
=item
directive. At the end of your list,
use =back
to end it. The
NUMBER
, if provided, hints to the
formatter how many spaces to indent. Some formatters aren't
rich enough to respect the hint, while others are
too rich to respect it, insofar as it's
difficult when working with proportional fonts to make
anything line up merely by counting spaces. (However, four
spaces is generally construed as enough room for bullets or
numbers.)
The actual type of the list is indicated by the
SYMBOL
on the individual items.
Here is a bulleted list:
=over 4 =item * Mithril armor =item * Elven cloak =back
And a numbered list:
=over 4 =item 1. First, speak "friend". =item 2. Second, enter Moria. =back
And a named list:
=over 4 =item armor() Description of the armor() function =item chant() Description of the chant() function =back
You may nest lists of the same or different types, but
some basic rules apply: don't use =item
outside an =over
/=back
block; use at least one =item
inside an
=over
/=back
block; and
perhaps most importantly, keep the type of the items
consistent within a given list. Either use =item
*
for each item to produce a bulleted list, or
=item 1
., =item 2
., and
so on to produce numbered list, or use =item
foo
, =item bar
, and so on to
produce a named list. If you start with bullets or numbers,
stick with them, since formatters are allowed to use the first
=item
type to decide how to format the
list.
As with everything in pod, the result is only as good as
the translator. Some translators pay attention to the
particular numbers (or letters, or Roman numerals) following
the =item
, and others don't. The current
pod2html translator, for instance, is
quite cavalier: it strips out the sequence indicators entirely
without looking at them to infer what sequence you're using,
then wraps the entire list inside
<OL>
and
</OL>
tags so that the browser can
display it as an ordered list in HTML. This is not to be
construed a feature; it may eventually be fixed.
=for
TRANSLATOR
=begin
TRANSLATOR
=end
TRANSLATOR
=for
, =begin
, and
=end
let you include special sections to be
passed through unaltered, but only to particular formatters.
Formatters that recognize their own names, or aliases for
their names, in TRANSLATOR
pay
attention to that directive; any others completely ignore
them. The directive =for
specifies that
just the rest of this paragraph is destined
for a particular translator.
=for html <p> This is a <flash>raw</flash> <small>HTML</small> paragraph </p>
The paired =begin
and
=end
directives work similarly to
=for
, but instead of accepting a single
paragraph only, they treat all text between matched
=begin
and =end
as
destined for a particular translator. Some examples:
=begin html <br>Figure 1.<IMG SRC="figure1.png"><br> =end html =begin text --------------- | foo | | bar | --------------- ^^^^ Figure 1. ^^^^ =end text
Values of TRANSLATOR
commonly
accepted by formatters include roff
,
man
, troff
,
nroff
, tbl
,
eqn
, latex
,
tex
, html
, and
text
. Some formatters will accept some of
these as synonyms. No translator accepts
comment
—that's just the customary word for
something to be ignored by everybody. Any unrecognized word
would serve the same purpose. While writing this book, we
often left notes for ourselves under the directive
=for later
.
Note that =begin
and
=end
do nest, but only in the sense that
the outermost matched set causes everything in the middle to
be treated as nonpod, even if it happens to contain other
=word
directives. That is, as soon
as any translator sees =begin foo
, it will
either ignore or process everything down
to the corresponding =end foo
.
The third type of paragraph is simply "flowed" text. That is, if a paragraph doesn't start with either whitespace or an equals sign, it's taken as a plain paragraph: regular text that's typed in with as few frills as possible. Newlines are treated as equivalent to spaces. It's largely up to the translator to make it look nice, because programmers have more important things to do. It is assumed that translators will apply certain common heuristics--see the section "Pod Translators and Modules" later in this chapter.
You can do some things explicitly, however. Inside either ordinary paragraphs or heading/item directives (but not in verbatim paragraphs), you may use special sequences to adjust the formatting. These sequences always start with a single capital letter followed by a left angle bracket, and extend through the matching (not necessarily the next) right angle bracket. Sequences may contain other sequences.
Here are the sequences defined by pod:
I<
text
>
Italicized text, used for emphasis, book titles, names of ships, and manpage references such as "perlpod (1)".
B<
text
>
Emboldened text, used almost exclusively for command-line switches and sometimes for names of programs.
C<
text
>
Literal code, probably in a fixed-width font like Courier. Not needed on simple items that the translator should be able to infer as code, but you should put it anyway.
S<
text
>
Text with nonbreaking spaces. Often surrounds other sequences.
L<
name
>
A cross reference (link) to a name:
L<
name
>
Manual page
L<
name/ident
>
Item in manual page
L<
name/"sec"
>
Section in other manual page
L<
"sec"
>
Section in this manual page (the quotes are optional)
L<
/"sec"
>
Ditto
The next five sequences are the same as those above, but
the output will be only text
, with
the link information hidden as in HTML:
L<
text
|
name
>
L<
text
|
name
/
ident
>
L<
text
|
name
/
"sec"
>
L<
text
|
"sec"
>
L<
text
|/
"sec"
>
The text
cannot contain
the characters /
and
|
, and should contain
<
or >
only
in matched pairs.
F<
pathname
>
Used for filenames. This is traditionally rendered the
same as I
.
X<
entry
>
An index entry of some sort. As always, it's up to the translator to decide what to do. The pod specification doesn't dictate that.
E<
escape
>
A named character, similar to HTML escapes:
E<lt>
A literal <
(optional except
in other interior sequences and when preceded by a
capital letter)
E<gt>
A literal >
(optional except
in other interior sequences)
E<sol>
A literal /
(needed in
L<>
only)
E<verbar>
A literal |
(needed in
L<>
only)
E<
NNN
>
Character number NNN
,
probably in ISO-8859-1, but maybe Unicode. Shouldn't
really matter, in the abstract...
E<
entity
>
Some nonnumeric HTML entity, such as
E<Agrave>
.
Z<>
A zero-width character. This is nice for putting in front of sequences that might confuse something. For example, if you had a line in regular prose that had to start with an equals sign, you could write that as:
Z<>=can you see
or for something with a "From" in it, so the mailer
doesn't put a >
in front:
Z<>From here on out…
Most of the time, you'll need only a single set of angle
brackets to delimit one of these pod sequences. Sometimes, however,
you will want to put a <
or
>
inside a sequence. (This is particularly
common when using a C<>
sequence to provide
a constant-width font for a snippet of code.) As with all things in
Perl, there is more than one way to do it. One way is to simply
represent the closing bracket with an E
sequence:
C<$a E<lt>=E<gt> $b>
This produces “$a <=>
$b
”.
A more readable, and perhaps more "plain" way, is to use an
alternate set of delimiters that doesn't require the angle brackets
to be escaped. Doubled angle brackets (C<<
stuff
>>
) may be
used, provided there is whitespace immediately following the opening
delimiter and immediately preceding the closing one. For example,
the following will work:
C<< $a <=> $b >>
You may use as many repeated angle-brackets as you like so
long as you have the same number of them on both sides, and you make
sure that whitespace immediately follows the last
<
of the left side and immediately precedes
the first >
of the right side. So the
following will also work:
C<<< $a <=> $b >>> C<<<< $a <=> $b >>>>
All these end up spitting out $a <=>
$b
in a constant-width font.
The extra whitespace inside on either end goes away, so you
should leave whitespace on the outside if
you want it. Also, the two inside chunks of extra whitespace don't
overlap, so if the first thing being quoted is
>>
, it isn't taken as the closing
delimiter:
The C<< >> >> right shift operator.
This produces "The >>
right shift
operator."
Note that pod sequences do nest. That
means you can write "The I<Santa MarE<iacute>a>
left port already
" to produce "The Santa
María left port already", or "B<touch>
S<B<-t> I<time>> I<file>
" to
produce "touch -t time file", and
expect this to work properly.