In English, as in many other spoken languages, we’re used to distinguishing between singular and plural. As a computer language designed by a human linguist, Perl is similar. As a general rule, when Perl has just one of something, that’s a scalar.[1]
A scalar is the simplest kind of data that Perl
manipulates. Most scalars are either a number (like 255 or 3.25e20)
or a string of characters (like hello
[2] or the Gettysburg Address). Although you may think of
numbers and strings as very different things, Perl uses them nearly
interchangeably.
A scalar value can be acted upon with operators (like addition or concatenate), generally yielding a scalar result. A scalar value can be stored into a scalar variable. Scalars can be read from files and devices, and can be written out as well.
Although a scalar is most often either a number or a string, it’s useful to look at numbers and strings separately for the moment. We’ll cover numbers first, and then move on to strings.
As you’ll see in the next few paragraphs, you can specify both integers (whole numbers, like 255 or 2001) and floating-point numbers (real numbers with decimal points, like 3.14159, or 1.35 x 1025). But internally, Perl computes with double-precision floating-point values.[3] This means that there are no integer values internal to Perl—an integer constant in the program is treated as the equivalent floating-point value.[4] You probably won’t notice the conversion (or care much), but you should stop looking for distinct integer operations (as opposed to floating-point operations), because there aren’t any.[5]
A literal is the way a value is represented in the source code of the Perl program. A literal is not the result of a calculation or an I/O operation; it’s data written directly into the source code.
Perl’s floating-point literals should look familiar to you. Numbers with and without decimal points are allowed (including an optional plus or minus prefix), as well as tacking on a power-of-10 indicator (exponential notation) with E notation. For example:
1.25 255.000 255.0 7.25e45 # 7.25 times 10 to the 45th power (a big number) -6.5e24 # negative 6.5 times 10 to the 24th # (a big negative number) -12e-24 # negative 12 times 10 to the -24th # (a very small negative number) -1.2E-23 # another way to say that - the E may be uppercase
Integer literals are also straightforward, as in:
0 2001 -40 255 61298040283768
That last one is a little hard to read. Perl allows underscores for clarity within integer literals, so we can also write that number like this:
61_298_040_283_768
It’s the same value; it merely looks different to us human beings. You might have thought that commas should be used for this purpose, but commas are already used for a more-important purpose in Perl (as we’ll see in the next chapter).
Like many other programming languages,
Perl allows you to specify numbers in other than base 10 (decimal).
Octal (base
8) literals start with a leading
0
, hexadecimal (base 16) literals start
with a leading 0x
, and binary (base 2) literals
start with a leading 0b
.[6] The
hex digits A
through F
(or
a
through f
) represent the
conventional digit values of ten through fifteen. For example:
0377 # 377 octal, same as 255 decimal 0xff # FF hex, also 255 decimal 0b11111111 # also 255 decimal (available in version 5.6 and later)
Although these values look different to us humans, they’re all
three the same number to Perl. It makes no difference to Perl whether
you write 0xFF
or 255.000
, so
choose the representation that makes the most sense to you and your
maintenance programmer (by which we mean the poor chap who gets stuck
trying to figure out what you meant when you wrote your code. Most
often, this poor chap is you, and you can’t recall why you did
what you did three months ago).
When a non-decimal literal is more than about four characters long, it may be hard to read. For this reason, starting in version 5.6, Perl allows underscores for clarity within these literals:
0x1377_0b77 0x50_65_72_7C
Perl provides the typical ordinary addition, subtraction, multiplication, and division operators, and so on. For example:
2 + 3 # 2 plus 3, or 5 5.1 - 2.4 # 5.1 minus 2.4, or 2.7 3 * 12 # 3 times 12 = 36 14 / 2 # 14 divided by 2, or 7 10.2 / 0.3 # 10.2 divided by 0.3, or 34 10 / 3 # always floating-point divide, so 3.3333333...
Perl also supports a
modulus
operator (%
). The value
of the expression 10 % 3
is the remainder when ten
is divided by three, which is one. Both values are first reduced to
their integer values, so 10.5 % 3.2
is computed as
10 % 3
.[7]
Additionally, Perl provides the FORTRAN-like
exponentiation
operator, which many have yearned for in
Pascal and C. The operator is represented by the double asterisk,
such as 2**3
, which is two to the third power, or
eight.[8]
In addition, there are other numeric operators, which we’ll introduce as we need them.
Strings are sequences of
characters (like
hello
). Strings may contain any combination of any
characters.[9]
The shortest possible string has no characters. The longest string fills all of your available memory (although you wouldn’t be able to do much with that). This is in accordance with the principle of “no built-in limits” that Perl follows at every opportunity. Typical strings are printable sequences of letters and digits and punctuation in the ASCII 32 to ASCII 126 range. However, the ability to have any character in a string means you can create, scan, and manipulate raw binary data as strings—something with which many other utilities would have great difficulty. For example, you could update a graphical image or compiled program by reading it into a Perl string, making the change, and writing the result back out.
Like numbers, strings have a literal representation, which is the way you represent the string in a Perl program. Literal strings come in two different flavors: single-quoted string literals and double-quoted string literals.
A single-quoted string literal is a sequence of characters enclosed in single quotes. The single quotes are not part of the string itself—they’re just there to let Perl identify the beginning and the ending of the string. Any character other than a single quote or a backslash between the quote marks (including newline characters, if the string continues onto successive lines) stands for itself inside a string. To get a backslash, put two backslashes in a row, and to get a single quote, put a backslash followed by a single quote. In other words:
'fred' # those four characters: f, r, e, and d 'barney' # those six characters '' # the null string (no characters) 'Don't let an apostrophe end this string prematurely!' 'the last character of this string is a backslash: ' 'hello ' # hello followed by backslash followed by n 'hello there' # hello, newline, there (11 characters total) ''' # single quote followed by backslash
Note that the
within a single-quoted string is not
interpreted as a newline, but as the two characters backslash and
n
. Only when the backslash is followed by another
backslash or a single quote does it have special meaning.
A double-quoted string literal is similar to the strings you may have seen in other languages. Once again, it’s a sequence of characters, although this time enclosed in double quotes. But now the backslash takes on its full power to specify certain control characters, or even any character at all through octal and hex representations. Here are some double-quoted strings:
"barney" # just the same as 'barney' "hello world " # hello world, and a newline "The last character of this string is a quote mark: "" "coke sprite" # coke, a tab, and sprite
Note that the double-quoted literal string
"barney"
means the same six-character string to
Perl as does the single-quoted literal string
'barney'
. It’s like what we saw with numeric
literals, where we saw that 0377
was another way
to write 255.0
. Perl lets you write the literal in
the way that makes more sense to you. Of course, if you wish to use a
backslash escape (like
to mean a newline
character), you’ll need to use the double quotes.
The backslash can precede many different characters to mean different things (generally called a backslash escape ). The nearly complete[10] list of double-quoted string escapes is given in Table 2-1.