CHAPTER 2

Basic Concepts

In the next few chapters, Chapters 3 to 11, we cover the fundamentals of the Perl programming language in depth. This chapter is an introduction to the basics of Perl that newcomers to the language can use to familiarize themselves before we get into the details.

All the subjects covered here are examined in detail in later chapters, so this chapter is only really required reading for those totally new to Perl, or who need a quick refresher without getting bogged down in too much detail early on.

The topics covered here are as follows:

  • Variables and Values
  • Comments and Whitespace
  • Operators and Built-In Functions
  • Expressions and Statements
  • Perl's Data Types
  • Evaluation Context
  • Perl's Special Variables
  • String Interpolation
  • Matching, Substitution, and Transliteration
  • Blocks, Loops, and Conditions
  • Subroutines
  • Modules and Packages
  • Enabling Warnings and Strictness
  • Variable Declarations, Scope, and Visibility

The following is a short example Perl program that, once we have mastered the basics covered in this chapter, we will be fully able to understand:

#!/usr/bin/perl
# helloperl.pl
use strict;
use warnings;

sub hello {
    my ($argument)=@_;
    # the traditional salutation
    print "Hello $argument World! ";
}
$ARGV[0] = "Brave New" unless scalar(@ARGV)>0;

hello($ARGV[0]);

If we save this program as hello.pl, we should be able to run it like this:

> hello.pl Perl
Hello Perl World!

Programmers who already have some experience in Perl will discover things they didn't know in the chapters that follow, but can nonetheless safely skip this introduction.

Values and Variables

Values are simply any single item of data that can be used in an expression to produce a new value or generate some kind of result. The following are all values:

1066
3.1415926
"Hello World"

These are literal or anonymous values, which means they are written directly into the source code. They are not associated with any label or identifier—they just are. As such, they are constants that we cannot manipulate. In order to do that, we need variables.

Variables are named identifiers that store a value. The value of a variable can be read or changed through the variable's name. In Perl, the data type of the variable is indicated by prefixing the name with a punctuation symbol. This is different from languages like C, which rely on a type specification in the declaration of a variable. However, Perl is not concerned with the precise type of a variable, whether it is an integer or a string or a reference and so on, but only its storage type: scalar, array, or hash.

Scalar variables store a single value, such as the preceding ones, and are prefixed with a $. Consider this example:

$counter

Array and hash variables look similar, but can store multiple values. We'll return to Perl's other data types shortly, so here are some examples in the meantime:

@array
%hash

A variable name can consist of any mixture of alphanumeric characters, including the underscore character, up to a total of 251 characters. From Perl 5.8 onward variables may even have accented and Unicode characters in their names. However, there are some limitations, notably that a variable name cannot start with a number. Here are some valid scalar variable names:

$A_Scalar_Variable
$scalarNo8
$_private_scalar

These, on the other hand, will cause Perl to complain:

$64bitint            (leading numbers not legal)
$file-handle         (minus sign not legal)
$excangerateto£      (pound symbol not legal)

Assigning values to variables is done with the assignment operator (i.e., an equals sign).

$variable = 42; # a Perl statement

This is a Perl statement, consisting of a variable, a value, an operator, and terminated by a semicolon to mark the end of the statement. It uses whitespace to improve the legibility of the statement, and is followed by a descriptive comment. Let's look at each of these concepts next, starting with comments and whitespace.

Comments and Whitespace

In the preceding examples, we surrounded the assignment with space characters. While not a requirement, it does improve readability. Whitespace is so called because it refers to anything that is just filler—literally, the white parts of a printed page. Perl is very forgiving when it comes to the use of whitespace (tabs, spaces, and newlines), so use it wherever it improves legibility.

Comments, indicated with a #, are segments of text that have no effect on the program at either compile or run time. They are provided to—hopefully—describe what the program is doing at that point. They are discarded by Perl during the compilation of the code and do not affect program execution in any way.

We can use whitespace and comments to improve the legibility of our code. As these two equivalent statements illustrate, whitespace can be used to help visually separate out matching parentheses in a complex expression. A comment allows us to document what the code actually does.

  print 1+2*(3*(rand(4)-5))+6;
  print 1 + 2 * ( 3*(rand(4)-5) ) + 6; # print expression

Perl's tendency to use punctuation for almost everything can result in often illegible code—see the section "Special Variables" later in this chapter. Judicious use of whitespace can make the difference between maintainable code and an illegible mess.

However, use of whitespace is not without restrictions. Specifically, whitespace cannot be used in places where it can confuse the interpreter. For instance, we cannot use whitespace in file names, so the following file name is not legal:

$sca lar;        # ERROR: we may use $scalar or $sca_lar instead.

In Perl, comments are also considered whitespace from the perspective of parsing source code. That means that whenever we can split a statement across multiple lines we can place a comment at the end of any line.

$string="one".   # first part
        "two".   # second part
        "three"; # after the statement

Variables and values are not very useful unless we have a means to manipulate them, such as the print function and arithmetic operators we used earlier. We will take a look at operators and functions next.

Operators and Functions

Operators are the fundamental tools with which we can process values and variables to produce new values. Numerical operators include addition, subtraction, multiplication, division, modulus, and raising powers (exponentiation) as shown in the following examples:

$number = 4 + 5;      # 4 plus 5 is 9
$number = 4 / 5;      # 4 divided by 5 0.8
$number = 4 % 5;      # 4 modulus 5 is 0
$number = 4 ** 0.5;   # 4 to the power of 0.5 is 2
$number = −4;         # negative 4 is −4

String operators include the concatenation operator:

$string = "P" . "e" . "r" . "l";

We should also not forget the comparison operators like <, >=, and == for numbers (and their equivalents lt, ge, and eq for strings).

$under_a_hundred = $number < 100;
$middle_or_higher = $string ge "middle"

The values that operators manipulate are called operands. Binary operators like addition have a left operand and a right operand. Unary operators like the unary minus earlier have a single right operand.

Perl has many built-in functions that can be used like operators. Some functions are unary operators, also called scalar operators, because they take a single value to process. Others are list operators that take either a fixed or arbitrary number of values. Either way, these values are always on the right-hand side. The print function is a good example of an arbitrary list operator, since it will print any quantity of values given to it:

print "A","list","of",5,"values"," ";

Perl has many built-in functions, all of which are either list operators or scalar operators, the latter taking only one value to process. A scalar operator is therefore just another term for a unary operator.

Operators have precedence and associativity, which control the order in which Perl will process them and whether the left- or right-hand side of a binary operator will be evaluated first. Multiplication has a higher precedence than addition, while both addition and subtraction have left associativity (meaning that the evaluation of the left operand is looked at first by Perl). For example, in this calculation the multiplication happens first, then the subtraction and addition, in that order:

$result = 1 - 2 + 3 * 4;      # 3*4 gives 12, 1-2 gives −1, −1+12 gives −11

Parentheses can be used to force a precedence other than the default.

$result = (1 - (2 + 3)) * 4;  # 2+3 gives 5, 1-5 gives −4, −4*4 gives −16

The assignment operator = also returns a result, in this case the value assigned to the left-hand side. It is perfectly normal to ignore this result because it is generally a side effect of the assignment. Even print returns a result to indicate whether or not it printed successfully, so it too qualifies as an operator.

We will read more on operators in Chapter 4. For now, we can take our knowledge of values, variables, and operators, and use it to start creating expressions and statements.

Expressions and Statements

An expression is any language construct that returns a value. Every example involving an operator in the previous section is an expression. A statement is just an expression whose return value is not used. Instead, it produces some other effect.

Values and variables are the simplest form of expression, but any combination of values, variables, and operators is also an expression. Since expressions return values, they can be combined into even larger expressions. For instance, this code is made up of several expressions:

  print 1 + 2 / ( 3*(rand(4)-5) ) + 6; # combine values to make expressions

This example demonstrates the use of parentheses to explicitly control how a compound expression is evaluated. Working outwards, 4 is an expression, rand(4) is also an expression, as is rand(4)-5. The parentheses group the operator and its operands, making them an expression that excludes the 3. The outer set of parentheses then groups the multiplication by 3, and returns its own value, which is divided into 2. The outermost expression consists of 1 plus the value of this expression, plus 6. Well, not quite the outermost. That's technically the print function, but its return value is ignored.

A statement produces an effect other than returning the result of a calculation. The preceding code has an effect: it prints out the result of its calculation to standard output. Statements are separated from each other by semicolons, which tell Perl where one statement ends and the next begins. Here is the same code again, this time rewritten in multiple statements:

  $number = rand(4)-5;
  $number = 3 * $number;
  $number = 1+ $number + 6;
  print $number;

As Perl does not care what kind of whitespace is used to separate expressions, or even if any is used, the preceding could be put all on one line instead:

$number = rand(4)-5; $number=3*$number; $number=1+$number+6; print $number

The very last statement of all does not need a semicolon because nothing comes after it. However, including the semicolon after each statement is good practice, just in case we add more code afterward. Combining statements into one line like this is valid Perl but not as legible as putting them on separate lines. In general, only consider it for very short statements where the meaning is clear.

A statement is an expression whose value is ignored. We call this evaluating the expression in a void context. We'll discuss context in a moment, after we consider the different ways Perl allows us to store data through its built-in data types.

Data Types

Perl is commonly described as defining three basic data types: scalars, arrays, and hashes (also known as associative arrays in other languages). These three data types cover most of the kinds of data that a Perl program will manipulate. However, this is not the whole story. Perl also understands the concept of filehandles, typeglobs (an amalgam of all the other types), and the undefined value, all of which are fundamental to the language. Scalars are categorized by the kind of value they store—integer, floating point number, text string, or reference (which also includes objects). While references and objects are both technically types of scalar value, the manner in which Perl treats them means that they should be considered separately.

Scalars

Scalars are solitary values, such as a number, or a string of text. Unlike more strongly typed languages like C, Perl makes no distinction between numbers and strings and provides no mechanism to define the "type" of a scalar. Instead Perl freely and automatically converts between different scalar types when they are used, caching the conversion for later reuse. The following are legal assignments of simple values to scalar variables:

$scalarint = 42;
$scalarfp  = 1.01;
$scalarstr = "A string is a scalar";

Although we will read more about numeric and string scalars in Chapter 3, we will cover references later in this chapter.

Arrays

The second data type is the array. An array is an indexed list of values with a consistent order. Names of arrays are prefixed with @. These are examples of arrays:

@first_array =  (1, 2, 3, 4);
@second_array =  ('one', '2', 'three', '4', '5'),

To access an element of an array, we use square brackets around the index of the element we want, counting from zero. Notice that the data type of the element is scalar, even though it is in an array, so the correct prefix for the array element access is a dollar sign, not an at-sign:

$fifth_element = $array[4];

Being ordered lists of values, the following arrays are not the same:

@array1 =  (1, 2, 3, 4);
@array2 =  (1, 2, 4, 3);

Perl provides many functions for manipulating arrays, including pop, push, shift, unshift, splice, and reverse. All these and more are covered in Chapter 5.

Hashes

Hashes, also called associative arrays, are tables of key-value pairs. Names of hashes are prefixed with %. For example:

%hash = ('Mouse', 'Jerry', 'Cat', 'Tom', 'Dog', 'Spike'),

Here Mouse is a key whose value is Jerry, Tom is the value for the key Cat, and Dog is the key of the value Spike. Since hash keys can only be strings, we can omit the quotes around them because Perl will treat them as constant strings. This fact allows us to use the => operator as an alternative to the comma that separates a key from its value, which lets us omit the quotes from the key.

%hash = (Mouse => 'Jerry', Cat => 'Tom', Dog => 'Spike'),

To access a value in a hash we must provide its key. Just as with arrays, the value is scalar (even if it is a reference to an array or hash) so we prefix the accessed element with a dollar sign.

$canine = $hash{Dog};

As hash keys are always strings, we can omit the quotes here too. If we use a variable or expression as the key, Perl will evaluate it as a string and use that as the key.

Unlike elements in arrays, the key-value pairs in a hash are not ordered. So, while we can say that the "first" element in @array1 has the index 0 and the value 1, we cannot talk about the "first" key-value pair. We can access a list of the keys in a hash with the keys function, and a list of the values with values, but the order in which we receive them is not defined.

References

References are not strictly a separate data type, merely a kind of scalar value. A reference is an immutable pointer to a value in memory, which may be any data type: scalar, array, hash, and so on. Perl references are not so different from references in other languages like C++ or Java, and differ from C-style pointers in that we cannot perform arithmetic on them to change where they point.

We create a reference to an existing variable or value by putting a backslash in front of it, as in the following examples:

$scalarref = $scalar;
$arrayref = @array;
$hashref = \%hash;
$reftostring = "The referee";

To get to the original value, we dereference the reference by prefixing it with the symbol of the underlying data type.

print $$reftostring;
@new_array = @$arrayref;

We can use braces to indicate exactly what we are dereferencing, or simply to make the code clearer. In this example, we use braces to indicate that we want to dereference an array reference, and then extract the element at index 4:

$fifth_element = @{$array}[4];

Without the braces, this statement would attempt to access the element at index 4 of an array called @array, then dereference the element value, quite a different meaning. A better way to keep this clear is to use the -> dereference operator.

$fifth_element = $array->[4];
$value = $hashref->{key};

References give us the ability to create hashes of hashes and arrays of arrays, and other more complex data structures. Arrays, hashes, and references are investigated in detail in Chapter 5.

Another very important property of references is that they are the basis of objects in Perl, through the bless keyword. An object in Perl is a reference that has been marked to be in a particular Perl package. So, it is a specialized type of reference, which is in turn a specialized form of scalar. Objects, therefore, are really scalars. Perl also enables objects to be treated as scalars for the purposes of operations like addition and concatenation. There are two mechanisms for doing this: using Perl's tie function to conceal the object behind a normal-seeming scalar variable, and overloading Perl's operators so the object can be treated like a scalar in some contexts. Objects are covered in Chapter 19.

The Undefined Value

The undefined value, which is also not strictly a data type, represents a lack of value or data type. It is neither a scalar, list, hash, nor any other data type, though if it had to be categorized it would be a scalar since it is by nature the simplest possible value—it certainly doesn't have elements. We can assign the undefined value to a variable explicitly using the undef function, or implicitly by simply declaring it but not initializing.

$a = undef;
$a;

The undef keyword is also a function, or to be more accurate, it is always a function and we sometimes just use its return value, which is undefined, which seems only appropriate. Another equivalent way to write the preceding statements is

undef $a;

Be wary of trying to empty an array by setting it to undef—it won't empty the array. Instead, we get an array with a single undefined element, which is a different thing entirely.

Typeglobs

The typeglob is a strange beast unique to Perl. It is a kind of super-value, an amalgam of exactly one of each of the other primary data types: scalar, array, hash, filehandle. In addition a typeglob can hold a code reference that is a pointer to a piece of Perl code, for example, a subroutine. It can also hold a format, which is a template that can be used with the write function to generate formatted and paginated output. The typeglob is not a single reference to something, rather it has six slots that can contain six different references all at once. It is prefixed with a *, so the typeglob *name contains the values of $name, @name, and %name, and other values as well.

Typeglobs are most often used for their ability to hold file and directory handles, since these are sometimes difficult to manipulate otherwise.

The other Perl data type is filehandles. A filehandle represents a channel for input and/or output within a Perl program, for instance, an open file or a network connection. Unlike the previous data types, filehandles are not prefixed with a special character. One way to create a filehandle is to use the open function:

open FILEHANDLE, $filename;

This example opens a filehandle called FILEHANDLE to the file called filename, so we can now manipulate this file using its handle.

We will read more about the undefined value, typeglobs, and filehandles in Chapter 5.

Context

An important concept in Perl is the idea of evaluation context. This is the context in which a Perl variable or section of Perl code is evaluated, and indicates the type of value that is wanted in a given situation.

Perl has three different contexts: scalar, list, and void. Which one applies depends on the way in which the value is being used (as opposed to what it actually is). In scalar context, a single scalar value is required. In list context, a list of values (zero or more) is required. Finally, when no value of any kind is required, we call this void context.

To illustrate the difference, here is an example of a list of three values being used in an assignment:

$count = (1,2,3);  # scalar context - counts the elements
@array = (1,2,3);  # list context - assigns @INC to another array

The list of three values is evaluated in scalar context in the first assignment and list context in the second. In both cases, the results of the assignments themselves are in void context. To make this clearer, in the next example, the leftmost assignment is in a void context but the rightmost is in list context (it is evaluated first and its result used by the leftmost assignment):

@array2 = @array1 = (1,2,3)

Several of Perl's built-in functions react differently when called in different contexts. For example, localtime returns a preformatted string containing the current time and date when used in scalar context, but a list of values (seconds, minutes, hours, and so on) when used in list context.

To force scalar context, we can use the scalar function, which is handy for things like printing out the time from localtime where the context from print would otherwise be list context:

print "The time is ",localtime," ";

Subroutines also have a context in which they are called, which they can detect with the built-in Perl function wantarray. We discuss that in Chapter 7.

Special Variables

Perl provides a number of special variables that are always available to any Perl script without declaration or qualification.

Some are set by context during program execution: $_ is used as the default loop control variable set by foreach, map, and grep loops, $. holds the current line number for the most recently read file, and @_ is defined to be the list of passed parameters on entry to subroutines. $! contains the most recent system error (if any). Others relate to Perl's environment: %ENV is a hash variable that contains environment variables in key-value pairs. We can both read it to find out how Perl was started, and alter it to control the environment of processes executed by Perl. @INC controls where Perl searches for reusable modules, and %INC stores the details of modules that were found and loaded. $0 contains the program name, while $$ contains the process ID.

Finally, several variables control how the interpreter works or manipulate aspects of the runtime environment: $| controls the buffering behavior of the current output filehandle, and $/ controls the input line terminator.

To see a special variable in action, issue the following command, which prints the default set of paths that Perl will search for libraries (Windows users need to replace the single quotes with double quotes, and replace " " with " "):

> perl -e 'foreach (@INC){print $_, " "}'


/usr/local/lib/perl5/5.8.5/i686-linux-thread-multi
/usr/local/lib/perl5/5.8.5
/usr/local/lib/perl5/site_perl/5.8.5/i686-linux-thread-multi
/usr/local/lib/perl5/site_perl/5.8.5
/usr/local/lib/perl5/site_perl
.
>

Likewise, the following example prints the value of the environment variable PATH. While the variable is a hash, the values in the hash are scalars, so the correct prefix to retrieve it is $:

> perl -e 'print $ENV{PATH}'


/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin>

Variables like %ENV and @ARGV, which hold the command-line arguments with which a program was executed, are reasonably self-explanatory. However, the majority of Perl's special variable names are comprised of punctuation marks, like $/. In theory, most of these are based on mnemonics, but since there are over 50 of them and only so much punctuation to go around, the intuitiveness of the mnemonic hints for some variables is a little stretched.

However, there is the English module, which provides a longer, descriptive name for each special variable. While we are not covering modules yet, this one is useful enough for newcomers to learn about now. After the English module is loaded with 'use English', $_ is given the alias $ARG, $. becomes $INPUT_LINE_NUMBER, and so on for every special variable Perl provides. Here is an example of how we can use it:

#!/usr/bin/perl -w
# whoami.pl

use English '-no-match-vars';

print 'I am', $PROGRAM_NAME, ' (', $PROCESS_ID, ')';

which is a lot more understandable than

print 'I am', $0, ' (', $$, ')';

NOTE The -no-match-vars argument tells Perl not to provide aliases for special variables involved in regular expression matching. It is time consuming for Perl to manage these variables, so by not giving them English names we avoid making Perl do extra work. It is almost always a good idea to include -no-match-vars.


For a list of all of Perl's special variables and their English names, see Appendix B.

String Interpolation

Interpolation is a very useful property of strings in Perl. When a variable name is seen in a double-quoted string, Perl replaces that variable with its value. For instance, the print statement in the following example prints the value of the variable $today, which is sunny, rather than the word character that constitutes the variable name, which is Friday:

#!/usr/bin/perl
# interpolate.pl

$today = "Friday";
print "It is $today ";

Next we'll execute the code, which in turn generates one line of output.

> perl interpolate1.pl


It is Friday

However, if we wanted to see the actual characters $today, we can prevent interpolation by escaping the variable name with a backslash:

print "It is $today ";

Now if we run this, we get It is $today. Alternatively, we could use noninterpolating single quotes to prevent the variable from being interpolated. In that case, our print statement would look like this:

print 'It is $today ';

The output we get from this is slightly different.


It is $today

Notice that Perl printed out the literally and that there is no linefeed, causing the command prompt (we assume > here) to appear on the end of the text. That is because it represents an end-of-line sequence only when interpolated. Perl has a number of special character combinations that are interpolated to their meanings when placed inside double quotes (or the equivalent qq operator). Perl supports the standard conventions for special characters like tabs and newlines first established by C. All of these are converted into the appropriate characters when they are seen by Perl in a double-quoted string. For example, for tabs and returns we use and .

Interestingly, interpolating a string containing the name of an array variable will cause each element of the array to be converted into text and placed into the resulting string, separated by spaces.

my @count=reverse (1..9);
print "Counting: @count ...";
# produces 'Counting: 9 8 7 6 7 4 3 2 1 ...'

This does not work for hash variables, though. Only scalar and array variables can be interpolated.

There is a lot more to interpolation than this, however, and it is in fact a more complex and versatile part of Perl than many people realize. Accordingly, we devote a good chunk of Chapter 11 to covering the subject of interpolation in depth, including inserting character codes, interpolating source code, and interpolating text more than once. Meanwhile, now that we have simple statements under control, it is time to introduce some structure to them.

Matching, Substitution, and Transliteration

One of Perl's best known features is its regular expression engine, which provides us with the ability to match input text against complex criteria, defined as regular expression strings, also called patterns. We can match text to see whether or not a given sequence of characters is present using the match operator m//, or just //.

my $input_text = "two pints of milk and a pot of yoghurt";
if ($input_text =˜/milk/) {
    # got milk...
}

We can also substitute one string for another.

$input_text =˜ /yoghurt/cream/;

Both matching and substitution can use a wide range of match criteria to search for arbitrarily complex sequences, return parts of the matched text, and match multiple times. As a taste of what is possible, this extracts all two- and three-letter words:

@words = $input_text =˜ /(w{2,3})/g;

Here  means "match on a word boundary," and the parentheses cause the text that matches their contents to be returned. w means a word character, any alphanumeric character plus underscores. It is qualified as w{2,3}, which means match at least two but no more than three times.

Regular expressions are related to interpolation in two ways. Firstly, Perl interpolates regular expression patterns before it applies them, so we can use variables and special characters in patterns just like double-quoted strings. Secondly, regular expressions add their own special characters that look like interpolation special characters—, shown earlier, is an example. Regular expressions are a large subject, and we cover them in detail along with interpolation in Chapter 11.

Transliteration looks a lot like substitution. It allows us to exchange individual characters. For example, to capitalize all lowercase a, b, or c characters:

$input_text =˜ /abc/ABC/;

or remove the t in Boston:

$input_text = "Boston";
$input_text =˜/t//;

Transliteration really has nothing to do with regular expressions, but since the syntax is similar, we also cover it at the end of Chapter 11.

Blocks, Conditions, and Loops

A block in Perl is a unit that consists of several statements and/or smaller blocks. Blocks can exist either on their own, where they are known as bare blocks, or form the body of a control statement such as an if condition or a foreach loop.

Blocks are defined by enclosing their contents in curly braces, as in the following example:

#!/usr/bin/perl
# block.pl
use warnings;

{
    print "This is a first level block. ";
    {
        print "    This is a second level block. ";
    }
}

We do not need to end the block with a semicolon; it isn't a statement, but a way to group them. Blocks feature heavily in Perl programming: files are implicit blocks, and several of Perl's built-in functions can take blocks as arguments—map, grep, and sort being the most obvious examples.

@capitalised = map { ucfirst lc } ("some","MIXED","Case","sTrInGs");

Perl provides us with a number of constructs, which we can use to control how our program behaves under a given condition. These control constructs make use of the concept of blocks.

Conditional Blocks: if, else, and unless

The if statement allows us to execute the statements inside a block if a particular condition is met, as demonstrated in this example:

#!/usr/bin/perl
# if1.pl

$input=<>;
if ($input >= 5 ) {
    print "The input number is equal to or greater than 5 ";
}

We use the operator >= to test whether our input was 5 or greater. If so, the block containing the print statement is executed. Otherwise, the program doesn't execute the block.

Note that we have used what is known as the readline operator, also called the diamond operator (<>), in the preceding example. This operator allows us to read a line at a time from a given filehandle. Normally a filehandle resides between the angle brackets, but if we are reading from standard input, we can omit it, leading to the diamond appearance.

We can create a more flexible version of if by combining it with else, as shown in the new version of our previous example.

#!/usr/bin/perl
# ifelse.pl

$input=<>;
if ($input >= 5 ) {
    print "The input number is equal to or greater than 5 ";
} else {
    print "The input number is less than  5 ";
}

The opposite of if is unless, which just inverts the sense of the condition, and is a linguistic aid to clarity when the body of the statement should be executed when the condition is false rather than true.

unless ($input >=5) {

}

Looping Blocks: foreach and while

The foreach statement loops through a list, executing a block for each value in that list.

#!/usr/bin/perl
# foreach1.pl
use warnings;

@array = ("one", "two", "three");
foreach $iterator (@array) {
    print "The value of the iterator is now $iterator ";
}

When we run this, program we get

> perl foreach.pl


The value of the iterator is now one
The value of the iterator is now two
The value of the iterator is now three

Earlier we mentioned the special variable $_ and noted that that many functions read from this variable and write to it in the absence of any other variable. So, let's see how we can modify our previous example to use $_:

#!/usr/bin/perl
# foreach2.pl

@array = ("one", "two", "three", "four");
foreach (@array) {
    print "The value of the iterator is now $_ ";
}

Having not stated explicitly our iterator, Perl has used $_ as the iterator, something that we can test by printing $_.

Perl's other main loop statement is while, which repeatedly executes a block until its control expression becomes false. For example, this short program counts from 10 down to 1. The loop exits when the counter variable reaches zero.

#!/usr/bin/perl
# while.pl
$count=10;
while ($count > 0) {
    print "$count... ";
    $count = $count −1;
}

Like if, while has a counterpart, until, which simply inverts the sense of the condition. It loops while the condition is false and ends when it becomes true.

All the blocks described so far are executed as soon as Perl sees them. One important class of block that we have not yet covered is the subroutine, a block that is not executed on sight but which instead allows us to label the code inside for future reuse. We consider subroutines next; blocks, loops, and conditions are explored in depth in Chapter 6.

Subroutines

A subroutine is a block that is declared for reuse with the sub keyword. Unlike a regular bare block, which is executed as soon as Perl encounters it, a subroutine block is stored away for future use under the name used to declare it. This provides the ability to reuse the same code many times, a core value of good program design. Code can now call the subroutine by its name.

sub red_october {
    print "A simple sub ";
}

To call this subroutine, we can now just write any of the following:

red_october;
&red_october;
red_october();

The first and third examples here are equivalent, but as we will see at the end of the chapter, the first will be disallowed if we enable "strict" subroutines. The second example shows an older way to call subroutines that is rarely seen in modern Perl but occasionally crops up in older code. (It also has a special property of passing the subroutine parameters given to the subroutine in which it is contained, but that is a detail we will come back to in Chapter 6.)

This simple example neither accepts parameters nor returns a value. More commonly subroutines are passed values and variables in order to perform a task and then return another value reflecting the result. For example, this subroutine calculates the factorial of the number passed to it and then returns the answer. Notice that there is no need to define the subroutine before we call it—Perl will figure it out for us.

#!/usr/bin/perl
# factorial.pl

$number=<>;    # read a number from the keyboard
chomp $number; # remove linefeed

# Call the subroutine with $number
$factorial=factorial($number);

# The subroutine
sub factorial {
    $input = shift; # read passed argument
    # return zero immediately if given 0 as input
    return 0 if $input==0;
    # otherwise do the calculation
    $result=1;
    foreach (1 .. $input) { # '..' generates a range
        $result *= $_;
    }
    return $result;
}

print "$number factorial is $factorial ";

Subroutines can also be given prototypes, to control the type and number of parameters passed to them, and attributes, metadata that influences how the subroutine behaves. These and other details of subroutines can be found in Chapter 7.

All the variables in the preceding program are global package variables, because we did not declare any of them before using them. As a result, this program is vulnerable to both misspelling of variable names and the use of variables outside their intended context. In order to protect ourselves from potential problems like these, we should apply some warnings and strictness.

Modules and Packages

A module is a Perl library, a reusable collection of subroutines and variables. We can load a module into our program with use, which searches for a requested module in the paths present in the special variable @INC. For example, to load the English module, which provides English aliases for Perl's special variables, we write

use English;

A package is a Perl namespace, a logical subdivision of compiled code in which variables and subroutines can reside. Two variables, or subroutines, can exist with the same name at the same time, so long as they are in different packages. Even a simple Perl script has a namespace, the default package main.

Usually, a module creates a package with the same name as the module, with nested namespaces separated by semicolons. For the purposes of using the module, therefore, we can often treat the terms "module" and "package" as synonymous, or at least connected. For example:

use Scalar::Util;

The actual module file is Scalar/Util.pm, or ScalarUtil.pm on Windows, and is located in Perl's standard library. (We can find out exactly where with perldoc -l Scalar::Util.)

The require keyword also loads modules, but it simply loads code at run-time. The use keyword by contrast loads a module at compile time, before other code is compiled, and calls an import routine in the just-loaded module to import variables and subroutines into the package of the caller. While require does have uses, we generally want use. Take this example use statement:

use File::Basename 'basename','dirname';

After this statement we can use the subroutines basename() and dirname() in our program, because they have been imported from the File::Basename package, which is defined in the File::Basename module. We can also call a subroutine or refer to a variable directly in its original package.

$scriptdir=File::Basename::dirname($0); #find directory of script

There is a lot more to use than this of course, and we cover the subject of using modules in detail in Chapter 9. We can create our own modules too, which is the subject of Chapter 10.

Perl notionally divides modules into two kinds, which can be differentiated by their case. Functional modules start with an uppercase letter, and simply provide subroutines or define object classes. The IO::File and CGI modules are both examples of functional modules, as are Scalar::Util, File::Basename, Math::Trig, and so on. By contrast, pragmatic modules are all lowercase, and modify the behavior of Perl itself in some way. The warnings and strict modules are the most important of these, and we take a look at them next.

Warnings and Strictness

So far we have not enabled either warnings or strict compile checks in our code. For short demonstrations this might be OK, but in general it is highly recommended to enable both in order to maintain code quality and catch programming errors.

Warnings can be enabled in three ways: through the -w option, the special variable $^W, or the use warnings pragma. The first two control the same global setting that enables or disables all warnings in all code within the application, be it in the main script or loaded modules. The pragma allows finer grained control and only affects the file or block it is placed in.

Here's how you can specify the -w option on the command line:

> perl -w myscript.pl

This has the same effect as writing the following at the top of myscript.pl:

#!/usr/bin/perl
$^W=1;

It is standard on Unix systems to specify the name of the interpreter to use as the first line of the file—the so-called hash-bang line (because it starts with #!).

A common sight in many Perl scripts is the following, which enables warnings automatically:

#!/usr/bin/perl -w

Windows platforms will generally understand this convention too. If the file has a recognized file extension (such as .pl) and is passed to Perl to execute, the options specified on the end of the line are extracted and applied even though the path itself is not applicable.

We don't always want or need to enable warnings everywhere, though. The use warnings pragma is lexically scoped, so it only affects the file (or block) in which it appears.

#!/usr/bin/perl
use warnings;
use A::Module; #warnings not enabled inside loaded module

This is handy when we are trying to diagnose problems in our own code and want to ignore warnings being generated by modules we are just using. That doesn't mean the warnings aren't important, but it allows us to be selective. We can be more selective by switching on or off different categories of warnings, which we will discover how to do in Chapter 16.

Perl also provides the strict pragma. This enables additional checks that our code must pass at compile time before the interpreter will execute it. This is almost always a good thing to do and there are few reasons not to, except in very small scripts and one-liners. In fact there are three separate strict modes: vars, refs, and subs. A use strict without arguments gives us all three, but we can enable or disable each mode separately if we wish.

use strict;                    #enable all strict checks
use strict qw(vars refs subs); #same thing, explictly

no strict qw(refs);            #allow symbolic references

The vars mode enforces the declaration of variables before or at their first use; we touch on it some more in just a moment. The refs mode disables symbolic references, while the subs mode prevents us from using subroutines without parentheses or a & prefix, where their meaning and context can be ambiguous. We tackle these two subjects in Chapters 5 and 7, respectively.

Variable Declarations

One of the effects of use strict is to enforce variable declarations. Perl has two different kinds of variable, package and lexical, and several different ways to declare them. Although Perl has a keyword called local, it doesn't actually declare a local variable in the normally accepted sense of the term. In most cases, we want to use my:

use strict 'vars';
my $variable = "value";

With strict variables in effect, leaving off the my will cause a syntax error. The my tells Perl that this is a lexical variable, which only exists within the file or block in which it is declared. Declared at the top of a file, this means that it can be used from anywhere inside the file, including from within subroutines, but can't be accessed from code outside it, not even if that code was called from the place where it was declared. As this example shows, we can make an assignment to the variable at the same time we declare it.

The other kind of variable is a package variable, which is visible from anywhere as soon as it is given a value. If we do not enable strict variables and simply use a variable without declaring it, a package variable is the result. If we are not using packages, then the variable is in the default main package and is what we would normally think of as a global variable.

With strictness enabled, we can no longer create package variables just by using them. We must declare them with the older vars pragma or the more modern our keyword introduced in Perl 5.6.

The vars pragma merely tells Perl that we are going to be making use of the named package variable; it has no useful effect if strict variables are not enabled. Unfortunately, it does not allow us to declare and assign to the variable at the same time.

use vars '$variable';
$variable = "value";

The our keyword does, however. It declares a package variable, but only makes it visible lexically, just like my.

our $variable = "value";

The our keyword is intended as an improved, more intuitive replacement for use vars, but we often see the latter in older Perl code and code designed to run on a wide range of Perl versions.

So what about local? It provides the ability to temporarily hide a package variable with another variable of the same name, but holding a different value. Somewhat counterintuitively, the scope of the variable is lexical, so it persists only while the interpreter is executing code within the file or block in which it is declared. However, it is visible from anywhere so long as this remains true, notably within called subroutines. This isn't what most people are expecting, and is why my is usually what we want to use. In fact, local is most useful for temporarily overriding Perl's built-in variables, which are really special cases of package variables.

Scope and visibility are important concepts, so it is worth taking a moment to look at them a little more closely before the end of the chapter.

Scope and Visibility

The scope and visibility of a variable is determined by its nature, package or lexical, and where in a piece of code it resides. As we mentioned, Perl has two distinct types of variable in terms of scope: package variables, which are visible from anywhere in a program, and lexical variables, whose scope and visibility is constrained to the file or block in which they are first declared.

In the following example, the first mention of $scalar is a package variable. Because it exists at the top of the program and is not in a declared package, it is also what we traditionally think of as a global variable. Inside the block, we have a second $scalar. This one is declared to be a lexical variable through the use of the my keyword. The lexical variable obscures the package global within the block, but as soon as the block finishes, the original package variable reappears.

#!/usr/bin/perl
# scope.pl

our $scalar = "global";
print "$scalar is a $scalar variable ";
{
      my $scalar = "lexical";
      print "$scalar is now a $scalar variable ";
}
print "$scalar is a $scalar variable again ";

When we run this program, we get the following output:

> perl scope.pl


$scalar is a global variable
$scalar is now a lexical variable
$scalar is a global variable again

The subject of package and lexical variables is simple at first glance, rather more complex on closer examination. We can declare package variables lexically with our or hide one package variable with another using local, which differs from the my example earlier because its visibility would persist until execution leaves the block. We tackle all this in more detail in Chapter 8.

Summary

In this chapter, we introduced Perl's basic concepts. In the following chapters, 3 to 11, we will expand on all of these subjects in detail. The purpose of this chapter is to provide just enough information on each area that any of these chapters can be dipped into without recourse to the others. We started with the fundamentals of the language, values and variables, and passing through whitespace and comments, operators and functions, and expressions and statements. We then looked at Perl's data types—scalars, arrays, hashes, filehandles, and the special undefined value, with a quick glance at references and typeglobs, and also considered the context in which Perl evaluates expressions. We examined some of the many special variables Perl provides and saw how Perl can expand variables into strings using variable interpolation. We then took a brief look at the very large subject of regular expressions, and saw the match, substitution, and transliteration operations in action.

After a look at Perl's block constructs, including loops and conditional statements and expressions, we saw how to declare and use subroutines in Perl, then use modules, Perl's implementation of libraries, to make use of subroutines and variables defined externally to our program script.

All well-written Perl programs make use of warnings and strict checking, both of which were briefly covered. Finally, armed with this information, we saw how to properly declare the variables that up until now we had simply used by writing them down.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset