A SAS program can be
any combination of the following elements:
-
-
-
Structured Query Language (SQL)
code
-
When you submit a SAS
program, the code is copied to a memory location called the input
stack. The presence of text in the input stack triggers
a component called the word scanner to
begin its work.
The word scanner has
two major functions. First, it pulls the raw text from the input stack
character by character and transforms it into tokens. Second, it sends
tokens for processing to the compiler and macro processor. A program
is then separated into components called tokens.
There are four types of tokens: name, number, special, and literal.
To build a token, the
word scanner extracts characters until it reaches a delimiter, or
until the next character does not meet the rules of the current token.
A delimiter is any whitespace character such as a space, tab, or end-of-line
character.
Name tokens consist
of a maximum of 32 characters, must begin with a letter or underscore,
and can include only letter, digit, and underscore characters.
Number tokens define
a SAS floating-point numeric value. They can consist of a digit,
decimal point, leading sign, and exponent indicator (e or E). Date,
time, and datetime specifications also become number tokens (for example: '29APR2019'd,
'14:05:32.1't, '29APR2019
14:05:32.1'dt).
Literal tokens consist
of a string of any characters enclosed in single or double quotation
marks. They can contain up to 32,767 characters and are handled as
a single unit.
Special tokens are made
up of any character or group of characters that have special meaning
in the SAS language. Examples include * / + - ; ( ) . & %
Knowing how tokenization
works helps you understand how the various parts of SAS and the macro
processor work together. Understanding differences in timing between
macro processing and SAS code compilation and execution is especially
important.