When processing a WHERE expression, SAS determines which
of the following access methods is likely to be most efficient:
|
SAS examines all observations
sequentially in their physical order.
|
|
SAS uses an index to
access specific observations directly. Using an index to process a
WHERE expression is referred to as optimizing the WHERE expression.
|
Using an index to process
a WHERE expression improves performance in some situations but not
in others. For example, it is more efficient to use an index to select
a small subset than a large subset. In addition, an index conserves
some resources at the expense of others.
After
SAS decides whether to create an index, you also play a role in determining
which access method SAS can use. When your program contains a WHERE
expression, you should review your program to see if you agree that
direct access is likely to be more efficient. If it is, you can make
sure that an index is available by creating a new index or by maintaining
an existing index.
To help you make a more
effective decision about whether to create an index, this topic and
the next few topics provide you with a closer look at the following:
-
steps that SAS performs for sequential
access and direct access
-
benefits and costs of index usage
-
steps that SAS performs to determine
which access method is most efficient
-
factors affecting resource usage
for indexed access
-
guidelines for deciding whether
to create, use, and maintain an index
Note: SAS can also use an index
to process a BY statement. BY processing enables you to process observations
in a specific order according to the values of one or more variables
that are specified in a BY statement. Indexing a data file enables
you to use a BY statement without sorting the data file. When you
specify a BY statement, SAS checks the value of the Sorted indicator.
If the Sorted indicator is set to NO, then SAS looks for an appropriate
index. If an appropriate index exists, the software automatically
retrieves the observations from the data file in indexed order. Using
an index to process a BY statement might not always be more efficient
than simply sorting the data file. Therefore, using an index for a
BY statement is generally for convenience, not for performance.