Typically, data comes
from multiple sources and might be in different formats. Many applications
require input data to be in a specific format before the data can
be processed. Although application requirements vary, there are common
factors for all applications that access, combine, and process data.
You can identify these common factors for your data. Here are tasks
to help you start:
-
Determine how the input data is
related.
-
Ensure that the data is properly
sorted or indexed, if necessary.
-
Select the appropriate access method
to process the input data.
-
Select the appropriate SAS tools
to complete the task.
You can use the CONTENTS,
DATASETS, and PRINT procedures to review the structure of your data.
Relationships among
multiple sources of input data exist when each of the sources contains
common data, either at the physical or logical level. For example,
employee data and department data could be related through an employee
ID variable that shares common values. Another data set could contain
numeric sequence numbers whose partial values logically relate it
to a separate data set by observation number.
You must be able to
identify the existing relationships in your data. This knowledge is
crucial for understanding how to process input data in order to produce
desired results. All related data falls into one of these four categories,
characterized by how observations relate among the data sets:
Finally, to obtain the
desired results, you should understand how each of these methods combines
observations and how each treats duplicate, missing, or unmatched
values of common variables. Some of the methods require that you preprocess
your data sets by sorting or creating indexes. Testing is a good first
step.