When
you submit a DATA step with a MERGE, UPDATE, or SET statement, and
if the output data set already exists, SAS creates a second copy of
the output data set. Once execution is complete, SAS deletes the original
copy of the data set. As a result, the original data set is replaced
by the new data set with the same name. The new data set can contain
a different set of variables than the original data set. The attributes
of the variables in the new data set can be different from those of
the original data set.
In contrast, when you submit a DATA step with a MODIFY statement,
the input and output data sets must be the same. SAS does not create
a second copy of the data, but updates the data set in place. New
variables can be added to the program data vector (PDV), but they
are not written to the data set. Therefore, the set of variables in
the data set does not change when the data is modified.
When
you use the MODIFY statement, there is an implied REPLACE statement
at the bottom of the DATA step instead of an OUTPUT statement. Using
the MODIFY statement, you can update the following:
-
every observation in a data set
-
observations using a transaction
data set and a BY statement
-
observations located using an index
CAUTION:
If the system terminates
abnormally while a DATA step that is using the MODIFY statement is
processing, you can lose data and possibly damage your master data
set. You can recover from the failure by doing the following:
-
restoring the master
data set from a backup and restarting the step, or
-
keeping an audit trail file and
using it to determine which master observations have been updated.
First we consider using
the MODIFY statement to modify all the observations in the data set.