Using SAS DATA Step Views to Conserve Data Storage Space

Overview

Another way to save disk space is to leave your data in its original location and use a SAS data view to access it. First we examine a SAS data view and how it compares to a SAS data file.
A SAS data file and a SAS data view are both types of SAS data sets. A SAS data file contains both descriptor information and the data. By contrast, a SAS data view, contains only descriptor information and instructions on how to retrieve data stored elsewhere.
SAS data view
The main difference between SAS data files and SAS data views is where the data is stored. A SAS data file contains data, and a SAS data view does not contain data. Data views can be particularly useful if you are working with data that changes often.
Suppose you have a flat file that you read into a SAS data file. If the data in the flat file changes, you need to update the data file to reflect those changes. However, suppose you use a SAS data view instead of a SAS data file to access the flat file. You do not need to update the SAS data view when the data in your flat file changes, because each time you reference the view that it accesses the most recent data in your flat file.
In most cases, you can use a SAS data view as if it were a SAS data file, although there are a few things to keep in mind when you are working with data views.
Note: There are multiple types of SAS data views. This chapter discusses only DATA step views. To learn more about PROC SQL views, see Creating and Managing Views Using PROC SQL. For more information about SAS data views and SAS data files, see the SAS documentation.
Now look at DATA step views.

DATA Step Views

A DATA step view contains a partially compiled DATA step that can read data from a variety of sources, such as these:
  • raw data files
  • SAS data files
  • PROC SQL views
  • SAS/ACCESS views
  • DB2, ORACLE, or other DBMS data
A DATA step view can be created only in a DATA step. A DATA step view cannot contain global statements, host-specific data set options, or most host-specific FILE and INFILE statement options. Also, a DATA step view cannot be indexed or compressed.
You can use DATA step views to do the following:
  • always access the most current data in changing files
  • avoid storing a copy of a large data file
  • combine data from multiple sources
The compiled DATA step does not use much room for storage, so you can create DATA step views to conserve disk space. On the other hand, use of DATA step views can increase CPU usage because SAS must execute the stored DATA step each time you use the view.
To create a DATA step view, specify the VIEW= option after the final DATA set name in the DATA statement.
General form, DATA step to create a DATA step view:
DATA SAS-data-view <SAS-data-file-1 ... SAS data-file-n> /
VIEW=SAS-data-view;
<SAS statements>
RUN;
Here is an explanation of the syntax:
SAS-data-view
names the data view to be created.
SAS-data-file-1 ... SAS-data-file-n
is an optional list that names any data files to be created.
SAS statements
includes other DATA step statements to create the data view and any data files that are listed in the DATA statement.
The VIEW= option tells SAS to compile, but not to execute, the source program and to store the compiled code in the DATA step view that is named in the option.
Note: If you specify additional data files in the DATA statement, SAS creates these data files when the view is processed in a subsequent DATA or PROC step. Therefore, you need to reference the data view before you attempt to reference the data files in later steps.

Example

The following program creates a DATA step view named Company.Newdata that reads from the file that is referenced by the fileref in the INFILE statement.
data company.newdata / view=company.newdata;
   infile <fileref>;
   <DATA step statements>
run;

The DESCRIBE Statement

DATA step views retain source statements. You can retrieve these statements by using the DESCRIBE statement. The following example uses the DESCRIBE statement in a DATA step to write a copy of the source code for the data view Company.Newdata to the SAS log:
data view=company.newdata;
   describe;
run;

Creating and Referencing a SAS DATA Step View

In order to use DATA step views successfully, you need to understand what happens when you create and reference one.
When you create a DATA step view, the following actions occur:
  • The DATA step is partially compiled.
  • The intermediate code is stored in the specified SAS library with a member type of VIEW.
Company Library
You reference a DATA step view in the same way that you reference a data file. When you reference the view in a subsequent DATA or PROC step, the following actions occur:
  • The compiler resolves the intermediate code and generates executable code for the host environment.
  • The generated code is executed as the DATA or PROC step requests observations.
Extended file
You can use a DATA step view as you would use any other SAS data set, with the exception that you cannot write to the view except under very specific circumstances. Also, you should keep in mind that a SAS data view reads from its source files each time it is used. Therefore, if the data changes, the results change. Likewise, if the structure of the data that a view accesses changes, you probably need to alter the view in order to account for this change.
Note: The OBSBUF= data set option enables you to specify how many observations to read at one time from the source data for the DATA step view. The default size of the view buffer is 32K, which means that the number of observations that can be read into the view buffer at one time depends on the observation length. If the observation length is larger than 32K, then only one observation can be read into the buffer at a time.
Remember that although data views conserve data storage space, processing them can require more resources than processing a data file. Look at a few situations where using a data view can adversely affect processing efficiency.

Referencing a Data View Multiple Times in One Program

SAS executes a view each time it is referenced, even within one program. Therefore, if data is used many times in one program, it is more efficient to create and reference a temporary SAS data file than to create and reference a view.

Example

Instead of referencing a data view in each step in the program, you could add a DATA step to the beginning of the program to create a temporary data file and read the data view into it. Then you could reference the temporary data set in each of the subsequent steps. By referencing the temporary data file rather than the data view in each of the PROC steps, SAS executes the data view only once instead of multiple times.
There are other reasons why extracting data to a temporary data file is a good idea. Suppose you submit this code and it takes a long time to run. If a flat file that is referenced by a view changes while your code is running, you have inconsistent results unless you create a SAS data file before submitting the PROC PRINT, PROC FREQ, and PROC MEANS steps, and reference the data file in your program.

Making Multiple Passes through Data in a Data View

Expect a degradation in performance when you use a SAS data view with a procedure that requires multiple passes through the data. When multiple passes are requested, the view must build a cache (spill file) that contains all generated observations. Then SAS reads the data in the spill file on each of the multiple passes through the data in order to ensure that subsequent passes read the same data that was read by previous passes.
For example, the UNIFORM option of the PRINT statement makes all the columns consistent from page to page by determining the longest value of each variable. In order to do this, SAS must make two passes through the data: one pass to find the longest value of each variable, and a second pass to print the data. If you use the UNIFORM option to print a data view, SAS creates a spill file as it generates observations from the view. Then SAS makes two passes through the observations in the spill file.
Spill file
Note: Some statistical procedures pass through the data more than once.

Creating Data Views on Unstable Data

Avoid creating views on files whose structures often change. If the view describes the structure of a raw data file, you need to change the view each time the file changes.
For example, suppose you create a view that combines the data file Company.Roster with the data file Company.Demog. Roster contains the variables LastName and FirstName, and Company.Demog contains the variables LastName, Address, and Age, as shown below.
Company.Roster
Suppose that both Company.Roster and Company.Demog are sorted by LastName. You could use a MERGE statement to combine these two data files into a view named Company.Roster_View, as shown below.
data company.roster_view/view=company.roster_view;
   merge company.roster company.demog;
   by lastname;
run;
Now suppose Company.Roster changes so that LastName is named Surname. Your data view must also be updated.
Company.Roster
data roster_view/view=roster_view;
   merge company.roster company.demog(rename=(LastName=Surname));
   by lastname;
run;
If Company.Roster changed again so that Surname and FirstName were combined into one variable called FullName, the code for your data view would need additional changes. Although this is a simple example, you can see that a data view that is based on unstable data requires additional maintenance work.

Comparative Example: Creating and Reading a SAS Data View

Overview

Suppose you have two SAS data sets, Retail.Custview and Retail.Custdata, that have been created from the same raw data file. Retail.Custview is a DATA step view, and Retail.Custdata is a data file. You can use these two data sets to compare the disk space that is required for each as well as the resources that are used to read from each view or file.
The following sample programs compare two techniques. You can use these samples as models for creating benchmark programs in your own environment. Your results might vary depending on the structure of your data, your operating environment, and the resources that are available at your site.

Programming Techniques

1 Data View
This program reads data from a raw data file, creates a SAS DATA step view named Retail.Custview, and then reads from the new DATA step view. The first DATA step creates the data view Retail.Custview. The second DATA step reads from the DATA step view.
data retail.custview / view = retail.custview;
   infile flat1;
   input @1   Customer_ID         12.
         @13  Country             $2.
         @15  Gender              $1.
         @16  Personal_ID        $15.
         @31  Customer_Name      $40.
         @71  Customer_FirstName $20.
         @91  Customer_LastName  $30.
         @121 Birth_Date       date9.
         @130 Customer_Address   $45.
         @175 Street_ID           12.
         @199 Street_Number       $8.
         @207 Customer_Type_ID     8.;
 run;
	 
data _null_;
   set retail.custview;
run;
2 Data File
This program reads data from a raw data file, creates a SAS data file named Retail.Custdata, and reads from the new SAS data file. The first DATA step creates the data file Retail.Custdata. The second DATA step reads from the data file.
data retail.custdata;
   infile flat1;
   input @1   Customer_ID         12.
         @13  Country             $2.
         @15  Gender              $1.
         @16  Personal_ID        $15.
         @31  Customer_Name      $40. 
         @71  Customer_FirstName $20.
         @91  Customer_LastName  $30.
         @121 Birth_Date       date9.
         @130 Customer_Address   $45.
         @175 Street_ID           12.
         @199 Street_Number       $8.
         @207 Customer_Type_ID     8.;
run;   
	 
data _null_;
   set retail.custdata;
run;

General Recommendations

  • Create a SAS DATA step view to avoid storing a SAS copy of a raw data file.
  • Use a SAS DATA step view if the content, but not the structure, of the flat file is dynamic.
  • Create a DATA step view to combine multiple SAS data sets with a merge or concatenation.
  • Create a DATA step view to access frequently used subsets.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset