Chapter 20. Controlling Memory Usage

Overview

Introduction

As you have learned, there is no single set of programming techniques that is most efficient or appropriate in all situations. However, if reducing execution time is an important consideration in your computing environment, one way of achieving that goal is to reduce the number of times SAS has to read from or write to the storage medium.

In this chapter you learn to use options and a statement to control the size and number of data buffers, which in turn can affect your programs' execution times by reducing the number of I/O operations that SAS must perform.

Introduction

Note

This chapter does not cover the SAS Scalable Performance Data Engine (SAS SPD Engine), which is a SAS 9.1 technology for threaded processing. For details about using the SAS SPD Engine to improve performance, see the SAS documentation. Δ

Objectives

In this chapter, you learn to

  • control the amount of data that is loaded into memory with each I/O transfer

  • reduce I/O by holding a SAS data file in memory through multiple steps of a program.

Prerequisites

Before beginning this chapter, you should complete the following chapters:

  • Part 1: SQL Processing with SAS

    • Chapter 1, "Performing Queries Using PROC SQL," on page 3

    • Chapter 2, "Performing Advanced Queries Using PROC SQL," on page 25

    • Chapter 3, "Combining Tables Horizontally Using PROC SQL," on page 79

    • Chapter 4, "Combining Tables Vertically Using PROC SQL," on page 123

    • Chapter 5, "Creating and Managing Tables Using PROC SQL," on page 159

    • Chapter 6, "Creating and Managing Indexes Using PROC SQL," on page 221

    • Chapter 7, "Creating and Managing Views Using PROC SQL," on page 243

    • Chapter 8, "Managing Processing Using PROC SQL," on page 261

  • Part 3: Advanced SAS Programming Techniques

    • Chapter 13, "Creating Samples and Indexes," on page 449

    • Chapter 14, "Combining Data Vertically," on page 479

    • Chapter 15, "Combining Data Horizontally," on page 511

    • Chapter 16, "Using Lookup Tables to Match Data," on page 557

    • Chapter 17, "Formatting Data," on page 601

    • Chapter 18, "Modifying SAS Data Sets and Tracking Changes," on page 631

  • Part 4: Optimizing SAS Programs

    • Chapter 19, "Introduction to Efficient SAS Programming," on page 677.

Controlling Page Size and the Number of Buffers

Measuring I/O

Improvement in I/O can come at the cost of increased memory consumption. In order to understand the relationship between I/O and memory, it is helpful to know when data is copied to a buffer and where I/O is measured. When you create a SAS data set using a DATA step,

  1. SAS copies the data from the input data set to a buffer in memory

  2. one observation at a time is loaded into the program data vector

  3. each observation is written to an output buffer when processing is complete

  4. the contents of the output buffer are written to the disk when the buffer is full.

Measuring I/O

The process for reading external files is similar. However, each record is first read into the input buffer before the data is parsed and read into the program data vector.

Measuring I/O

In both cases, I/O is measured when the input data is copied to the buffer in memory and when it is read from the output buffer to the output data set.

Page Size

Think of a buffer as a container in memory that is big enough for only one page of data. A page

  • is the unit of data transfer between the storage device and memory

  • is fixed in size when the data set is created, either to a default value or to a user-specified value.

The amount of data that can be transferred to one buffer in a single I/O operation is referred to as page size. Page size is analogous to buffer size for SAS data sets.

Page Size

A larger page size can reduce execution time by reducing the number of times SAS has to read from or write to the storage medium. However, the improvement in execution time comes at the cost of increased memory consumption.

Reporting Page Size

You can use the CONTENTS procedure or the CONTENTS statement in the DATASETS procedure to report the page size and the number of pages.

Reporting Page Size

The total number of bytes that a data file occupies equals the page size multiplied by the number of pages. For example, the page size for Company.Order_fact is 8192 and the number of pages is 9423. Therefore, the data file occupies 77,193,216 bytes.

Note

Note that the information that is available from PROC CONTENTS depends on the operating environment. Δ

Note

Page size is analogous to buffer size for SAS data sets. Δ

Note

In uncompressed data files, there is a 16-byte overhead at the beginning of each page and a 1-bit per observation overhead (rounded up to the nearest byte), used to denote an observation's status as deleted or not deleted, at the end of each page.

You can learn about the structure of uncompressed and compressed data files in Chapter 21, "Controlling Data Storage Space," on page 705. Δ

Using the BUFSIZE= Option

To select a default page size, SAS uses an algorithm that is based on observation length, engine, and operating environment. The default page size is optimal for most SAS activities, especially on computers that are supporting multiple SAS jobs concurrently. However, in some cases, choosing a page/buffer size that is larger than the default can speed up execution time by reducing the number of times that SAS must read from or write to the storage medium.

You can use the BUFSIZE= system option or data set option to control the page size of an output SAS data set. BUFSIZE= specifies not only the page size (in bytes), but also the size of each buffer that is used for reading or writing the SAS data set. The new buffer size is a permanent attribute of the data set. After it is specified, it is used whenever the data set is processed.

Caution

MIN might cause unexpected results and should be avoided. Use BUFSIZE=0 to reset the buffer page size to the default value in your operating environment. Δ

Note

The syntax that is shown here applies to the OPTIONS statement. On the command line or in a configuration file, the syntax is specific to your operating environment. For details, see the SAS documentation for your operating environment. Δ

Only certain page/buffer size values are valid for each operating environment. If you request an invalid value for your operating environment, SAS automatically rounds up to the next valid page/buffer size. BUFSIZE=0 is interpreted as a request for the default page/buffer size.

In the following program, the BUFSIZE= system option specifies a page size of 30720 bytes.

options bufsize=30720; 
filename orders 'c:orders.dat';
data company.orders_fact;
      infile orders;
      <more SAS code>
run;

Before you change the default page size, it is important to consider the access pattern for the data as well as the I/O transfer rate of the underlying hardware. In some cases, increasing the page size might degrade performance, particularly when the data is processed using direct (random) access.

Note

The default value for BUFSIZE= is determined by your operating environment and is set to optimize sequential access. To improve performance for direct access, you should change the value for BUFSIZE=. For the default setting and possible settings for direct access, see the BUFSIZE= system option in the SAS documentation for your operating environment. Δ

Note

You can override the BUFSIZE= system option by using the BUFSIZE= data set option. Δ

Caution

If you use the COPY procedure to copy a data set to a library that is accessed via a different engine, the original page/buffer size is not necessarily retained. Δ

Using the BUFNO= Option

You can use the BUFNO= system or data set option to control the number of buffers that are available for reading or writing a SAS data set. By increasing the number of buffers, you can control how many pages of data are loaded into memory with each I/O transfer.

Note

Increasing the number of buffers might not affect performance under the Windows and UNIX operating environments, especially when you work with large data sets. By default, the Windows and UNIX operating environments read one buffer at a time. Under the SAS 9 Windows environment, you can override this default by turning on the SGIO system option when you invoke SAS. For details on the SGIO system option, see the SAS documentation for the Windows operating environment. Δ

The following techniques might help to minimize I/O consumption:

  • When you work with a small data set, allocate as many buffers as there are pages in the data set so that the entire data set can be loaded into memory. This technique is most effective if you read the same observations several times during processing.

  • Under the z/OS operating environment, increase the number of buffers allocated, rather than the size of each buffer, as the size of the data set increases.

Caution

The recommended maximum for this option is 10. Δ

Note

The syntax that is shown here applies to the OPTIONS statement. On the command line or in a configuration file, the syntax is specific to your operating environment. For details, see the SAS documentation for your operating environment. Δ

In the following program, the BUFNO= system option specifies that 4 buffers are available.

options bufno=4; 
filename orders 'c:orders.dat';
data company.orders_fact;
      infile orders;
      <more SAS code>
run;
proc print data=company.orders_fact; 
run;

The buffer number is not a permanent attribute of the data set and is valid only for the current step or SAS session.

Current SAS Session

Figure 20.1. Current SAS Session

Note

You can override the BUFNO= system option by using the BUFNO= data set option. Δ

Note

In SAS 9 and later, the BUFNO= option has no effect on thread-enabled procedures under the z/OS operating environment. Δ

The product of BUFNO= and BUFSIZE=, rather than the specific value of either option, determines how much data can be transferred in one I/O operation. Increasing the value of either option increases the amount of data that can be transferred in one I/O operation.

BUFSIZE

BUFNO

Bytes Transferred In One I/O Operation

6144

2

12,288

6144

10

61,440

30,720

2

61,440

30,720

10

307,200

The number of buffers and the buffer size have a minimal effect on CPU usage.

Comparative Example: Using the BUFSIZE= Option and the BUFNO= Option

Suppose you want to compare the resource usage when a data set is read using different buffer sizes and a varying number of buffers. The following sample programs use the following settings for the BUFNO= option and the BUFSIZE= option.

  1. BUFSIZE=6144, BUFNO=2

  2. BUFSIZE=6144, BUFNO=5

  3. BUFSIZE=6144, BUFNO=10

  4. BUFSIZE=12288, BUFNO=2

  5. BUFSIZE=12288, BUFNO=5

  6. BUFSIZE=12288, BUFNO=10

You can use these samples as models for creating benchmark programs in your own environment. Your results might vary depending on the structure of your data, your operating environment, and the resources that are available at your site. You can also view general recommendations for controlling page size and the number of buffers.

Note

6144 bytes is the default page size under the z/OS operating environment. Δ

Programming Techniques

General Recommendations

  • To reduce I/O operations on a small data set, allocate as many buffers as there are pages in the data set so that the entire data set can be loaded into memory. This technique is most effective if you read the same observations several times during processing.

  • Under the z/OS operating environment, as the size of the data set increases, increase the number of buffers allocated, rather than the size of each buffer, to minimize I/O consumption.

Using the SASFILE Statement

Another way of improving performance is to use the SASFILE statement to hold a SAS data file in memory so that the data is available to multiple program steps. Keeping the data file open reduces open/close operations, including the allocation and freeing of memory for buffers.

The SASFILE statement opens a SAS data file and allocates enough buffers to hold the entire file in memory. Once the data file is read, the data is held in memory, and it is available to subsequent DATA and PROC steps or applications until either

  • a SASFILE CLOSE statement frees the buffers and closes the file

  • the program ends, which automatically frees the buffers and closes the file.

In the following program, the SASFILE statement opens the SAS data file Company.Sales, allocates the buffers, and reads the data into memory.

sasfile company.sales load; 
proc print data=company.sales;
      var Customer_Age_Group;
run;
proc tabulate data=company.sales;
      class Customer_Age_Group;
      var Customer_BirthDate;
      table Customer_Age_Group,Customer_BirthDate*(mean median);
run;
sasfile company.sales close; 

Note

The SASFILE statement can also be used to reduce CPU time and I/O in SAS programs that repeatedly read one or more SAS data views. Use a DATA step to create a SAS data file in the Work library that contains the view's result set. Then use the SASFILE statement to load that data file into memory. Δ

Note

Though a file that is opened with the SASFILE statement can be used for subsequent input or update processing, it cannot be used for subsequent utility or output processing. For example, you cannot replace the file or rename its variables. Δ

Guidelines for Using the SASFILE Statement

When the SASFILE statement executes, SAS allocates the number of buffers based on the number of pages for the data file and index file. If the file in memory increases in size during processing because of changes or additions to the data, the number of buffers also increases.

It is important to note that I/O processing is reduced only if there is sufficient real memory. If there is not sufficient real memory, the operating environment might

  • use virtual memory

  • use the default number of buffers.

If SAS uses virtual memory, there might be a degradation in performance.

If you need to repeatedly process part of a SAS data file and the entire file will not fit into memory, use a DATA step with the SASFILE statement to create a subset of the file that does fit into memory, and then process that subset repeatedly. This saves CPU time in the processing steps because those steps will read a smaller file, in addition to the benefit of the file being resident in memory.

Note

When using a SASFILE statement, monitor the paging activity (the I/O activity that is done by the virtual memory management subsystem of your operating environment) while your program runs. If the paging activity increases substantially, consider keeping less data in memory and using techniques described elsewhere in this course to reduce memory requirements. Δ

Comparative Example: Using the SASFILE Statement

Suppose you want to create multiple reports from SAS data files that vary in size. Using small, medium, and large data files, you can compare the resource usage when the PRINT, TABULATE, MEANS, and FREQ procedures are used with and without the SASFILE statement to create reports.

Name of Data File

Number of Rows

Page Size

Number of Pages

Number of Byes

Retail.Small

45,876

24,576

540

13,279,232

Retail.Medium

458,765

24,576

5,398

132,669,440

Retail.Large

4,587,654

24,576

53,973

1,326,448,640

  1. Small Data File without the SASFILE Statement

  2. Medium Data File without the SASFILE Statement

  3. Large Data File without the SASFILE Statement

  4. Small Data File with the SASFILE Statement

  5. Medium Data File with the SASFILE Statement

  6. Large Data File with the SASFILE Statement.

The following sample programs show each of these techniques. You can use these samples as models for creating benchmark programs in your own environment. Your results might vary depending on the structure of your data, your operating environment, and the resources that are available at your site. You can also view general recommendations for using the SASFILE statement.

Programming Techniques

General Recommendations

  • If you need to repeatedly process a SAS data file that will fit entirely in memory, use the SASFILE statement to reduce I/O and some CPU usage.

  • If you use the SASFILE statement and the SAS data file will not fit entirely in memory, the code will execute, but there might be a degradation in performance.

  • If you need to repeatedly process part of a SAS data file and the entire file will not fit into memory, use a DATA step with the SASFILE statement to create a subset of the file that does fit into memory, and then process that subset repeatedly. This saves CPU time in the processing steps because those steps will read a smaller file, in addition to the benefit of the file being resident in memory.

Additional Features

Using the IBUFSIZE= System Option

Beginning with SAS 9, you can use the IBUFSIZE= system option to specify the page size for an index file. Typically, you do not need to specify an index page size. However, you might need to use the IBUFSIZE= option if

  • there are many levels in the index

  • the length of an index value is very large.

The main resource that is saved when reducing levels in the index is I/O. If your application is experiencing a lot of I/O in the index file, increasing the page size might help. However, you must re-create the index file after increasing the page size. The number of pages that are required for the index varies with the page size, the length of the index value, and the values themselves.

Note

The MIN setting should be avoided. Δ

When an index is used to process a request, such as for WHERE processing, SAS searches the index file in order to rapidly locate the requested record(s). The page size affects the number of levels in the index. The more pages there are, the more levels in the index. The more levels, the longer the index search takes. Increasing the page size allows more index values to be stored on each page, thus reducing the number of pages (and the number of levels).

Use IBUFSIZE=0 to reset the index page size to the default value in your operating environment.

Note

For details on using the IBUFSIZE= system option, see the SAS documentation. Δ

Summary

Controlling Page Size and the Number of Buffers

When you read a SAS data set or an external file, I/O is measured when the input data is copied to the buffer in memory and when it is read from the output buffer to the output data set.

A page is the unit of data transfer between the storage device and memory. When you create a SAS data set, SAS takes the data and copies it to a buffer. Each buffer can hold one page of data.

The amount of data that can be transferred to one buffer in a single I/O operation is referred to as the page size. Increasing the page size can speed up execution time by reducing the number of times SAS has to read from or write to the storage medium. You can use the CONTENTS procedure to report the page size and the number of pages.

You can use the BUFSIZE= system option or data set option to control the page size of an output SAS data set. The new buffer size is permanent. After it is specified, it is used whenever the data set is processed.

You can use the BUFNO= system or data set option to control how many buffers are available for reading or writing a SAS data set. By increasing the number of buffers, you can control how many pages of data are loaded into memory with each I/O transfer.

The product of BUFNO= and BUFSIZE=, rather than the specific value of either option, determines how much data can be transferred in one I/O operation. Increasing either option increases the amount of data that can be transferred in one I/O operation. However, the improvement in I/O comes at the cost of increased memory consumption.

Review the related comparative example:

  • "Comparative Example: Using the BUFSIZE= Option and the BUFNO= Option" on page 694.

Using the SASFILE Statement

Another way of improving performance is to use the SASFILE statement to hold a SAS data file in memory so that the data is available to multiple program steps. Keeping the data set open reduces open/close operations, including the allocation and freeing of memory for buffers.

When the SASFILE statement executes, SAS allocates the number of buffers based on the number of pages for the data file and index file. If the file in memory increases in size during processing because of changes or additions to the data, the number of buffers also increases.

It is important to note that I/O processing is reduced only if there is sufficient real memory. If SAS uses virtual memory, there can be a degradation in performance.

Review the related comparative example:

  • "Comparative Example: Using the SASFILE Statement" on page 697.

Additional Features

The IBUFSIZE= system option specifies the page size for an index file. Typically, you do not need to specify an index page size. However, you might need to use the IBUFSIZE= option if

  • there are many levels in the index

  • the length of an index value is very large.

The main resource that is saved when reducing levels in the index is I/O. If your application is experiencing a lot of I/O in the index file, increasing the page size might help. However, you must re-create the index file after increasing the page size. The number of pages that are required for the index varies with the page size, the length of the index value, and the values themselves.

Quiz

Select the best answer for each question. After completing the quiz, check your answers using the answer key in the appendix.

  1. Which of the following statements is true regarding the BUFNO= option?

    1. The BUFNO= option specifies the size of each buffer that is used for reading or writing a SAS data set.

    2. The BUFNO= option can improve execution time by limiting the number of input/output operations that are required.

    3. Using the BUFNO= option results in permanent changes to the data set.

    4. Using the BUFNO= option to increase the number of buffers results in decreased memory consumption.

  2. Which of the following statements is not true regarding a page?

    1. A page is the unit of data transfer between the storage device and memory.

    2. A page includes the number of bytes that are used by the descriptor portion, the data values, and the overhead.

    3. The size of a page is analogous to buffer size.

    4. The size of a page can be changed at any time.

  3. The total number of bytes occupied by a data set equals...?

    1. the page size multiplied by the number of pages.

    2. the page size multiplied by the number of observations.

    3. the sum of the page size and the number of pages.

    4. the number of pages multiplied by the number of variables.

  4. Which statement opens the file Work.Quarter1, allocates enough buffers to hold the entire file in memory, and reads the data into memory?

    1. sasfile work.quarter1 open;
    2. sasfile work.quarter1 load;
    3. sasfile work.quarter1 bufno=max;
    4. sasfile work.quarter1 bufsize=max;
  5. Which of the following statements is true regarding a file that is opened with the SASFILE statement?

    1. The file is available to subsequent DATA and PROC steps or applications until a SASFILE CLOSE statement is executed or until the program ends.

    2. The file is available to subsequent DATA and PROC steps or applications until a SASFILE END statement is executed.

    3. The file is available for subsequent utility or output processing until the program ends.

    4. If the file increases in size during processing, the number of buffers remains the same.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset