Chapter 15. Timing Computations and the Performance of Algorithms

  • Contents

  • 15.1 Overview of Timing Computations 371

  • 15.2 Timing a Computation 372

  • 15.3 Comparing the Performance of Algorithms 375

    • 15.3.1 Two Algorithms That Delete Missing Values 375

    • 15.3.2 Performance as the Size of the Data Varies 377

    • 15.3.3 Performance as Characteristics of the Data Vary 377

  • 15.4 Replicating Timings: Measuring Mean Performance 379

  • 15.5 Timing Algorithms in PROC IML 383

  • 15.6 Tips for Timing Algorithms 384

  • 15.7 References 384

15.1 Overview of Timing Computations

In a complex language such the SAS/IML language, there is often more than one way to accomplish a given task. Futhermore, in the statistical literature there are often competing algorithms for computing the same statistic. In choosing which algorithm to implement, you can consider various characteristics:

  • the time and effort required to implement each algorithm

  • the memory requirements for each algorithm

  • the performance of the algorithm

In this chapter, the performance of an algorithm means the time required for it to run on typical data, and also how that time changes with the size or characteristics of the data. For many cases, the memory requirements are either unimportant or are equivalent. Therefore, given two or more algorithms implemented in the SAS/IML language that perform the same task, it is useful to know how to measure the performance of the algorithms.

15.2 Timing a Computation

If a program is taking more time to run than you think it should, you can measure the time required by various portions of the program. This is sometimes called profiling the program. Suppose you have written a program that generates a random 1000 × 1000 matrix, inverts it, and uses the inverse to solve a linear system, as shown in the following statements:

/* use the INV function to solve a linear system; not efficient */
n = 1000;                            /* size of matrix              */
x = rannor(j(n, n, 1));              /* n × n matrix                */
y = rannor(j(n, 1, 1));              /* n × 1 vector                */

xInv = inv(x);                       /* compute inverse of x        */
b = xInv*y;                          /* solve linear equation x*b=y */

It takes several seconds to run this program, so you might wonder where the bulk of the time is being spent: allocating the matrices with random numbers or solving the linear system. The key to timing a computation is to use the TIME function in Base SAS software. The TIME function returns the time of day as a SAS time value, which is the number of seconds since midnight. Therefore, if you call the TIME function immediately before and after a computation, the difference between those two times is the elapsed time (in seconds) required for the computation. The following statements employ this technique to time each portion of the computation:

/* time portions of the program to identify where time is spent */
n = 1000;                            /* size of matrix              */

/* measure time spent generating matrices */
t0 = time();                         /* begin timing RANNOR         */
x = rannor(j(n, n, 1));              /* n × n matrix                */
y = rannor(j(n, 1, 1));              /* n × 1 vector                */
t1 = time() - t0;                    /* end timing                  */
print "Elapsed time RANNOR (n=1000):" t1;

/* measure time spent solving linear system with the INV function   */
t0 = time();                         /* begin timing INV            */
xInv = inv(x);                       /* compute inverse of x        */
b = xInv*y;                          /* solve linear equation x*b=y */
t2 = time() - t0;                    /* end timing                  */
print "Elapsed time INV (n=1000):" t2;
Times Required by Two Parts of a Program

Figure 15.1. Times Required by Two Parts of a Program

Figure 15.1 shows that the time required to allocate the matrices and fill them with random numbers is small compared with the time required to solve an n × n linear system for n = 1000. Therefore, if you want to improve the performance of this program, you should focus your attention on the statements that solve the linear system.

After you identify the portion of the program that takes the most time, you can focus on optimizing that portion. Sometimes you can improve performance by rewriting the way that you implement an algorithm; other times you might decide to change algorithms. In the previous program, you can replace the INV function with a call to the SOLVE function to dramatically decrease the running time, as shown in Figure 15.2:

/* measure time spent solving linear system with the SOLVE function */
t0 = time();                      /* begin timing SOLVE */
b = solve(x, y);                  /* solve linear equation directly */
t3 = time() - t0;                 /* end timing */
print "Elapsed time SOLVE (n=1000):" t3;
Time Required to Solve a Linear System with the SOLVE Function

Figure 15.2. Time Required to Solve a Linear System with the SOLVE Function

Figure 15.2 shows that the SOLVE function solves the linear system in about 40% of the time required for the INV function.

You can use this technique to determine the length of time for most computations. Under the Windows operating system, this technique is limited to timing computations that take more than 15 milliseconds. This is the frequency at which the Windows operating system updates a certain time stamp (see Nilsson (2004) and references therein). Consequently, on Windows operating systems the TIME function returns the time of day to the nearest 15 milliseconds.

For example, you can try to time how long it takes to solve a linear system with 50 rows and columns by setting n=50 in the previous program. The resulting output is shown in Figure 15.3:

Time to Solve a 50 × 50 Linear System

Figure 15.3. Time to Solve a 50 × 50 Linear System

Because the TIME function measures the time of day to the nearest 15 milliseconds, if a computation finishes too quickly, the interval between calls to the TIME function is too short to measure, as shown in Figure 15.2. A solution to this problem is to repeat the computation many times (say, 1000 times) so that the total elapsed time is large enough to measure, as shown in the following statements:

/* measure time spent solving linear system 1000 times */
n = 50;                           /* size of matrix */
x = rannor(j(n, n, 1));           /* n × n matrix                   */
y = rannor(j(n, 1, 1));           /* n × 1 vector                   */

t0 = time();                      /* begin timing                   */
do i = 1 to 1000;                 /* repeat computations many times */
   b = solve(x, y);               /* solve linear equation directly */
end;
t3 = time() - t0;                 /* end timing                     */
print "Elapsed time 1000 SOLVE (n=50):" t3;
print "Average time for SOLVE (n=50):" (t3/1000);
Time Required to Solve a 50 × 50 Linear System 1000 Times

Figure 15.4. Time Required to Solve a 50 × 50 Linear System 1000 Times

The elapsed time for solving a 50 × 50 linear system 1000 times is shown in Figure 15.4. If necessary, you can determine the average time required for each solution by dividing the cumulative times by 1000.

15.3 Comparing the Performance of Algorithms

Some algorithms are more efficient than others. For example, the previous section showed that less time is required to solve a linear equation by using the SOLVE function than by using the INV function. You can use this same timing technique to compare the performance among different algorithms. For example, Chapter 13, "Sampling and Simulation," describes several algorithms for sampling and simulation, and mentions that some are faster than others.

This section describes how to compare two or more algorithms and how to investigate how the algorithms perform as you modify characteristics of the data in a systematic manner. However, you should keep in mind that sometimes there is a trade-off between speed and accuracy or between speed and ease-of-programming. How fast an algorithm runs is only one criterion in determining the best way to write a statistical program.

15.3.1 Two Algorithms That Delete Missing Values

Section 3.3.4 describes how to remove missing values from a SAS/IML vector. The SAS/IML program that creates Figure 3.11 consists of two main steps: find observations in a vector that are not missing, and use the subscript operator to extract those values into a smaller vector. However, there is an alternative way to form a vector of the nonmissing values: find the observations that are missing, and use the REMOVE function to create the vector that results from deleting them.

The following statements define two SAS/IML modules that perform the same task: given a vector x of length n that contains k missing values, return the vector, y, of length n − k that results from deleting the missing values from x. These modules will be used to illustrate how to time algorithms and profile performance.

/* Remove missing values from a vector. */
/* Investigate the relative speeds of each algorithm. */
/* Algorithm 1: use subscripts to remove missing values */
start DeleteMissBySubscript(x);
   do i = 1 to 1000;            /* repeat computation 1000 times    */
      idx = loc(x^=.);          /* find elements NOT missing        */
      if ncol(idx)>0 then
         y = x[idx];            /* extract them                     */
   end;
   return ( y );
finish;

/* Algorithm 2: use the REMOVE function to remove missing values */
start DeleteMissByRemove(x);
   do i = 1 to 1000;            /* repeat computation 1000 times    */
      idx = loc(x=.);           /* find elements that ARE missing   */
      if ncol(idx)>0 then
         y = remove(x, idx);    /* remove them; assign what is left */
   end;
   return ( y );
finish;

Because the subscript operator and the REMOVE function are so fast, each module performs the computation 1000 times before returning. Consequently, if you pass in a sufficiently large data vector to these modules, the time required for each module call should be greater than 0.1 seconds. (If not, increase the number of iterations in the modules.) Although these modules are not very sophisticated, they are useful in describing how to compare the performance of algorithms. The algorithms are simple, easy-to-understand, and do not take very long to run, yet analyzing them illustrates basic ideas that can be applied to analyzing more complicated statistical algorithms.

The statements below create a vector, x, of length n = 10,000 and assigns missing values to the first 5% of the elements. This vector is passed into the two modules, and times are recorded for each module call. The times are shown in Figure 15.5.

/* compare time required by each algorithm (5% missing values) */
n = 1e4;                            /* length of data vector          */
x = j(n, 1, 1);                     /* constant data vector           */
pct = 0.05;                         /* percentage of missing values   */

NumMissing = ceil(n*pct);           /* number of missing values       */
x[1:NumMissing] = .;                /* assign missing values          */

print pct[r="PCT Missing:"];
t0 = time();                        /* begin timing                   */
y = DeleteMissBySubscript(x);       /* run the first algorithm        */
t1 = time() - t0;                   /* end timing                     */
print t1[r="Elapsed time SUBSCRIPT:"];

t0 = time();                        /* begin timing                   */
y = DeleteMissByRemove(x);          /* run the second algorithm       */
t2 = time() - t0;                   /* end timing                     */
print t2[r="Elapsed time REMOVE:"];
Times Required for Two Algorithms

Figure 15.5. Times Required for Two Algorithms

It appears from Figure 15.5 that using the REMOVE function is faster than using the subscript operator for these data.

15.3.2 Performance as the Size of the Data Varies

A common question in statistical programming is "How does the time required by an algorithm scale with the size of the input data?" To answer this question, you can systematically increase the size of the input data and graph the time required by an algorithm as a function of the data size. The graph will indicate whether the time required by your implementation is linear in the size of the data, is quadratic, or has some other relationship.

The time required to run the DeleteMissBySubscript and DeleteMissByRemove modules certainly depends on the length of the input vector x: more data means more time is required to search for and delete missing values. Figure 15.6 shows the relationship between the time required by these modules as the size of the input data is varied. The statements that create Figure 15.6 are not shown, but are a simple variation of the technique that are described in the next section. As expected, the performance for each module is roughly linear in the number of observations.

Time Versus the Size of the Input Vector (5% Missing Values)

Figure 15.6. Time Versus the Size of the Input Vector (5% Missing Values)

15.3.3 Performance as Characteristics of the Data Vary

Figure 15.5 shows that the DeleteMissByRemove algorithm is faster than the DeleteMissBy-Subscript algorithm when about 5% of the data are missing. However, does the performance of the modules also depend on the percentage of missing values in the input vector? This section examines that question.

You can modify the program in Section 15.3.1 to answer this question. Instead of running the modules a single time for a single value of the pct variable, you can use the DO function to define a list of percentages. The following program loops over a list of percentages, creates an input vector that contains the specified percentage of missing values, and then calls the modules:

/* compare time spent by each algorithm as a parameter is varied */
n = 1e4;                            /* length of data vector         */
x0 = j(n, 1, 1);                    /* constant data vector          */

/* define list of percentages */
pctList = 0.01 || 0.05 || do(0.1, 0.9, 0.1) || 0.95;

/* allocate 3 columns for results: "PctMissing" "Subscript" "Remove" */
results = j(ncol(PctList), 3);
do k = 1 to ncol(pctList);
   pct = pctList[k];                /* percentage of missing values  */
   NumMissing = ceil(n*pct);        /* number of missing values      */
   x = x0;                          /* copy original data            */
   x[1:NumMissing] = .;             /* assign missing values         */

   t0 = time();
   y = DeleteMissBySubscript(x);    /* run the first algorithm       */
   t1 = time() - t0;

   t0 = time();
   y = DeleteMissByRemove(x);       /* run the second algorithm      */
   t2 = time() - t0;

   results[k,] = pct || t1 || t2;
end;

The results matrix records the times that are associated with each value of the pctList vector. When the iterations complete, you can print or graph the results. The following IMLPlus statements create a line plot of the results by graphing each algorithm's time versus the percentage of missing values in the input vector. The graph is shown in Figure 15.7.

/* create a line plot of the timing results as a parameter is varied */
declare DataObject dobj;
dobj = DataObject.Create("Timing",
                        {"PctMissing" "Subscript" "Remove"}, results);

declare LinePlot line;
line = LinePlot.Create(dobj, "PctMissing", {"Subscript" "Remove"});
line.SetTitleText("Timing Results for n="+strip(putn(n,"BEST.")), true);

The vertical axis on the figure is the number of seconds required by each module call. The figure shows that the performance of the two algorithms is very different as you vary the percentage of missing values in the input vector. The time required by the DeleteMissBySubscript is negatively correlated with the percentage of missing values, whereas the DeleteMissByRemove module is positively correlated. Both relationships can be modeled as linear, assuming that the bumps in Figure 15.7 indicate the presence of noise in measuring the times. When the percentage of missing values is small, the DeleteMissByRemove module is faster than the DeleteMissBySubscript module; when the percentage is large, the opposite is true.

Time for Two Algorithms Versus the Percentage of Missing Values

Figure 15.7. Time for Two Algorithms Versus the Percentage of Missing Values

Graphs such as Figure 15.7 can help you determine which algorithm is best for a particular application. If you know a priori that your data contain a small percentage of missing values, then you might decide to choose the DeleteMissByRemove module. Alternatively, you might look at the scale of the vertical axis and decide that the total time required for either algorithm is so small that it does not matter which algorithm you choose.

15.4 Replicating Timings: Measuring Mean Performance

Every time you run the program that creates Figure 15.7, you will get a slightly different graph. At first this might seem surprising. After all, there are no random numbers being used in the computation; the numerical computations are exactly the same every time you run the program. Nevertheless, if you run the program again on the same computer, you will observe different times.

The resolution to this paradox is to recall that the operating system of a modern computer enables many applications to run "simultaneously." However, the applications are not really running at the same time. Instead the operating system dynamically switches between the applications, spending a fraction of a second processing the computational needs of one application before moving on to the next.

When you run a program such as the one that creates Figure 15.7, the times that you see for each trial are only estimates of some theoretical "minimum time" that the program could achieve on a dedicated CPU. In reality, the CPU is multitasking while the program runs. The operating system might be checking to see if you have received new e-mail. Or it might be scanning for viruses or performing any one of a dozen system administration tasks that occur in the background while you are working. In short, the computer is not devoting all of its computational resources to running your SAS/IML program, and is, in fact, busy running other programs as well. This fact makes the precise timing of algorithms difficult.

This section describes one way to deal with the imprecise measurements: repeat the measurements several times so that you obtain multiple times associated with each trial. Then you can use statistical models to predict the mean time for each trial.

The basic idea is simple: in order to obtain multiple times for each trial, put a loop around the program in the previous section so that it runs several times. In practice, it is more efficient to put the new loop around the calls to the modules, since you can therefore reuse the input vector for each call. This idea is implemented by the following statements, which run each trial five times:

/* time each algorithm multiple times as a parameter is varied */
NReplicates = 5;                    /* run each trial several times  */
n = 1e4;                            /* length of data vector         */
x0 = j(n, 1, 1);                    /* constant data vector          */

/* define list of percentages */
pctList = 0.01 || 0.05 || do(0.1, 0.9, 0.1) || 0.95;

NTrials = NReplicates               * ncol(PctList);
/* allocate 3 columns for results: "PctMissing" "Subscript" "Remove" */
results = j(2*NTrials, 3);
cnt = 1;                            /* index to store results        */
do k = 1 to ncol(pctList);
   pct = pctList[k];                /* percentage of missing values  */
   NumMissing = ceil(n*pct);        /* number of missing values      */
   x = x0;                          /* copy original data            */
   x[1:NumMissing] = .;             /* assign missing values         */

   do repl = 1 to NReplicates;      /* repeat multiple times         */
      t0= time();
      y = DeleteMissBySubscript(x); /* run the first algorithm       */
      t1 = time() - t0;
      results[cnt,] = pct || repl || t1;
      cnt = cnt+1;

      t0 = time();
      y = DeleteMissByRemove(x);    /* run the second algorithm      */
      t2 = time() - t0;
      results[cnt,] = pct || repl || t2;
      cnt = cnt+1;
   end;
end;

The results matrix contains the timings for both modules: the times for the DeleteMissBy-Subscript module are contained in the odd rows, whereas the times for the DeleteMissByRemove module are contained in the even rows. The following statements create a data object from these data and add a variable that indicates the algorithm. The markers for the even rows are set to red triangles.

/* create scatter plot of Time versus PctMissing for each trial */
declare DataObject dobj;
dobj = DataObject.Create("Timing",
                  {"PctMissing" "Replicate" "Time"}, results);
algorithm = {"Subscript", "Remove"};    /* odd rows are "Subscript"    */
algVector = repeat(algorithm, NTrials); /* even rows are "Remove"      */
dobj.AddVar("Algorithm", algVector);    /* add this variable to data   */

evenIdx = do(2, 2*NTrials,2);           /* generate even row numbers   */
dobj.SetMarkerColor(evenIdx, RED);      /* color them red and...       */
dobj.SetMarkerShape(evenIdx, MARKER_TRIANGLE); /* mark as triangles    */

The data were purposely structured so that the timings for both modules can be displayed in a single scatter plot of the Time variable versus the PctMissing variable, as shown in Figure 15.8, which is created by the following statements:

declare ScatterPlot p;
p = ScatterPlot.Create(dobj, "PctMissing", "Time");
p.SetMarkerSize(5);
p.SetTitleText("Timing Results for n="+strip(putn(n,"BEST.")), true);

/* add a legend */
color = BLACK || RED;
style = SOLID || DASHED;
shape = MARKER_SQUARE || MARKER_TRIANGLE;
run DrawLegend(p, algorithm, 12, color, style, shape, -1, "ILB");
Timing Data with Replicates

Figure 15.8. Timing Data with Replicates

Notice that the replicated timings are consistent with the timings shown in Figure 15.7, but the individual timings vary by up to 0.05 seconds from one another.

If you want to model the mean time as a function of the parameter (the percentage of missing values), you can compute a scatter plot smoother for each algorithm. SAS/IML software provides several smoothers such as a cubic spline smoother (the SPLINE subroutine) and a thin-plate spline smoother (the TPSPLINE subroutine). In addition, you can call SAS/STAT procedures such as the LOESS and GAM procedures from IMLPlus. The following statements call the TPSPLINE subroutine to compute a smoother for the timings for each algorithm. The resulting curves are overlaid on the scatter plot and shown in Figure 15.9:

/* compute a scatter plot smoother: a thin-plate smoothing spline */
p.DrawUseDataCoordinates();
lambda = -2;                              /* parameter for TPSPLINE */
do i = 1 to nrow(algorithm);              /* for each algorithm     */
   idx = loc(algVector=algorithm[i]);
   x = results[idx, 1];
   y = results[idx, 3];                   /* extract the data */
   call tpspline(fit, coef, adiag, gcv, x, y, lambda);

   p.DrawSetPenAttributes(color[i], style[i], 1);
   p.DrawLine(x, fit);                    /* draw the smoother */
end;
Smoothers Added to Timing Data

Figure 15.9. Smoothers Added to Timing Data

The TPSPLINE function computes the unique design points (that is, the unique values of PctMissing) and uses the mean value of each design point to fit the spline smoother. The figure shows that the performance of the DeleteMissBySubscript module is almost linear and decreases with the percentage of missing values. In contrast, the time used by the DeleteMissByRemove module tends to increase with the percentage of missing values in the input vector.

15.5 Timing Algorithms in PROC IML

All of the numerical techniques in this chapter are valid for measuring the performance of algorithms in PROC IML. The only difference is that the IMLPlus graphics shown in this chapter do not run in PROC IML. However, you can write the data to a SAS data set and use the SGPLOT procedure to visualize the results.

For example, to create Figure 15.9, you can use the CREATE and APPEND statements in SAS/IML to write the timing data to a data set named DeleteMissing. You can then call the SGPLOT procedure on these data. Although the SGPLOT procedure does not support thin-plate spline smoothers, you can use the PBSPLINE statement to fit a penalized B-spline through the data. The following statements demonstrate one way to visualize these data outside of SAS/IML Studio. Figure 15.10 shows the graph that is produced.

proc sgplot data=DeleteMissing;
   pbspline x=PctMissing y=Time / group=Algorithm;
run;
ODS Graphics Plot of Smoothers and Data

Figure 15.10. ODS Graphics Plot of Smoothers and Data

15.6 Tips for Timing Algorithms

The following tips can help you to make accurate assessments of the performance of algorithms:

  • Encapsulate the algorithm into a module in order to make it easier to call and to change parameters in the algorithm.

  • Use small quantities of data when you are developing and debugging your program.

  • Maintain a quiet machine when timing the algorithm. Close down e-mail and your Web browser. Do not start or run other applications during the timing (unless you purposely want to examine how the algorithm timing varies under realistic conditions.)

  • For long-running algorithms (say, greater than 15 seconds), it is not necessary to repeat each timing multiple times as described in Section 15.4. The variation in times will usually be small compared with the time required by the algorithm.

  • Notice that the nonparametric model used in Section 15.4 is a thin-plate spline, and not a loess model. The loess model is not a good choice for modeling replicated measurements because of the way the loess algorithm chooses points to use for local estimation.

  • You can wrap the TIME function around a SUBMIT statement to measure the time required to call a SAS procedure or to call R functions.

  • The way that an algorithm varies according to the size of the data is not always easy to understand. The performance of an algorithm will vary from CPU to CPU, depending on the size of certain memory caches, buffers, and page sizes.

15.7 References

[bibch15_01] J. Nilsson, (2004), "Implement a Continuously Updating, High-Resolution Time Provider for Windows," MSDN Magazine. URL http://msdn.microsoft.com/en-us/magazine/cc163996.aspx

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset