Chapter 5: Methods

Defining Methods

System-Defined Methods

User-Defined Methods

Recursion

Defining Methods

Methods are the home of executable DS2 code. Executable statements in DS2 look like executable statements in the DATA step. That is, you add, subtract, multiply, divide, and so on, the same way in both. The important difference in DS2 is that you can put code into logical groupings called methods.

Methods act like SAS functions. A method can take arguments (parameters) and return an answer (result). Like SAS functions, methods can be saved and used in other DS2 programs.

In DS2, there are two fundamental types of methods:

•   System-defined methods

•   User-defined methods

System-Defined Methods

Because all executable code resides in methods and methods are invoked like SAS functions, you might ask, “How do I start a program?” The answer is simple—use system-defined methods. Here are the system-defined methods:

1.   init()

2.   run()

3.   term()

Every DS2 program must have at least one of these defined (if you want your program to actually run!). A method can take arguments and return results, with one caveat—it has to be a user-defined method. System-defined methods cannot take arguments and they do not return results. System-defined methods are always present—if you do not explicitly add code to a system-defined method, PROC DS2 creates an empty method for you.

Based on their names, you can surmise that the init() method initializes, the term() method summarizes, and the run() method processes. Let’s look at the following DATA step program that has common features:

DATA   showDS (drop= runStart runEnd elapsed)

SummaryDS (keep= runStart runEnd elapsed);

    if _n_ = 1                         

    then

      do;

        runStart = datetime();

        retain runStart;

      end;

    set inpData end=done;              

/* processing */

    output showDS;

    if done                          

    then

      do;

        runEnd = datetime();

        elapsed = runEnd - runStart;

        /* other termination statements */

       output summaryDS;

      end;

run;

   The DATA step executes the code once at the beginning of processing. This is usually code that sets starting values.

   The DATA step reads rows from the input table. When the last row is read, it sets the variable done to 1. For all other rows, the variable done has the value 0.

   When the last row of the input table is read, the variable done is set to 1 and the code in the if block is executed. This code is executed only once.

Here is the equivalent code in DS2 with the same points highlighted:

proc ds2;

data   showDS2 (drop= (runStart runEnd elapsed) overwrite=yes);

    SummaryDS2 (keep= (runStart runEnd elapsed) overwrite=yes);

    declare double runStart runEnd elapsed;  

    retain runStart;

    method init();  

        runStart = datetime();

    end;

    method run();

        set inpData;  

        /* processing */

       output showDS2;

    end;

    method term();  

        runEnd  = datetime();

        elapsed = runEnd - runStart;

     /* other termination statements */

       output summaryDS2;

    end;

enddata;

run;

quit;

   The code is automatically run once at the start of the program. This replaces the if _n_= 1 statement in the DATA step.

   Each row of the input table is read. There is no end= option on the set statement.

   The code is automatically run once after method run() has completed its processing. This replaces the if done statement.

   The variables that you want to write out are declared. If there are other computed variables to be written to the output data set, they should also be declared.

Although this is a simple example, it clarifies the order of processing – init(), run(), term(). In addition, although the statements in the if _n_ = 1 and the if done blocks are each only processed once, the IF statements are executed for each row in the input. If there are 500 million rows in the input data, that means that there are one billion evaluations made for two executions. Although there are other ways to structure a DATA step to handle initialization and termination processing, the previous example is probably the most common.

If there is no need to initialize variables or if there is no code that needs to be run once at the start of the program, the init() method can be omitted. Similarly, if there is no code that needs to be executed once after all of the input rows have been read, the term() method can be omitted.

As the previous example shows, PROC DS2 can be written by using only the three system-defined methods. However, you are not taking advantage of encapsulation and modularization.

User-Defined Methods

Suppose you need to calculate the patient time in a study. Or, you need to age accounts. Or, you need to accrue interest, Or, you need to determine sales commission. If you use SAS, you know you can code calculations that range from very simple to very complex. In DS2, you can use methods to encapsulate the logic of these calculations.

Exploring User-Defined Methods

A user-defined method is a block of DS2 code that supports parameters, performs transformations, and returns results. The method is called within your DS2 program to perform these actions. When it is used in a DS2 program, a user-defined method call looks like a SAS function call.

Here is a user-defined method that calculates body mass index (BMI):

                                  

method bmi( double height, double weight) returns double; 

declare double b;

b = round(weight/(height*height), 0.1);           

    return (b);                                          

end;

   method is followed by the method name and the argument list in parentheses.

   The argument list shows two parameters (height and weight). Both are type double. You must specify the data type of each argument.

   returns is followed by the data type to be returned. This method returns a variable with a data type of double.

   The BMI is calculated to one decimal point.

   The calculated value of BMI is returned in the return statement.

   This complete line is called the method signature.

The method is defined within the data program in which it is used.

proc ds2;

data bmi (overwrite = yes);

declare double bmi;

method bmi(double height, double weight) returns double;

declare double b;

b = round(weight/(height*height), 0.1);

    return (b);

end;

method run();

   set class;

   bmi = bmi(height, weight);     

end;

enddata;

run;

quit;

   The method is invoked and the value of BMI is assigned to the variable bmi. Note that a scalar variable can have the same name as a method.

Let’s look at the result set, work.bmi. There are some problems, indicated by circles:

Figure 5.1: work.bmi Result Set

image

Method Overload

A method is overloaded if it has the same name as another method but a different signature. A signature includes the argument list and the return data type. There can be many reasons to overload a method, ranging from a difference in requirements to a difference in the data. Looking at the BMI example, you see that the formula is based on the International System of Units, metric. But, some of the height and weight measures are in the older Imperial system of pounds and inches. In addition, some rows have a blank (missing) value for unit. You need to make some changes to the bmi method to accommodate these differences. However, because the bmi method works correctly when it has the correct measures, you do not want to change it. In this case, you overload the bmi method with a new definition instead.

method bmi(double height, double weight, char(1) unit) returns double;

    declare double b;

    b = if      unit = 'M' then round(weight/(height * height), 0.1)   

        else if unit = 'I' then round(703.*(weight/(height*height)), 0.1)

        else if unit = ' ' and height < 5.0 then round((weight/(height*height)), 0.1)

        else NULL;

    return (b);

end;

   A new argument, unit, is added to the argument list. This bmi method, although it has the same name as the original bmi method, has a different signature, so both are allowed.

   A new if statement is used to compute BMI based on the value of unit. You can infer that unit is metric if there is no value for unit and height is less than 5. You could augment the code to determine whether unit is Imperial.

The complete DS2 program follows. Although you have two methods called bmi, they have different signatures. As a result, when DS2 sees the method call, it knows which one to invoke.

proc ds2;

data bmi (overwrite = yes);

declare double bmi;

method bmi(double height, double weight) returns double;               

    declare double b;

    b = round(weight/(height*height), 0.1);

    return (b);

end;

method bmi(double height, double weight, char(1) unit) returns double;

    declare double b;

    b = if      unit = 'M' then round(weight/(height*height), 0.1)

        else if unit = 'I' then round(703.*(weight/(height*height)), 0.1)

        else if missing(unit) and height < 5.0 then round((weight/(height*height)), 0.1)

        else NULL;

    return (b);

end;

method run();

   set class;

   bmi  = bmi(height, weight, unit);                                   

end;

enddata;

run;

quit;

   method bmi has been overloaded. There are two method definitions with the same name but they have different signatures.

   PROC DS2 matches the arguments in the method call to the method signature to call the right method.

The results look better, except that Jeffrey has no BMI. Changing the program to correctly infer his unit to calculate his BMI is something that you can do later. Here are the revised results:

Figure 5.2: Revised work.bmi Result Set

image

Modularity

The previous code is simple—you encapsulated a little bit of logic (with the IF statement) with a basic formula. Now you can call this method anywhere in the DS2 program. If you need to augment the method to handle more complexity (for example, to determine whether a unit is in Imperial when it is missing), you change only the method, not the multiple places in the program where BMI is calculated. You can also encapsulate more complex logic into a module.

When you start a new project, you often start with some form of pseudo code. For example, suppose you need to examine all of your subscribers and calculate a base (or regular) premium and possibly other premiums that have to be applied. You could start by writing out something like the following steps:

For each subscriber:

•   Compute previous regular premium.

•   Compute the regular increase.

•   Compute the C extra premium.

•   Compute the A extra premium.

•   Compute the B extra premium.

You think about it some more and start coding a DS2 program:

method run();

         set subscribers;

         regFee = regularPremium();

         regIncrease = regularIncrease();

         C_Fee  = C_Premium();

         A_Fee  = A_Premium();

         B_Fee  = B_Premium();

         FeeTotal = sum(0, regFee,

                            regIncrease,

                            A_Fee,

                            B_Fee,

                            C_Fee

                         );

     end;

When someone new has to determine why the B_Fee is incorrect, it is easy to see not only where in the process B_Fee is calculated, but also which code block calculates the B_Fee.

Forward Reference

Sometimes you want to invoke a method before it has been defined. DS2 cannot compile a program where a method is invoked before it is defined. To get around this requirement, you can use forward referencing.

proc ds2;

data adjustments (overwrite = yes);

/* FORWARD declare the methods - they follow method run() in the code */

      FORWARD regularPremium;

      FORWARD regularIncrease;

      FORWARD A_Premium;

      FORWARD B_Premium;

      FORWARD C_Premium;

method run();

         set subscribers;

         regFee = regularPremium();

         regIncrease = regularIncrease();

         C_Fee  = C_Premium();

         A_Fee  = A_Premium();

         B_Fee  = B_Premium();

         FeeTotal = sum(0, regFee,

                           regIncrease,

                            A_Fee,

                           B_Fee,

                           C_Fee

                         );

     end;

method regularPremium()

  /* more code */

end;

  /* more code *

enddata;

run;

quit;

You can use forward referencing to provide an initial list of all the methods that have been defined. It ends up being a useful form of code documentation.

By Value Parameters

By default, DS2 passes the method parameters by value. This means that a copy of the variable is passed to the method. Any changes to the parameter in the method are discarded when the method completes because they are implicitly declared as local variables. The exception to this default behavior is a parameter that is an array. Arrays are always passed by reference.

By Reference Parameters

When a parameter is passed by reference, this means that the memory address of the variable is passed to the method. Any changes to the parameter in the method change the value of the variable in the calling method. In other words, changes are not discarded. To tell DS2 a parameter is being passed by reference, you use the IN_OUT argument modifier. Here is what the bmi method looks like if you passed parameters by reference:

method bmi(double height, double weight, char(1) unit, IN_OUT double outbmi) ;  

    outbmi = if      unit = 'M' then round(weight/(height*height), 0.1)   

        else if unit = 'I' then round(703.*(weight/(height*height)), 0.1)

        else if missing(unit) and height < 5.0 then round((weight/(height*height)), 0.1)

        else NULL;

end;

   The IN_OUT parameter tells DS2 that this parameter can be changed. Note that there is no RETURNS parameter in the definition. A method with an IN_OUT parameter does not return values.

   The IN_OUT parameter is updated in the bmi method.

When you use the IN_OUT parameter, the method cannot explicitly return a value. The value to be returned is part of the parameter list. Here is what the call to the method looks like:

bmi = 0.0;

bmi(height, weight, unit, bmi);

DS2 knows by the method signature which of the bmi methods to invoke.

The use of the IN_OUT parameter is even better when you have a method that can return multiple values. For example, suppose you need to get the last day of the month for every billing date. A simple method to return the date is easy:

method lastOfMonth(double inDate) returns double;

     declare double endDate;

     endDate = intnx('month', inDate, 0, 'E');

     return(endDate);

end;

Later, you discover that you also need the number of days in the month and you occasionally need the first day in the month. The first day of the month is also simple to program:

method firstOfMonth(double inDate) returns double;

     declare double endDate;

     endDate = intnx('month', inDate, 0, 'B');

     return(endDate);

end;

These two methods could be collapsed into one method:

method dateOfMonth(double inDate, char(1) whereInMonth) returns double;

     declare double whichDate;

     whichDate = if upcase(whereInMonth) in ('B', 'E')

                  then intnx('month', inDate, 0, 'B')

                  else NULL;

     return(whichDate);

end;

However, you might prefer separate methods if this is more clear:

enddate   = lastOfMonth(thisDate);

Otherwise, you have this:

   enddate   = dateOfMonth(thisDate, 'e');

To get the number of days in a month, you do some simple arithmetic:

enddate   = lastOfMonth(thisDate);

startdate  = firstOfMonth(thisDate);

daysInMonth = endDate – startDate + 1;

Because you need both the start date and the end date to get the number of days, you can create a method that returns all three:

method datesInMonth(double inDate,  IN_OUT double startDate,

                                    IN_OUT double endDate,

                                    IN_OUT integer days);   

     startDate = firstOfMonth(inDate);   

     endDate = lastOfMonth(inDate);

     days = endDate - startDate + 1;

end;

method run();

   declare double thisDate startDate endDate having format yymmdd10.;

   declare integer numDays;

   thisDate = to_Double(date'2015-01-15');

   datesInMonth(thisDate, startDate, endDate, numDays);   

end;

enddata;

run;

quit;

   There are three IN_OUT parameters and no return values.

   The previous methods are used to calculate the values.

   The method is invoked.

The obvious first question about the firstOfMonth and lastOfMonth methods is, “Why not just use the SAS date functions directly?” The answer is, “It’s not that simple.” For example, there can be more than one last day of the month. Maybe it is the last calendar day. Maybe it is the last working day. Using a method that calls the SAS date functions makes it easier to modify if the last calendar day is not the right answer.

Methods and Scope

All variables must be declared. Where they can be accessed (scope) is determined by where they are declared. Variables declared within a method are local to that method. That is, they are accessible only within the method and only while the method is executing. When the method stops executing, the local variables disappear. If there is a global variable and a local variable with the same name, the local variable takes precedence:

proc ds2;

data _NULL_;

DECLARE double startDate endDate having format yymmdd10.;  

retain startDate endDate;

method firstOfMonth(double inDate) returns double;

     declare double startDate having format yymmdd10.;  

     put '2.--> In First Of Month  ' startDate=;

     inDate = inDate + 1;

     put '3.--> After change       ' inDate= yymmdd10.;  

     startDate = intnx('month', inDate, 0, 'B');

     return(startDate);

end;

method lastOfMonth(double inDate) returns double;

     declare double endDate having format yymmdd10.;

     put 'In last Of Month   ' endDate=;

     endDate = intnx('month', inDate, 0, 'E');

     return(endDate);

end;

method run();

     declare double inDate endDate having format yymmdd10.;  

     inDate = to_Double(date'2016-02-15');

     startDate = 1;

     put '5.--> Before Method Call ' startDate= inDate=;  

     startDate = firstOfMonth(inDate);

     put '6.--> After Method Call  ' startDate= inDate= ;  

     endDate = 1;

     put 'Before Method Call ' endDate=;

     endDate = lastOfMonth(inDate); 

     put 'After Method Call  ' endDate= inDate= ;

end;

method term();

   put '7.--> In Term ' startDate= endDate=;  

end;

enddata;

run;

quit;

   startDate and endDate are declared outside of all methods, so they are global variables.

   startDate is declared within method firstOfMonth. It is local to the method and takes precedence over the global variable of the same name. This is evident by the value printed to the log.

   inDate is a parameter. All parameters that are not modified by IN_OUT are implicitly declared as local. Any changes to inDate are discarded.

   The variables are local to the run method. In method run(), endDate takes precedence over the global variable of the same name.

   The global variable startDate is initialized. inDate and startDate are displayed in the log.

   A new value is assigned to startDate. The dates are displayed in the log.

   With the term method, you can see the values that you get.

Here are the results:

5.--> Before Method Call    startDate=1960-01-02 inDate=2016-02-15

2.--> In First Of Month        startDate=         .

3.--> After change             inDate=2016-02-16

6.--> After Method Call     startDate=2016-02-01 inDate=2016-02-15

      Before Method Call       endDate=1960-01-02

      In last Of Month         endDate=         .

      After Method Call        endDate=2016-02-01 inDate=2016-02-15

7.--> In Term               startDate=2016-02-01 endDate=         .

   shows inDate to be 2016-02-15 as was assigned.

    show the results from within the method firstOfMonth().

   shows a missing value for startDate, not 2016-02-16, because you are now accessing the variable local to firstInMonth().

   shows that the value of the parameter has been increased by one day.

   startDate has the correct date, 2016-02-01 and inDate has reverted to its value before the method call, 2016-02-15.

   startDate has the correct value. However, the global variable endDate is missing. In method run(), the endDate that was updated was local to method run(), so its value has been discarded.

What if you wanted to access a global variable in a method that also had a local variable of the same name? You can use THIS.expression. It tells DS2 to access the global variable, not the local variable:

method firstOfMonth(double inDate) returns double;

     declare double startDate having format yymmdd10.;

     inDate = inDate + 1;

 put '1.--> Local inDate        ' inDate= yymmdd10.;

 put '2.--> Global inDate       ' THIS.inDate= yymmdd10.;  

     startDate = intnx('month', inDate, 0, 'B');

     return(startDate);

end;

   DS2 knows to use the global variable.

Recursion

DS2 methods can be called recursively. Recursion is a topic beyond the scope of this book. Although recursion is often elegant, it is commonly inefficient.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset