Chapter 12
Automation

Cajeros automáticos—automatic teller machines, or ATMs.

When you're backpacking from Guatemala to Patagonia, there's nothing like a cajero automático to save the day.

I recall backpacking “back in the day” before the ubiquity of ATMs and before global acceptance of credit cards. Running out of money was a real threat and replenishing the stash always a hassle.

The production began with staking out a bank that had reasonable lines. On Monday mornings, for example, many Central and South American bank lines wrap around the block, easily surpassing a two-hour wait. So you definitely didn't want to run out of green stuff over the weekend.

Once inside, past the posted sentries with shotguns, I always seemed to end up in the wrong line. I'd reach the counter, utter my trite phrase “Necessito cambiar algun dinero, por favor” (I need to change some money), and often be shuffled off to another line or some obscure corner desk.

Bank officers are some of the most patient people I've ever met, dealing with broken Spanish all day as tourists attempt to exchange currency. Their first request is always for the passport, so out it comes, instinctually, like your driver's license when you look in the rearview mirror and watch the fuzz walking up.

After your credentials have been validated, you have to dig out the stack of U.S. bills or traveler's checks or some combination thereof from whatever coffer or compartment you've hidden them, often half undressing in the process while trying to avoid eye contract.

The banks are extremely finicky about foreign currency—subtle marks, maculae, tears, or even wrinkles will cause a bill to be rejected. And then you're left carrying useless bills for the duration of your trek or until some hapless individual or bank finally accepts them. Oh, but try rejecting some of their bills sometime and wait for the contemptuous looks!

After examining each bill—front and back, literally holding it to the light—the bank officer counts, double counts, and triple counts the stash. After your rejected bills have been returned, the total of the thrice-counted bills goes into an adding machine.

In a whir of calculations, after a minute you're finally shown a number. With a nod of approval, you think you're about to get paid, but no—this is roughly the halfway point.

Next, forms are printed—sometimes from a computer, but often on a typewriter—that describe the transaction, showing the cambio (exchange rate), as well as associated bank fees that were charged, and finally the amount in the local currency.

The forms are always produced in triplicate or quadruplicate. When the bank officer breaks out a large desk stamp and moistens it with ink, every page is stamped multiple times with so much force that the desk shakes.

Finally the cash drawer is unlocked, which sometimes requires an additional trip to the back office by the officer to retrieve extra currency or a supervisor. The currency is counted—again in triplicate, and again with the precision and handling of Las Vegas poker dealers.

But the bills are never counted at once. Each denomination is counted individually after which the figure is entered into the adding machine, and this back-and-forth continues until the last pesos or centavos have been added. And the count occurs—you guessed it—at least three times.

At long last, after the hullaballoo has concluded, you're handed the stack of currency and a copy of the receipt. No effort ever is made to count the money to the customer—the fanfare is only to demonstrate to the bank officer that the correct amount has been received and dispersed.

At this point, having spent an hour exchanging money for the week, you're finally on your way. With this significant level of effort, it's easy to understand the appeal that cajeros automáticos have for tourists backpacking through countless countries, who can in seconds securely withdraw local currency from their foreign accounts.

images

The inefficiency of and delays in Central and South American banks are notorious, but these are made more frustrating because a simpler, more efficient solution exists for cash withdrawal—ATMs. While the end result—some pesos, lempiras, or quetzales—is identical, the process of changing money in person can be painful.

Just as ATMs are taken for granted in the United States—and increasingly so in Central and South America—automation is taken for granted in software applications development. User-focused applications are compiled, encrypted, and run with the press of a button or the click of a mouse.

SAS software, however, is often run manually by opening the SAS application and executing a program. While necessary during development, testing, and empirical research, manual execution should be eliminated in most production programs that are executed regularly.

Banking in person can be slow and frustrating, but typically that's where the criticisms end. The bank officers are polite and professional, the transactions accurate, and the shotgun-wielding guards imbue patrons with a definite sense of security. But shirking automation in SAS not only can increase software execution time but also can introduce defects, unintended (or unapproved) modification, and ultimately variability that can cripple reliability.

Banking in person requires numerous checks and balances to ensure and validate that the correct amount is dispersed, all of which require time and effort. ATMs conversely automate these validation processes, and while I still count the bills I receive, I've never once received an incorrect amount. SAS software executed manually similarly must rely on manual inspection of the SAS log to validate process success or demonstrate failure, whereas a hallmark of automated software is its ability to identify and communicate automatically when exceptions or errors occur.

On occasion, there are banking activities that do require visiting a physical bank branch—although I can't think of any right now, I'm sure they exist. But for routine banking activities such as weekly withdrawals to ensure you can buy daily pupusa rations or bribe a Belizean zoo guard to look the other way while you scale a fence into an exhibit, ATMs are a necessity. All SAS software similarly is not destined for automation, but where practicable, it can be a tremendous advantage to software performance.

DEFINING AUTOMATION

The International Organization for Standardization (ISO) defines automatic as “pertaining to a process or equipment that, under specified conditions, functions without human intervention.”1 Similar to the automatic transmission in a car, which allows a driver to accelerate and decelerate without manual shifting, automated software performs necessary functions without developer intervention. This typically includes not only execution, but also detection of exceptions and errors, communication of failures to stakeholders, and ultimately validation of process success. In data analytic development, a common further objective of software automation is the facilitation of software that can be scheduled to run recurrently.

Automation is taken for granted in traditional software development because third-party software users typically obtain software whose code is stable, compiled, encrypted, and impenetrable. Thus, although I'm interacting with Microsoft Word now as I type this paragraph, this interaction occurs only within a graphical user interface (GUI), and I'm prevented from interacting with or modifying the underlying code. End-user development environments differ substantially because they encourage interaction with underlying code, which decreases code stability and increases the unfortunate reliance on SAS log parsing to validate software success. However, where stable SAS software must exhibit high reliability and availability, automated, scheduled batch jobs can facilitate these objectives.

This chapter introduces and differentiates SAS interactive and batch modes, the latter of which can be used to execute and schedule software directly from the operating system (OS) environment. Because batch jobs can also be initiated from within SAS software, techniques are demonstrated that enable software to spawn simultaneous batch jobs in separate SAS sessions through parallel processing. Through the implementation of automation principles, SAS practitioners not only gain independence from software during its execution but also can more creatively and efficiently prescribe the program flow of modularized software.

AUTOMATION IN SAS SOFTWARE

From a software applications perspective, software that is being developed represents the final product and ultimate business value to a user. For example, end users purchase Microsoft Office understanding that they'll neither have access to nor interact with its source code. However, they also trust that the high reliability and customizability of Microsoft Word—coupled with the complexity of its source code—ensure that no additional benefit would be provided through access to source code. Consumers thus expect to receive an impenetrable product that is reliable, secure, stable, and high quality.

As discussed in chapter 1, “Introduction,” data analytic software development differs substantially from applications development in that the final product bearing ultimate business value is not software itself but rather a data product, data analysis, information, knowledge, or data-driven decision. Because data models must often be repetitively refined and, even once stabilized, often need to be accessible to SAS practitioners, the paragon of quality SAS software is transparent and frequently more malleable than software applications intended for third-party users.

The downside of the accessibility and malleability prevalent in data analytic software is the lack of stabilization this encourages. Software is less likely to be hardened or finalized because it can be modified in seconds and rerun, contributing to a preference in many environments to execute SAS software through interactive sessions rather than stable, automated batch jobs. While continually modified software has its place in empirical research, data modeling, and other applications of Base SAS, central and critical data processes should be built through software development best practices within the SDLC that necessitate stabilizing and automating software. With this automation can come increased reliability and efficiency as developers are freed from painful, repetitive software execution and manual process validation.

SAS PROCESSING MODES

The Step-by-Step Programming with Base SAS® Software distinguishes two processing modes for SAS software:2

  • Foreground processing (AKA the SAS Windowing Environment)—This includes the interactive mode and non-interactive mode, the latter of which is not discussed. Manually running programs in the SAS Display Manager, SAS Enterprise Guide, or SAS Studio (including the SAS University Edition) demonstrates the interactive mode.
  • Background processing—This includes batch jobs that can be initiated from within software, or initiated or scheduled directly from the OS environment without manually starting the SAS application. SAS Display Manager, SAS Enterprise Guide, and SAS Studio (among others) all have batch processing available, but only SAS Display Manager batch processing is demonstrated in this chapter.

Because only interactive foreground processing is discussed, all foreground processing is referenced as the interactive mode, or running SAS interactively. Interactive mode is arguably the most common environment in which SAS is learned and in which development and testing occur while batch mode best facilitates software automation. Interactive mode includes the familiar SAS GUI interfaces—SAS Display Manager, SAS Enterprise Guide, and SAS Studio—from which code can be typed and executed, with results viewable in the SAS log, output window, or data set viewer. Batch mode instead spawns a new SAS session and executes a SAS program before terminating, at which point results can be viewed externally. This chapter demonstrates methods to run SAS software in batch mode within the SAS Display Manager, emphasizing increased performance gained through automated parallel processing.

Introduction to Interactive Mode

To initiate SAS interactive mode, the SAS executable file must be executed through the Windows start menu, a desktop icon, the command prompt, or other method. For example, in a Windows environment, the SAS Display Manager is launched with SAS.exe and Enterprise Guide is launched with SEGuide.exe. Figure 12.1 demonstrates the SAS Display Manager startup window, from which SAS practitioners can open, write, edit, or run code, or perform a multitude of menu-driven functions that require no software development whatsoever.3

nfgz001

Figure 12.1 SAS Display Manager at Startup

Other common interactive modes include SAS Enterprise Guide and SAS Studio, the latter of which provides the backbone for the SAS University Edition. While the look and feel differ for these SAS interfaces, the functionality of Base SAS software is largely equivalent across all SAS interfaces. The “SAS Interface Portability” section in chapter 10, “Portability,” identifies very few differences when interactively running SAS. However, because batch processing inherently involves interacting directly with the OS environment to spawn SAS sessions, significant variations in functionality exist, such as the inability to spawn SAS sessions with the SYSTASK statement in SAS Studio. For this reason, use of batch is demonstrated solely within the SAS Display Manager.

Introduction to Batch

Batch mode automates the spawning, execution, and termination of not only the SAS application (e.g., SAS.exe) but also SAS programs that are run within it. Because SAS programs are always compiled at the time of execution (except when using the SAS Stored Compiled Macro Facility, which is briefly introduced in chapter 11, “Security”), the SAS application must be opened before SAS software can be executed. Thus, batch jobs effectively combine the actions of spawning a new SAS session and executing SAS software into one seamless activity.

SAS Display Manager batch jobs can be spawned from interactive mode, (with the SYSTASK statement that can execute OS command line syntax), or in batch mode directly from the OS (from the command prompt or Windows Task Scheduler). With scheduling enabled, batch jobs can be fully automated for regular, reliable, recurrent execution without human interaction, freeing developers to pursue far more interesting endeavors. Both the OS and SAS programs enable batch jobs to be run in series or parallel. Both methods of spawning batch jobs can also implement business logic that enables program flow to pause until one or more batch jobs have completed.

Unlike running SAS software interactively, in which the SAS log can be viewed in real-time within the SAS interface, batch mode executes in the background and thus saves—rather than displays—all logs, data sets, and other output. To facilitate quality assurance and exception handling, batch jobs launched from either an interactive SAS session or the OS environment provide return codes that can demonstrate exceptions, warnings, and runtime errors. Given that production software aims to be self-sufficient and should not require independent, manual review of logs to determine program success, exception handling and program validation should be integral components of all batch processing.

Synchronicity and Asynchronicity

Data analytic programs typically execute in a serialized manner in which processes have clear prerequisites, dependencies, and ordered program flow. Serial program flow is sometimes described as synchronous because processes occur one after another. For example, the following code demonstrates a data transformation module (simulated with the DATA step) that must be run before an analytic module (simulated with the MEANS procedure). The MEANS procedure will not execute until after the DATA step has completed.

data final;
   length char $10 num 8;
   do num=1 to 100;
      char=put(num,10.);
      output;
      end;
run;
proc means data=final;
   var num;
run;

While some SAS software will require serialized steps, in other cases processes or programs can be run in parallel to reduce execution time and dramatically improve performance. In this example, if a separate analytic module (represented by the FREQ procedure) is appended to the program flow, the new module can be run in parallel with the MEANS procedure because each requires read-only access to the Final data set. While the following code demonstrates a serialization of the three modules, the two analytic modules in theory could be run in parallel:

data final;
   length char $10 num 8;
   do num=1 to 100;
      char=put(num,10.);
      output;
      end;
run;
proc means data=final;
   var num;
run;
proc freq data=final;
   tables char;
run;

In synchronous execution, the FREQ procedure will not execute until the MEANS procedure terminates, a fact we take for granted in the Base SAS language. However, because the MEANS and FREQ procedures in theory could be executed at the same time (but are not), their serialization represents a false dependency, described in the “False Dependencies” and “Parallel Processing” sections in chapter 8, “Efficiency.” An asynchronous solution, on the other hand, would execute the MEANS procedure and immediately thereafter execute the FREQ procedure, enabling them to run in parallel.

The following code implements an asynchronous solution that launches and executes two modules (Means.sas and Freq.sas, demonstrated in the later “Modularity” section) in parallel via the SYSTASK statement:

%let perm=c:perm;
libname perm "&perm";
data perm.final;
   length char $10 num 8;
   do num=1 to 100;
      char=put(num,10.);
      output;
      end;
run;
systask command """%sysget(SASROOT)sas.exe"" -sysin ""&permmeans.sas"" -log ""&permmeans.log"" -print ""&permmeans.lst""" status=rc_means;
systask command """%sysget(SASROOT)sas.exe"" -sysin ""&permfreq.sas"" -log ""&permfreq.log"" -print ""&permfreq.lst""" status=rc_freq;
waitfor _all_;

In this revised example, the DATA step executes synchronously because it must wait for the LIBNAME statement to complete before starting. However, the two SYSTASK statements execute asynchronously because they don't require subsequent processes to wait for them to finish before starting. Thus, the return code &RC_MEANS hasn't been generated yet when the second SYSTASK statement executes, because the Means program is still executing. The asynchronicity of SYSTASK is described further in the “Decomposing SYSTASK” section, including exception handling that should accompany batch jobs both to detect failure and validate success.

This example demonstrates the tremendous flexibility in combining SAS interactive and batch modes. The unnamed program is run in interactive mode through an open SAS session. However, the interactive session is able to spawn two batch jobs with SYSTASK, each of which completes in batch mode while the unnamed program also continues to execute. If the unnamed program were instead saved to a SAS program C:permETL.sas and run from the operating system as a batch job, the code would not change but its processing would represent a program (ETL.sas) running in batch mode, spawning two additional batch jobs, Means.sas and Freq.sas. This versatility is demonstrated further in the following sections.

STARTING IN INTERACTIVE MODE

As demonstrated in the “Synchronicity and Asynchronicity” section, software executing in interactive mode can spawn batch jobs (with the SYSTASK statement) that execute in batch mode. When the interactive mode launches a batch job, the interactive session and software remain open while the batch job independently opens, executes, and terminates a separate SAS session in which the batch job is executed. This independence is beneficial because it ensures that the new SAS session will be fresh, thus not contaminated by modified system options, global macro variables, open data streams, files in the SAS WORK library, or other artifacts that could cause unwanted variability during execution.

While this independence is required to facilitate consistency and reliability of production jobs, it also limits communication between parent and child processes. For example, a parent process running interactively could spawn a child process via SYSTASK that will execute in batch mode. However, macro variables cannot be passed to the child, and return codes cannot be passed back to the parent. SAS practitioners accustomed to reviewing SAS logs after software execution within the log window will also quickly discover that batch process logs are saved instead to log files. Notwithstanding these complexities of communication, the benefits of spawning batch jobs from the interactive mode are profound.

A Serialized Program Flow Example

The SYSTASK COMMAND statement executes OS environment commands from SAS software.4 However, SYSTASK examples in this text only demonstrate execution of the SAS executable file (SAS.exe) to spawn a SAS session and batch job. The SYSTASK statement by definition spawns only asynchronous tasks. Thus, ten consecutive SYSTASK statements would attempt to run ten separate programs in ten parallel SAS sessions. In high-resource environments, this might be an effective solution that provides substantially increased performance, but low-resource environments might choke and demonstrate slowed processing, inefficiency, or outright failure.

To implement more advanced program control, SYSTASK includes a WAIT option that pauses the parent process until one or more child processes have completed, thus effectively prescribing synchronicity. With the WAITFOR and SYSTASK KILL statements, additional dynamic processing can be implemented. Both WAITFOR and KILL are demonstrated in the “Euthanizing Programs” section in chapter 11, “Security.”

The following serialized code simulates ETL software in which one program includes separate Trans, Freq, and Means modules:

%let perm=c:perm;
libname perm "&perm";
* transformation;
data perm.final;
   length char $10 num 8;
   do num=1 to 100;
      char=put(num,10.);
      output;
      end;
run;
* means;
proc means data=perm.final;
   var num;
run;
* freq;
proc freq data=perm.final;
   tables char;
run;

As demonstrated in the “Synchronicity and Asynchronicity” section, this code includes a false dependency because the FREQ procedure is required to wait for the MEANS procedure to complete, but in theory these could complete in parallel. To execute these two modules in parallel, the PERM.Final data set must be accessible by both child processes. Thus, referenced data sets should always be in shared libraries and never in the WORK library, which will be inaccessible to other SAS sessions. Because the PERM library is utilized, each child process must utilize the LIBNAME statement to initialize the PERM library (unless it is globally defined), as demonstrated in the following section on “Modularity.” Modular software design is often the first step toward software automation that can support parallel processing models, so these advantages and techniques are demonstrated in the following section.

Modularity

A technical requirement should first exist before embarking on a crusade to needlessly infuse existing software with batch processing. That requirement should demonstrate what function or performance will be improved through batch processing, balancing this additional value with the inherent complexity that batch brings. In the previous example, two analytic modules—Means and Freq—could in theory be run in parallel because each requires the prior DATA step to have completed and a shared file lock on the PERM.Final data set. To increase performance, the original monolithic program could be enhanced with modular software design to include a parent process that asynchronously spawns two SAS batch jobs (children): Means and Freq. With this vision of how parallel processing will drive higher software performance, technical requirements can be modified and development can commence.

The first step toward parallel processing through automation must modularize code into discrete chunks, after which prerequisites, dependencies, inputs, and outputs can be identified. Modularity is discussed throughout chapter 14, “Modularity,” while its central role in functional decomposition of software (to facilitate critical path analysis and eventual parallel processing) is described throughout chapter 7, “Execution Efficiency.” One common paradigm is to maintain one parent program that operates as the engine (AKA controller or driver) and dynamically spawns and validates all subsequent batch children. This program flow facilitates flexible exception handling that can detect and dynamically respond to exceptions and runtime errors not only in the parent but also its child processes.

Child processes essentially suffer from amnesia, having no awareness of their parent environment, macro variables, library assignments, or other alterations to the environment. If the PERM library is not globally or permanently defined, it must be assigned in each child process. SAS system options in the parent session may also need to be established in batch sessions. The now independent Means module is saved as C:permmeans.sas:

* saved as c:permmeans.sas;
%let perm=c:perm;
libname perm "&perm";
%put SYSENV: &sysenv;
proc means data=perm.final;
   var num;
run;

The global macro variable &SYSENV, which demonstrates the SAS application processing mode, has also been added to the code to distinguish batch from interactive modes. In interactive mode (or whenever the default SAS system option TERMINAL is active), &SYSENV will be FORE, representing that SAS is running in the foreground. When a SAS program runs in batch mode with the NOTERMINAL option, &SYSENV will be BACK, representing that SAS is running in the background. Without the NOTERMINAL option explicitly specified, the &SYSENV value will indicate FORE whether software is executing interactively or in batch mode.

The Freq module is similarly saved as C:permfreq.sas. It, too, requires definition of the PERM library, demonstrating the benefits of global library assignments.

* saved as c:permfreq.sas;
%let perm=c:perm;
libname perm "&perm";
%put SYSENV: &sysenv;
proc freq data=perm.final;
   tables char;
run;

The final component enabling asynchronous program flow is the parent process, saved to C:permetl_engine.sas.

* saved as c:permetl_engine.sas;
%let perm=c:perm;
libname perm "&perm";
data perm.final;
   length char $10 num 8;
   do num=1 to 100;
      char=put(num,10.);
      output;
      end;
run;
%put SYSENV: &sysenv;
systask command """%sysget(SASROOT)sas.exe"" -noterminal -sysin ""&permmeans.sas"" -log ""&permmeans.log"" -print ""&permmeans.lst""" status=rc_means taskname=task_means;
systask command """%sysget(SASROOT)sas.exe"" -sysin ""&permfreq.sas"" -log ""&permfreq.log"" -print ""&permfreq.lst""" status=rc_freq taskname=task_freq;
waitfor _all_ task_means task_freq;
%put RC_MEANS: &rc_means;
%put RC_FREQ: &rc_freq;

When the ETL_engine.sas is executed, it first creates the PERM.Final data set, after which the Means and Freq modules now execute in parallel. While the twin return codes—&RC_MEANS and &RC_FREQ—demonstrate the highest warning or error code generated in the respective child process, at this point, no actual exception handling has been implemented in either the parent or the child processes. In the next section, the functionality of the SYSTASK and WAITFOR statements is explored.

Decomposing SYSTASK

While the examples of SYSTASK in this chapter demonstrate only calling SAS.exe from the OS, SYSTASK is far more robust and can be utilized to pass other instructions to the OS. For example, in a Windows environment, SYSTASK can implement the DIR command to pipe the contents of a directory to a text file. The following statement compiles a list of all SAS programs in the C:perm folder and pipes this list to the text file Allmysas.txt:

systask command "dir c:perm*.sas /b > c:permallmysas.txt";

In this example, the SYSTASK statement includes instruction only for the OS, but more complex implementation can additionally include instruction for the parent and child process. For example, the first SYSTASK statement from the “Modularity” section example shells to the OS, opens the SAS application (SAS.exe), spawns the Means module (C:permmeans.sas), and terminates SAS.

systask command """%sysget(SASROOT)sas.exe"" -noterminal -sysin ""&permmeans.sas"" -log ""&permmeans.log"" -print ""&permmeans.lst""" status=rc_means taskname=task_means;

The triple quotation marks signify the beginning and end of instruction to the OS (on how to invoke SAS.exe) while the STATUS and TASKNAME parameters are instructions to the parent process to create macro variables representing the batch job return code and task name, respectively. If the SYSPARM parameter is present (as demonstrated in the “Passing Parameters” section), it represents instruction for the child process. The previous SYSTASK statement is equivalent to typing the following statement at the command prompt:

"C:Program FilesSASHomeSASFoundation9.4sas.exe" -noterminal -sysin "c:permmeans.sas" -log "c:permmeans.log" -print "c:permmeans.lst"

The %SYSGET macro function extracts the SASROOT operating environment variable, the location of the SAS.exe executable file. Two double quotations are used in the SYSTASK invocation to represent one double quotation within the OS. Quotations are required only when spaces exist in folder or file names, thus the following quotation-free invocation from the command prompt is equivalent because “Progra∼1” is the 8.3 legacy filename format for the “Program Files” folder:

C:progra∼1SASHomeSASFoundation9.4sas.exe -noterminal -sysin c:permmeans.sas -log c:permmeans.log -print c:permmeans.lst

SAS system options such as NOTERMINAL are preceded with a dash, as are the batch parameters SYSIN, LOG, and PRINT, regardless of whether implemented from within a SAS program or from the command prompt. The SYSTASK COMMAND statement itself, as well as SYSTASK options such as STATUS, TASKNAME, or WAIT, are never used at the command prompt. Rather, they are used by the parent process to initiate, control, and track the status of the batch job.

The SYSIN parameter represents the SAS program that will be executed, the LOG parameter its log, and the PRINT parameter its output. If either the LOG or PRINT parameters are omitted, the respective log or output files are created with the same name and in the same directory as the SAS program being executed. Thus, both the LOG and PRINT parameters could be omitted from the previous SYSTASK statement to produce identical functionality.

Omitting the LOG and PRINT parameters is typically acceptable unless a SAS program is intended to be run concurrently—that is, multiple instances of the same program run in different SAS sessions in parallel. This complexity is also demonstrated in the “Complexities of Concurrency” section in chapter 11, “Security.” With the identical program launched through two or more SYSTASK statements, each batch session will default to identically named log and output files, thus causing data access collisions because both sessions cannot write to the same file. For example, if two SYSTASK statements attempt to launch two concurrent instances of the Means.sas program, the LOG and PRINT parameters are required and need to specify uniquely named files between the two SYSTASK statements. The STATUS and TASKNAME parameters would also need to be unique between statements:

systask command """%sysget(SASROOT)sas.exe"" -noterminal -sysin ""&permmeans.sas"" -log ""&permmeans1.log"" -print ""&permmeans1.lst""" status=rc_means1 taskname=task_means1;
systask command """%sysget(SASROOT)sas.exe"" -noterminal -sysin ""&permmeans.sas"" -log ""&permmeans2.log"" -print ""&permmeans2.lst""" status=rc_means2 taskname=task_means2;

The STATUS statement is not passed to the OS and specifies a macro variable to represent the return code of the spawned child process. SYSTASK STATUS return codes—in this example represented by &RC_MEANS and &RC_FREQ—behave differently than the more common automatic macro variables such as &SYSERR and &SYSCC. Empty status codes represent that the return has not been generated while 0 indicates no warning or error was encountered. A value of 1 demonstrates a warning and 2 a runtime error. As further demonstrated in the later “Batch Exception Handling” section, STATUS return codes are created as local macro variables if the SYSTASK statement is inside a macro definition and as global macro variables if not.

Because SYSTASK always executes asynchronous batch jobs that run in the background, the WAITFOR statement—which causes the parent process to halt until child processes have completed—must be invoked before the value of return codes can be assessed. Without the WAITFOR statement, both the Freq and Means modules will still be executing when &RC_MEANS and &RC_TASK are assessed, so the values of each will be empty. For example, the following code demonstrates when the STATUS macro variable is created and when it is assigned:

%macro test;
%if %symexist(new_status)=0 %then %put NEW_STATUS does not exist before;
systask command """%sysget(SASROOT)sas.exe"" -noterminal -sysin ""&permmeans.sas"" -log ""&permmeans.log"" -print ""&permmeans.lst""" status=new_status taskname=new_task;
%if %symexist(new_status)=0 %then %put NEW_STATUS does not exist after;
%put STATUS Before: &new_status;
waitfor _all_ new_task;
%put STATUS After: &new_status;
%mend;
%test;

The output demonstrates that the STATUS return code &NEW_STATUS (a local macro variable) does not exist before the SYSTASK statement, exists but is not assigned after the SYSTASK statement, and is assigned only after the WAITFOR statement has determined that the child process has terminated:

%test;
NEW_STATUS does not exist before
STATUS Before:
STATUS After: 0

The TASKNAME statement names the asynchronous task, also known as the batch job or child process. TASKNAME can be finicky because if not closed with either a WAITFOR or KILL statement, the task will remain open in the SAS parent session, even if the child process has long since completed. For example, invoking the following duplicate SYSTASK statements in series produces an error that the “Specified taskname is already in use”:

systask command """%sysget(SASROOT)sas.exe"" -noterminal -sysin ""&permmeans.sas"" -log ""&permmeans.log"" -print ""&permmeans.lst""" status=new_status taskname=new_task;
systask command """%sysget(SASROOT)sas.exe"" -noterminal -sysin ""&permmeans.sas"" -log ""&permmeans.log"" -print ""&permmeans.lst""" status=new_status taskname=new_task;

The WAITFOR statement references task names of processes for which the parent must wait while the SYSTASK KILL statement terminates referenced processes. The KILL statement is demonstrated in the “Batch Exception Handling” section later in the chapter.

The NOTERMINAL SAS system option specifies that dialog boxes are not displayed during the SAS session.5 For example, when running automated processes overnight, you don't want to discover that some window has popped up, halting software execution. NOTERMINAL prevents popups during batch jobs, and is also required so that the &SYSENV automatic macro variable will accurately represent BACK for background processing. With the default option TERMINAL enabled, &SYSENV will always reflect FORE, even when batch jobs are running.

Other SAS system options can be included in SYSTASK invocations as necessary. System options that are modified programmatically in the parent process will not percolate to a child process unless explicitly invoked in its SYSTASK statement. Thus, as with global SAS library assignments that are preferred to redundant, manual assignments in child processes, the extent to which SAS options can be standardized will benefit automated software while reducing error-prone, manual invocations of SAS system options via SYSTASK.

Passing Parameters with SYSPARM

The Means program (demonstrated in the “Modularity” section) simulates an analytical module that can be optimized and automated through parallel processing. Modules are often constructed as SAS macros, so they regularly include parameters that are passed during macro invocation to facilitate dynamic processing. Because a new SAS session is spawned when SYSTASK invokes a batch job, neither global nor local macro variables in the parent process can be passed directly to the child. Rather, a single command line parameter—SYSPARM—can pass a text string. Through creative engineering, tokenization, and parsing, multiple macro variables and other information can be passed via SYSPARM to a child process.

Consider a process that runs the MEANS procedure and requires two levels of titles, Title1 and Title2. This is a straightforward task in static code not performed in batch mode:

title1 'Title One';
title2 'Title Two';
proc means data=perm.final;
   var num;
run;

To create these titles dynamically, macro parameters for Title1 and Title2 can be passed through the %MEANS invocation. The output (not shown) is identical to that produced by the previous code but now can be varied by modifying the macro parameters at invocation:

%macro means(tit1=, tit2=);
title1 "&tit1";
title2 "&tit2";
proc means data=perm.final;
   var num;
run;
%mend;
%means(tit1=Title One, tit2=Title Two);

The dynamic macro works well when run interactively, but if it is saved as a separate program and invoked in batch mode, dynamism is lost because the necessary parameters can no longer be passed. To overcome this limitation, the SYSPARM command option is implemented within SYSTASK and used to pass the parameters.

systask command """%sysget(SASROOT)sas.exe"" -noterminal -sysin ""&permmeans.sas"" -log ""&permmeans.log"" -print ""&permmeans.lst"" -sysparm ""tit1=Title One, tit2=Title Two""" status=rc_means taskname=task_means;

To execute the SYSTASK statement, the updated Means module is saved as C:permmeans.sas:

%let perm=c:perm;
libname perm "&perm";
%macro means(tit1=, tit2=);
title1 “&tit1”;
title2 “&tit2”;
proc means data=perm.final;
   var num;
run;
%mend;
%means(&sysparm);

When the Means module is spawned via SYSTASK, it interprets the &SYSPARM value, parses this into separate macro variables &TIT1 and &TIT2, and dynamically produces the output. While this solution is functional, it lacks flexibility, which diminishes the solution's reusability and extensibility. The %GETPARM macro represents a more reusable solution that accepts a comma-delimited list of parameters and assigns these parameters to dynamically named global macro variables. If the &SYSPARM macro variable is empty (e.g., if the SYSPARM parameter was not passed), exception handling directs the macro to terminate via the %RETURN statement.

The %GETPARM macro does pose some security risks, discussed in the “Variable Variable Names” section in chapter 11, “Security.” Despite these risks, its reusable design is beneficial, and the following code is saved to the SAS program C:permgetparm.sas:

* accepts a comma-delimited list of parameters;
* in VAR1=var 1, VAR2=var 2 format;
%macro getparm;
%local i;
%let i=1;
%if %length(&sysparm)=0 %then %return;
%do %while(%length(%scan(%quote(&sysparm),&i,','))>1);
   %let var=%scan(%scan(%quote(&sysparm),&i,','),1,=);
   %let val=%scan(%scan(%quote(&sysparm),&i,','),2,=);
   %global &var;
   %let &var=&val;
   %let i=%eval(&i+1);
   %end;
%mend;

The Means module (saved as C:permmeans.sas) can now be updated to include a reference to the %GETPARM macro. This dynamism enables the SYSTASK statement to pass a virtually unlimited number of parameters to the Means module or other software:

%let perm=c:perm;
libname perm "&perm";
%put SYSENV: &sysenv;
%let tit1=;
%let tit2=;
%include 'c:permgetparm.sas';
%getparm;
title1 "&tit1";
title2 "&tit2";
proc means data=perm.final;
   var num;
run;

However, the Means module is now dependent on the &SYSPARM value; as a result, the code is no longer backward compatible with the interactive mode. In other words, if SAS practitioners need to execute the Means module manually during further development, debugging, or testing, the values for &TIT1 and &TIT2 will be missing when run interactively. This issue is discussed and a solution demonstrated in the “Batch Backward Compatibility” section later in the chapter.

Recovering Return Codes

In production software, return codes are necessary to demonstrate successful completion of macros and programs and to identify exceptions, warnings, or runtime errors that were encountered. Because Base SAS does not provide inherent return code functionality within macros, as discussed in the “Faking It in SAS” section in chapter 1, “Introduction,” global macro variables are commonly utilized to pass return codes and other information from child processes back to the parent process that invoked them. However, when the child is invoked as a batch job in a separate SAS session, macro variables cannot be passed back to the parent, creating a huge obstacle in exception handling and exception inheritance within batch processing.

Very rudimentary exception handling can be achieved by altering the value of the global macro variable &SYSCC within a child process because this value is communicated back to the optional STATUS parameter invoked in the SYSTASK statement. To demonstrate this exception inheritance from child to parent, the following single-line SAS program is saved as C:permchild_exception.sas:

%let syscc=0;

The child process is called by the parent, saved as C:permparent_exception.sas:

libname perm 'c:perm';
systask command """%sysget(SASROOT)sas.exe"" -noterminal -sysin c:permchild_exception.sas -log c:permchild_exception.log" status=RC taskname=task;
waitfor _all_;
systask kill task;
%put RC: &rc;

When &SYSCC is 0 (representing no warnings or runtime errors) in the child process, this communicates successful child completion to the parent, causing the STATUS option to set the &RC return code to 0, as demonstrated in the following log:

systask command """%sysget(SASROOT)sas.exe"" -noterminal -sysin c:permchild_exception.sas -log c:permchild_exception.log" status=RC taskname=task;
NOTE: There are no active tasks/transactions.
waitfor _all_;
systask kill task;
%put RC: &rc;
RC: 0
NOTE: Task "task" produced no LOG/Output.

However, if &SYSCC instead is assigned a value of 1 through 4 in the child process, STATUS will be 1 in the parent process, representing that a warning occurred in the child. Assigning &SYSCC a value greater than 4 will change the STATUS value to 2, representing a runtime error in the child process. Thus, three nondescript return codes—representing successful completion, warning, or runtime error—can be inherited from child to parent without much difficulty.

However, many SAS modules may need to provide much more descriptive information through exception handling return codes. For example, a module testing the existence of a data set might create one return code value if the data set is missing, another if it is locked, and a third if the data set is present and unlocked but has an invalid format. For this level of specificity, Base SAS is unfortunately ill-equipped to transmit information from child processes back to their parents.

In addition to return codes, actual data may need to be passed from a child process back to its parent. SAS macros are often utilized as functions, effectively performing some operation to return some value or values (or dynamically generated SAS code) as one or more global macro variables. This works well when both parent and child exist in the same SAS session; however, when a parent process calls a child batch process, the parent cannot retrieve information or data directly from that child, just as it cannot retrieve descriptive return codes.

The workaround for returning information from a child process to its parent is to write those data either to a SAS data set or to a text file. The data set or text file acts as a control table, storing the information from the child and enabling the parent process to query the control table to receive the information. While messy, this methodology greatly extends the functionality of SAS batch jobs, ensuring bidirectional communication between parent and child processes and apt identification and handling of exceptions and runtime errors. This functionality is discussed in the “Control Tables” section in chapter 3, “Communication.”

Batch Backward Compatibility

While software testing is depicted as a single phase of the SDLC, software testability is a quality characteristic that should be considered throughout the entire SDLC. As code is hardened and operationalized through automated batch jobs, batch-specific techniques (such as passing parameters via SYSPARM) may have been incorporated. These techniques can sometimes reduce the testability of SAS programs, especially when they prevent them from running interactively. Because development and testing occur in the interactive—not batch—mode, all automated processes that incorporate batch processing should ensure backward compatibility to the interactive mode to facilitate inevitable maintenance and testing activities.

The Means.sas program depicted in the “Passing Parameters with SYSPARM” section might at some point need to be modified, requiring development and testing in the SAS interactive mode. But when the Means module is opened and executed interactively, it fails because &SYSPARM is empty, as the SYSPARM parameter has not been passed.

One way to overcome this obstacle is to first test the length of &SYSPARM and, if this is zero, to assign a default value to &SYSPARM that will allow the Means module to execute in test mode. The updated Means module now first tests the content of &SYSPARM and, if it is empty, assigns test values to both &TIT1 and &TIT2:

* saved as c:permmeans.sas;
%let perm=c:perm;
libname perm "&perm";
%let tit1=;
%let tit2=;
%macro parm;
%if %length(&sysparm)>0 %then %do;
   %include 'c:permgetparm.sas';
   %getparm;
   %end;
%else %do;
   %let tit1=Test 1;
   %let tit2=Test 2;
   %end;
%mend;
%parm;
title1 "&tit1";
title2 "&tit2";
proc means data=perm.final;
   var num;
run;

When the updated code is saved to C:permmeans.sas, it can now run in both batch and interactive modes with values supplied for the title statements either through the actual SYSPARM parameter or through the test data. It is critical, however, that test values can be distinguished from actual values. The risk, of course, is that with these modifications, the program could now execute even if the SYSPARM were accidentally omitted from a batch invocation, which should cause program failure. However, this risk could easily be overcome through exception handling that further limits functionality in test mode, for example, by causing the program to terminate by throwing a specific exception.

Another method to mitigate the risk of failure caused by accidental invocation without the SYSPARM option would be to assess the &SYSENV automatic macro variable, discussed earlier in the “Modularity” section. For example, if &SYSENV is FORE, this indicates that the program is executing interactively rather than in batch mode, which could signal the code is being developed or tested. And if the value of &SYSENV is BACK, this indicates the program is running in a production environment as a batch job. Thus, programmatic assessment of &SYSENV could also be used to assign test values to macro variables that otherwise would have been assigned via the SYSPARM parameter.

To be clear, however, testing thus far has described only unit testing in which the batch job is tested in isolation, irrespective of the parent process calling it. Because the Means.sas program is ultimately intended to be run in production as a batch job (as opposed to interactively), integration testing should also be performed that assesses the performance of the batch job in respect to the parent process. Unit testing and integration testing are described and demonstrated in chapter 16, “Testability.”

Batch Exception Handling

The prior-referenced invocations of SYSTASK allude to, yet don't demonstrate, exception handling to validate software success. Because errors can occur inside both the parent and child processes, exception handling must separately address these risks. As with all exception handling routines, business rules first must specify what courses of action to follow when exceptions or runtime errors occur. For example, the following sample business rules prescribe program flow under exceptional circumstances:

  • If an exception occurs in the invocation of the first SYSTASK statement (calling Means.sas), the second SYSTASK statement should not be executed and the parent process should terminate.
  • If an exception occurs in the invocation of the second SYSTASK statement (calling Freq.sas), the first SYSTASK statement should be stopped and the parent process should terminate.
  • If an exception occurs inside the Means module (but its SYSTASK invocation succeeds), the second SYSTASK statement should execute and the parent process should continue.
  • If an exception occurs inside the Freq module (but its SYSTASK invocation succeeds), the first SYSTASK statement should continue to execute and the parent process should continue.

These rules demonstrate a common program flow paradigm in which the parent process acts as an engine and spawns separate child processes (batch jobs) that execute in parallel. Inherent in this program flow complexity, however, is a commensurately complex exception handling framework that must dynamically detect and handle exceptions not only in the parent process but also in each of its children. The sample business rules demonstrate this complexity and also attempt to maximize business value in the face of potential failure. For example, if something goes awry inside the Means.sas program, the Freq.sas program may be unaffected, so it continues to execute. The best practice of maximizing business value despite exceptions or failures is discussed throughout chapter 6, “Robustness.” However, when a more serious exception occurs in the parent process (while invoking the SYSTASK statement itself), the parent process is terminated.

The following parent process implements the sample business rules and represents a more reliable and robust solution:

%let perm=c:perm;
libname perm "&perm";
%global rc_means;
%let rc_means=;
%global rc_freq;
%let rc_freq=;
data perm.final;
   length char $10 num 8;
   do num=1 to 10000000;
      char=put(num,10.);
      output;
      end;
run;
%macro spawn_analytic_modules;
systask command """%sysget(SASROOT)sas.exe"" -noterminal -sysin ""&permmeans.sas"" -log ""&permmeans.log"" -print ""&permmeans.lst""" status=rc_means taskname=task_means blah;
%let rc_systask_means=&sysrc;
%if &sysrcˆ=0 %then %put failed1;
systask command """%sysget(SASROOT)sas.exe"" -noterminal -sysin ""&permfreq.sas"" -log ""&permfreq.log"" -print ""&permfreq.lst""" status=rc_freq taskname=task_freq;
%put SYSRC: &sysrc;
%if &sysrcˆ=0 %then %put failed2;
data _null_;
   put 'before sleep';
   call sleep(1,1);
   put 'after sleep';
run;
%put RC_MEANS INSIDE: &rc_means;
%put RC_FREQ INSIDE: &rc_freq;
%if %eval(&rc_means>0) or %eval(&rc_freq>0) %then %do;
   systask list;
   systask kill task_means task_freq;
   %end;
waitfor _all_ task_means task_freq;
%mend;
%spawn_analytic_modules;
%put RC_MEANS OUTSIDE: &rc_means;
%put RC_FREQ OUTSIDE: &rc_freq;
%macro test_analytic_modules;
%if %length(&rc_means)=0 %then %put The MEANS module did not complete;
%else %if &rc_means=1 %then %put A warning occured in the MEANS module;
%else %if &rc_means>1 %then %put An error occured in the MEANS module;
%if %length(&rc_freq)=0 %then %put The FREQ module did not complete;
%else %if &rc_freq=1 %then %put A warning occured in the FREQ module;
%else %if &rc_freq>1 %then %put An error occured in the FREQ module;
%if &rc_means=0 and &rc_freq=0 %then %do;
   %put Yay! Everything worked!;
   %end;
%mend;
%test_analytic_modules;

The DATA _NULL_ step is required to wait a second so that the return code from the two STATUS options will have had time to be generated. Without this convention, STATUS return codes will always be blank unless they are assessed after a WAITFOR statement, because the STATUS values will not have been generated.

The %TEST_ANALYTIC_MODULES macro prints text descriptions of various warnings and errors that could have occurred. In actual production software, this logic would dynamically alter program flow rather than printing less-than-useful messages (i.e., exception reports) to the SAS log. The benefits of exception handling over exception reporting are discussed in the “Exception Handling, Not Exception Reporting!” section in chapter 6, “Robustness.”

Although not demonstrated, the WAITFOR statement TIMEOUT option is useful in exception handling. TIMEOUT forces a SYSTASK job—including its associated SAS session and SAS program—to terminate if execution time exceeds a parameterized threshold. The TIMEOUT option can be used to initiate a fail-safe path because it can kill rogue or runaway batch jobs that have exceeded execution time thresholds, thus preserving the integrity (and control) of the parent process. TIMEOUT is demonstrated in the “Euthanizing Programs” section in chapter 11, “Security.”

Starting Batch from Enterprise Guide

Batch jobs can be spawned by Enterprise Guide projects just as easily as from SAS programs. To demonstrate this, open the Enterprise Guide environment, drag the SAS program C:permetl_engine.sas into the Process Flow window, and run the process. In many environments, this will execute identically to running the program in the SAS application. In other environments, based on local policy set by SAS administrators, the SYSTASK statement may be prohibited because it shells directly to the OS. SAS practitioners should consult their SAS administrator if failures occur during attempted SYSTASK invocation within the SAS Display Manager or SAS Enterprise Guide.

STARTING IN BATCH MODE

Throughout this text, batch jobs represent the SAS programs (child processes) that execute in batch mode. As demonstrated in “Starting in Interactive Mode,” batch jobs can be spawned through the SYSTASK statement within software executing interactively. While only one batch mode exists, various paths lead to it. Several methods to spawn batch jobs are discussed in the SAS® 9.4 Companion for Windows, Fourth Edition, including:6

  • From SAS software executing interactively, with the SYSTASK or X statement
  • From SAS software executing in batch mode, with the SYSTASK or X statements
  • From the command prompt, by invoking SAS.exe and referencing the specific SAS program to be run with the SYSIN parameter
  • From a saved batch file, by double-clicking the icon, typing the file name at the command prompt, or by otherwise executing it
  • From the OS, by right-clicking a SAS program file (or icon) and selecting “Batch Submit with SAS 9.4”

Thus far, only the first method of spawning batch jobs has been demonstrated. This list also does not include additional methods that can run batch jobs in SAS Enterprise Guide or SAS Studio. The tremendous advantage of the first two methods—initiating batch jobs with SYSTASK—is the ability to incorporate information and macro variables from the parent process and pass them via the SYSPARM parameter to child processes. Another advantage is the ability of SYSTASK statements to be dynamically generated and invoked. For example, as demonstrated in the “Salting Your Engine” section in chapter 11, “Security,” a parameterized number of SYSTASK statements can be dynamically invoked through a macro loop that iteratively calls SYSTASK. While the SYSTASK statement can spawn batch jobs from both Windows and UNIX environments, some installations of SAS Enterprise Guide locally restrict use of SYSTASK and X statements for security reasons, eliminating the ability to run batch jobs by invoking SYSTASK.

The remaining three methods of spawning batch jobs each initiate a SAS session from scratch—that is, directly from the OS rather than from within an open SAS session. Thus, the primary limitation of these methods is the inability to pass macro variables and other information dynamically from the parent process to the child. A secondary limitation is the inability to execute SYSTASK statements dynamically, for example, by invoking a parameterized number of SYSTASK statements in a macro loop. However, parameters can still be passed to batch jobs via the SYSPARM option, which is demonstrated in the “Passing Parameters with SYSPARM” section. The principal advantage of initiating SAS batch jobs directly from the OS is the ability to schedule execution through the OS, eliminating the need for SAS practitioners to be present when production software is executed or to babysit it while it runs.

While syntax differences between batch job spawning methods exist, these are minor, and the capabilities of each are similar. Each method provides a way to invoke batch jobs in series or parallel, can halt a spawned job until other processes have completed, and can generate return codes that can be utilized to facilitate quality assurance through an exception handling framework. Notwithstanding these largely equivalent capabilities, the extent to which program flow business logic can be included within SAS software (as opposed to command syntax found in batch files) will add versatility to program flow and exception handling paths.

Starting Batch from Command Prompt

Batch mode is always invoked when SAS is executed from the command prompt by calling SAS.exe. In the interactive mode, the %SYSGET(SASROOT) macro function supplies the location of the SAS executable file, but when launching from the OS, it must be supplied manually. In SAS 9.4 for Windows, the default location for the SAS executable follows:

C:Program FilesSASHomeSASFoundation9.4sas.exe

Because “Program Files” contains a space, quotes must surround the command when invoked. If no other options or parameters follow the command, the SAS application opens in interactive mode. Specifying the NOTERMINAL option enables batch mode, causing the SAS Display Manager not to open and modifying the &SYSENV value from FORE (foreground) to BACK (background). Additional useful options include NOSPLASH, NOSTATUSWIN, and NOICON, which can prevent windows from opening unnecessarily while in batch mode.

Other command parameters are identical to those discussed throughout the previous “Starting in Interactive Mode” sections. SYSIN references the SAS program being executed, LOG its log file, and PRINT its output file. In interactive mode, each of these parameters could be generated through macro code, such as dynamically referencing the folder or program name. When launching batch jobs directly from the OS, however, these attributes each must be hardcoded, as demonstrated in the following command statement, which launches the Means.sas program from the command prompt:

“C:Program FilesSASHomeSASFoundation9.4sas.exe” -noterminal -sysin c:permmeans.sas -log c:permmeans.log -print c:permmeans.lst

This batch job invocation from the OS does not differ from launching the batch job from within the SAS application (via SYSTASK), as demonstrated throughout the “Starting in Interactive Mode” sections. The main differences are the lack of STATUS and TASKNAME options used by the SYSTASK statement itself to control program flow and enable exception handling. However, because Means.sas is now spawned directly from the OS, when it completes, program control returns to the OS. Thus, if warnings or runtime errors are encountered in the batch job, because no SAS engine is driving program control, no exception handling framework exists to validate program success or detect its failure. For this reason, to the extent possible, program flow should be initiated by the OS but controlled by SAS software.

Had the Means.sas program failed with a warning or runtime error, this would have been communicated to the log file Means.log, although no analysis of this file would have occurred. Because one of the primary responsibilities of automated software is automatic detection and handling of exceptions and runtime errors, while the previous command line invocation does spawn a batch process, it cannot truly be said to be automated. A more reliable method of invoking a batch job from the command line entails executing a parent process that can act as an engine by spawning, controlling, and validating subsequent child batch processes. The following command line statement runs the parent process ETL_engine.sas (which in turn invokes the Means.sas batch process via SYSTASK) from the OS:

"C:Program FilesSASHomeSASFoundation9.4sas.exe" -noterminal -sysin c:permetl_engine.sas -log c:permetl_engine.log -print c:permetl_engine.lst

The parent process (ETL_engine.sas) is a much better candidate to be invoked as a batch job directly from the OS because it contains exception handling routines that monitor not only its own exceptions but also exceptions generated by its child processes. This method also enables children to be terminated if necessary via the WAITFOR statement TIMEOUT option—an especially useful feature if they exceed execution time thresholds.

Starting Batch from a Batch File

Within this text, batch jobs represent SAS programs that are run in batch mode, regardless of whether they are invoked interactively through SAS software, through SAS software running in batch mode, or directly from the OS. Batch files, on the other hand, represent text files that contain command line syntax interpreted by the OS environment, not by the SAS application. Thus, while batch jobs are always SAS programs, batch files are never SAS programs but can invoke the SAS application via command line statements to run SAS programs. Batch files typically have the .bat file extension.

Starting a batch job from a batch file can overcome some of the limitations that occur when batch jobs are initiated directly from the command prompt. For example, because multiple child processes can be spawned from a single batch file, a batch file can effect rudimentary control over its children, such as by halting batch file execution until child processes have completed. Batch files can also prescribe whether child processes execute in series (synchronously) or in parallel (asynchronously).

The downside of launching batch jobs from a batch file, like launching batch jobs from the command prompt, is the lack of dynamic program control and robust exception handling that are available within SAS software. The following batch file spawns both the Means and Freq modules and should be saved as C:permetl_engine.bat:

start "job1" "c:program filessashomesasfoundation9.4sas.exe" -noterminal -sysin c:permmeans.sas -log c:permmeans_batch.log -print c:permmeans_batch.lst
start "job2" "c:program filessashomesasfoundation9.4sas.exe" -noterminal -sysin c:permfreq.sas -log c:permfreq_batch.log -print c:permfreq_batch.lst

This batch program can be executed in several ways. The file name (including .bat extension) can be typed at the command prompt, or, from a Windows environment, the batch file can be double-clicked or right-clicked with “Run as administrator” selected. The START command signals that the batch jobs should be run asynchronously—that is, the Freq module does not wait for the Means module to complete. When START is used, the specific job must be named, thus the first batch job is named Job1 and second Job2.

To instead run these processes in series (i.e., synchronously), the following code should be saved to the batch file C:permetl_engine.bat and executed:

"c:program filessashomesasfoundation9.4sas.exe" -noterminal -sysin c:permmeans.sas -log c:permmeans_batch.log -print c:permmeans_batch.lst
"c:program filessashomesasfoundation9.4sas.exe" -noterminal -sysin c:permfreq.sas -log c:permfreq_batch.log -print c:permfreq_batch.lst

In this revised example, the Freq.sas program now executes only after Means.sas completes. Additional program control can be implemented with the WAIT option that can be specified after START, but the level of sophistication still does not begin to approximate what can be accomplished when the parent process is a SAS program rather than a batch file.

To further illustrate the inherent limitations in batch files, consider the business rules prescribed in the earlier “Batch Exception Handling” section. These rules require handling of exceptions generated not only in the parent process but also in child processes. Yet none of these business rules are operationalized in the previous batch file invocation, again because the parent process is the OS itself—now a batch file and not the command prompt—rather than SAS software. To eliminate these limitations and deliver a solution that maximizes the benefits of both the OS environment and SAS software, the following batch file demonstrates the best practice of invoking a SAS parent process from a batch file and should be saved as C:permetl_engine.bat:

"c:program filessashomesasfoundation9.4sas.exe" -noterminal -sysin c:permetl_engine.sas -log c:permetl_engine.log -print c:permetl_engine.lst

In this final solution, the parent process ETL_engine.sas is invoked directly from the OS via the batch file. This enables the batch file to be scheduled through the OS so it can be executed as necessary without further human interaction. However, the dynamic program flow inside ETL_engine.sas is preserved, ensuring that prescribed business rules are followed. Thus, as the respective child processes execute, they can do so dynamically from within the parent process, ensuring that success is validated and runtime errors are identified and handled. This parent–child paradigm is arguably the most commonly implemented method to achieve robust, reliable automation in SAS software.

AUTOMATION IN THE SDLC

If the decision to automate is made during software planning, this will tremendously influence software design and development. SAS practitioners will be able to understand whether a central engine or parent process is necessary to drive program flow and what parameters, options, or other information will need to be passed to child processes via SYSPARM. Because substantial performance gains can sometimes be achieved through executing child processes as parallel batch jobs, this performance may spur the decision to modularize software into discrete chunks that can be spawned as concurrent batch jobs.

In other cases, the decision to automate software is made after the software is already in production. This decision is sometimes made due to unexpected success of software that has become increasingly critical or valuable over time and thus requires commensurate reliability and availability. To achieve this increased performance, including automated exception handling and validation of process success, business rules should prescribe program flow under normal and exceptional conditions and can be implemented via a robust exception handling framework. While it is always easier to design and develop software initially with automation in mind, the transition is commonly made, and in some cases will require only a few lines of extra code to implement.

Requiring Automation

Regardless of when the decision is made to automate SAS software, automation requirements should clearly state the intent and degree of automation. The intent is essentially the business case for what value should be gained through automation, which should surpass the cost of additional code complexity inherent in automation. For example, one intent might state that SYSTASK is being implemented so that parallel processes can be asynchronously spawned to facilitate faster parallel processing. This differs substantially from another common intent that aims to create batch jobs and batch files that can be scheduled to run recurrently on the SAS server. The first intent inherently requires modularization of software and critical path analysis, while the second intent could be achieved with a one-line batch file that invokes a SAS program from the OS.

The intent in the first example aims to increase software performance by reducing execution time, a metric that can be measured against pre-automation baselines (if those exist) and contrasted against the effort involved in automating the process. In the second example, the intent aims instead to decrease personnel workload during O&M activities, so these decreased operational hours can be measured against the development hours expended to automate and test the software. As the frequency with which software is intended to be run increases, the relative benefit gained from automation increases because the cumulative effects of greater software performance (or decreased personnel workload to support software operation) are multiplied with every execution. In other words, the more often stable software is executed, the more valuable automation of that software becomes.

The degree of automation should also accompany automation intent. For example, if child processes are being automated with SYSTASK so that they can be run in parallel, this doesn't necessitate that the parent process must also be automated. Thus, the SAS application might still have to be started manually, the parent process opened and executed, and the various parent and child logs reviewed for runtime errors. While somewhat inefficient, the software would still benefit from the limited automation of its child processes, despite not being a fully automated solution. Another common partial automation paradigm involves full automation of software execution, but no (or limited) automation of exception handling. Thus, software might be scheduled through the OS and run automatically each morning at 6 AM before SAS practitioners have arrived at work yet require someone to review SAS logs manually later in the day.

In other cases, automation truly denotes the total lack of human interaction during recurrent software operation. The OS environment executes a batch file at scheduled intervals, which in turn runs a SAS program. That program may in turn execute subsequent batch children either in series or parallel before terminating. Exception handling detects exceptions, warnings, and runtime errors during execution—in both parent and child processes—and a separate process may validate the various SAS logs after execution to provide additional assurance that execution was successful. Because stakeholders can pick and choose which aspects of software to automate and which to perform manually, and because they can increase or decrease automation levels throughout a software's lifespan, automation will differ vastly between organizations and specific software products.

Measuring Automation

Assessing automation performance will depend expressly on the degree to which software was intended to be automated. One of the most telling metrics of software automation is the degree to which exception handling and failure communication are automated. A first grader could schedule SAS software to run recurrently through a batch file that executes SAS software. However, if automation only automates execution and does nothing to automate quality controls and process validation, automation can actually make software less reliable when SAS file logs are overwritten or ignored, or when scheduled tasks execute despite failures of prerequisite tasks. Therefore, while automation does not necessarily imply that software should be more reliable or robust than manually executed software, automation should not make software less reliable or robust by masking exceptions, runtime errors, and possible invalid data products.

When scheduled jobs are executed in short succession, a performance failure might occur if one instance of the software takes too long to run, causing functional failures when the next scheduled run—quick on the heels of the first—executes before the first has completed. In fully automated software, robust processes should predict and appropriately handle these common exceptions through business rules and dynamic exception handling. In many cases, the performance of fully automated software can be assessed by evaluating software reliability and availability metrics, given that software failure often incurs human interaction to recover and restart software. Reliability and availability are discussed throughout chapter 4, “Reliability.”

Recoverability metrics, described in chapter 5, “Recoverability,” can also be utilized to measure the degree and success of automation. The TEACH mnemonic describes that recovery should be autonomous, in that recovery should not only be automated to the extent possible but also facilitate intelligent recovery to an efficient checkpoint that maximizes performance. Thus, while some software will be expected to run without human interaction, SAS practitioners may be expected to play a large role in software recovery from failure. In other cases, even software failure detection and recovery should be automated (to the extent possible), and automation additionally will be measured by the success and speed with which software recovers from failure without human intervention.

WHAT'S NEXT?

Software automation is typically the last step in hardening production software, so this chapter concludes Part I, which describes dynamic performance requirements that can be observed and measured during software execution. What occurs beneath the software covers, however, can be just as relevant to software quality as dynamic characteristics. While not directly observable during execution, static performance requirements aim to ensure that code can be easily maintained and, through modular software design, that code is organized sufficiently to benefit dynamic performance techniques such as parallel processing. The next six chapters demonstrate static performance requirements that should be incorporated to improve the maintainability of production software.

NOTES

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset