Chapter 6
Robustness

More coca, please.

Off-roading at 16,000 feet across the Bolivian Altiplano (high plane), your head is literally in the clouds. Mine was also figuratively in the clouds as I munched on a Diamox and coca candy cocktail, trying to stave off altitude sickness while taking in the extraterrestrial landscape.

While my head wasn't prepared for the three-day trek through the heavens, my guides fortunately had thought of everything. Those old black-and-white photos of “Okies” escaping the Dust Bowl, setting out with high hopes and humble means on Route 66 for a new life in California, their Model Ts overburdened, all their earthly wares tied haphazardly to anything that would hold a rope—our two Land Cruisers looked no different as we departed from Uyuni, Bolivia, overflowing with bags and backpacks, and crowned with tires and tarps. The only discernible difference was that we were driving into our Dust Bowl.

Carving trails into the famed Salar de Uyuni (at more than 4,000 square miles, the largest salt flats in the world), we raced across in blinding sunlight that reflected off the brilliant white landscape. You touched the brim on your polarized sunglasses every couple minutes just to prove to yourself they were still in place despite the necessity to squint. I had trekked to Badwater Basin, Death Valley, to visit the California salt flats and lowest point in North America—yet Bolivian brine was like nothing I had ever seen. You can succumb to sunburn in minutes here, so when we disembarked for an hour to go play in the salt, our guides fortunately provided sunscreen for the tourists who had neglected to bring any.

The caravan continued across the salt flats, crushing the hexagonal plates of salt that burgeoned forth beneath our tires. The next destination was Isla Incahuasi, a rugged, rocky “island” outpost that perforates the otherwise bleak incandescence. For the guys—all over 6 feet and crammed into the Beverly Hillbillies roadster—it offered a chance to unfold our bodies and, for our lone female companion, a baño (bathroom) provided a welcome relief.

Seconds after she had darted into the primitive cobblestone structure, she raced back to our SUVs, her look of utter dejection turning to elation as our guide's outstretched arm held toilet paper. They had fortunately thought of that, too.

By the appearance of the overladen Land Cruisers, you'd have thought we were pitching tents on the Salar, but no, we were booked into a hotel constructed entirely from salt blocks perched somewhere in the mountains. However, getting there would prove to be the true adventure.

As we continued on, the salt dissipated, giving way to mud, then sand, then gravel, then outright rocks, as driving conditions continued to deteriorate. The first flat tire—exposing the more infamous meaning of the salt flats—came in the afternoon. Our guides were able to repair it quickly with an air compressor they had fortunately brought.

As we departed the basin and the terrain turned more mountainous, several other flats would follow which were met with more compressor sessions, aerosol tire sealant, and finally full replacement, all of which were adroitly handled by the guides. We made it to our briny, barebones hotel by dusk to dine sumptuously on a salt table and sleep on salt beds.

Day two was a trek through the Reserva Nacional de Fauna Andina Eduardo Abaroa, a federal nature preserve of untold beauty and often austerity. We chased flamingos, jumped through sulfuric clouds of off-gassing geysers, relaxed with cold beers in geothermal hot springs, and of course made countless stops for recurring flat tires endemic to the rough terrain.

At one point, having limped as far as we could on a bandaged spare, the guides pulled into a remote service station, where the tire was professionally patched—of course by a crew with whom the guides were fortunately besties.

But throughout the adventure, at every turn the guides had our backs—and fortunately an endless supply of coca leaves to chew to stave off the ever encroaching altitude sickness.

images

I've intentionally overused “fortunately” throughout the scenario to illustrate that fortune in fact had nothing to do with our successful trek. The guides were pros. They'd taken this exact route hundreds of times. They knew the terrain. They knew their vehicles. They knew their equipment. The only variability seemed to be the occasional curveball that a tourist would throw, but after so many expeditions, even that variability was subtle and predictable.

Robust software should be equally equipped to detect and handle predicted threats and vulnerabilities. You don't go off-roading in the wilderness for days without a spare tire (or two) because you know the terrain is unforgiving and flat tires can occur. You similarly don't assume that a data set will be available when you try to run a MEANS procedure, because you can predict that it could be missing, locked, or possibly corrupted.

Over time, additional threats or vulnerabilities may be identified that weren't expected at the outset. The guides initially hadn't carried sunscreen but, after a couple early trips in which trekkers got horribly sunburned on the flats, the safety in a bottle became a small measure they could take to avoid a huge risk. Software exception handling routines similarly can be modified over time to address new or unpredicted risks to facilitate smooth performance.

The trek also demonstrated the flexibility that should exist in a risk management framework. When our tire initially went flat, they pumped it immediately. When leaks worsened, they applied tire sealant. When we'd actually found a patch of level ground, they took the opportunity to more closely examine and often exchange failing tires. Finally, the desert mechanics patched a tire when we'd nearly exhausted our nine lives.

SAS practitioners should also understand the full array of tools and techniques that can eliminate or mitigate risk to software. Exception handling aims not only to identify threats but also to enable software to continue despite encountering threats, thus maximizing business value.

The guides never had to use it, but they had one final Hail Mary in their pockets in the event of extreme injury, sickness, or other calamity—a satellite phone charging in the glovebox. Even the most robust software will encounter exceptions, errors, or environmental conditions that cannot be overcome and which signal imminent failure. The Hail Mary in software development should be a fail-safe path that facilitates graceful termination.

We never had to use the sat phone, but its presence was reassuring in case we ever ran out of tires…or coca.

DEFINING ROBUSTNESS

Robustness is “the degree to which a system or component can function correctly in the presence of invalid inputs or stressful environmental conditions.”1 This is distinguished from fault tolerance, defined as “the degree to which the software product can maintain a specified level of performance in cases of software faults or of infringement of its specified interface.”2 The terms are often commingled even in technical discussions, but robustness speaks to overcoming variability external to software, whereas fault-tolerance aims to overcome unwanted (and sometimes unpredictable) variability within software.

The main objective of robust and fault-tolerant software is to detect and overcome variability so that business value can be delivered. A secondary objective, however, is to enable software to fail safe when software execution cannot continue. Fail safe is defined as “pertaining to a system or component that automatically places itself in a safe operating mode in the event of a failure.”3 Failing safe is colloquially known as failing gracefully or graceful termination while the fail-safe path terminates program flow and safely and securely shuts down software when all hope has been lost. In other words, software will do its best to overcome challenges but, when faced with insurmountable obstacles, will exit in a manner that doesn't damage itself, its data products, other output, or other aspects of the environment.

This chapter introduces defensive programming techniques that require the creative wherewithal to imagine potential vulnerabilities in and threats to software, as well as the technical prowess to design solutions that mitigate or eliminate those vulnerabilities or threats before they occur or as they are occurring. The chapter also introduces exception handling, which enables software to detect variability such as exceptions, runtime errors, or other faults and to respond dynamically based on prescribed business rules. In some cases, full business value can still be delivered but, in cases in which only partial business value can be delivered, it is nevertheless done in a planned manner. Through this chapter, SAS practitioners will gain an understanding of how to build robust software that benefits from exception handling and facilitates more reliable software performance.

ROBUSTNESS TOWARD RELIABILITY

The journey toward robustness is ultimately one toward reliability. The intent of equipping your Land Cruiser with off-road shocks and all-terrain tires is not to impress the neighbors, but ultimately to get somewhere reliably despite the adversity or obstacles you may face along the way. Making software more robust does not remove the variability that reliability loathes, but instead improves the adaptability of software by allowing it to deliver consistent performance despite inconsistent inputs or environmental elements.

Consider the example of SAS extract-transform-load (ETL) software designed to ingest third-party data from an external source. Because SAS practitioners cannot control the quality of third-party data input, they must validate data quality when data are ingested. Thus, robust ETL software is able to continue executing even if some values, observations, or data sets must be deleted or modified because they are missing, duplicate, corrupt, or otherwise invalid. For example, if 19 data sets were received correctly, but one was corrupt, robust ETL software should be able to process the valid data when ingestion can still deliver partial business value, thus maximizing reliability to the extent possible.

In another example, software intended to be run in different environments (such as in both SAS Display Manager and the SAS University Edition) will need to be robust to variability in those environments. File naming conventions, file structure, system options, and other functionality do subtly differ between the environments, and the inability to recognize these differences could result in failure. In this sense, the aim of achieving software portability is to ensure that the software is robust enough to function and perform equivalently across SAS environments in which the SAS interface, installation type, or other aspects vary. Thus, software portability, discussed in chapter 10, “Portability,” aims to make software robust and reliable despite specific sources of environmental variability.

While the primary function of robustness is to enable software to continue functioning despite the bumps in the road (and thus deliver full or partial business value), the secondary function is the facilitation of fail-safe termination. In data analytic development, the most important aspect of software reliability is the generation of an accurate solution or data product, without which the software is worthless. Robust software can prevent inaccurate or invalid results by ensuring that software terminates when it encounters an environment or condition that would cause functional failure, just as experienced off-roaders can read the terrain and realize when it's time to turn back toward safer ground. Thus, as a last resort, even the most elaborate exception handling routines should provide a fail-safe path to graceful software termination.

DEFENSIVE PROGRAMMING

Defensive programming is another name for developing robust software. By identifying known threats and vulnerabilities, developers defend against failure by making software more robust and reliable. Defensive programming is greatly facilitated by an awareness of specific threats and vulnerabilities, whether external (such as server outage) or internal (such as unexpectedly locked SAS data sets) that can cause runtime errors. These threats are unexpected in the sense that no one knew the data set would be locked at 10 AM or the server would fail at 3 PM, but they are nevertheless predictable because these exceptions are readily identifiable. Defensive programming assumes the mind-set that every data set is locked or missing until proven otherwise and every SAS function, statement, and procedure has failed until otherwise validated.

Risk identification is the first step in defensive programming. Risks typically include software that terminates early with a failure, or software that completes but with functional or performance failures. Each of these situations reduces or eliminates business value. Thus, the second step is to identify specific vulnerabilities within software (or requirements) that could be exploited to produce a failure. These vulnerabilities comprise the risk register, described in the “Risk Register” section in chapter 1, “Introduction,” and should be used to identify which vulnerabilities should be eliminated or mitigated through exception handling. In software that does not demand high reliability or robustness, the decision is often made to forgo exception handling and accept all risks; however, robust software should incorporate an exception handling framework that identifies and handles exceptions and runtime errors that occur.

Specific Threats

The following SAS code demonstrates the “data-set-run” mentality—the unfortunate reality of how most SAS practitioners are taught to code through instruction, in literature, and by their peers. This straightforward representation may suffice in end-user development environments or when code doesn't need to demonstrate robustness, but it lacks sufficient quality for production environments:

data final;
   set original;
run;

The code creates the Final data set from the Original data set and is the most basic implementation of the DATA step. However, in complex, variable environments, defensive programmers must ask themselves two questions about every line of code:

  • What are the specific threats that could cause this line of code to fail?
  • What occurs if the software fails for any reason at this line of code?

The first question addresses robustness while the second question addresses fault-tolerance and risk. Although the degree to which robustness and fault-tolerance are integrated into software should depend on the required level of software reliability, defensive programming techniques teach developers to eschew error-prone assumptions—except to assume that if it can fail, it will.

Some threats to the previous DATA step include:

  • The Original data set may not exist.
  • The Original data set may exist but be exclusively locked so it cannot be read from.
  • The Original data set may exist but in fact be the wrong data set.
  • The Original data set may exist but be corrupt.
  • The Original data set may exist but be missing data, variables, or observations or otherwise lack completeness or accurate structure.
  • The Original data set may be too large to duplicate in the SAS application.
  • The Original data set may be too large to create in the WORK library.
  • The Final data set may exist but be exclusively locked so it cannot be overwritten.
  • The SAS application may run out of memory processing the DATA step.
  • The DATA step may incur other unspecified warnings or errors.

The first threat (a missing file) is a specific software threat that can exploit a specific software vulnerability—the failure to validate data set existence. If SAS software requires reliable code, but fails to validate data set existence and availability, these vulnerabilities represent errors (human mistakes) as well as code defects that must be remedied to achieve the required level of robustness. When this vulnerability is exploited, the following failure is produced:

data final;
   set original;
ERROR: File WORK.ORIGINAL.DATA does not exist.
run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.FINAL may be incomplete.  When this step was stopped there were 0 observations and 0 variables.

While the previous threats are all possible, the risk they pose will vary substantially from one environment to the next and even from one DATA step to the next. For example, the DATA step reads the Original data set from the WORK library, which normally is inaccessible to other users and processes. Thus, while it is technically possible on some networks to manually assign a library reference to another SAS practitioner's temporary WORK library, in general, this constitutes an extremely low risk; thus data sets in the WORK library do not need to be protected against the risk of inaccessibility.

However, the following modified DATA step is inherently riskier than the previous because it references a shared library PERM that could be accessed by other developers; therefore, testing both the availability and the file lock status of PERM.Original before the DATA step would be more valuable than in the prior process.

data final;
   set perm.original;
run;

Separate threats do not necessarily require separate remedies. For example, to test for data set existence, the EXIST function (often operationalized with the %SYSFUNC macro function) is commonly used. The following code now first determines whether PERM.Original exists before continuing:

libname perm 'c:perm';
%macro doit();
%if %sysfunc(exist(perm.original)) %then %do;
   data final;
      set perm.original;
   run;
   %end;
%mend;
%doit;

An unrelated threat occurs when the PERM.Original data set is exclusively locked and the SET statement fails. This vulnerability can be eliminated by first testing the lock status of the data set with the OPEN function, but as it turns out, the OPEN function will also fail if the data set is missing. Thus, the following code ensures that the data set is both existent and accessible before the DATA step execution, effectively killing two threats with one stone:

%macro doit();
%local dsid;
%let dsid=%sysfunc(open(perm.original));
%if %eval(&dsid>0) %then %do;
   data final;
      set perm.original;
   run;
   %let close=%sysfunc(close(&dsid));
   %end;
%mend;
%doit;

This code now defends against the first two enumerated threats, lack of existence and accessibility. However, in defending against failure, SAS practitioners must always remember to ask the second, general question: What happens if my code fails here? In other words, if an unknown exception or runtime error occurs (possibly due to a latent defect) on line 5, what will happen? Will the program continue in a safe, predictable manner? Or, if the power fails while line 5 is being processed, what will occur, and will data products or the environment be damaged in the process, either during failure or resultant recovery?

As demonstrated throughout later sections, exception handling can make software much more reliable and robust. However, it can also introduce additional, unintended errors if done improperly or incompletely, providing SAS practitioners and other stakeholders with a false sense of security. Some vulnerabilities can be introduced through exception handling routines and cannot be overcome, but SAS practitioners should at least be made aware of them. For example, the following code represents a slightly more complex implementation of the %DOIT macro, which contains an error (and latent defect) immediately after the DATA step.

%macro doit();
%local dsid;
%let i=1;
%let dsid=%sysfunc(open(perm.original));
%if %eval(&dsid>0) %then %do;
   data final;
      set perm.original;
   run;
   %if &i=1 %then growl; *defect;
   %let close=%sysfunc(close(&dsid));
   %end;
%mend;
%doit;

In this example, the macro variable &I simulates a loop or some other variable construct that was not thoroughly tested. Thus, when &I is not equal to 1, no runtime error occurs and the defect goes undetected, but when &I is 1, the SAS application doesn't know how to “growl” and fails. Central to the exception handling that has been implemented, when SAS fails to growl it also prevents the CLOSE function from executing, which in turn retains the data stream that was opened. Thus, after the initial failure, even after the growl line is removed, the code will perform incorrectly because it will open a second data stream to the Original data set with the OPEN function. This failure, while not producing a runtime error, will effectively prevent the software from closing the stream, which it will unnecessarily maintain until the SAS session is terminated.

Thus, in implementing use of %SYSFUNC(OPEN) to validate data set existence and accessibility, SAS practitioners must understand the new risk posed by a failure to close the associated data stream. Fault-tolerant software that fails safe would ensure that this stream is closed as part of terminating a SAS macro or program, as demonstrated in the “Closing Data Streams” section in chapter 11, “Security.” It's impossible to eliminate all risk from software, and accepting risk is a necessary component of all software development. So while developers might do nothing to mitigate or eliminate the vulnerability that exists when CLOSE fails to execute, they should still be aware of it. In general, however, software is significantly less risky and more reliable with exception handling in place.

General Threats

General threats represent unpredictable or unknown threats to software functionality. For example, occasionally SAS will unexpectedly throw a runtime error whose description references a Java error—something completely outside the Base SAS world—which must be tracked down. For example, what do you do when your program is executing confidently, and suddenly you're embroiled in this morass?

ERROR:  An exception has been encountered.
Please contact technical support and provide them with the following traceback information:
The SAS task name is [Submit]
ERROR:  Read Access Violation Submit
Exception occurred at (04A3E47A)
Task Traceback
Address   Frame     (DBGHELP API Version 4.0 rev 5)
0000000004A3E47A  000000000737ED00  sasxkern:tkvercn1+0xBD43A
00000000049D1074  000000000737ED80  sasxkern:tkvercn1+0x50034
00000000049E382C  000000000737EF30  sasxkern:tkvercn1+0x627EC
00000000049E2AC0  000000000737F020  sasxkern:tkvercn1+0x61A80
00000000049E56B8  000000000737F290  sasxkern:tkvercn1+0x64678
00000000049E098C  000000000737F3C0  sasxkern:tkvercn1+0x5F94C
00000000049E053C  000000000737F450  sasxkern:tkvercn1+0x5F4FC
0000000004DA2761  000000000737F458  sasxshel:tkvercn1+0x1721
0000000004DA19D5  000000000737F610  sasxshel:tkvercn1+0x995
0000000004DCAF20  000000000737F750  sasxshel:tkvercn1+0x29EE0
0000000004DF4341  000000000737FAB0  sasxshel:tkvercn1+0x53301
0000000004DF7BAC  000000000737FBA0  sasxshel:tkvercn1+0x56B6C
0000000004DFA4A0  000000000737FBF0  sasxshel:tkvercn1+0x59460
00000000034F89DB  000000000737FBF8  sashost:Main+0x10EBB
00000000034FE62D  000000000737FF50  sashost:Main+0x16B0D
00000000770259BD  000000000737FF58  kernel32:BaseThreadInitThunk+0xD
000000007715A2E1  000000000737FF88  ntdll:RtlUserThreadStart+0x21

The reality is that the SAS application itself is software and susceptible to its own defects and errors, and, while rare, they inexplicably rear their ugly heads from time to time. Fault-tolerant code would not be expected to identify the specific threat of the previous error, but it should appropriately handle the general threat that SAS code at any point could experience a similar failure. For this reason, testing general code failures with automatic macro variables such as &SYSERR and &SYSCC is a critical component of fault-tolerant design.

A second interpretation of general threats includes any threat not explicitly identified and communicated within an exception handling framework. Failure to detect and handle specific threats can occur for several reasons, including:

  • Ignorance—SAS practitioners unfamiliar with exception handling, or possibly with Base SAS in general, may not realize the importance of exception handling or understand the nature of a specific exception or error.
  • Negligence—This can occur when robustness isn't coded into software (when requirements direct it should be) or when robustness is not prioritized in software requirements. Choosing not to implement robustness should always be an intentional decision based on calculated risk and knowledge of vulnerabilities.
  • Acceptance—By accepting the risk of a specific threat, software robustness and reliability may be diminished. But when threats pose only negligible risk, they often are accepted. For example, the “data-set-run” mind-set accepts en masse all of the previously enumerated threats.

To illustrate the difference between specific and general threats, the simple DATA step is reprised from the “Specific Threats” section:

data final;
   set original;
run;

This time, instead of Original being missing or some other predictable runtime error occurring, Base SAS informs you “ERROR: An exception has been encountered.” and throws some unfriendly hex your way—that's hexadecimal and not an actual curse, to be clear. But it may feel like witchcraft, because there's neither a way to predict nor recover from the failure. Fault-tolerant SAS software will instead test for any general failure (which includes the hex) by assessing &SYSERR immediately after a procedure or DATA step or &SYSCC before exiting a macro or program. The use of automatic variables as return codes is demonstrated throughout chapter 3, “Communication.”

The following revised code uses the &SYSERR automatic macro variable to detect any warnings or runtime errors that may have been produced by the DATA step:

%macro doit();
%let syscc=0;
%local dsid;
%let dsid=%sysfunc(open(perm.original));
%if %eval(&dsid>0) %then %do;
   data final;
      set perm.original;
   run;
   %let close=%sysfunc(close(&dsid));
   %end;
%if &syscc>0 %then %return;
* otherwise do other stuff;
%mend;
%doit;

The code now terminates the %DOIT macro using the %RETURN statement if warnings or runtime errors are encountered. This framework detects not only the threat of a missing or inaccessible data set but also other threats that cause warnings or runtime errors, including the previous hex. The %DOIT macro is not made more robust by this additional post hoc exception handling that checks &SYSCC, because robustness implies that the code is able to continue toward the happy trail and produce some ultimate business, but it is made more fault-tolerant because code that otherwise would follow the DATA step will be prevented from executing when errors are detected. However, the software as a whole could have been made more robust because the failure of the %DOIT module can now be detected and used to alter program flow dynamically to some other process to deliver full or partial business value.

Notwithstanding, a zero-value &SYSERR or &SYSCC does not necessarily denote process success, just as a clean log does not necessarily represent that software has not failed. Some failures occur but are displayed only in SAS notes within the log. In other cases, return codes such as &SYSFILRC or &SYSLIBRC must be individually assessed to determine if statements or functions failed. These automatic macro variables are discussed in chapter 3, “Communication.” Failures can also occur due to errors in business logic and produce neither warnings nor runtime errors. While exception handling represents the most important tool to facilitate robust software, it shouldn't replace common sense and code inspection.

EXCEPTION HANDLING

Exception handling is the primary defensive programming method to facilitate robust and fault-tolerant software. These three terms are sometimes used interchangeably, although exception handling always represents the technical implementation or tool to further robustness. Exception handling describes the identification and dynamic resolution of adverse, unexpected, or untimely events or environmental states during software execution.4 Resolution doesn't imply that the exception is eliminated or overcome—only that it is handled. In software literature, exception handling is also referenced as error handling, error processing, error trapping, event handling, event trapping, defensive programming, or fault-tolerant programming.

The goal of exception handling is always to reroute program flow back to the happy trail—that is, the originally intended process path that delivers full business value. This is illustrated in the “Happy Trail” section later in the chapter. Thus, exception handling is always implemented to alter program flow dynamically. The “Exception Handling, Not Exception Reporting!” section distinguishes exception reporting, which only produces static reports that demonstrate exceptions. Thus, while exception handling is a quality assurance tool intended not only to detect but also to overcome exceptions and runtime errors, exception reporting is a quality control mechanism that alerts the developer or user to aberration or failure.

While exception handling is inherent in languages such as Java, Python, and even the SAS Component Language (SCL), no inherent exception handling functionality exists in Base SAS.5 You have to fake it in SAS, which is unfortunate considering the tremendous role that exception handling plays in facilitating software robustness and reliability. Despite these numerous weaknesses, faking it is better than unhandled exceptions that can cause abrupt or unpredictable failures in production software.

Exception Handling Elsewhere

To understand classic exception handling theory and function, you need to step outside the SAS world and explore languages that have inherent exception handling functionality. The following Python code uses the PRINT function to print the variable X:

print(X)

But what if the variable X doesn't exist? A runtime error results, the program terminates, and the following output is displayed:

print(X)
Traceback (most recent call last):
   File “<stdin>”, line 1, in <module>
NameError: name ‘x’ is not defined

This outcome is unacceptable in robust software, because the software should either display a message to the user, prompt the user to initialize a value for X, continue to another process, or a host of other activities. But because the exception is unhandled, the code abruptly terminates. Without exception handling, that blue screen of death that 1990s software so frequently exhibited would today still be common. Exception handling allows software to encounter variability—including errors and faults—and to adapt through prescribed channels.

The following modified Python code demonstrates exception handing that catches the exception and transfers control to an exception handler—the EXCEPT statement—for further processing. The underlying functionality—the PRINT function—remains unchanged but only has been wrapped in an exception handling block:

try:
   print(X)
except NameError:
   sys.exit()

Now when the code executes, if the variable X is missing, the SYS.EXIT function terminates the program. This is not the only outcome of exception handling, but one of many that are discussed in the section titled “Exception Handling Pathways.” By intercepting the exception, no runtime error occurs and the software can continue undaunted or, in this example, terminate gracefully. Note that because NameError (a specific type of error) is specified, the exception handling block only detects a certain type of exception, thus eliminating the specific threat of a missing variable. To eliminate all threats (as discussed previously in the “General Threats” section), NameError can be removed. The revised code, for example, also now detects and handles the syntax error that would occur if the PRINT function were misspelled:

try:
   print(X)
except:
   sys.exit()

Any number of Python statements can be included within an exception handling block, demonstrating the tremendous efficiency of implementing exception handling. In the following example, five successive PRINT functions attempt to print the variables A through E. If A and B exist but C does not, A and B will be printed, after which the exception—the missing variable C—will be detected and program flow will transfer to the EXCEPT block. If all five variables exist, no exception occurs and the EXCEPT block is never executed:

try:
   print(A)
   print(B)
   print(C)
   print(D)
   print(E)
except NameError:
   print("Something funky here...")
   sys.exit()

In languages that offer inherent exception handling functionality, the full capabilities of exception handling extend far beyond this abridged introduction. In the following section, Base SAS exception handling is differentiated and demonstrated.

Faking It in SAS

Because Base SAS provides no inherent exception handling capabilities, SAS practitioners can only fake it by testing detectable exceptional states, warnings, and runtime errors. Thus, major limitations exist that distinguish exception handling contrived through Base SAS from the real deal:

  • Exception handling cannot be used within DATA steps or SAS procedures—Thus, the two primary constructs of SAS software offer no ability to test or handle exceptions. For example, when the DATA step encounters most exceptions or errors, it immediately terminates with a runtime error, offering no ability to handle it dynamically. In SAS, while some exceptions can be avoided through a priori detection, many exceptions immediately produce runtime errors when they occur. There are few opportunities to catch exceptions because Base SAS typically produces only runtime errors, not exceptions. In other languages, a handled exception does not become a runtime error.
  • Faults cannot be detected through exception handling—For example, real exception handling catches not only faulty or exceptional data or environmental states but also syntax errors and other defects that may lie dormant in code. When SAS encounters a syntax error, this triggers a runtime error, and no exception handling can occur. If Python code perceives a falling basket of eggs, it identifies the exception and swoops in to rescue the eggs before they can hit the floor; post hoc detection handling in SAS instead provides a bucket that the basket can be dropped into to minimize the mess, but does nothing to stop the impact.
  • Exception handling can be invoked only during specific lines of code that constitute exception handling routines—For example, to test exceptions over multiple lines of code, multiple exception handling routines are required. In other languages, a single exception handling block (like the TRY-EXCEPT block demonstrated previously) handles any exception immediately as it occurs.

Despite all the ways that Base SAS exception handling is limited, it is still beneficial when more robust and reliable software is required. This section recreates the functionality of the Python exception handling demonstrated throughout the “Exception Handling Elsewhere” section.

The following SAS code uses the %PUT macro function to print the macro variable &X:

%put &x;

But what if the macro variable &X doesn't exist? A warning results and the log displays the following output:

%put &x;
WARNING: Apparent symbolic reference X not resolved.
&x

This is unacceptable in production software because logs are not manually monitored, so some mechanism should detect the exception and do something about it. Two global macro variables—&SYSERR and &SYSCC—demonstrate warnings and runtime errors that occur and are essential to approximating exception handling functionality. These variables and their implementation within exception handling are described in chapter 3, “Communication.”

The following code now detects warnings and runtime errors and dynamically alters program flow to terminate the macro when they occur. While not all exception handling in SAS requires use of the macro language, in more complex code and logic this is nearly always a requirement:

%macro meh();
%let syscc=0;
%put &X;
%if &syscc>0 %then %return;
%mend;
%meh;

This code tests for warnings and runtime errors, rather than testing solely for a runtime error that occurs when a variable is missing, which Python can accomplish. Already, at this most basic level, SAS has deviated from exception handling functionality native in other languages. In SAS, while some errors such as the general memory error are separately distinguished, many runtime errors are lumped together and produce only a vague 1012 return code. All warnings, similarly, receive the return code 4 regardless of their nature, making them impossible to distinguish. But, at least Base SAS is able to detect and handle general threats, even if they often cannot be distinguished programmatically.

A second deviation already apparent in Base SAS is its inability to catch a general exception before it becomes a warning or runtime error. For example, in Python, once the exception of a missing variable is detected, program flow immediately shifts to the EXCEPT block. In Base SAS, however, the presence of a non-zero value in &SYSERR or &SYSCC reflects that a warning or runtime error has already occurred. For this reason, even with effective exception handling in place, the SAS log will often be littered with warnings and runtime errors.

In other languages, an infinite number of statements can be included within one exception handling block, but not so in Base SAS. Suppose you want to print five macro variables, &A through &E, but if any of these are missing, you want to immediately shift program flow elsewhere to handle this exception. The following SAS code attempts (and fails) to replicate this business logic:

%macro meh();
%let syscc=0;
%put &A;
%put &B;
%put &C;
%put &D;
%put &E;
%if &syscc>0 %then %do;
   %put Something funky here...;
   %return;
   %end;
%mend;
%meh;

When the missing macro variable &C is encountered, the code continues to attempt to print &D and &E rather than immediately transferring control to the exception block. Rather than assessing the value of &SYSCC automatically after each statement, &SYSCC is only assessed once and thus fails to halt execution in time. If &D were dependent on the successful execution of the previous line of code, then the &D line would also fail. These types of failures are discussed later in the “Cascading Failures” section.

A second attempt to deliver the necessary functionality and exception handling creates the EXCEPT label to which successive %GOTO statements can transfer program flow when an exception is detected. However, this is not only redundant, but also terribly faulty. Because SAS labels cannot be nested within %DO, %WHILE, or %UNTIL conditional logic, they essentially are always executed unless explicitly leapfrogged over through additional conditional logic. The following code demonstrates equivalent functionality and exception handling but is an utter disaster and demonstrates why procedural %GOTO logic is abhorred in software development.

%macro meh();
%let syscc=0;
%put &A;
%if &syscc>0 %then %goto except;
%put &B;
%if &syscc>0 %then %goto except;
%put &C;
%if &syscc>0 %then %goto except;
%put &D;
%if &syscc>0 %then %goto except;
%put &E;
%if &syscc>0 %then %goto except;
* other logic would go here;
%if &syscc=0 %then %return; * successful run;
%EXCEPT: %put Something funky here...;
%mend;
%meh;

Because the code has inherent dependencies—for example, &B should be printed only if &A printed successfully—another way to conceptualize the program flow is through nested conditional logic. The following code, while arguably as convoluted as the previous %GOTO logic, could be preferred because it doesn't require leapfrogging over SAS labels. Notwithstanding, it still requires redundant testing of &SYSCC.

%macro meh();
%let syscc=0;
%put &A;
%if &syscc=0 %then %do;
   %put &B;
   %if &syscc=0 %then %do;
      %put &C;
      %if &syscc=0 %then %do;
         %put &D;
         %if &syscc=0 %then %do;
            %put &E;
            %end;
         %end;
      %end;
   %end;
%if &syscc>0 %then %do;
   %put Something funky here...;
   %return;
   %end;
* other logic would go here;
%mend;
%meh;

Exception handling in Base SAS sadly doesn't get any better than this. And bear in mind that the above functionality was successfully demonstrated in Python in only nine lines of code without conditional nested logic! SAS exception handling will always make code more lengthy and convoluted while often failing to provide functionality achievable in other languages. For example, because exception handling plays no role inside DATA steps or SAS procedures, the scope and capability of SAS exception handling is tremendously reduced. Notwithstanding all these technical limitations, exception handling is so critical to achieving reliability, robustness, and fault-tolerance in software design that the remaining sections of this chapter demonstrate best practices for implementing exception handling routines within Base SAS.

Asking for Forgiveness versus Permission

While traditional exception handling requires an exception to have occurred, been detected, and handled, some equivalent robustness can be delivered in SAS when specific threats to software are known. The “Specific Threats” section, as discussed earlier, includes examples such as a missing data set or locked data set that are predictable, identifiable, and, by implementing exception handling, sometimes also preventable. Thus, where causes of failure can be detected before they occur, the corresponding failures often can be prevented through a priori exception handling. This contrasts with post hoc exception handling, which typifies traditional exception handling that detects and handles an exception that has actually occurred, not just a condition or state that would have caused an exception.

These two methods—a priori and post hoc—are often characterized as asking for permission and asking for forgiveness, respectively. Thus, traditional exception handling (as seen in object-oriented languages) is likened to the adage, It's better to ask for forgiveness than for permission, because post hoc detection can identify exceptions the instant they occur. However, because of the limitations of Base SAS, an exception handling framework in SAS typically consists of both a priori and post hoc exception handling routines, with significant emphasis on a priori detection to identify specific threats, more so than in software languages that provide inherent exception handling capabilities.

A priori exception handling is sometimes referred to in literature as error proofing, although this should not be misconstrued to represent that software is exception-free or error-free, only that it is made more resistant to exceptions or errors. Post hoc exception handling is conversely referred to as error testing because tests are conducted after some process to determine if exceptions or errors occurred. This highlights the important distinction that while both a priori and post hoc exception handling can improve the reliability of a process, neither can guarantee process success, and only post hoc routines can demonstrate or validate process success.

The differences between a priori and post hoc exception handling are further described in the following two sections in which exception handling routines in Base SAS and Python are demonstrated and contrasted.

Forgiveness

You've been trapped on an intercontinental flight to parts unknown, South America, for hours and, despite the turbulence and the insistence by flight attendants to “Please take your seats,” you hasten down the aisle anyway, slam the tiny Occupado latch, and apologize afterward. Often it's just easier in life and software development to make an attempt and, if it fails, ask for forgiveness afterward. In software development, however, forgiveness requires apt, post hoc exception handling routines that detect exceptions and dynamically redirect software based on business rules. Because post hoc exception handling was demonstrated in the “Exception Handling Elsewhere” and “Faking It in SAS” sections, it is only briefly reprised here.

The following Python code attempts to print a variable but, if the variable does not exist, the program exits safely without runtime error:

try:
   print(X)
except NameError:
   sys.exit()

Similar SAS code also attempts to print a macro variable, but if the variable does not exist, the program exits safely without runtime error. The global macro variable &SYSCC is set to 4, representing the warning that occurs when the referenced macro variable &X does not exist:

%macro meh();
%let syscc=0;
%put &X;
%if &syscc>0 %then %return;
%mend;
%meh;

But what occurs when a variable is referenced in a DATA step and is missing? As it turns out, a note is printed to the SAS log, but no return code is generated that can be programmatically evaluated.

In the following example, the DATA step relies on third-party data that should (but erroneously does not) contain the variable VERYIMPORTANT:

data test;
   set thirdpartydata;
   a=veryimportant; * does not exist so this represents an exception;
run;
NOTE: Variable veryimportant is uninitialized.
NOTE: The data set WORK.TEST has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
%put SYSERR: &syserr   SYSCC: &syscc;
SYSERR: 0   SYSCC: 0

Because the exception cannot be detected programmatically, it also cannot be handled programmatically, so SAS post hoc exception handling is powerless and not robust to this type of common data variability. However, because the exception represents a specific, predictable threat, it can be handled through a priori exception handling, as depicted in the “Permission” section.

Another common failure of a priori exception handling occurs in SAS because many exceptions can only be captured as runtime errors. For example, the following log demonstrates a DATA step that fails because the Doesnotexist data set does not exist:

%macro moremeh();
%let syscc=0;
%global moremehRC;
%let moremehRC=GENERAL FAILURE;
data test;
   set doesnotexist; * missing data set;
   * transformations go here;
run;
%if &syscc>0 %then %do;
   %let moremehRC=you broke me!;
   %return;
   %end;
%else %let moremehRC=;
%mend;
%moremeh;
ERROR: File WORK.DOESNOTEXIST.DATA does not exist.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TEST may be incomplete.  When this step was stopped there were 0 observations and 0 variables.
WARNING: Data set WORK.TEST was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
%put RC: &moremehRC;
RC: you broke me!

However, because automatic macro variables like &SYSCC and &SYSERR do not update until the boundary step (like a RUN statement), it's impossible to implement post hoc exception handling that catches the exception—the missing data set—before the exception is recognized as a runtime error. Paradoxical to the very intent of exception handling, as soon as the SAS processor recognizes the exception it is simultaneously too late to do anything to prevent the subsequent runtime error so the exception and runtime error occur as a single event.

Thus, while asking for forgiveness is more closely associated with fault-tolerant software design and forms the underpinning of exception handling in most software languages, to be tolerant of faults, those faults must be programmatically detectable. Many specific threats to SAS software either cannot be identified programmatically or will be identified too late (i.e., after failure) to do anything useful. The only remaining solution is to detect the specific cause of the exception—the missing data set—before it is referenced, which lies at the heart of asking permission through a priori exception handling.

Permission

Asking for permission squarely addresses specific threats to software execution by removing vulnerabilities in code. By detecting causes of exceptions before they occur, the resultant exceptions and runtime errors can be eliminated and robustness and reliability are improved.

The “Forgiveness” section demonstrates that in some cases, exceptions that occur in the DATA step produce SAS notes but provide no programmatic feedback to facilitate exception handling. In other cases, as soon as an exception is detected—like a missing data set referenced by a SET statement—a runtime error is produced, which also defeats post hoc exception handling methods. Given these limitations in the Base SAS language, a priori exception handing is required to detect specific threats and reroute program flow.

The “Forgiveness” section depicts the %MEH macro in which post hoc exception handling detects the warning that occurs when a nonexistent SAS macro variable is referenced. Equivalent functionality can be accomplished through a priori testing, which provides the additional benefit of not producing warnings or runtime errors. For example, although the macro variable &X should exist, when &X does not exist, the exception is detected and handled before the variable is referenced:

%macro test;
%if %symexist(X) %then %put &X;
%else %return;
%mend;

As demonstrated in the “Forgiveness” section, sometimes an exception in SAS—like referencing a nonexistent variable in a DATA step—displays a note to the log but offers no return code (like &SYSCC) that can be programmatically assessed. This challenge can be overcome with a priori exception handling and is demonstrated in the following “Happy Trail” section, which uses the ATTRN function to determine whether a variable exists in a data set.

A final weakness of post hoc exception handling in SAS is its inability to be used effectively within the DATA step when exceptions immediately cause runtime errors that terminate the DATA step. Because a missing data set is a specific threat that can be identified programmatically, a priori exception handling routines can be emplaced before the DATA step (or SAS procedure) and can validate whether all prerequisites have been met. The %MOREMEH macro from the “Forgiveness” section is reprised and modified so that it validates data set existence before attempted use:

%macro moremeh();
%let syscc=0;
%global moremehRC;
%let moremehRC=GENERAL FAILURE;
%if %sysfunc(exist(doesnotexist))=0 %then %do;
   %let moremehRC=missing data set;
   %return;
   %end;
%else %do;
   data test;
      set doesnotexist; * missing data set;
      * transformations go here;
   run;
   %end;
%if &syscc>0 %then %do;
   %let moremehRC=you broke me!;
   %return;
   %end;
%else %let moremehRC=;
%mend;
%moremeh;
%put RC: &moremehRC;

In the revised example, the post hoc exception handling is not removed. While the newly implemented a priori exception handling can ensure that the DATA step does not execute if the Doesnotexist data set is missing, as noted earlier, a priori routines cannot guarantee process success; they can prevent failure only from specific threats. For example, the DATA step could still run out of memory while executing, which would cause a runtime error detected by the post hoc analysis of &SYSCC. Thus, in SAS, robustness and reliability can often be achieved by asking for both permission and forgiveness.

Exception Handling Framework

Exception handling gets you around or through a specific exception (or exception cluster), but what happens next? The %MOREMEH macro in the “Permission” section avoids the DATA step with a priori exception handling if the required data set is missing and subsequently terminates the macro. Post hoc routines additionally terminate the macro if any warnings or runtime errors are detected after the DATA step. When exception handling is demonstrated in SAS literature, however, this is often where the demonstration concludes—either a return code is generated that reflects the exception, or, more commonly and significantly less useful in production software, an exception report is produced that prints the exception to the log or other static location.

Contrary to examples demonstrated throughout literature, a dynamic exception handling framework is required in robust production software and utilizes exception management to operationalize business rules and logic through exception handling routines. Exception handling frameworks are typically not demonstrated in software literature for the same reason that exception handling is absent unless it is the focus of the discussion—they add complexity, diminish readability, and require the added context of business rules to be fully understood. For example, to demonstrate robustness rather than only fault-tolerance, the %MOREMEH macro would need to show through an exception handling framework how full or partial business value was achieved despite encountering the exception.

Because of this added complexity (and necessary context) to demonstrate an exception handling framework, while exception handling is demonstrated throughout this text, exception handling frameworks are demonstrated only in the following sections. For example, in literature the %RETURN statement often depicts termination of a process flow, and return codes are often generated to imply—if not demonstrate—subsequent dynamic program flow based on their values. Where this text references “exception handling frameworks,” however, these following sections can serve as a guide to demonstrate the complexity and control necessary to achieve robust software.

Exception Inheritance

Exception inheritance requires that exceptions are propagated from prerequisite processes to dependent processes and from child processes to parent processes. Without exception inheritance, subsequent processes can fail when previous processes encounter exceptions. This will typically cause cascading failures in which failure begets failure until the software terminates. Inheritance links the success or failure of one process to the next, providing validation from software outset to end.

The “Modularity” section in chapter 18, “Reusability,” demonstrates a DATA step that calls the %GOBIG macro, which in turn calls the %FINDVARS macro. When all three modules execute successfully, this provides a reusable solution that dynamically capitalizes every character variable in the parameterized data set. However, because the code includes no exception handling, a more robust example is demonstrated in this section and handles two specific exceptions that can occur. The concepts of black-box and white-box testing, referenced later in this section, are introduced in the “Functional Testing” and “Unit Testing” sections, respectively, in chapter 16, “Testability.”

The %ENGINE macro capitalizes all character variables in the Sample data set. Embracing modular design, however, %ENGINE actually calls %GOBIG to write the capitalization code dynamically; thus %ENGINE must check the &GOBIGRC return code to ensure that %GOBIG was successful:

data sample;
   length char1 $20 char2 $20 num1 8 num2 8;
   char1="I love SAS";
   char2="SAS loves me";
run;
%macro engine();
%global engineRC;
%let engineRC=GENERAL FAILURE;
data uppersample;
   set sample;
   %gobig(dsn=sample);
   num1=99;
run;
%if %length(&gobigRC>0) %then %do;
   %let engineRC=something broke! GOBIG: &gobigrc;
   %return;
   %end;
%else %let engineRC=;
%mend;
%engine;

Because %GOBIG is encapsulated (essentially inside a black box), %ENGINE is unaware of how %GOBIG creates the capitalization code for each variable. The %ENGINE macro relies on receiving two things from %GOBIG—the capitalization code and the return code &GOBIGRC—and has no idea that %GOBIG has in fact subcontracted out much of its work to its own child process, the %FINDVARS macro. Only through a white-box inspection of %GOBIG is this apparent:

* dynamically changes all character variables in a data set to   upper case;
%macro gobig(dsn= /* old data set in LIB.DSN or DSN format */);
%global gobigRC;
%let gobigRC=GENERAL FAILURE;
%local i;
%findvars(dsn=&dsn, type=CHAR);
%if %length(&findvarsRC)=0 or &findvarsRC=no variables %then %do;
   %let i=1;
   %do %while(%length(%scan(&varlist,&i,,S))>1);
      %scan(&varlist,&i,,S)=upcase(%scan(&varlist,&i,,S));;
      %let i=%eval(&i+1);
      %end;
   %let gobigRC=;
   %end;
%else %do;
   %let gobigRC=FINDVARSRC: &findvarsRC;
   %end;
%mend;

While %GOBIG writes the dynamic code to capitalize variables (with the %SCAN statement), it relies on the %FINDVARS macro to provide the list of variables. The %FINDVARS macro is unaware of %ENGINE, and %ENGINE conversely is unaware of %FINDVARS, but because %ENGINE is indirectly dependent on %FINDVARS functionality (since %GOBIG directly relies on %FINDVARS), a method must exist to pass the %FINDVARS return code (&FINDVARSRC) directly to %GOBIG as well as indirectly to %ENGINE. This is inheritance.

The following %FINDVARS macro creates the space-delimited list of relevant variables (i.e., character variables, in this example) and, if an exception occurs, the macro passes it within the return code &FINDVARSRC:

* creates a space-delimited macro variable VARLIST in data set DSN;
%macro findvars(dsn= /* data set in LIB.DSN or DSN format */,
   type= /* ALL, CHAR, or NUM to retrieve those types of variables */);
%global findvarsRC;
%let findvarsRC=GENERAL FAILURE;
%local dsid;
%local vars;
%local vartype;
%global varlist;
%let varlist=;
%local i;
%local vartot;
%let dsid=%sysfunc(open(&dsn,i));
%if &dsid<1 %then %do;
   %let findvarsRC=missing or locked data set;
   %return;
   %end;
%let vars=%sysfunc(attrn(&dsid, nvars));
%do i=1 %to &vars;
   %let vartype=%sysfunc(vartype(&dsid,&i));
   %if %upcase(&type)=ALL or (&vartype=N and %upcase(&type)=NUM) or
         (&vartype=C and %upcase(&type)=CHAR) %then %do;
      %let varlist=&varlist %sysfunc(varname(&dsid,&i));
      %let vartot=%eval(&vartot+1);
      %end;
   %end;
%if &vartot=0 %then findvarsRC=no variables;
%else %let findvarsRC=;
%let close=%sysfunc(close(&dsid));
%mend;

Because Base SAS does not inherently provide return code functionality, return codes must be faked by initializing global macro variables within macros. While these global macro variables can be accessed from anywhere inside a SAS session, inheritance should be used to pass return codes so that the chain of custody for software performance can be demonstrated. In other words, while %ENGINE could directly assess the value of &FINDVARSRC, don't get lazy and allow it to—any feedback from %FINDVARS must be passed through %GOBIG. Due to encapsulation and loose coupling, which are discussed and demonstrated throughout chapter 14, “Modularity,” the %ENGINE macro should have no knowledge that the %FINDVARS macro (or its respective return code &FINDVARSRC) even exists!

To put the complexity of exception handling and inheritance in perspective, the original functionality that this code is producing is demonstrated in the following five lines of code:

data uppersample;
   set sample;
   char1=upcase(char1);
   char2=upcase(char2);
run;

Notwithstanding its additional complexity, the exception handling framework improves reliability and robustness by testing for known threats, and improves static performance by representing a more modular, reusable, extensible, and stable solution that can improve the efficiency of future software development and maintenance. However, because of the inherent complexities of modular software design and the necessity to propagate and inherit return codes, it's critical that modules not only return their own exceptions but also detect and return the exceptions of child processes upon which they rely. Figure 6.1 demonstrates inheritance of the two exceptions (missing data set and no relevant variables) from the %FINDVARS macro to the %GOBIG macro, and the subsequent inheritance to the %ENGINE macro.

Schematic for Exception Inheritance.

Figure 6.1 Exception Inheritance

The importance of inheritance becomes clearer when the intent of modular, reusable code is understood. For example, the %FINDVARS macro can be used for an infinite number of purposes that need to create a list of all variables (of either only character or only numeric variables) in a data set. For some purposes, such as the %GOBIG macro, if %FINDVARS produces an empty list—that is, no relevant variables exist in the data set—this could be valid and not represent an exception. This relationship is demonstrated in Figure 6.1.

However, because modules are intended to be reused in different situations and thus called by different parent processes, in other cases, if %FINDVARS finds no relevant variables, this could represent an exception in the parent process that would need to be subsequently handled. This more expanded relationship is demonstrated in Figure 6.2.

Schematic for Exception Inheritance in Software Reuse.

Figure 6.2 Exception Inheritance in Software Reuse

By implementing inheritance methods, processes can ensure that all prerequisite processes—including both child processes as well as previous serialized processes—have completed without error, or sufficiently to allow subsequent program flow to continue. Exception inheritance overcomes cascading failures that can occur in SAS when unhandled exceptions and runtime errors occur, as demonstrated in the following section. Exception inheritance is also demonstrated in the “Data Governors” section in chapter 9, “Scalability.”

Cascading Failures

Cascading failures are caused by unhandled exceptions, errors, or software defects and occur when failures in a child process are not communicated to the parent process, or when failures in a prerequisite process are not communicated to subsequent, dependent processes. The following code and output illustrate a DATA step that fails, which causes a cascading failure in the subsequent MEANS procedure:

data final;
   length char1 $10 num1 8;
   char1='sas is rad';
   num1=99;
   oopsy!;
         -----
         180
ERROR 180-322: Statement is not valid or it is used out of proper order.
run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.FINAL may be incomplete.  When this step was stopped there were 0 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
proc means data=final;
   var num1;
run;
NOTE: No observations in data set WORK.FINAL.
NOTE: PROCEDURE MEANS used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

While the software defect (oopsy!) lies in the DATA step, it causes a cascading failure in the MEANS procedure. Recall that failures do not necessarily denote warnings, runtime errors, or software termination, but occur whenever an invalid solution or data product is generated or requirements are not met. Thus, although the MEANS procedure executes, it produces no output because the failed DATA step creates a spurious data set with two variables but no observations. Even more critical, because SAS notes cannot be programmatically detected, if the original defect is not detected and handled in the DATA step, it's more difficult to determine programmatically during the MEANS procedure that a failure has occurred. Validation could be performed by inspecting data produced with the OUT statement of the MEANS procedure, but it is much better to have caught the failure before the MEANS procedure is ever invoked.

The following revised code now uses post hoc exception handling to detect if warnings or runtime errors occurred in the DATA step and, if something goes awry, the MEANS procedure is skipped, preventing the cascading failure:

%macro oopsy();
%let syscc=0;
%global oopsyRC;
%let oopsyRC=GENERAL FAILURE;
data final;
   length char1 $10 num1 8;
   char1='sas is rad';
   num1=99;
   oopsy!;
run;
%if &syscc>0 %then %do;
   %let oopsyRC=data step had an OOPSY!;
   %return;
   %end;
proc means data=final;
   var num1;
run;
%if &syscc>0 %then %do;
   %let oopsyRC=proc means had an OOPSY!;
   %return;
   %end;
%else %let oopsyRC=;
%mend;
%oopsy;
%put RC: &oopsyRC;

This code makes no attempt to deliver partial or delayed business value, but rather is focused on detection of general failures to facilitate fault tolerance. A vulnerability in the above code still exists, however, because the Final data set is created despite the failure, as demonstrated in the following SAS log:

%oopsy;
NOTE: Line generated by the invoked macro "OOPSY".
data final;     length char1 $10 num1 8;     char1='sas is rad'; num1=99;     oopsy!;
-----
180
1  ! run;
ERROR 180-322: Statement is not valid or it is used out of proper order.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.FINAL may be incomplete.  When this step was stopped there were 0 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
%put RC: &oopsyRC;
RC: data step had an OOPSY!

Because failure of the DATA step generates a spurious, empty data set, the exception handling routine should embrace security (and recoverability principles) and delete Final so it is not accidentally utilized. Either the DELETE procedure or DATASETS procedure should be used to delete Final when warnings or runtime errors are detected during its creation.

Happy Trail

Exception handling has been introduced and demonstrated, but, in every example thus far, the goal has been to detect a failed state and terminate subsequent processes. While this describes the secondary goal of robustness—to gracefully terminate through a fail-safe path—it only scratches the surface of exception handling, and it fails to demonstrate the primary goal of robustness—maximization of business value in the face of software variability.

Thus, the goal of exception handling should always be to detect variability and reroute process control back to the happy trail—the path that delivers full business value or, after exceptions or errors have occurred, the maximum attainable business value. For example, in some cases, a missing data set signals that an entire SAS process or program should be terminated, but in other cases, functionality still can be salvaged. In the “External Issues” section in chapter 5, “Recoverability,” SAS software was modified so that it could flexibly adapt when individual tables were missing within a third-party database, thus eliminating runtime errors while improving software reliability.

Despite the challenges of implementing exception handling within Base SAS software, as a data analytic language SAS does have one huge advantage—data analytic software is inherently serialized. Where user-focused software often places a user at some central location from which endless program flow paths may follow (based on user input), data analytic development typically transforms data from one form to another over a series of sequenced processes. In many instances, the failure of earlier processes predicates failure of later processes, thus demonstrating clear dependencies that exist. In some cases, however, full or partial business value still can be achieved despite exceptions or failures in earlier processes, thus allowing program flow to return to the happy trail.

For example, the following SAS code simulates software that ingests and transforms a data set before performing some analysis, represented by the MEANS procedure. The LIBNAME statement and DATA step precede the code to provide the necessary reference and input:

libname perm 'c:perm';
data perm.original;
   length char1 $10 var1 8;
   var1=5;
run;
* PROCESS STARTS HERE;
data final;
   set perm.original;
   * transformations;
run;
proc means data=final;
   var var1;
run;

To understand the happy trail for specific software, it's first necessary to describe threats that can derail software from that path of full business value. Table 6.1 demonstrates an abbreviated risk register that lists processes, their specific threats, and respective exception handling pathways. For example, the business rule in the first line indicates that the risk of attempting to access the Final data set when it is locked should be accepted. In other words, because Final is maintained in the WORK library and the risk of other processes locking it is negligible, the decision has been made not to test for data set accessibility before creating the data set and to assume that it is unlocked.

Table 6.1 Abbreviated Risk Register

Process Threats Solution
DATA Final Final is exclusively locked ACCEPT
Original does not exist Switch to backup
Original is exclusively locked Switch to backup
PROC MEANS Original has no observations ACCEPT
Original has no VAR1 Terminate program

The second business rule in Table 6.1 demonstrates that if the Original data set does not exist, program flow should use a backup data set that will have to be maintained. Recall, however, that at this early planning phase in the SDLC, activities are needs-focused rather than software- or solutions-focused, so a specific method to achieve this backup won't necessarily have been selected or even discussed.

Where SAS practitioners either fail to identify threats to software or fail to implement solutions that eliminate vulnerabilities, they tacitly accept higher risks and may reduce the robustness and reliability of their software. Whereas a risk register typically records proposed solutions to specific known software vulnerabilities, Table 6.1 depicts business rules that have been accepted for implementation to make software more robust. Given these business rules, the flowchart in Figure 6.3 depicts program flow, including the happy trail.

Schematic for Program Flow and Happy Trail.

Figure 6.3 Program Flow and Happy Trail

Once business rules within the risk management strategy are established that specify how specific threats and vulnerabilities will be remedied, the actual technical solutions can be designed and developed. In this scenario, to make the process more robust, a mirrored SAS library PERM2 is created on a separate drive. When the PERM.Original data set is exclusively locked by a user or process, the program will automatically switch to the backup. While the backup may have some outdated data, in this scenario, day-old donuts are better than no donuts at all, and using day-old data can still maximize business value given the exception of a missing or locked primary data set.

The Final data set is also now tested before the MEANS procedure to ensure that the variable VAR1 exists in the data set. And, finally, post hoc exception handling is added to ensure that if either the DATA step or MEANS procedure produce warnings or runtime errors, the macro terminates via the %RETURN statement:

%macro makemeans();
%let syscc=0;
%global makemeansRC;
%let makemeansRC=GENERAL FAILURE;
%local dsid;
%local close;
%local dsn;
%local vars;
%local i;
%local found;
%let found=NO;
%let dsn=perm.original;
* test availabiltiy and access;
%let dsid=%sysfunc(open(&dsn));
%if %eval(&dsid<1) %then %do;
   %let dsn=perm2.original;
   %let dsid=%sysfunc(open(&dsn));
   %if %eval(&dsid<1) %then %do;
      %let makemeansRC=data locked or missing;
      %return;
      %end;
   %end;
%let vars=%sysfunc(attrn(&dsid, nvars));
%do i=1 %to &vars;
   %if %upcase(%sysfunc(varname(&dsid,&i)))=VAR1 %then %let found=YES;
   %end;
%if &found=NO %then %do;
   %let makemeansRC=variable not found;
   %return;
   %end;
data final;
   set &dsn;
   * transformations;
run;
%if &syscc>0 %then %do;
   %let makemeansRC=data step failure;
   %return;
   %end;
proc means data=final;
   var var1;
run;
%if &syscc>0 %then %do;
   %let makemeansRC=proc means failure;
   %return;
   %end;
%else %let makemeansRC=;
%let close=%sysfunc(close(&dsid));
%mend;
%makemeans;
%put RC: &makemeansRC;

The revised %MAKEMEANS macro now also produces a return code, &MAKEMEANSRC, that demonstrates program success or failure, which should be validated by subsequent or dependent processes. In this example, a blank return code represents success while other return codes demonstrate specific threats that lead to failure. While the current return code does not include any indication of whether the backup data set is used rather than the primary, this logic could be added to provide additional performance metrics.

The revised program flow now reflects two different exception handling pathways—methods or techniques used to handle specific failure types. If the PERM.Original data set is missing, the process flow switches to a backup data set that can deliver full or nearly complete functionality. When other exceptions or errors are encountered, however, the process flow terminates the macro and writes the exception in the &MAKEMEANSRC return code. The following “Exception Handling Pathways” section describes these and other pathways that SAS developers should keep in their tool belt to ensure maximum value is achieved from software regardless of variability that is encountered.

To be clear, the revised %MAKEMEANS macro is intentionally coded without attention paid to static performance attributes to more clearly demonstrate exception handling logic. This exposes the frequent tradeoff between software readability and other quality characteristics that occurs not only in production software but predominantly throughout software development literature. Thus, the static performance characteristics of this code can be substantially improved through increased focus on maintainability, modularity, stability, testability, and reusability principles.

Exception Handling Pathways

When an exception occurs, software must first detect and subsequently handle it. Most commonly demonstrated in SAS literature, following a runtime error or exception detection, the process produces an exception report (demonstrated and disparaged in the next section), the macro is terminated via the %RETURN statement, or the software is terminated with the %ABORT, ABORT, or ENDSAS statement. The use of these methods in literature is not intended to imply that the exception would be handled similarly in production software, but is intended to improve readability; thus, the handling portion of exception handling is rarely demonstrated in literature. This practice is true throughout this text, where %RETURN is commonly implemented to simulate return to some parent process that is not depicted.

Despite the overuse of %RETURN to simulate dynamic exception handling, several exception handling pathways can be harnessed, many of which allow delivery of full or partial functionality despite exceptional conditions. The following exception handling pathways are introduced and demonstrated in detail in a separate text by the author: Ushering SAS® Emergency Medicine into the 21st Century: Toward Exception Handling Objectives, Actions, Outcomes, and Comms:6

  • Undetected Success—The user is unaware of the exception, and full functionality is delivered.
  • Rerouted Success—Program flow is rerouted back to the happy trail, so full functionality is delivered but possibly with some delay.
  • Reattempted Success—An exception occurs initially, but after waiting some time period, the process is reattempted and succeeds.
  • Prompt User—More common in applications development, the user is notified of the exception and provides some input that allows the process flow to continue.
  • Partial Success—Program flow is rerouted around exceptions but some functionality is compromised.
  • Process Termination—The process must be terminated but program flow can transfer to independent, unrelated processes.
  • Program Termination—The program must be terminated but exits gracefully.

In many cases, more dynamic (and beneficial) exception handling will require multiple, redundant exception handling pathways. This concept is demonstrated in the %MAKEMEANS macro in the “Happy Trail” section. For example, a backup of the PERM.Original data set may exist as PERM2.Original. If the first data set is locked or missing, access to the backup data set is attempted. If successful, this second attempt is an example of rerouted success, in which full functionality is delivered, or possibly partial success, in the case that the backup data set represents day-old data.

Regardless of the type and number of exception handling pathways that are implemented within a code module, however, a fail-safe path should always provide a route terminating the process (or program) should all other exception handling strategies fail. For example, because the backup data set could also be missing, robust exception handling would also need to account for this risk. This represents the Hail Mary—the emergency satellite phone tucked away in the glove compartment that you hope never to need.

Exception Handling, Not Exception Reporting!

Exception reporting is a quality control method that detects and reports exceptions that occur but stops short of actually handling software exceptions. The SAS log is an example of an exception report when it includes notes, warning, and runtime errors that occur during execution. Thus, while the log is static and doesn't alter program flow, it represents a historical record of some exception types that occur during execution. Other exception reporting can include additional comments, data, and information conveyed to the log through %PUT or PUT statements, to ODS output through PRINT, REPORT, and other procedures, and to SAS data sets or external files.

Exception handling is a quality assurance method that can improve the functionality and performance of software. Commonly used to detect exceptions, runtime errors, and software defects, exception handling is distinguished from exception reporting because exception handling not only detects but also responds to exceptions by dynamically altering program flow. Exception reporting, in contrast, is a static quality control method; it only tells you that an exception was encountered, but does nothing to fix or mitigate it. Exception reporting is more commonly used to detect exceptions that occur in data sets. Especially in data analytic languages, exception handling is more closely associated with software quality while exception reporting is associated with data quality.

In SAS literature, exception reporting is often demonstrated in place of exception handling because of ease of implementation and increased readability of code. For example, the following SAS code prints a note (i.e., a very tiny exception report) to the SAS log if the Original data set is missing:

%macro reporting();
%let syscc=0;
data final;
   set original;
run;
%if &syscc>0 %then %put data set failed!;
%mend;

If Original is missing, however, the exception reporting does little more than the out-of-the-box SAS runtime errors that are printed alongside it. In fact, it's nearly lost in the fray!

%reporting;
ERROR: File WORK.ORIGINAL.DATA does not exist.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.FINAL may be incomplete.  When this step was stopped there were 0 observations and 0 variables.
WARNING: Data set WORK.FINAL was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.08 seconds
      cpu time            0.01 seconds
data set failed!

Not all exceptions create runtime errors and, especially where exceptional data represent outlier, missing, or other invalid data, exception reports are commonly created to record these data mishaps. The following code identifies exceptional data (invalid months) that occur in the Months data set:

data months;
   infile datalines delimiter=',';
   length month $10;
   input month $;
   datalines;
January
Januarry
February
March
march
April
april
May
Mayy
June
;
run;
%macro check();
data final exceptions;
   set months;
   if upcase(month) in
      ('JANUARY','FEBRUARY','MARCH','APRIL','MAY','JUNE','JULY','AUGUST',     'SEPTEMBER','OCTOBER','NOVEMBER','DECEMBER') then do;
      * do some transformations;
      output final;
      end;
   else output exceptions;
run;
%mend;
%check;

Although exception reporting should never be used in isolation to facilitate increased software quality, it can be extremely useful in supporting data quality. As demonstrated in the previous code, especially where quality controls not only demonstrate exceptional data (through exception reporting) but also remove them from data sets, this quality control function can prevent subsequent software failure. For example, by removing invalid months from the data set, a subsequent REPORT procedure might be able to execute with more reliability and accuracy. In this sense, exception reporting that also removes or modifies exceptional data can be considered to be a quality control wrapped within a quality assurance framework.

While often demonstrated as a proxy for exception handling throughout SAS literature, exception reporting should never replace exception handling in production software. Moreover, where exception handling is implemented within a robust system, exceptional data should be reported to a dynamic construct (such as a SAS data set) rather than the SAS log. In saving exceptional data to a data set, subsequent programmatic actions can assess, modify, delete, or further communicate the exceptional data as necessary.

ROBUSTNESS IN THE SDLC

The differentiation between software reliability and robustness is one that should be made early in software planning and design, and that can spur realism within later technical requirements. Perfect reliability hopes to achieve functional and performance objectives all the time, but robustness and fault-tolerance acknowledge that internal and external sources of variability (to software) do exist, will be encountered, and will cause failure. For example, reliability metrics record a customer's desire to have an ETL process complete in under an hour every day while robustness—codified through risk management business rules—tacitly acknowledges some defeat, stating that delayed or partial results (or business value) are better than none at all. Moreover, robustness also demonstrates that preventing invalid results may be as valued as facilitating valid results.

While reliability metrics will be more commonly included in software requirements, the risk management and realism inherent in robust design must be discussed and should also influence requirements. As demonstrated previously in the “Happy Trail” section, achieving robustness capable of diverting program flow to achieve partial business value can be exceedingly complex, but can also endear software products to stakeholders because they will succeed where lesser programs fail. Because of the high cost, a risk assessment should always demonstrate that the increased reliability warrants the effort, whereas a failure log should be maintained to ensure that where identified exceptions or failures occur, they follow business rules prescribed through the risk management framework.

Requiring Robustness

Robustness is often specified through use of a risk register, as depicted in the “Risk Register” section in chapter 1, “Introduction.” For example, Table 6.1 in the “Happy Trail” section demonstrates specific threats to software, such as missing, locked, or invalid data sets. During planning and design, stakeholders must decide which risks to accept and which risks to eliminate to achieve the desired level of robustness and reliability. Risk management and risk resolution are introduced in the “Risk Management” section in chapter 1, “Introduction.”

Some specific threats to software are the focus of later chapters. Portability, for example, describes the robustness of software to function across different environments. Scalability describes the robustness of software to function efficiently when confronted with big data or too many users. These threats also can be recorded in technical requirements or a risk register. For example, stating that software is only intended to operate in the SAS Display Manager 9.4 for Windows defines the landscape in which SAS practitioners must develop software. While software that performs too slowly but completes accurately does not diminish robustness, when software fails to scale to big data and terminates with runtime errors, these functional failures do make software less robust.

One of the simplest ways of facilitating some degree of robustness is to require that software identify general failures by requiring fail-safe post hoc exception handling. Thus, by checking the value of &SYSERR, &SYSCC, and other automatic macro variables at the close of every child process, module, macro, or program, software can detect exceptions that require termination. Implementation of this fail-safe path alone will often not provide additional business value, but it can ensure that software failures don't beget subsequent cascading failures.

Measuring Robustness

It's difficult to measure robustness without measuring reliability, and in fact the primary goal of robustness is inherently to improve software reliability and availability, thus maximizing business value. Unlike reliability, however, robustness can be measured only through analysis of exceptions, warnings, and errors that occur during operation. Thus, if software executes reliably despite incurring variability along the road, robustness has succeeded; however, if software executes reliably because no variability is encountered, then no assessment of robustness can be made.

Robustness is most readily assessed by comparing the failure log to the risk register (or other technical requirements that specify risk management business rules that software should follow). If business rules state that software should be robust to missing or locked data sets, then the failure log should not demonstrate failures caused by these exception types. If those failure types are apparent in the failure log, then the required level of robustness has not been achieved, because business rules were not followed. While this analysis may sound like an onerous process, where the risk register and failure log are both maintained in a standardized format (such as a SAS data set), this type of robustness analysis can be achieved through a repeatable, programmatic solution.

WHAT'S NEXT?

Robustness can't guarantee reliability, but it can guard against specific, predictable threats as well as general threats that may be unpredictable or unpreventable. Even robust, reliable software may be useless, however, if it fails to meet other performance objectives, such as execution time thresholds. The next two chapters introduce execution efficiency (i.e., software speed) and efficiency—arguably the most sought-after performance objectives.

NOTES

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset