Chapter 16
Testability

I didn't know it yet, but I'd torn my right meniscus to shreds.

Four days, 26 miles, gear in tow, summiting a 13,000-foot peak—the Inca Trail is no marathon, but it's also no joke.

We had booked the trek through SAS Travel—naturally—a stellar tour company albeit with no connection to Cary, North Carolina. Despite a leisurely backpacker's lifestyle for a few months, I thought I had trained adequately for the jaunt.

While in Guatemala, I'd joined the Antigua Gym, a converted 17th-century Spanish Colonial house. Even in the open courtyard, tarps had been pitched and staked to a central mast like the big top of an itinerant circus, the now-quiet, centuries-old plaster fountain oddly the centerpiece of modern fitness equipment.

Despite my high-energy start, more than 4,000 miles spent on buses from Guatemala to Peru had tempered my fitness regimen. I knew I was in trouble when in Quito, Ecuador, I was out of breath just walking around town at 12,000 feet.

The trek itself was indescribable, up one mountain and down the next, winding along cliffs, bridges, worn stone steps, and muddy paths rutted from centuries of use. Hypnotized by the trail, placing one foot steadfastly ahead of the other like our pack mule companions, we plodded along, pausing only for photos, snacks, and jungle restrooms.

We wore rain ponchos at first, believing somehow that the ephemeral plastic would keep us dry—yet even during the few, brief rainless respites, the sempiternal fog cloaked us in moisture. By the second day, I'd given up even wearing a shirt.

And the porters wore less, often bounding up and down the stepped landscape barefoot carrying packs two-thirds their size, the pitter-patter of their feet signaling weary gringos to briefly step aside.

And finally, day four: the end of the line. Emerging into Machu Picchu, the landscape is so overwhelming you don't even know where to point your camera. And when you do, the images don't just look surreal—they look fake.

Although my knees were screaming, one final summit remained—Huayna Picchu—the jagged peak jutting behind Machu Picchu in every classic photo. There would be a week of painkillers and ice to come—and surgeries to follow—but, undaunted, we set out and conquered Huayna.

images

In planning for the Inca Trail, I thought I'd adequately trained through jogging, stair climbers, and a pretty extensive daily workout routine. Training not only strengthens your body but also enables you to better understand your weaknesses—testing those vulnerabilities that might more easily be exploited through fatigue, stress, or injury.

Software development is no different and, while software doesn't get stronger simply by being executed, it can garner increased trust of stakeholders when testing demonstrates reliability. Moreover, testing can expose vulnerabilities that pose operational risks. If your legs are sore after a mile jog uphill, there's a pretty good chance you're not going to survive hiking seven miles a day.

An important component of any training (or testing) regimen is to ensure it's done in a realistic environment. While jogging and stair climbers had been necessary cardiovascular training, the actual terrain I faced was tremendously varied—there was nothing routinized about the grade, texture, or speed of the trail. Not having trained for that variability, my knees were not happy.

In other cases, risks are identified that are accepted. For example, in Quito I realized that I would have some issues with the altitude of the trek but, rather than strapping on an elevation mask for hypoxic training while chicken busing, I chose to deal with the elevation when I encountered it, thus accepting the risk. Similarly, in software development there may be components that are too difficult to test or for which the costs outweigh the benefits, in which case those risks are accepted.

In addition to variability, there was the sheer duration of the hike. An hour at the gym—even if a machine could somehow replicate the arduous climbing environment—still would only have been an hour. Muscle training and strength training, yes; endurance training that prepares a body to climb up and down mountains all day, no. Data analytic testing, similarly, must account for not only the variability of data but also the size. The degree to which you can replicate the anticipated operational environment will make testing that much more effective.

At least I'd thoroughly tested my equipment, having worn my backpack for months. The weight was perfectly distributed, straps snug, and I was barely cognizant I was wearing it. Others, you could tell, either had just purchased their packs or were not accustomed to them, and were constantly stopping to tug straps, readjust, and redistribute items.

In software development you also have to test the equipment you build. Does a module perform well individually? Does it perform well when integrated with other software? The hikers who initially struggled with their packs didn't have poor-quality packs—but they had never tested the packs on their backs as a combined unit.

DEFINING TESTABILITY

Testability is the “degree of effectiveness and efficiency with which test criteria can be established for a system, product, or component and tests can be performed to determine whether those criteria have been met.”1 Testability defines the ease with which functional testing, performance testing, load testing, stress testing, unit testing, regression testing, and other testing can be implemented successfully to demonstrate technical requirements. In many cases, both positive and negative testing are required, in that testing must demonstrate not only that software is doing what it should do, but also that software is not doing what it shouldn't do.

Test plans are commonly created during software planning and design in conjunction with technical requirements that are being generated. Through the inclusion of test cases, stakeholders effectively enumerate not only the functional and performance requirements that must be met for software acceptance, but also specifically how those requirements will be unambiguously demonstrated and measured before software release. Testability principles such as modularity and readability facilitate testing clarity and can yield tests that more effectively demonstrate software success and adherence to test plan and test case requirements.

Because software testing—and especially testing done through formalized test plans—is egregiously absent from many end-user development environments, this chapter describes, differentiates, and demonstrates multiple software testing modalities. It further describes the benefits of formalized test plans, including their critical role in software quality assurance plans, as well as their usefulness in creating a battery of test cases that can collectively demonstrate software success. While testability remains the quality characteristic that high-performing software should demonstrate, a culture of software testing must first exist to ensure benefits will be derived from delivering more testable software.

SOFTWARE TESTING

Software testing is “the dynamic verification of the behavior of a program on a finite set of test cases, suitably selected from the usually infinite executions domain, against the expected behavior.”2 Testing does two things: it demonstrates correct function and performance, and it identifies software vulnerabilities and the threats that could exploit them. Because of the critical role of testing in software development, the SDLC always contains a testing phase that validates software against requirements.

The primary role of testing is to validate whether software meets functional and performance objectives. Testing doesn't include the refinement and refactoring that occur naturally during development as SAS practitioners strive to correct defects and improve performance to meet software objectives. For example, a SAS practitioner might first write a process that relies on the SORT procedure to order a data set but, after “testing” the process, might refactor it to order the data with a data set index instead. This type of empirical testing can be integral to the development process, but it does not represent the testing phase of the SDLC. Testing instead examines software or software modules that are believed to be complete and validates them.

The secondary role of testing is to ensure that software is robust against sources of failure, thus testing often requires both normal and exceptional injects and environmental characteristics. In other words, testing aims to uncover all vulnerabilities, including defects, software errors, and errors in technical requirements, that pose a risk to software. It's not enough to know that software is doing the right thing—it's equally important to be confident it's not doing the wrong thing.

Testers

Too frequently in software development literature, software testing seems to be performed by some elite warrior class of testers who operate as part of an independent quality assurance team. While these environments do exist, in reality, much software testing is performed by developers with concomitant software testing responsibilities. This cross-functionality, a commonly sought objective within Agile development, can effectively increase the skillset of developers and the team as a whole.

To be effective testers, however, developers must espouse a new perspective. Rather than focusing on developing quality software, their focus must turn to ways to undermine, antagonize, and otherwise threaten that software. When a team completes software, they should believe that they've produced a masterpiece—an amalgam of perfectly poised statements that will thrill the customer and meet all needs and requirements for the expected lifespan of the software. If latent defects or vulnerabilities do exist when the software is released, the developers should have documented these within the code or in a risk register, and the respective risks they pose should be known to and accepted by the customer.

Dedicated software testers, on the other hand, receive code, review software requirements, and seek to ensure that the former fulfills the latter, doing nothing more and nothing less. Demonstration of positive (or correct) behavior can be straightforward given adequate technical specifications, but testers must also identify negative behavior. In doing so, they must attempt to break software—assume that vulnerabilities exist, find them, exploit them, document them, and possibly suggest remedies. Rather than trying to create something beautiful, testers are trying to tear it apart. But their role and unique attitude are absolutely essential.

Developers can make excellent testers, but not necessarily for the software they have developed. Because testing immediately after development would require SAS practitioners to attack a masterpiece they had just created, a cooling-off period between development and testing can facilitate more objectivity. However, because this delay can thwart business value and deadlines, peer code reviews offer one form of testing in which developers swap code (and respective requirements documentation) for a more immediate quality assurance review.

Within end-user development environments, SAS practitioners themselves are both software developers and users. These environments often diverge sharply from SDLC best practices, for example, in teams for which no formal software testing phase or protocols exist, whose developers often “fix it on the fly.” Moreover, in end-user development environments, software integrity is more likely to be conveyed to customers and other stakeholders through faith in and friendship with developers than through formalized testing and test documentation. Even within end-user development environments, however, SAS practitioners can espouse and benefit from formalized software testing plans and methods. Thus, in addition to wearing the hats of developer and user simultaneously, end-user developers should additionally incorporate software tester into the litany of roles they perform.

Test Plans

A test plan is a “document describing the scope, approach, resources, and schedule of intended testing activities.”3 A test plan is beneficial because it formalizes the importance of testing activities within the SDLC, demonstrating a commitment to quality to and from all stakeholders. While technical requirements convey the objectives that software must achieve, a formalized test plan conveys the methods and metrics to determine whether those objectives were achieved.

Comprehensive test plans should reference specific software requirements. For example, a requirement might state that a software module should be able to process 10 million observations efficiently while not exceeding a threshold level of system resources. In development, the module would have been executed, but not necessarily with the volume or variability of data throughput expected in software operation. Specifying within a test plan that load testing is required with a realistic volume of 12 million observations allows testers to demonstrate to stakeholders that software will sufficiently scale as required.

Many test plans require automated tests, which are essentially additional SAS programs that test and validate the actual software being developed. Automated testing is preferred because results represent a repeatable, defensible product that can be submitted to the customer with the final software product. Where test plans are required to be submitted to a customer or saved for posterity, test data and test cases should also be archived with the automated testing software.

Test Cases

Test cases are a “set of inputs, execution conditions, and expected results developed for a particular objective, such as to exercise a particular program path or to verify compliance with a specific requirement.”4 Test cases are used to test the infinite with the finite. Therefore, while the aggregate variability facing software may be multifaceted and fairly limitless (when environmental factors and data variability are included), a finite set of test cases should be extracted that, when cumulatively executed, adequately ensures that all aspects of the software have been tested.

Test cases allow developers to maintain autonomy over the design and development of software, but provide an enumeration of standards against which that software will ultimately be judged. A test plan often comprises an enumeration of test cases that must be demonstrated for the software to be validated and accepted. For example, because SAS comments can begin with an asterisk, the following test plan excerpt demonstrates test cases that can be checked off as software developers or testers complete the test plan:

* TEST: demonstrate an existent data set succeeds with ALL type;
* TEST: demonstrate an existent data set succeeds with CHAR type;
* TEST: demonstrate an existent data set succeeds with NUM type;
* TEST: demonstrate an existent data set succeeds with the type missing;
* TEST: demonstrate an existent data set fails with GIBBERISH type;
* TEST: demonstrate a missing data set fails;

In the following “Unit Testing” section, these test cases are expanded with test code that demonstrates success or failure of each test. Some formalized test plans additionally include specific return codes that should be created for successful and failed conditions.

Test Data

While test cases represent positive and negative conditions that software must successfully navigate to validate technical requirements, test data are the data that software evaluates to validate its function and performance during testing. In user-focused software applications, these data often represent user input as well as other information passed inside software. Thus, parameters passed to a SAS macro module or return codes from a child process also represent test data that could be used to validate test cases. For example, to validate a macro with one parameter, several test data might be required, each representing different parameter conditions—a correct value, an incorrect value, an invalid value, and a missing parameter.

In data analytic development environments, test data more commonly represent data sets or input that are ingested and processed. In some environments, especially those with low-volume data, actual data may be used as test data. This is common, for example, in clinical trials environments that have relatively small sample sizes. One risk of using real data to test software is the likelihood that real data, especially in smaller quantities, will not demonstrate the full variability of valid data described through data models or other business rules. For example, business logic might prescribe that SAS software take certain actions when an observation is encountered for a patient with epilepsy. However, if no epileptic patients exist in a data set, then the data are insufficient to validate test cases.

A second risk of using real data to test software is the likelihood that sufficient invalid data (i.e., those representing negative test cases) will not exist. For example, if a business rule specifies that a patient's sex must be either M or F in a data set, and if data constraints enforce this rule through conditional SAS logic, that logic can only be fully tested when valid (M and F) and invalid (H and J) test data are processed. Although actual data may be a good substrate on which to build test data sets, additional data must often be added to ensure that all positive and negative test cases are sufficiently represented. Caution should also be exercised when real and fictitious data are commingled to ensure that test data are never confused or integrated with actual data.

While Base SAS does not directly support data relationships indicative of relational databases, data sets are often constructed to represent relational database tables. Thus, integrity constraints and other business rules may effectively create an equivalent relational structure among SAS flat data sets. Where complex relationships do exist among data sets, test data will be commensurately complex. For example, if a one-to-many relationship exists between a primary key and foreign key in two data sets, test data should not only represent valid and invalid values but also valid and invalid relationships between the respective data sets to ensure that quality controls appropriately flag, expunge, delete, or modify the invalid values.

Test data sets tend to accrete over time as additional variability is discovered or incorporated within operational data sets. For example, a development team might not consider during initial planning and design how to handle duplicate (i.e., invalid) values in a particular data set. When a duplicate value is first encountered, the error is detected, stakeholders establish a business rule to prescribe error handling, a test case is created, and test data should be modified to include the exceptional data so that updated software can demonstrate that it identifies and handles duplicate data. Because test cases and test data tend to increase over time—even after software release—it can be beneficial to archive test code, test cases, test data, and test results with software whenever production software is validated. This versioning enables stakeholders, if necessary during a review or audit, to recreate historical testing requirements, conditions, and results.

Testing Modalities

A multitude of software testing modalities exist, many of which overlap each other in intent, function, and technique. The majority of testing activities understandably occur within the testing phase of the SDLC. Some testing also continues into software operation, especially where development and test environments can't fully replicate the production environment. As defined by the International Organization for Standardization (ISO), common types of software testing include:

  • Informal Testing—“Testing conducted in accordance with test plans and procedures that have not been reviewed and approved by a customer, user, or designated level of management.”5
  • Formal Testing—“Testing conducted in accordance with test plans and procedures that have been reviewed and approved by a customer, user, or designated level of management.”6
  • Automated Testing—Any type of software testing performed in which tests are conducted by repeatable programs that validate or invalidate functionality or performance of the original software.
  • Unit Test—“1. Testing of individual routines and modules by the developer or an independent tester. 2. A test of individual programs or modules in order to ensure that there are no analysis or programming errors.”7
  • Branch (Path) Testing—“Testing designed to execute each outcome of each decision point in a computer program.”8
  • Functional Testing—“Testing that ignores the internal mechanism of a system or component and focuses solely on the outputs generated in response to selected inputs and execution conditions; testing conducted to evaluate the compliance of a system or component with specified functional requirements.”9
  • Performance Testing—“Testing conducted to evaluate the compliance of a system or component with specified performance requirements.”10
  • Load Testing—Testing to evaluate a component at the anticipated load, demand, or usage level. While the ISO definition of stress testing incorporates load testing, many software texts differentiate load testing as testing at but not exceeding the anticipated volume of data, demand, users, or other attributes.
  • Stress Testing—“Testing conducted to evaluate a system or component at or beyond the limits of its specified requirements.”11
  • Integration Testing—“Testing in which software components, hardware components, or both are combined and tested to evaluate the interaction among them.”12
  • Regression Testing—“Selective retesting of a system or component to verify that modifications have not caused unintended effects and that the system or component still complies with its specified requirements; testing required to determine that a change to a system component has not adversely affected functionality, reliability, or performance and has not introduced additional defects.”13
  • Acceptance (Operational) Testing—“Testing conducted to determine whether a system satisfies its acceptance criteria and to enable the customer to determine whether to accept the system.”14

Informal Testing

The ISO definition of informal testing differs from formal testing only in that informal test plans and testing procedures have not been “reviewed and approved by a customer, user, or designated level of management.”15 But this supposes that test plans and procedures do exist, which may not be the case in some environments. Informal testing, especially in end-user development environments, can include developers reviewing the output of some process under ideal conditions and with ideal data injects. This type of manual inspection essentially provides face validity but not integrity. That is, the software prima facie appears to perform as it should and is believed to be correct—until demonstrated otherwise.

Informal testing is adequate for some purposes and environments, but its convenience underlies its vulnerabilities and detractors. When informal testing is not documented, developers, customers, and other stakeholders must essentially take someone's word that the code “looks right” or “ran correctly,” but this informal process is neither repeatable nor defensible. For example, as demonstrated in the “Plan to Get Hit by a Bus” section in chapter 15, “Readability,” when poor Thomas fails to show up to work one day, because his test plans and test results are documented nowhere, his quality assurance efforts cannot be demonstrated and testing must redundantly be performed again. When no documentation of testing exists, stakeholders are essentially placing their trust only in the developer performing the testing rather than appropriately in the integrity of the SDLC and formal testing procedures.

The ISO distinction between informal and formal testing is the review and approval process of testing procedures. The specification of technical requirements during software planning is intended to ensure that all stakeholders have a clear, unambiguous vision of required functionality and performance. However, if a performance requirement specifies that software must efficiently scale to handle 10 million observations and be portable between the SAS Display Manager and SAS Studio environments, then a test plan should exist that details how those requirements will be demonstrated to the customer. Test plans do not have to be lengthy documents; in this example, a single statement could indicate that the software would be required to produce identical results and performance in the two SAS environments. In formal testing environments, test cases would be required to state specifically how equivalent function and performance would be both defined and measured.

Formal Testing

Formal testing specifies not only that a test plan and procedures exist, but also that the plan has been accepted by stakeholders. Where software technical requirements exist and customer acceptance of software is required, formal software testing can facilitate the acceptance process by providing concrete methods and metrics against which performance is tested. Moreover, those performing the testing will understand that their work is part of an organized, empirical process as they carry out the test plan. Thus, the formal testing process ensures that developers, customers, and other stakeholders agree on the methodology that will evaluate whether software is complete—often referred to anecdotally as the definition of done.

Automated Testing

Automated testing (or test automation) describes software that tests other software. This includes third-party open source or commercial-off-the-shelf (COTS) test software, as well as test software developed internally by developers and testers themselves. The benefits of automated testing are numerous, and within many development environments, “testing” is synonymous with “automated testing” in that all tests must be automated. Automated testing can be run with little effort and provides structured, repeatable, defensible results that can validate software or uncover defects or vulnerabilities. Moreover, automated tests can be archived with code and test data so that test conditions (and results) can be replicated if necessary for an audit.

Automated testing is the standard for third-generation languages (3GLs) such as Java or Python. Despite its prevalence in other languages, and in part because third-party testing software is not widely available for Base SAS, automated testing is rarely described in SAS literature and likely to be implemented on a limited scale in few environments. The quality of SAS software can be substantially improved through automated testing, however, in part because of the vulnerabilities that tests can uncover, and in part because automated testing encourages SAS practitioners to embrace a testability mind-set throughout planning, design, and development. SAS practitioners who develop within environments that espouse automated testing typically know what tests their software will have to pass even before they begin developing software, which helps frame design and development activities.

Two of the most common automated tests are unit testing and functional testing. Unit testing inspects an individual module of software, typically irrespective of other software elements or functionality. For example, a unit test for a SAS macro might validate how the module handled valid and invalid data and parameter input. Especially where the unit being tested represents a SAS macro, unit tests can be conceptualized as fake parent processes that call the SAS macro to validate specific test cases. Functional testing conversely examines only the big picture—software inputs and outputs—irrespective of what steps occur in the middle. In a sense, functional testing describes a unit test at the software level. Automated unit testing and functional testing are demonstrated later in the “Unit Testing” and “Functional Testing” sections, respectively.

Automated tests can be written before software is developed, as is central in test-first development and test-driven development (TDD) environments. Automated tests can also be written in tandem with or immediately after code within the development phase, or after software is completed in a separate testing phase. Especially in Agile and other rapid-development environments, writing automated tests often occurs with or immediately following development because software components must be designed, developed, tested, accepted, and released all within a time-boxed iteration. This proximity of development and automated testing contrasts with phase-gate Waterfall environments in which automated tests may be written weeks or months after software development has occurred.

Regardless of the manner in which automated testing is implemented, all automated tests should be functional when software is validated and accepted. In environments that utilize automated testing, the degree to which software morphs throughout the SDLC will require that automated tests are also refined to ensure they accurately represent software behavior before software acceptance. For example, if the functional intent of a SAS macro changes after production or the macro later incorporates additional parameters, these changes should be reflected in commensurately updated automated unit testing.

Unit Testing

Unit testing tests the smallest functional module of software. Under the principles of software modularity, modules should be functionally discrete, loosely coupled, and encapsulated. Thus, testing a unit in theory is straightforward, because it tests a small portion of code that should lack complexity. Unit testing is often referred to as white-box testing because the inner mechanics of modules are exposed and validated, as contrasted with functional testing, referred to as black-box testing. To be clear, however, white-box testing can also refer to other invasive species of testing, such as branch testing, integration testing, or regression testing.

In SAS development, unit testing typically refers to testing a single macro, DATA step, or batch job, so long as modular design principles were espoused. If your macro is 150 lines, however, this may include composite—not discrete—functionality, more closely representing monolithic than modular design. While it is possible to functionally test monolithic code, it is definitively impossible to unit test a complex software module with composite functionality. Just as modular software is expected to be loosely coupled, unit tests should also have as few interdependencies with other unit tests as possible. Modular software design and functional decomposition to identify and isolate discrete functionality are discussed throughout chapter 14, “Modularity.”

Consider the %VARLIST macro, demonstrated in the “Macro Comments” section in chapter 15, “Readability”:

* creates a space-delimited list of all variables within a data set;
***** does not validate existence of the data set;
***** assumes a shared lock can be obtained on the data set;
***** does not validate the TYPE parameter;
%macro varlist(dsn= /* data set name in LIB.dataset or dataset format */,
   type= /* ALL, NUM, or CHAR, the type of variables returned */);
%local vars;
%local dsid;
%local i;
%global varlist;
%let varlist=;
%let dsid=%sysfunc(open(&dsn, i));
%let vars=%sysfunc(attrn(&dsid, nvars));
%do i=1 %to &vars;
   %let vartype=%sysfunc(vartype(&dsid,&i));
   %if %upcase(&type)=ALL or (&vartype=N and %upcase(&type)=NUM) or
         (&vartype=C and %upcase(&type)=CHAR) %then %do;
      %let varlist=&varlist %sysfunc(varname(&dsid,&i));
      %end;
   %end;
%let close=%sysfunc(close(&dsid));
%mend;

Unit testing specifies that the smallest module should be tested, but it doesn't specify the type of test that is being performed. For example, if conditional logic branching occurs within a unit, branch testing might be performed as part of a unit test. Functional testing is also typically included because units should have some function or purpose, even if they are only called as a child process. Unit testing, like many other types of testing, overlaps significantly with testing modalities but specifies that each module should be tested thoroughly and severally.

The following code demonstrates three unit tests that show the expected output from the %VARLIST macro:

* TEST GROUP: pass existent data set with valid and invalid data types;
libname test 'c:perm';
%macro test_varlist;
data test.test_dsn;
   length char1 $10 char2 $10 num1 8 num2 8;
run;
* TEST: demonstrate an existent data set with ALL type;
%varlist(dsn=test.test_dsn,type=ALL);
%if &varlist=char1 char2 num1 num2 %then %put ALL: PASS;
%else %put ALL: FAIL;
* TEST: demonstrate an existent data set with NUM type;
%varlist(dsn=test.test_dsn,type=NUM);
%if &varlist=num1 num2 %then %put NUM: PASS;
%else %put NUM: FAIL;
* TEST: demonstrate an existent data set with missing type;
%varlist(dsn=test.test_dsn,type=);
%if %length(&varlist)=0 %then %put Missing TYPE: PASS;
%else %put CHAR: FAIL;
%mend;
%test_varlist;

If any of the tests produce a FAIL, this indicates an error in the module. In this example, three test cases are demonstrated. The first two occur under normal operation (i.e., positive test cases) while the third represents an exceptional event (i.e., negative test case) in which the TYPE parameter was omitted. Other unit tests would be required to test this module fully, but this represents a start in the right direction.

Unit testing is invaluable because it tests discrete modules of software functionality and, in theory, unit tests should be sufficiently decoupled from each other to enable them to follow the code they validate. For example, if a modular SAS macro is developed and is flexible enough that it warrants inclusion in a reuse library—discussed in the “Reuse Library” section in chapter 18, “Reusability”—not only the macro but also its separate unit testing code and logs can be included in the reuse library to benefit future reuse and repurposing. Unit testing code and unit testing results are sometimes provided (with completed software) to customers and other stakeholders to validate functionality during software acceptance.

Branch Testing

Branch or path testing tests the sometimes-complex reticulations of conditional logic statements. In theory, all possible branches should be tested, but this can be difficult where branches represent environmental attributes rather than parameters that are passed. For example, if one branch of a program executes only when the software detects it is running in a UNIX environment, this can be difficult to fake when testing is occurring within Windows.

To demonstrate branch testing for the %VARLIST macro described in the “Unit Testing” section, two additional unit tests can be added to demonstrate program flow when the TYPE parameter is either missing or invalid:

* TEST: demonstrate an existent data set with the type missing;
%varlist(dsn=test.test_dsn,type=);
%if %length(&varlist)=0 %then %put Missing: PASS;
%else %put Missing: FAIL;
* TEST: demonstrate an existent data set with an invalid type;
%varlist(dsn=test.test_dsn,type=BLAH);
%if %length(&varlist)=0 %then %put Missing: PASS;
%else %put Missing: FAIL;

When these additional tests are run within the structure of the %TEST_VARLIST macro, they demonstrate that either an empty TYPE parameter or an invalid TYPE parameter will produce a global macro variable &VARLIST that is empty. Coupled with the prior three unit tests, these five tests collectively demonstrate that the branch testing has passed.

One of the most critical uses for branch testing can be to test the fail-safe path to ensure that software fails gracefully in a secure manner. The fail-safe path is discussed in the “Fail-Safe Path” sections in chapter 11, “Security,” and is required when robust software encounters a runtime error or other exception from which it cannot recover. For example, when software fails, data sets may be explicitly locked, file streams may be open, or temporary or invalid data sets may have been created—each of which can require some action in the fail-safe path to return software to a secure state. Branch testing the fail-safe path is the only way to ensure that when software does fail, it does so as prescribed through business rules.

Functional Testing

Functional testing takes into account software inputs and outputs but does not test what occurs between these two extremes. For this reason, functional testing is often referred to as black-box testing because software (or, in theory, even developers) cannot view the module interior—only inputs and outputs are observable. The test cases in both the “Unit Testing” and “Branch Testing” sections are examples of functional testing at the module level. In general, however, functional testing refers to testing significantly larger software products having complex, composite functionality.

In the following example, the %GET_MEANS macro runs the MEANS procedure and saves the mean value of one or more variables to a data set. The macro calls the %VARLIST macro, which provides input for the VAR statement in the MEANS procedure, after which output is generated to the Means_temp data set:

data final;
   length num1 8 num2 8 char $10;
   char="n/a";
   do num1=1 to 10;
      num2=num1+10;
      output;
      end;
run;
* saves mean values of one or more variables to a data set;
%macro get_means(dsn=);
%varlist(dsn=&dsn,type=num);
proc means data=&dsn noprint;
   var &varlist;
   output out=means_temp;
run;
%mend;
%get_means(dsn=final);

To functionally test the %GET_MEANS macro, it's not necessary to directly test the inputs, outputs, or any other aspect of the %VARLIST procedure. The functional test for %GET_MEANS should only evaluate the input (Final data set) and output (Means_temp data set) for the macro itself. The %VARLIST macro and other functionality inside %GET_MEANS essentially occur within the black box, as demonstrated in the following functional test:

* TEST GET_MEANS;
* show means in output data set are accurate;
proc means data=final noprint;
   var num1 num2;
   output out=test_get_means;
run;
%get_means(dsn=final);
proc compare base=test_get_means compare=means_temp;
run;

The COMPARE procedure validates that the MEANS procedure produces identical results to the %GET_MEANS macro. As demonstrated, when the functionality being tested is simple, such as that in the %GET_MEANS macro, the test itself may be as long as or longer than the code being tested. As the complexity of larger programs increases, however, functional testing code will remain relatively small because it relies only on primary inputs and outputs and does not reference or utilize secondary (or intermediary) inputs or outputs.

Some functions specified through technical requirements can only be tested and validated by inspecting software or its products. In data analytic development, for example, a data product such as a SAS report can be examined in part through automated testing if the REPORT procedure utilizes the OUT statement to create a testable data set. However, other functional requirements of the report such as formatting and aesthetic attributes can only be validated through manual inspection.

Performance Testing

Performance testing tests performance rather than function, and includes load testing, stress testing, and typically all tests of speed, efficiency, and scalability. Performance testing is especially useful in data analytic development due to data variability, including fundamental differences that can exist between test data and production data. Load testing and stress testing are described later in their respective sections while methods to facilitate greater speed, efficiency, and scalability are described in chapter 7, “Execution Efficiency,” chapter 8, “Efficiency,” and chapter 9, “Scalability.”

Performance testing often involves testing software against a boundary value—that is, a value “that corresponds to a minimum or maximum input, internal, or output value specified for a system or component.”16 Boundary values can be used to facilitate secure software, for example, by detecting erroneous inputs that could lead to buffer overflows or other failures. More relevant to data analytic development, boundary values are used in technical requirements to specify data volume or velocity capabilities required by software. Load testing is distinguished from stress testing in that the former performs testing at boundary limits while the latter performs testing beyond boundary limits. In other words, successful load testing should be demonstrated before software acceptance, whereas stress testing places software in situations or environments to elicit when likely or inevitable failure will occur.

Performance testing is sometimes conceptualized to include reliability testing, although this lies in a grey zone that spans functionality and performance testing. If software functions but does so unreliably, its functionality will also be diminished when availability of the data product, output, or solution is compromised. For example, the %GET_MEANS macro demonstrated in the earlier “Functional Testing” section fails if the data set is exclusively locked by another user or process, which threatens the robustness and reliability of the software.

To detect the exception—a locked data set—an additional module (%DSN_AVAILABLE) is developed that tests the existence and availability of a data set. The OPEN function returns a 0 value for the &DSID macro variable if the data set does not exist or is exclusively locked:

* tests existence and shared lock availability of a data set;
%macro dsn_available(dsn= /* data set in LIB.DSN or DSN format */);
%global dsn_available_RC;
%let dsn_available_RC=;
%local dsid;
%local close;
%let dsid=%sysfunc(open(&dsn));
%if &dsidˆ=0 %then %do;
   %let dsn_available_RC=TRUE;
   %let close=%sysfunc(close(&dsid));
   %end;
%else %let dsn_available_RC=FALSE;
%mend;

By implementing the macro within the %GET_MEANS program and testing the value of the &DSN_AVAILABLE_RC return code immediately before attempting the MEANS procedure, robustness is improved because the MEANS procedure cannot be invoked on a missing or locked data set. Nevertheless, software reliability really is not improved because the software still terminates and business value is lost when a data set is locked. While software has been made robust to one type of exception, to be made more reliable, additional exception handling would be required to wait for the data set to be unlocked, provide a duplicate data set to failover to, or provide some other mechanism through which business value could still be achieved.

The following revised %GET_MEANS macro includes the %DSN_AVAILABLE quality control:

* saves mean values of one or more variables to a data set;
%macro get_means(dsn= /* data set in LIB.DSN or DSN format */);
%dsn_available(dsn=&dsn);
%if &dsn_available_RC=TRUE %then %do;
%varlist(dsn=&dsn,type=num);
   proc means data=&dsn noprint;
      var &varlist;
      output out=means_temp;
   run;
   %end;
%else %put DATA SET MISSING OR LOCKED;
%mend;
%get_means(dsn=final);

After integrating the new module into the %GET_MEANS macro, integration testing should ensure the software functions as intended. This is demonstrated in the “Integration Testing” section later in the chapter, which—spoiler alert—uncovers a logic error in the %GET_MEANS macro.

Load Testing

Load testing describes testing software against data volume or demand (usage) described in software requirements. While load (demand) testing describes testing a software application to ensure it functions effectively with 50 simultaneous users (if this represents the expected user demand), for purposes of this chapter, only load testing that tests the predicted volume or velocity of data throughput is discussed. For example, if an extract-transform-load (ETL) system is being developed and is required to process a maximum of 20 million observations per day, at some point the software would need to be load tested against this predicted volume. During development and early testing, however, it's common to run and test software with less voluminous data to facilitate faster performance and test results.

Load testing should not, however, be misconstrued to represent testing at only the expected data levels specified in requirements. For example, if 20 million observations is the expected data volume, load testing might test the software at 125 percent of this value to demonstrate how software would handle this variability. In other cases, load testing is utilized if the predicted load is expected to increase over time. For example, other requirements might not only specify an initial volume of 20 million observations per day but also specify that an additional 5 million observations should be expected per year thereafter. In this second scenario, load testing might be conducted to the expected five-year mark—that is, 45 million observations—to determine how the software will handle the eventual load. This additional load testing beyond current load volume enables stakeholders to plan programmatic and nonprogrammatic solutions for the future if test results demonstrate that software will not meet performance requirements as data attempt to scale.

Load testing often demonstrates the ability of software to scale—with respect to data volume and velocity—from a test environment to a production environment, and possibly to an additional theoretical future environment. By ISO definition, load testing is subsumed under stress testing and not differentiated. However, because this differentiation is commonly made throughout software development literature, it is made throughout this text. As stated in the “Test Data” section, test data should also be sufficiently variable to demonstrate the breadth of both positive and negative test cases; however, variability is typically not the focus of load testing.

Stress Testing

Stress testing demonstrates the performance of software not only within expected boundaries but also beyond. Where load testing and stress testing are differentiated (as throughout this text), however, stress testing refers more selectively to testing software beyond its operational capacity, typically with the intent to expose vulnerabilities, performance failure, or functional failure. In some cases, stress testing is conducted up to a statistical threshold determined from requirements specifications. For example, if requirements state that software must process 20 million observations per day, stress testing test cases might specify testing at 500 percent of that value.

In other cases, stress testing denotes incremental testing until the software experiences functional or performance failure. For example, stress testing might create test data sets of increasingly larger size until some component of the program runs out of memory or produces other runtime errors. This equates to functional stress testing, because it tests the software product as a whole against one or more stressors such as data volume. In chapter 8, “Efficiency,” the “Memory” section demonstrates stress testing of the SORT procedure until it begins to perform inefficiently and fails. And, in the “Inefficiency Elbow” section in chapter 9, “Scalability,” the use of FULLSTIMER performance metrics (to aid in performance and functional failure prediction) is demonstrated.

In applications development, stress testing is a critical component of facilitating software security because malicious parties can attack and potentially exploit software by stressing it, for example, by overwriting software with code injection or buffer overflow attacks. While software attacks are beyond the scope of this text, stress testing remains a critical step toward ensuring and validating that software will not fail when grossly gratuitous inputs, injects, or data throughput are processed. Stress testing in part aims to test the limits of data validation and other quality controls to ensure they are adequate.

Environmental stress testing changes aspects of the environment to see how software responds. In hardware stress testing, this could involve testing how equipment performs in extreme heat, cold, moisture, or in the face of some other physical threat. In software stress testing, however, more common environmental threats include the decrease of system resources, such as low-memory conditions. For example, stress testing might halve the amount of memory available to the SAS application to test how software performs in a low-memory environment. A complete stress test would continue decreasing available memory until functional or performance failure resulted, thus generating a threshold that could be used to deter and detect future failures of this type.

True stress testing stands alone as the one type of test that software should often not be able to pass. Especially in cases where stress testing is designed to elicit software failure, it's understood that failure will be evident, and testing is intended to document the boundary at which failure occurs. In some cases, where stress tests reveal that failure thresholds are too proximal to predicted data loads, additional refactoring may be required to ensure that software does not fail under normal utilization. In other cases, stress testing will yield thresholds so high (or low) that they will never be achieved in software operation and thus present no risk.

In some cases, developers are actually testing the limits of the software language itself. For example, SAS documentation describes the SYSPARM command line option but fails to specify any upper limits, such as the maximum length of the string that can be passed. This might be crucial to understand for peace of mind, so the following child process can be saved as C:permstress.sas, which assesses the length of the &SYSPARM macro variable after it is passed from a parent process:

libname perm 'c:perm';
%macro test;
%if %sysfunc(exist(perm.stress))=0 %then %do;
   data perm.stress;
      length text $10000 len 8;
   run;
   %end;
data perm.stress;
   set perm.stress end=eof;
   output;
   if eof then do;
      text=strip("&sysparm");
      len=length(text);
      output;
      end;
run;
%mend;
%test;

When the following parent process is executed, it tests the lengths of SYSPARM iteratively from 1 to 100 characters:

%macro sysparm_stress_test();
%local var;
%do i=1 %to 100;
   %put &i;
   %let var=&var.1;
   systask command """%sysget(SASROOT)sas.exe"" -noterminal -nosplash    -sysparm ""&var"" -sysin ""&permstress.sas"" -log    ""&permstress.log"" -print ""&permstress.lst"""    status=stress_status taskname=stress_task;
   waitfor _all_ stress_task;
   %end;
%mend;

The partial results demonstrate that at the 83rd iteration, one threshold of the SYSTASK statement itself (rather than the SYSPARM option, which was the intended test subject) was reached, as the length of the quotation used by the SYSTASK COMMAND statement exceeded the BASE SAS threshold:

82
NOTE: Task "stress_task" produced no LOG/Output.
83
WARNING: The quoted string currently being processed has become more than 262 characters long.
         You might have unbalanced quotation marks.
NOTE: Task "stress_task" produced no LOG/Output.

Despite the warning message, examination of the PERM.Stress data set reveals that the SYSTASK statement continued to execute and the SYSPARM length continued to increment, so this warning poses no actual limitation or threat. Further stress testing (not demonstrated) reveals that at a length of 8,076 characters, the SYSTASK statement itself begins to fail, producing a STATUS return code of 104 but no warnings or runtime errors in the log. This demonstrates the value not only in stress testing but also in exception handling to validate the success of critical processes. The SYSTASK statement and SYSPARM option are discussed further throughout chapter 12, “Automation.” Armed with this new information (and system boundary) achieved through stress testing, SAS practitioners can more confidently implement SYSTASK and the SYSPARM parameter.

Integration Testing

Once modules of code have been independently tested and integrated into a larger body of software, it's critical to understand how all the pieces fit together. Integration testing accomplishes this by demonstrating that modules work together cohesively and do not create additional vulnerabilities in their collective implementation. The %GET_MEANS macro demonstrated earlier in the “Performance Testing” section includes a logic error that makes it vulnerable to either missing or locked data sets—the very two vulnerabilities that the additional exception handling sought to alleviate!

While the OPEN function can demonstrate that a data set exists and is not exclusively locked, the %DSN_AVAILABLE macro subsequently closes the data set with the CLOSE function. Once closed, a separate process running in another session of SAS could sneak in and either delete or exclusively lock the data set. Therefore, although the &DSN_AVAILABLE_RC return code would indicate TRUE, by the time the MEANS procedure would have been executed, the data set could have been locked or deleted, causing the MEANS procedure to fail. This demonstrates the importance of integration testing, because at face value, the %DSN_AVAILABLE macro does determine data set existence and availability, its only two objectives. However, when implemented within this macro, it fails to eliminate the risk.

To solve this problem, the I/O stream to the data set must remain open while the MEANS procedure executes, thus preventing a concurrent session of SAS from deleting or exclusively locking the data set. Once the MEANS procedure terminates, the CLOSE function should be executed from within the %GET_MEANS module rather than from within the %DSN_AVAILABLE macro. This modification also requires that the &DSID macro variable be changed from a local to a global macro variable so that it can be read by the %GET_MEANS macro:

* tests existence and shared lock availability of a data set;
%macro dsn_available(dsn= /* data set in LIB.DSN or DSN format */);
%global dsn_available_RC;
%let dsn_available_RC=;
%global dsid;
%let dsid=%sysfunc(open(&dsn));
%if &dsidˆ=0 %then %let dsn_available_RC=TRUE;
%else %let dsn_available_RC=FALSE;
%mend;
* saves mean values of one or more variables to a data set;
%macro get_means(dsn= /* data set in LIB.DSN or DSN format */);
%dsn_available(dsn=&dsn);
%if &dsn_available_RC=TRUE %then %do;
%varlist(dsn=&dsn,type=num);
   proc means data=&dsn noprint;
      var &varlist;
      output out=means_temp;
   run;
   %let close=%sysfunc(close(&dsid));
   %end;
%else %put DATA SET MISSING OR LOCKED;
%mend;

While the updated code now achieves the functional intent and is robust against the possibility of a missing or exclusively locked data set, software modularity has unfortunately been decreased. Having to open the file stream inside the child process but close it in the parent process diminishes loose coupling and encapsulation. The child process performs a risky task—opening a file stream that requires a later %SYSFUNC(CLOSE) function—but has no way to guarantee that the file stream is closed by the parent process. Nevertheless, this tradeoff—the loss of static performance (modularity) to gain dynamic performance (robustness)—is warranted if the process needs to be protected from incursions from external SAS sessions. Loose coupling and encapsulation are discussed within chapter 14, “Modularity.”

Regression Testing

Integration testing adds a module to a larger body of code and then seeks to ensure that the composite software still functions as required. As demonstrated in the “Integration Testing” section, however, this can require subtle or not-so-subtle changes to modules, such when the %DSN_AVAILABLE macro had to be overhauled to correct a logic error. Regression testing, on the other hand, occurs after modules have been modified to ensure that current functionality or performance is not diminished. For example, in stable production software, a reusable module such as the %DSN_AVAILABLE macro might have been saved in the SAS Autocall Macro Facility or to an external SAS program and referenced with an %INCLUDE statement. This stability of a tested module facilitates its reuse and generalizability to other software and solutions.

However, because the macro had to be modified, all other software that referenced it would need to be tested to ensure that the macro still functioned in those diverse use cases. And, because the CLOSE statement had to be removed, in all likelihood, the modifications to the %DSN_AVAILABLE macro would have caused it to fail within software that referenced its original version. Regression testing thus seeks to demonstrate backward compatibility of modules or software after modification to ensure that past uses are not corrupted or invalidated.

A more secure (and backward compatible) solution instead modifies the %DSN_AVAILABLE macro for use in %GET_MEANS while not sacrificing its original design (that includes the CLOSE statement). This more robust modification passes regression testing by enabling backward compatibility:

* tests existence and shared lock availability of a data set;
%macro dsn_available(dsn= /* data set in LIB.DSN or DSN format */,
   close=YES /* default closes the data set, NO keeps the stream open */);
%global dsn_available_RC;
%let dsn_available_RC=;
%global dsid;
%let dsid=%sysfunc(open(&dsn));
%if &dsidˆ=0 %then %do;
   %let dsn_available_RC=TRUE;
   %if &close=YES %then %let close=%sysfunc(close(&dsid));
   %end;
%else %let dsn_available_RC=FALSE;
%mend;

The function is now backward compatible because it overloads the macro invocation with an extra parameter, CLOSE, that represents whether %DSN_AVAILABLE should close the data stream. Older code can still call the macro with the following invocation, which will close the I/O stream before the macro exits (by defaulting to CLOSE=YES):

%dsn_available(dsn=&dsn);

Or, as is required by the %GET_MEANS macro, the %DSN_AVAILABLE macro can be overloaded by including the CLOSE=NO parameter, which will keep the I/O stream open so that it can be closed by the parent process (rather than the child) after the MEANS procedure has completed:

* saves mean values of one or more variables to a data set;
%macro get_means(dsn=);
%dsn_available(dsn=&dsn, close=NO);
%if &dsn_available_RC=TRUE %then %do;
%varlist(dsn=&dsn,type=num);
   proc means data=&dsn noprint;
      var &varlist;
      output out=means_temp;
   run;
   %let close=%sysfunc(close(&dsid));
   %end;
%else %put DATA SET MISSING OR LOCKED;
%mend;
%get_means(dsn=final);

Especially where libraries of stable code exist that are used throughout production software by a development team or an organization, every effort should be made to ensure that modules are backward compatible when modified so that existent calls to and uses of those modules do not also have to be individually modified. Regression testing is an important aspect of software maintainability because it effectively reduces the scope of system-wide modifications that are necessary when only small components of software must be modified.

Regression Testing of Batch Jobs

Batch jobs are automated SAS programs spawned in self-contained sessions of SAS, launched from the command prompt, the SAS interactive environment, or other batch jobs. As demonstrated in the “Passing Parameters with SYSPARM” section in chapter 12, “Automation,” use of SYSPARM enables one or more parameters to be passed from the parent process (or OS environment) to the batch job. Inside the batch job, the &SYSPARM automatic macro variable receives the SYSPARM parameter, which can be parsed into one or multiple parameters or values.

A batch job typically represents the culmination of SAS production software because it has been tested, validated, automated, and often scheduled to ensure reliable execution. Even the most reliable batch jobs, however, must be maintained and occasionally modified to ensure they remain relevant to shifting needs, requirements, or variability in the environment. Furthermore, because development and testing occur in an interactive environment rather than the batch environment, batch jobs should ensure that they retain backward compatibility to the interactive environment, which can be facilitated through performing regression testing by running a batch job manually from the interactive environment.

To demonstrate a lack of backward compatibility, the following plain text batch file is saved as C:permfreq.bat:

"c:program filessashomesasfoundation9.4sas.exe" -noterminal -nosplash -sysin c:permfreq.sas -log C:permfreq.log -print c:permfreq.lst -sysparm "dsn=perm.final, table_var=char1"

The batch file calls the SAS batch job C:permfreq.sas, which runs the FREQ procedure on the data set specified in the SYSPARM internal DSN parameter. To ensure the data set is available (only in this simulation), the LIBNAME statement assigns the PERM library while the DATA step generates the PERM.Final data set. The following code should be saved as C:permfreq.sas:

libname perm 'c:perm';
data perm.final; * produces required data set;
   length char1 $10;
   do i=1 to 10;
      char1="obs" || strip(put(i,8.));
      output;
      end;
run;
* accepts a comma-delimited list of parameters in VAR1=parameter one, VAR2=parameter two format;
%macro getparm();
%local i;
%let i=1;
%if %length(&sysparm)=0 %then %return;
%do %while(%length(%scan(%quote(&sysparm),&i,','))>1);
   %let var=%scan(%scan(%quote(&sysparm),&i,','),1,=);
   %let val=%scan(%scan(%quote(&sysparm),&i,','),2,=);
   %global &var;
   %let &var=&val;
   %let i=%eval(&i+1);
   %end;
%mend;
* retrieves the parameters DSN (data set name) and TABLE_VAR (variable name to be analyzed);
%getparm;
proc freq data=&dsn;
   tables &table_var;
run;

This Freq.sas program runs from batch but, when executed interactively—including from either the SAS Display Manager or SAS Enterprise Guide—the software fails because the &SYSPARM macro variable is empty, not having been passed a SYSPARM parameter. To ensure backward compatibility, the software should be able to be run manually from the interactive mode as well as from batch. The following inserted code tests the &SYSPARM macro variable and, if it is empty, assigns default values to the global macro variables that otherwise would have been created with the %GETPARM macro:

* REMOVE IN PRODUCTION! * initializes SYSPARM for testing environment;
%macro testparm();
%if %length(&sysparm)=0 %then %let sysparm=dsn=perm.final, table_var=char1;
%mend;
%testparm; * REMOVE IN PRODUCTION!;

Inserting this code between the %MEND statement and %GETPARM invocation assigns &SYSPARM to a default value, thus allowing the %GETPARM macro to be executed interactively. The inserted code does have to be removed in production because it could mask errors that occurred, for example, if the batch job were called erroneously without the SYSPARM parameter. However, by inserting this succinct test code, it allows developers to run, modify, and test the program within an interactive environment. This discussion continues in the “Batch Backward Compatibility” section in chapter 12, “Automation.”

Acceptance Testing

Acceptance testing in many environments is less actual software testing and more a dog-and-pony show that demonstrates software to the customer, users, or other stakeholders. Acceptance testing primarily demonstrates software function and performance and, during formal acceptance testing, stakeholders may scrutinize requirements documentation in conjunction with software to validate that software has demonstrated all technical specifications. By validating the enumerated list of test cases, the customer or other stakeholders can be confident not only that software meets functional and performance requirements, but also that it meets formal testing requirements.

But in data analytic development environments, business value is typically conveyed not through software products (or the archetypal “releasable code” espoused in Agile literature), but through resultant data products. Moreover, unlike user-focused applications that may include a graphical user interface (GUI) that begs stakeholder interaction, Base SAS software products derive value by being accurate and fast, not flashy. In fact, because a common end-goal is to automate SAS production software as batch jobs that unobtrusively run in the background, in many cases there won't even be a SAS log that a customer can pretend to view as it scrolls by at an indecipherable pace.

The fundamental differences between software applications development and data analytic development may manifest in such a stark contrast that customers have little to no interest in reviewing software functional and performance requirements during acceptance testing. In cases in which business value is conveyed solely through a derivative data product—such as a report that must be written, derived from data generated through SAS software—some customers may not even acknowledge the milestone of software completion because software alone provides no business value. For example, after four long days of work, I once remarked to a customer, “I finished the ETL software for the new data stream,” to signal that he could review the software product. “Let me know when you finish the report,” was his only response, because only the derivative data product conveyed value to him.

In other cases, customers may recognize the intrinsic value in software that has been created, but only show interest in functional objectives that were met. Still other customers may be interested in function and dynamic performance, yet have no interest in static performance attributes. Regardless of the environment in which software is developed and from where business value is derived, if software performance and quality are important enough to discuss during planning and design for inclusion in software requirements, then they should be commensurately important enough to acknowledge and validate when software is completed as part of software acceptance.

When Testing Fails

Tests will fail. Sometimes tests fail expectedly because some functional or performance element was not specified in software requirements or was intentionally omitted. For example, if software requirements say nothing about testing for data set existence (before referencing the data set), some degree of robustness is lost, but this may be acceptable in software in which it is unlikely that the data set would be missing or in which the missing data set would pose little risk. Thus, it would be a waste of time to create an automated test to demonstrate what occurs when a data set is missing in software not required to demonstrate this type of robustness. When specific risks are accepted by stakeholders, there is no need to develop tests to demonstrate those failures—they can be chronicled in the failure log if necessary.

Sometimes tests fail unexpectedly when an unforeseen vulnerability is detected through software testing. After all, the intent of software testing is to identify both known and unknown vulnerabilities. The identified error or vulnerability should be discussed but, if it requires extensive modification of code, and depending on the software development environment and methodology espoused, the required maintenance might need to be authorized by a customer or other stakeholders. Where new risks are identified, they can be eliminated by correcting software defects, but customers also commonly accept the results of failed tests because the exposed vulnerabilities pose little overall risk to the software, or because additional functionality or the software release schedule are prioritized over maintenance.

While in many cases adequate software testing will identify previously unknown defects in software that must be remedied through corrective maintenance, in other cases the defects will be recorded in a risk register, accepted, and effectively ignored thereafter. The quality assurance aspect of testing is not intended to force unnecessary performance into software, but rather to identify vulnerabilities that exist and to ensure that software is being released with the desired level of functionality and performance and an acceptable level of risk.

TESTABILITY

The shift toward embracing testability first and foremost requires a commitment to software testing. Where software testing is not a priority within a team or organization and the testing phase of the SDLC is absent, testability principles may still benefit software in other ways but won't suddenly spur stakeholders to implement testing. The decision to implement formalized testing in software will only be made when its benefits are understood and its costs not prohibitive. Thus, where high-performing software is demanded, and especially where software audits are likely or expected, the embracing of testability principles will facilitate higher quality software.

Testability Principles

Testability principles facilitate higher quality software because software more readily and reliably can be shown to be functional and free of defects. The previous enumeration of formalized testing methods may be foreign to some SAS development environments that rely on less formal, manual code review to facilitate software validation. Notwithstanding the breadth of differences in intensity or formality with which testing can be approached, all software testing—from formalized test plans to informal code review—are benefited by testability principles.

SAS data analytic development environments are less likely to implement formalized software testing in part due to the practice of building monolithic software—large, single-program software products. This design practice in turn encourages programs that are more complex and less comprehensible to SAS practitioners, and thus not only more error-prone but also more difficult to debug, test, and validate. By deciphering smaller chunks of software, not only is understanding improved, but software function and performance can more easily be compared against technical requirements.

Technical Requirements

All testing begins with the definition and documentation of technical requirements that ideally should specify not only what software should do but also what it shouldn't do. With this groundwork in place, whether a formal test plan is created or not, subsequent tests will be able to be measured against those technical specifications. In software development environments that lack technical requirements, it is virtually impossible to test software because no functional or performance expectations exist, so no definition of done or software acceptance criteria can be validated.

Modularity

As demonstrated earlier in the “Unit Testing” section, even a small code module that is functionally discrete and relatively simple can have multiple vulnerabilities that should be tested, documented, and potentially resolved. While testing alone will not mitigate or eliminate vulnerabilities, it can uncover them and raise their attention to developers and stakeholders to ensure they are documented in a risk register and prioritized for possible future corrective maintenance. Without modular software design, testing will be significantly more difficult due to its inherent complexity while unit testing will be all but impossible where composite functionality exists and test cases cannot be disentangled.

Readability

Software readability improves the ability to test software efficiently and effectively. Readable software, described in chapter 15, “Readability,” references primarily the ease with which code can be comprehended. But where requirements documentation, project documentation, data documentation, or other relevant information can more fully describe software, these artifacts can also further comprehension and may be required in data analytic software. When software requirements convey the objectives that software should achieve, and code is clear and concise, software testing will be facilitated because test cases often follow naturally from requirements or software use cases. By eliminating confusion or ambiguity from both requirements and software, the assessment of how that software meets those requirements can be made much more readily.

WHAT'S NEXT?

The goal of software testing is to demonstrate the validity of software as assessed against technical requirements. Once tested, validated, and accepted, software can be released into production and, at that point, software should be sufficiently stable that it can perform without frequent modifications or maintenance. In the next chapter, the benefits of software stability are described and demonstrated, which can in turn facilitate greater code reuse and repurposing.

NOTES

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset