Chapter 18
Reusability

Roasting marshmallows in flowing lava…bucket list—CHECK!

Antigua, Guatemala. Back in my home away from home, my friend Lisa and I had already toured every cathedral in town, so we decided to take a day trip to the Pacaya volcano, the most accessible active volcano to the city.

Pacaya last gained fame when it erupted in March 2014, sending plumes of fire and smoke thousands of feet in the air and raining down ash throughout central Guatemala. But when not erupting, it's a great place for a hike and picnic.

We bought our bus tickets and, while doing some research online waiting for the bus, I read an offhanded blog post from someone claiming to have roasted marshmallows in one of Pacaya's accessible lava flows. Hmm

This seemed outlandish—certainly nothing the U.S. Park Service had allowed me to do in Hawaii—but on the off chance, we darted over to the supermercado (supermarket) and found not only marshmallows, but also graham crackers and chocolate bars. Sulfur s'mores, here we come!

The school bus (a chicken bus in training) took an hour and a half to wind its way toward the base of the mountain, where we disembarked and set out on the gravelly path. All signs of life and vegetation gave way to rough, volcanic terrain, seemingly smooth in the distance, but painfully unforgiving to the touch. And then, over a crest, distinct patches of red—lava!

There must have been a warning about melting the soles of your feet, that lava can be sharp as glass, that sulfur clouds can kill you, but I was too busy playing with the dog that had bounded up beside me, no doubt because he thought I was pulling treats out of my pack. I was—but the treats weren't for him.

Finding a stick for the marshmallows proved easy, but standing a couple feet from an active lava flow was a feat of endurance. I would wonder later—looking at the photos while I still nursed my blistered knuckles—why I hadn't thought to wrap my shirt (tied around my waist) around my hand, which was in danger of spontaneously combusting.

I burned enough marshmallows for Lisa and me and, as the only s'more representatives of the day, distributed the remaining supplies to the gaggle of tourists.

With a mix of disappointment and horror, we watched as every other tourist paid unaccompanied local children to go burn their hands (rather than roasting marshmallows themselves)—but the kids did get their fill of s'mores, and all survived.

images

Reuse comes in all shapes and sizes, requiring various levels of effort and input to implement. I only reused a crazy notion from a blog—taking s'more fixin's to an active volcano—to achieve a goal I didn't realize was even possible. In software development, code reuse often similarly occurs where technical methods or capabilities are abstracted—if not actually copied—from external sources including texts, white papers, and SAS documentation.

Other reuse can require less or no effort at all. The lazy tourists benefited from the s'more fixin's that Lisa and I had purchased and hauled to Pacaya. Moreover, in a classic build-versus-buy (or build-versus-burn) decision, the tourists further outsourced marshmallow roasting to seven-year-olds. Especially where reusable software modules are maintained within an organization, the objective of software reuse is to facilitate painless implementation of past functionality into future software.

Although a couple tourists did seem discombobulated by the notion of squishing mallow, chocolate, and graham into unwholesome goodness, most immediately began salivating at the sight of the ingredients—you know s'mores are incredible because you've tasted (tested) them your entire life. Software testing also greatly encourages software reuse because software integrity and popularity increase as stakeholders understand and expect its strengths and weaknesses. And s'mores have no weaknesses!

Reuse is also benefited by modularity. I hadn't brought s'mores to the party—I had brought s'more fixin's, the primordial ingredients of mallow, chocolate, and graham. So when one guy only wanted a roasted marshmallow, and a woman just wanted to chomp on raw chocolate (rather anticlimactically, given the opportunity at hand), these independent needs could also be fulfilled. Modular software design similarly increases reusability because smaller, functionally discrete components are more likely to be reused and repurposed.

As code reuse becomes more common within an organization and the clear benefits of reuse more recognizable, incorporation of reusability principles into software design begins to occur naturally. In my travels, I've similarly learned always to consult travel blogs for other unique experiences that I can borrow. Here's a hint—smuggle bananas into Machu Picchu if you want a herd of llamas to befriend and follow you relentlessly! Llamas love bananas!

DEFINING REUSABILITY

Reusability is the “degree to which an asset can be used in more than one system, or in building other assets.”1 The International Organization for Standardization (ISO) goes on to define reusable software as “a software product developed for one use but having other uses, or one developed specifically to be usable on multiple projects or in multiple roles on one project.” Primary software reuse incorporates internally developed code into future software products by the same developer, team, or organization. Secondary reuse further encompasses adoption and use of external software, including code and technical methods derived from textbooks, white papers, software documentation, and external teams and organizations. Therefore, software that exhibits reusability principles will encourage primary reuse by its own development team or organization, and secondary reuse by external teams or organizations.

Software reuse is a commonly sought objective in software development because it can substantially improve the speed and efficiency of the development process. Where tested, stable modules of code exist that can be immediately dropped into software, significant design, development, and testing effort can be eliminated and invested elsewhere. The aim of reusability is to develop software that is intended to be reused, rather than software that is reused solely through ad hoc or serendipitous implementation. Software reusability typically incurs a higher cost up front to design, develop, test, and validate. While this additional effort may not benefit the initial implementation of software, it represents an investment that pays dividends through future software reuse. In many cases, however, reusability principles—such as modular software design—can immediately benefit functional and performance objectives in the initial software use.

This chapter introduces software reuse, including two knowledge management artifacts—the reuse library and reuse catalog—that can benefit a team or organization by documenting and organizing reusable code and reuse implementations. It further explores the requirements of reusability and demonstrates through successive SAS examples how flexibility, modularity, readability, testability, and stability can individually and collectively improve software reusability. Moreover, where code modules are centrally located but universally implemented, the chapter demonstrates how software maintainability is benefited when a single version of software can be maintained while implemented across multiple software products.

REUSE

Software reuse occurs constantly and oftentimes unconsciously in software development. Whole products and snippets of code, as well as knowledge of their underlying concepts, design, techniques, and logic, are regularly incorporated into software as it's developed. When external sources of code such as Base SAS documentation, white papers, or other technical sources are all considered, it becomes virtually impossible to develop software without some reuse component. While code reuse alone is often touted as an objective of software development, the net value of code reuse will depend on the extent to which code can be efficiently and effectively incorporated.

Software reuse in fact can be a painful, inefficient, and ungratifying experience when the quality of software or its documentation is lacking, or when reuse is implemented without planning. By incorporating reusability principles into software, developers incur a greater up-front cost due to increased focus on dimensions of quality, but can be rewarded with painless future reuse of software modules. For example, by encouraging software testability during development, SAS practitioners can more effortlessly and effectively reuse code because it will have been more thoroughly tested. Reusability principles often benefit not only the software for which the code was originally written, but also future software that can reuse the modules. To be clear, however, a significant portion of software is functionally specific and not adaptable for software reuse.

Software use is an inherent prerequisite of software reuse. Thus, before software can be reused, it should have already passed through the software development life cycle (SDLC) design, development, testing, and validation phases. When software is reused, it can bypass much of the SDLC, making second and subsequent implementations more efficient than the original. A second prerequisite of software reuse is an awareness of the initial use, which can include an understanding of software objectives, strengths, weaknesses, vulnerabilities, demonstrated quality, testing results, and performance metrics. Armed with this information, developers can intelligently compare the benefits and risks of software reuse with the costs of new design and development.

From the User Perspective

From the user's perspective, software reuse describes the adoption or purchase of whole software products, such as open-source, commercial-off-the-shelf (COTS), or government-off-the-shelf (GOTS) software. Some software products are domain-specific, and users understand their limited scope and use. For example, a customer buys architectural software so he can remodel his kitchen but, once the remodel is complete, may believe he has no future use for the software. However, when he whimsically decides to dig a 10,000-gallon pond, he realizes that the software also has landscape design functionality and can be reused. Thus, even in niche markets, the extent to which software can flexibly meet diverse needs of users will increase its reuse potential.

With the exception of open-source products that end-users intend to modify, most users purchase software “as is” with the understanding that its function is controlled solely by software developers. Periodic software maintenance—released to users through patches or updates—can further facilitate software reuse by extending the software lifespan. Moreover, software reuse is encouraged where regular upgrades expand software functionality, responding to shifting customer needs and competing for market share against other new and expanding software products.

From the Developer Perspective

From the developer's perspective, software reuse describes the incorporation of existing modules of code into software as it's being developed. Rather than accepting or rejecting software as a whole, developers are able to realize and benefit from its composite nature, retaining and reusing useful elements and discarding the chaff. The extent to which existing software can be incorporated into later software will be determined by its functionality in addition to its flexibility, generalizability, reliability, and other performance attributes. Furthermore, software documentation, including demonstration of thorough and successful software testing, can instill confidence in developers that software will be of sufficient quality for its intended reuse.

The user and developer perspectives of reuse differ substantially. Users view architectural software products in terms of their ultimate functionality—the ability to create architectural designs and, in the case of some products, to additionally design superfluous water features. Developers instead view software functionality as a set of discrete components. For example, the architectural software may include a list of the nine most recently used floorplan files and display these in a drop-down menu. A developer designing unrelated music editing software might be able to reuse this recent file functionality to recreate an identical menu item in the music software. Thus, while end users might perceive no similarities between the overall functionality of these wildly divergent software applications, a developer can view disparate software products and functionally decompose them into the component level to extract functionality and code that can be reused in dissimilar software products. Functional decomposition is further discussed in the “Functional Decomposition” section in chapter 14, “Modularity.”

Reuse of External Code

Code can be acquired from external sources, such as textbooks, technical white papers, software documentation, blogs, or user forums. If you're lucky, the developers had reuse in mind when they designed and developed the software and instilled reusability principles. Perhaps they included documentation that outlines software functionality as well as vulnerabilities, enabling subsequent developers to better understand the intent and quality of the software and thus make a more informed decision about whether and how to reuse it. Software modularity also supports reuse because subsequent developers are able to pick and choose functionally discrete components without having to separate them like conjoined twins.

Knowledge of reusability principles benefits SAS practitioners as they incorporate external code because they can more accurately assess the level of effort required to modify, integrate, test, and implement the code. Imagine the experienced general contractor that you've contacted to give you an estimate for a bathroom remodel. With ease, he should be able to establish which components can remain, which can be modified, and which must be replaced. It might be possible to keep the antique claw-footed tub in place and remodel around it, but in other instances, this could prove to be so awkward and inefficient that it would be less expensive to jackhammer it, remove the pieces, and install a replacement.

Thus, while developers can't improve the quality or reusability of software that has already been developed externally, their knowledge of reusability principles can benefit code incorporation and reuse. As skilled artisans, savvy SAS practitioners should be able to review external code and more accurately decide whether and how to best reuse software. Moreover, to those who have suffered through the incorporation of software that lacked reusability principles, the distinction and benefits of reusable software will be evident and valued.

Reuse of Internal Code

In each of the previous reuse examples, software developers evaluated and reused code but were powerless to influence its reusability because the software had already been produced by some third party. However, as SAS practitioners develop new software, incorporation of reusability principles encourages reuse and can benefit not only current but also future software products.

The primary benefits of reuse of internal code are the same as external code—a reduction in design, development, and testing effort, since the code has already passed through the SDLC. But one of the primary benefits of internal code reuse is the ability to plan and anticipate the reuse. When a SAS practitioner searches online for a solution to a complex REPORT procedure that he is trying to create but doesn't want to engineer from scratch, he might find the solution spread across multiple white papers. He ingests the knowledge and techniques and, in an ad hoc fashion, formulates and empirically tests the software until the requirements are met.

This ad hoc reuse method contrasts sharply with intentional code reuse, in which anticipated future use of code may be planned even before the initial development or use has occurred. For example, in a planning and design meeting, a team might discuss the development of a specific data cleaning algorithm for use in an extract-transform-load (ETL) infrastructure. One SAS practitioner might realize that he could also utilize the cleaning module in a separate SAS analytic program being developed, and thus request that the module be developed to be as dynamic as possible. While this extra dynamism might not directly benefit the ETL software for which it was originally intended, it could dramatically increase the efficiency with which the later analytic software is developed.

In other cases, software modules are developed and reused within the same software, thus enabling reuse to immediately benefit the original software product for which the module was developed. This occurs most commonly with low-level functionality, such as basic I/O functions or basic data operations. For example, if during software planning it's decided that SAS software will need to be robust to missing or locked data sets, a reusable macro could be developed to confirm data set existence and availability. Especially by functionally decomposing software objectives into discrete, modular chunks, SAS practitioners will be better poised to identify software functionality that can benefit from reuse (from existing modules) as well as to implement reusability principles within software to support eventual reuse.

Reuse Artifacts

While this text focuses on software quality rather than the quality of the software development environment or specific software development methodologies, some development artifacts and best practices are so beneficial that they beg inclusion. Software reuse libraries and reuse catalogs provide a proven method to organize, discover, and archive software modules within an organization, facilitating security, reusability, and overall maintainability of code. For example, without some knowledge management method to track in which software products and in what ways modules of code have been reused, it becomes impossible to maintain the intent and integrity of code modules as their function may morph or expand over time.

Reuse Library

A reuse library is “a classified collection of assets that allows searching, browsing, and extracting.”2 An asset can represent software itself, accompanying documentation or, more abstractly, even software frameworks or templates. The library is not classified in the national security sense (requiring a polygraph to peruse it), but rather in the sense that it is structured and organized—think Dewey Decimal, not Treadstone or Blackbriar. Thus, a reuse library will include instances of modular, readable, tested, stable software that are sufficiently organized and documented to ensure rapid discovery and comprehension. Additional accompanying documentation such as test plans, test cases, or test results may also be included.

Myriad knowledge management software applications exist which organize software in reuse libraries. They range from exceedingly complex to mid-range solutions that are more generic, such as SharePoint, to well-organized folder structures saved on a server. In a SAS environment, a reuse library could be as simple as a segregated portion of the directory structure that ensures sufficient security and stability of its contents, so long as it yields search and retrieval capabilities. If SAS practitioners don't have at least read-only access permissions to the reuse library or can't locate software efficiently within it, the utility of the library is eliminated.

In some cases, production code could be run directly from a reuse library, but in a development environment that espouses software modularity, most modules of code won't be intended to be run in isolation, but rather called by parent processes. Thus, you wouldn't expect to find an ETL system codified within the reuse library because it would have been developed for a specific purpose. However, that ETL software might utilize multiple SAS macros that are contained within and called from the reuse library. For example, a macro that converts a comma-delimited text file into a SAS format would have value outside of the ETL software, so it would ideally have been dynamically created sufficiently to be inducted into the reuse library. Once the macro module would have been completed, tested, stabilized, and saved as a distinct SAS program file, it could have been included in the reuse library so it could benefit future software.

The following code, reprised from the “SAS Formatting” section in chapter 17, “Stability,” creates SAS character formats from comma-delimited text files:

*<DESC> creates a SAS format that categorizes or bins character data;
* must include at least two comma-delimited items;
* the first of which is the new format and the second (or additional);
* which will be converted to the first value;
%macro ingest_format(fil= /* logical location of format file */,
   form= /* SAS character format being created */);
filename zzzzzzzz "&fil";
data ingest_format (keep=FMTNAME START LABEL);
   length FMTNAME $32 START $50 LABEL $50;
   fmtname="&form";
   f=fopen("zzzzzzzz");
   rc1=fsep(f,",");
   do while(fread(f)=0);
      rc3=fget(f,label);
      label=strip(label);
      do while(fget(f,start)=0);
         start=strip(start);
         output;
         end;
      end;
run;
proc format library=work cntlin=ingest_format;
run;
%mend;

The extent to which duplication of development can be avoided is paramount because it reduces developer effort. First, it reduces the initial development process because code only needs to be written and tested once, and second, it substantially improves maintainability of code because only one copy of code needs to be modified, retested, and reintegrated into software when maintenance is required. In an end-user development environment that is not implementing separate knowledge management software to organize a formal reuse library, teams can facilitate code organization and search capabilities by including code descriptions in module headers. Where software header formats are standardized and parsed by automated processes, software reusability will be substantially improved.

For example, if comments are prefaced with *<DESC> or other symbolism, rather than using only a nondescript single asterisk, a code parser can quickly ingest and parse all software within a reuse library and automatically create sufficient documentation to allow rapid search, discovery, and comprehension of all modules. In this way, developers are able to differentiate comments that should be included in the reuse library to facilitate search and retrieval from those that should be viewed only from within the software itself. This type of automated comment parsing is demonstrated in the “Automated Comment Extraction” section in chapter 15, “Readability.”

A reuse library can include not only code but also templates such as common software requirements that are regularly implemented within code. For example, each time a data set is referenced in Base SAS, multiple assumptions must be met, such as the existence of the data set, accessibility of the data set, and valid structure of the data set. Threats exist when these assumptions fail to be met; a list of common threats to the DATA step is enumerated in the “Specific Threats” section in chapter 6, “Robustness.” By identifying and preserving these threats as threat templates within a reuse library, SAS practitioners can reuse these templates to identify threats in future software products.

Less common residents of reuse libraries include test data sets used in test plans to demonstrate that software meets functional and performance requirements. By formalizing common test data sets, test plans can reference these test data sets, thus ensuring that they adequately cover necessary use cases for data, and that developers do not waste time recreating test data sets. Moreover, if vulnerabilities are later discovered in test data sets, such as the identification of software use cases that were not adequately represented, all software that relied on those data sets for testing can be retested to ensure they still meet test requirements when the additional test data are utilized.

Reuse Catalog

A reuse catalog is “a set of descriptions of assets with a reference or pointer to where the assets are actually.”3 When the asset (typically a software module) is stored in a reuse library, the reuse catalog represents a brief description of the intent and content of the module and enumerates every instance in which the module is implemented within software. Reuse catalogs often contain basic information, including:

  • Code module name
  • Description
  • Requirements (if module is coupled with other modules or data)
  • Inputs (if coupled to parameters or data sets)
  • Vulnerabilities
  • Authors
  • File location
  • Creation date
  • Modification date
  • Test date
  • Program file checksum
  • List of all programs or modules that utilize the module

Software creation dates, modification dates, and test dates lend credibility to code modules by demonstrating adherence to the SDLC and testing methods. Combined with a description of the software and known vulnerabilities (if included), this information enables SAS practitioners to search and investigate existing code modules quickly to determine not only their relevance but also their relative quality and risk. The optional checksum utilizes a hash function to create a cryptographic representation of the SAS program file. In addition to inspecting modification and test dates, SAS practitioners can further compare checksum values between the software reuse catalog and other copies of the code that may exist, thus further validating software integrity. This quality control method is discussed in the “Checksums” section in chapter 11, “Security.”

One of the most important aspects of the reuse catalog is the comprehensive listing of all SAS software programs that implement each code module. This listing not only provides a metric by which primary software reuse can be calculated within a team or organization, but also ensures that when a module must be modified, all known uses are identified. Without this record, SAS practitioners might modify a core software module for one purpose and program, yet disrupt that module's prior function in unrelated software products. Thus, by tracking all uses (and reuses) of shared and centrally managed software modules, SAS practitioners can ensure that modifications maintain backward compatibility. And again, with thorough and standardized commenting within software modules, reuse libraries and catalogs can (and should) be generated automatically through programs that ingest and parse SAS program files.

REUSABILITY

Software reuse can make the software development process more efficient because redundant software design, development, and testing are eliminated. Reuse, however, too often occurs in an unplanned manner without central management. For example, chunks of software might be internally plundered from other existent software but, rather than maintaining a single copy of the software within a central repository, no record is maintained to link the two uses of the software. If improvements or other modifications are made to one module, they won't benefit its twin, and divergence creeps into the environment, which thwarts maintainability.

Because software reuse involves and can benefit multiple software products, reuse is more commonly implemented at the team or organizational levels. Thus, reuse artifacts such as the reuse library or reuse catalog may be adopted as a best practice within an organization to ensure that SAS practitioners both register reusable code modules and investigate the reuse library (to see if any existing modules would be beneficial) before beginning to develop new software.

Reusability principles further ensure that, to the extent possible, software modules can be efficiently reused within either the original program or future software. By infusing these principles into software design and development, and in coordination with reuse artifacts, SAS practitioners can author software that will be more likely to be reused without software modification. These principles incorporate other static performance attributes and are discussed in the following sections.

Flexibility

Reusable code must be flexible—rigid, hardcoded SAS software is more difficult if not impossible to reuse. Within Base SAS, the SAS macro language most commonly facilitates flexibility by enabling macro code that dynamically writes Base SAS code. Code that writes code for you—that sounds like an idea that those lazy, lava-fearing tourists could get behind!

Consider the scenario in which a developer needs to convert all character variables in a specific data set to uppercase. The Sample data set is created in the following code, which includes character variables Char1 and Char2, after which a hardcoded DATA step manually converts the variables to uppercase:

data sample;
   length char1 $20 char2 $20 num1 8 num2 8;
   char1="I love SAS";
   char2="SAS loves me";
run;
data uppersample;
   set sample;
   char1=upcase(char1);
   char2=upcase(char2);
run;

The code is straightforward, sufficient and, at least on this scale, efficient to write. But, as the number of variables increases, developers might want to implement a more scalable solution that dynamically converts all variables to uppercase. A dynamic solution, moreover, would be valuable if the variable names were subject to change. And, central to reusability, a dynamic solution would enable future software products to utilize the module irrespective of the number or names of variables or the name of the data set.

The %GOBIG macro dynamically determines all variable types and converts character variables to uppercase.

%macro gobig(dsnin= /* old data set in LIB.DSN or DSN format */,
   dsnout= /* updated data set in LIB.DSN or DSN format */);
%local dsid;
%local vars;
%local vartype;
%local varlist;
%local i;
%let varlist=;
%let dsid=%sysfunc(open(&dsnin,i));
%let vars=%sysfunc(attrn(&dsid, nvars));
%do i=1 %to &vars;
   %let vartype=%sysfunc(vartype(&dsid,&i));
   %if &vartype=C %then %let varlist=&varlist    %sysfunc(varname(&dsid,&i));
   %end;
%let close=%sysfunc(close(&dsid));
%put VARLIST: &varlist;
data &dsnout;
   set &dsnin;
   %let i=1;
   %do %while(%length(%scan(&varlist,&i,,S))>1);
      %scan(&varlist,&i,,S)=upcase(%scan(&varlist,&i,,S));
      %let i=%eval(&i+1);
      %end;
run;
%mend;
%gobig(dsnin=sample, dsnout=uppersample);

The %GOBIG macro can be run on any data set, irrespective of the quantity and names of variables. This represents a much more flexible solution, but it still can be improved by infusing additional reusability principles, as demonstrated in the following sections.

Modularity

The smaller the module, the more likely it is to be reused. One of the central principles of modular software design is discrete functionality: the objective that a module should do one and only one thing. This principle indirectly limits the size of the modules, because one thing can only occupy so much space. Moreover, discrete functionality maximizes module generalizability because a smaller component can be flexibly placed into more software.

The %GOBIG macro is somewhat modular because it can be removed from SAS code, saved to a SAS program file, and included via the SAS Autocall Macro Facility or an %INCLUDE statement. However, its functionally is not discrete because %GOBIG opens an I/O stream to the data set, creates a list of character variables, closes the stream, and finally performs the DATA step. This functionality could be further decomposed into a separate module that only creates a list of all character variables within a data set. This module could subsequently be used to populate the variables in a data set DROP or KEEP statement dynamically.

To improve modularity by making %GOBIG more functionally discrete, the following code uncouples the function that identifies all character variables from the function that converts them to uppercase. Moreover, flexibility can also be improved by enabling &VARLIST to be created with all numeric variables, all character variables, or all variables of any type:

* creates a space-delimited macro variable VARLIST in data set DSN;
%macro findvars(dsn= /* data set in LIB.DSN or DSN format */,
   type=ALL /* ALL, CHAR, or NUM to retrieve those variables */);
%local dsid;
%local vars;
%local vartype;
%global varlist;
%let varlist=;
%local i;
%let dsid=%sysfunc(open(&dsn,i));
%let vars=%sysfunc(attrn(&dsid, nvars));
%do i=1 %to &vars;
   %let vartype=%sysfunc(vartype(&dsid,&i));
   %if %upcase(&type)=ALL or (&vartype=N and %upcase(&type)=NUM) or
         (&vartype=C and %upcase(&type)=CHAR) %then %do;
      %let varlist=&varlist %sysfunc(varname(&dsid,&i));
      %end;
   %end;
%let close=%sysfunc(close(&dsid));
%mend;
* dynamically changes all character variables in a data set to upper case;
%macro gobig(dsnin= /* old data set in LIB.DSN or DSN format */,
   dsnout= /* updated data set in LIB.DSN or DSN format */);
%local i;
%findvars(dsn=&dsnin, type=CHAR);
data &dsnout;
   set &dsnin;
   %let i=1;
   %do %while(%length(%scan(&varlist,&i,,S))>1);
      %scan(&varlist,&i,,S)=upcase(%scan(&varlist,&i,,S));
      %let i=%eval(&i+1);
      %end;
run;
%mend;
%gobig(dsnin=sample, dsnout=uppersample);

The %GOBIG macro invocation presumably occurs within some larger body of code—a parent process that calls %GOBIG as its child. Thus, an added benefit of the increased modularity is backward compatibility, since the macro invocation is identical to that in the previous “Flexibility” section. The functionality of the %FINDVARS macro has also been improved and now reads the TYPE parameter to determine whether all character, all numeric, or all variables of any type will be saved to the &VARLIST macro variable.

The second principle of modular design specifies that software modules should be loosely coupled. The %GOBIG macro, however, includes a DATA step which restricts the ways in which the module can be implemented. For example, consider that the initial objective of the software was not only to convert character variables to uppercase but also to perform other functions within a DATA step. The following DATA step simulates additional functionality by arbitrarily assigning NUM1 the value of 99:

data uppersample;
   set sample;
   char1=upcase(char1);
   char2=upcase(char2);
   num1=99;
run;

Because the current %GOBIG module includes a DATA step, the num1=99 statement must be included in a separate DATA step, which inefficiently requires a gratuitous DATA step given the current macro configuration. In other words, any macro that includes a DATA step cannot be run from within another DATA step. The following code demonstrates the lack of loose coupling and the gratuitous DATA step:

%gobig(dsnin=sample, dsnout=uppersample);
data uppersample;
   set uppersample;
   num=99;
run;

A more efficient and less coupled solution would enable %GOBIG to function anywhere, removing its dependency on the DATA step. This is demonstrated in the following %GOBIG macro, which now dynamically creates UPCASE statements rather than performing a DATA step.

* dynamically changes all character variables in a data set to upper case;
%macro gobig(dsn= /* old data set in LIB.DSN or DSN format */);
%local i;
%findvars(dsn=&dsn, type=CHAR);
%let i=1;
%do %while(%length(%scan(&varlist,&i,,S))>1);
   %scan(&varlist,&i,,S)=upcase(%scan(&varlist,&i,,S));;
   %let i=%eval(&i+1);
   %end;
%mend;
data sample;
   length char1 $20 char2 $20 num1 8 num2 8;
   char1="I love SAS";
   char2="SAS loves me";
run;
data uppersample;
   set sample;
   %gobig(dsn=sample);
   num1=99;
run;

This revised code illustrates a much more flexible solution that supports reusability principles. Because of their versatility, both the %GOBIG and %FINDVARS macros could be saved as SAS program files and included within reuse libraries for subsequent use in software products. In a production environment, exception handling would likely be included to support more robust execution, as demonstrated in the “Exception Handling Framework” section in chapter 6, “Robustness.”

Readability

Software readability improves reusability because software that is easily understood is more likely to be implemented. Where SAS practitioners are confused about the intent, logic, or vulnerabilities of syntax, they will be less likely to reuse it and more likely to find or develop another solution. As code is made more modular, software inherently becomes smaller and more comprehensible. However, as modules are updated, simulated by the successive modifications made to the %GOBIG macro in the “Modularity” section, software comments should also be updated. Note that the last two instances of the %GOBIG macro contained identical headers, yet the first instance performed a DATA step while the second only generated dynamic code—two entirely different outcomes unfortunately having identical comments:

* dynamically changes all character variables in a data set to upper case;

Readability should be improved with additional comments that clarify the way in which the two UPCASE functions are applied: uncoupled or coupled to a DATA step. Furthermore, as development and testing reveal vulnerabilities in software modules, these should be articulated in the code as well so that those seeking to reuse it can gain insight into its anticipated quality. Readability can also be improved through automated parsers that ingest standardized comments (such as program headers) within software, as described earlier in the “Reuse Library” section and as demonstrated in the “Automated Comment Extraction” section in chapter 15, “Readability.” When in doubt, it is always better to include no software comments than to include outmoded or incorrect comments that decrease software readability by confusing stakeholders.

Testability

Software testing is intended to demonstrate that software meets established customer needs and requirements. Moreover, it can expose defects and vulnerabilities, allowing developers to decide whether to eliminate or mitigate risk of software failure. Software that is thoroughly tested and that has been documented through a formalized test plan is more reusable because SAS practitioners who seek to use the software will understand the degree of quality to which it was written as well as threats to its functionality or performance.

This is not to say that all or any vulnerabilities must be resolved, only that their identification furthers software reuse. For example, within the %FINDVARS macro in the “Modularity” section, the OPEN statement will fail if the data set is exclusively locked by another user or SAS session. To eliminate the risk of failure due to a locked data set, the lock status would need to be tested, passed through a return code to the %GOBIG macro, and in turn passed to the parent DATA step, thus requiring an extensive exception handling framework to detect and eliminate this single threat.

If the original requirements that led developers to build the %FINDVARS module greatly prioritized function over performance, thus demanding neither reliability nor robustness, this additional exception handling would not be warranted because it would provide no added business value. For example, if the code were intended to be run by end-user developers who could quickly remedy a common failure (such as a locked data set), then this increased robustness would be wasted. Regardless of the degree of performance required by software, however, positive and negative test cases could still be created to validate module functionality.

For example, because the %FINDVARS macro is now functionally discrete, a unit test can validate the TYPE parameter when set to CHAR. The following positive unit test validates partial functionality of %FINDVARS:

* TEST FINDVARS: test CHAR=TYPE one variable;
data test;
   length char1 $10;
   char1='taco';
run;
%findvars(dsn=test, type=char);
%put FINDVARS: char1 --> &varlist;

The output demonstrates that the variable CHAR1 was discovered as expected and, even if the macro were later refactored to improve performance or repurposed to provide additional functionality, the same test case could be reused to show consistent, correct functionality. In more robust macros that include exception handling—for example, quality controls that validate macro parameters—negative unit tests should also be incorporated. A negative test might invoke the TYPE parameter with the invalid value “other” to evoke a return code demonstrating the failure. Unit and other tests are discussed throughout chapter 16, “Testability.”

Stability

I once had a team of developers waiting for a SAS module I was completing so that it could be included in their respective programs. We were collectively developing a comprehensive ETL infrastructure to process vast amounts of data, and my module was a self-contained quality control that would facilitate higher data quality. Because of the diversity of data being processed, the module was being built to be used by multiple programs. With software reuse imminent, I was aware of and focused on reusability principles, including software testing and testability.

I finished the code and did some preliminary testing, but in the interest of expediting the delivery of the code to coworkers, I checked it into our central repository before all testing was complete. Thereafter, I continued to test the module and of course was forced to make a substantive change—the generation of an additional return code. When I replaced the preliminary code with the thoroughly tested code, my team wasn't pleased because my subtle modification spurred subsequent modifications that each developer had to make in his respective software.

Code stability is essential where software reuse is likely or intended. Whenever possible, modifications to code that has been reused should ensure backward compatibility so that existing parent processes that rely on a software module will not have to be modified when a child process requires maintenance. Without code stability, modifications will undermine software reuse and reusability in software—SAS practitioners will be leery of reusing centralized code modules for justifiable fear of their instability. Moreover, if reuse is not supported within an environment, reusability principles may add little value to software.

Stability is increased when formal SDLC procedures are in place. In my premature release of code, all software testing was ad hoc, as no formalized test plan or test cases existed. Had a formalized test plan been established, accepted by stakeholders during software planning, and included in requirements specifications, I would have had a concrete enumeration of boxes to check before even considering peer review or release of the code into our central repository. Without these quality controls in place, however, it's much easier for even well-meaning SAS practitioners to circumvent best practices and hastily deliver a solution that may lack sufficient quality and require later (or even imminent) modification that thwarts stability.

FROM REUSABILITY TO EXTENSIBILITY

Software reuse is often epitomized as whole software modules that can be plucked effortlessly from a reuse library or other repository and dropped into (or referenced by) software with no modification to the code. In fact, where the %INCLUDE statement or SAS Autocall Macro Facility are utilized to reference SAS program files or macros, this reuse does only require a pointer to the software module and demonstrates nearly effortless reuse. This archetypal reuse also provides security because the module cannot be accidentally modified when it's maintained centrally in a secure location, thus the integrity provided by testing is conferred upon the reuse in later software.

In other cases, however, reuse depicts pillaging and plundering, as existing code is picked apart and only partially absorbed into new software. While this still constitutes code reuse, it more commonly is distinguished as software repurposing where functionality is modified, advanced, or extended. For example, consider an initial %FINDVARS macro intended solely to retrieve the list of variables in a data set:

%macro findvars(dsn= /* data set in LIB.DSN or DSN format */);
%local dsid;
%local vars;
%local vartype;
%global varlist;
%let varlist=;
%local i;
%let dsid=%sysfunc(open(&dsn,i));
%let vars=%sysfunc(attrn(&dsid, nvars));
%do i=1 %to &vars;
   %let vartype=%sysfunc(vartype(&dsid,&i));
   %let varlist=&varlist %sysfunc(varname(&dsid,&i));
   %end;
%let close=%sysfunc(close(&dsid));
%mend;

When executed, the %FINDVARS macro displays all variables in a parameterized data set.

data mydata;
   length char1 $10 char2 $10 num1 8;
run;
%findvars(dsn=mydata);
%put VARLIST: &varlist;
VARLIST: char1 char2 num1

However, if a subsequent software product requires similar functionality but also needs to differentiate between character and numeric variable types, a solution could be developed that repurposes the module while allowing backward compatibility to existing uses. The repurposed %FINDVARS module is demonstrated in the “Modularity” section earlier in the chapter. Because the default value for the TYPE parameter is ALL, the updated module can be invoked with or without the TYPE parameter. Repurposing software while ensuring backward compatibility facilitates maintainability, because only one version of the software needs to be maintained.

The ability to repurpose software for added functionality describes software extensibility, a component of software reuse often described as a separate performance characteristic. Because of the modular, structured, dynamic nature of the original %FINDVARS macro in this section, it was easily transformed with a couple of changes to deliver the additional functionality. In the next sections, extensibility is described further, including methods that can facilitate divergent reuse of software.

Defining Extensibility

Extensibility is defined by the ISO as a synonym of extendibility, “the ease with which a system or component can be modified to increase its storage or functional capacity.”4 Only the second use of extensibility is discussed, demonstrating the repurposing of software to extend functionality. Thus, extensibility builds upon an existing code base to meet new or shifting needs and requirements. In some cases, additional functionality can be incorporated into existing software when backward compatibility can be achieved. In other cases, extensibility represents not only reuse of some existent code but also a further divergence from that code to produce additional functionality.

Extensibility can be viewed as software dynamism with an eye to the future. Code reusability principles aim to make software flexible so it can deliver the same functionality in multiple situations or environments and, relative to data analytic environments, to diverse and possibly unpredictable data injects. Code extension, rather, delivers new or variable functionality to meet new objectives or functional requirements. Thus, extensibility principles aim to create software nimble enough to be repurposed through minimal or sometimes no change to the existing code base.

Facilitating Extensibility

Extensibility doesn't require you to predict how software will be repurposed in the future, but imagination does benefit this endeavor. Since extensibility is often difficult to pin down, an analogy captures the principle motivation. A friend of mine built a three-story townhome a couple of years ago, and it was exciting to watch him work through the design process as each week he made choices about flooring, lighting, cabinetry, and the general design of the house. In the basement, the builders gave him the choice of installing an optional bathroom, an optional kitchenette, an optional master suite, or a default “recreation room” that was an undefined rectangular space.

He chose the rec room because it was the most extensible option. Despite its simplicity, the rec room had been designed so that homeowners could easily customize or upgrade the room in the future. In one closet, supply and drainage plumbing had been run where a toilet and shower could eventually be installed. On a separate wall, 30-amp electric service had been installed, just in case a washer/dryer unit was relocated from the first floor to the basement. And, in one corner, gas, plumbing, and electrical connections could someday service a kitchenette.

In fact, without substantial construction, a bathroom, bedroom, and kitchen could be added so that the entire basement could be inhabited or rented as a separate dwelling. The builders had designed and built with extensibility in mind, employing a little extra cost and labor up front so that the homeowners could more easily modify and repurpose the space in the future. Rather than having to tear up floors to run PVC or walls to run gas and electric lines, tremendous future functionality—including functionality imagined yet not selected or prescribed—could be added with minimal effort.

In this scenario, the homebuilders had an advantage because they not only knew what common upgrades homeowners were likely to request but also had already drawn up architectural designs for several options. In software development, SAS practitioners might not know the specific scope or extent of all future functionality or performance of software, but typically have the experience to be able to anticipate ways in which software might be repurposed or extended in the future.

For example, if a rudimentary ETL process is designed but does not include a quality control module to clean ingested data, this represents a likely candidate for future functional extension if the process becomes more critical over time. Similarly, if a rudimentary child process doesn't initially require robustness, the creation of a placeholder return code—one that initializes but doesn't assign or validate the global macro variable—could facilitate future performance extension. In either case, the subtle hint of possible future functionality or performance provides some basic substrate upon which later improvements can be constructed.

WHAT'S NEXT?

Identification, valuation, and implementation of software quality.

Quality Identification. Multiple dimensions of software quality, representing both dynamic and static performance characteristics, have been described and demonstrated. An awareness and understanding of these respective dimensions is the critical first step toward advancing software quality; the next step is identification. Understanding the software quality landscape and nomenclature facilitates the identification and differentiation of software performance, providing a structured quality model—a lens through which to assess and measure software quality characteristics. With this knowledge—including critical yet complex pairings such as reliability and robustness, speed and efficiency, and modular and monolithic design—stakeholders can identify the degree to which software performance characteristics have been included or excluded, not only in software but also in software requirements.

Quality Valuation. The value of software quality is never assessed in a vacuum, but rather against competing priorities of project scope (including functionality), schedule, and cost. Especially in data analytic development environments, software quality must also be weighed against data quality and data product quality. Whereas the identification of software performance leans toward objectivity, asking the question Is this software reliable?, the valuation of software performance asks the more subjective (and substantive) question: How reliable should this software be made? Valuation forces stakeholders to acknowledge the benefits of software performance inclusion and the risks of software performance exclusion, and to prescribe realistic technical requirements that specify what functional and performance objectives software should achieve.

Quality Implementation. You're aware of software quality. You can identify dimensions of software quality. You've prioritized software quality—functionality and performance—into software requirements. Implementation of quality is the final step—that is, software development that achieves and measurably demonstrates functional and performance objectives. While the focus of this text has been on demonstrating a software product quality model to identify and advance these objectives, brief forays have introduced software development best practices (such as Agile development methodologies, risk management, and the SDLC) and software development artifacts (such as risk registers, failure logs, reuse libraries, reuse catalogs, test plans, and test data) that can further facilitate achieving software quality objectives. Software quality can and should be allowed to flourish in any environment.

NOTES

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset