Chapter 9. Test Execution

Executing the test plan.

image

At this stage, the test team has addressed test design (Chapter 7) and test development (Chapter 8). Test procedures are now ready to be executed so as to exercise the application-under-test (AUT). Also, as discussed in Chapter 8, test environment setup planning and implementation was addressed consistent with the test requirements and guidelines provided within the test plan.

With the test plan in hand and the test environment now operational, it is time to execute the tests defined for the test program. When carrying out these test procedures, the test team will need to comply with a test procedure execution schedule, as discussed in Chapter 8. The test procedure execution schedule implements the strategy defined within the test plan. Plans for unit, integration, system, and user acceptance testing are executed. Together, these testing phases make up the steps that test the system as a whole.

After test execution ends, the test results need to be evaluated. Section 9.1 covers test outcome evaluation procedures and their documentation. These procedures describe the steps that should be completed after a test has been executed. For example, the actual result (outcome) of execution will be compared to the expected result. Commonly, during any test phase, many discrepancies between expected and actual results are identified. All discrepancies, however, do not necessarily indicate a problem with the AUT. Instead, a problem could relate to the test data or other items unrelated to the AUT. Section 9.1 therefore, outlines the things to look for so as to avoid these false negatives.

Once the test team has determined that a discrepancy between the expected result and actual result derives from a problem with the AUT, a defect or software problem report (SPR) is generated. Ideally, the SPR is documented and tracked within a defect tracking tool. Section 9.2 addresses defect tracking and also describes the characteristics to look for within a defect tracking tool and how the defect tracking tool can be used.

After a test procedure has been executed, the test team must undertake additional administrative activity. For example, test procedure execution status needs to be documented and maintained. The test team needs to be able to identify the status of test progress. Status reporting includes the ability to determine whether any particular test procedure has been executed, and if so, whether the test passed or failed. Section 9.3 covers test program status reporting.

To be able to assess the quality of test progress, the test team needs to collect and analyze various measurements. The ratio of the number of defects identified to the number of test procedures executed provides the test team with a measurement of the effectiveness of the test effort. In some cases when a high defect rate is observed, adjustments to the test schedule or test plan may become necessary.

As discussed in Section 9.3, the collection and evaluation of metrics is an activity associated with test program execution and status reporting. The metrics that receive extra attention during the test execution phase include test acceptance criteria, earned value/progress measurements, quality of application, and quality of the test life-cycle process. In addition to tracking test execution status, the test team must clarify when test execution should begin and when test execution has been completed. Test execution generally concludes when the test team verifies that the defined acceptance criteria have been satisfied, as outlined within the test plan.

Implementation of the ATLM constitutes the adoption of a set of best practices. Appendix E summarizes recommendations and suggestions that constitute a set of best practices for the development and execution of automated test procedures. These best practices are aimed at helping the test team avoid the kind of test program missteps that consume test engineer time and increase test program effort.

9.1 Executing and Evaluating Test Phases

At this stage, the test team is ready to execute and evaluate the test procedures, as defined in the test plan for each of the various test phases. It therefore implements and analyzes the results of integration, system, and user acceptance testing. The primary input for each test phase is the associated suite of test procedures. The output of each test phase consists of the achieved or modified acceptance criteria, as defined in the test plan. Software problem reports are documented throughout each test phase, fixes are implemented and documented, and automated test scripts are baselined in the integration test phase and later reused in the system test phase. Figure 9.1 depicts a detailed test execution flow that incorporates the tracking of software problem reports.

Figure 9.1. Test Execution Flow

image

The level of formality required for each test phase depends on the particular organization of the test team or the specifications detailed in customer or end-user requirements. The test team needs to clearly identify all required test program documentation within the test plan and then ensure that the documentation is produced according to schedule. Test program documentation may need to comply with specific industry or regulatory standards. These standards, when imposed, specify the level of detail for which the documentation must be produced.

9.1.1 Unit Test Execution and Evaluation

Unit tests should be performed in accordance with the test plan and should remain consistent with the detailed development schedule, as discussed in Chapter 8. Test procedures should consist of input and expected results to facilitate an automated results checkout process. At the white-box testing level, test procedures focus on the smallest collection of code that can be usefully tested. Because unit testing requires a detailed understanding of the code, it is generally more efficient to have application developers execute the unit tests than to have independent test engineers perform these tests. During unit testing, test engineers formally assigned to perform system-level testing can help by documenting and witnessing unit tests. This involvement allows developers to focus their attention on developing the utilities or the respective tools needed to automate and execute the unit tests.

Where possible, an individual other than the developer responsible for a unit of code should execute tests on the particular unit of code. Such a person will perform testing more objectively because the code will be new to him or her and because developers (and people in general) are often blind to their own mistakes. It is also best to execute unit tests shortly after the code has been created and before any code integration has occurred. This approach will allow software defects to be found early and will help prevent such errors from being carried forward.

Also during unit testing, static analysis can be performed. For example, when the development team creates applications in C++, tools such as PC Lint can perform static analysis of the code to look for anything suspect. For more information on the various tools described in this section, refer to Appendix B.

During the unit testing phase, code profiling can be performed. Many profiling tools are available, including a tool called Visual Quantify that carries out runtime profiling. Traditionally, profiling is a tuning process that determines, for example, whether an algorithm is inefficient or a function is being called too frequently. Profiling can discover improper scaling of algorithms, instantiations, and resource utilization. For instance, code profiling activities may identify the need to change from a bubble sort to a quick sort. In other cases, it can highlight a slow search routine in a third-party library or redundant API calls that can cause performance delays. These problems may remain invisible during initial implementation but can cause catastrophic disaster during production.

Most code development test tools come with their own debuggers. Yet, these debuggers offer little help during unit testing when the individual does not have access to the source code. Several development test tools, such as Purify and Quantify, may pinpoint problems in object files and libraries while providing good stack tracebacks. Purify, for example, can identify memory leaks in the application as well as in third-party components. Quantify may identify resource leaks. The various types of tools that can be employed in support of unit and integration-level testing are noted in Chapter 3.

When unit testing uncovers problems, such defects need to be documented and tracked. Test documentation is more the exception than the rule when it comes to the performance of unit testing. Commonly, an application developer executes a test, notes any defects, and then immediately proceeds with troubleshooting activities to fix the defect, without documenting it. For metric collection and process improvement purposes, development teams may find it worthwhile to record and track defects observed during unit testing. Code and unit test scripts need to be refined to permit this activity. Unit test activities can be documented within a software development folder (SDF) associated with the particular software unit, depending on the required level of formality. Selected unit testing evaluation criteria follow.

Unit Testing Evaluation Criteria List

image

Once the unit test phase is complete, the software application code is baselined and the test team evaluates test results and prepares a test report that summarizes test activities. Table 9.1 gives a sample unit test evaluation report. Test results need to satisfy unit completion criteria, and the test team may need to obtain sign-off from management, end users, and the QA department after each phase before initiating the next test phase.

Table 9.1. Unit Test Evaluation Report

image

9.1.2 Integration Test Execution and Evaluation

Integration testing can be conducted either by developers or by the test group, depending upon the decision made during test planning with regard to the allocation of funding for test activities. Integration testing resembles system testing, but concentrates on the application internals more than system testing does. During integration testing, units are incrementally integrated and tested together based upon control flow. Because units may consist of other units, some integration testing (also called module testing) may take place during unit testing.

Integration test procedures are based on an integration test design addressed in Chapter 7. After tests have been executed, the test team performs a careful analysis (see Section 9.1.3 for further discussion of the detailed evaluation activities). For process improvement purposes, defect reports (that is, SPRs) need to be documented and tracked. Section 9.2 provides further detail on software problem reporting and defect tracking.

The development team must generate software fixes to resolve problem reports, and integration test procedures subsequently need to be refined. When the test team takes responsibility for executing integration tests, the test engineers can enhance developer’s understanding of system and software problems and help replicate a problem when necessary. Each defect report will be classified in a range of 1 to 4, based upon the degree of priority. Section 9.2 addresses the classification of defects in further detail.

Test engineers may participate in engineering review boards, as applicable, to review and discuss outstanding defect reports. Following the development effort intended to mitigate defect reports, test engineers perform regression tests to verify closure of the problems. During integration testing, some test scripts can be automated successfully for reuse during system testing. Given this reuse opportunity, the automated scripts employed in integration testing need to be baselined following their successful execution. After integration testing ends, the test team prepares a report that summarizes test activities and evaluates test results. End-user approval of the test report constitutes the conclusion of unit and integration testing.

9.1.3 System Test Execution and Evaluation

System testing is another form of integration testing, albeit one conducted at a higher level. During system testing, the test engineer examines the integration of parts, which make up the entire system. System-level tests are usually performed by a separate test team that implements the test procedure execution schedule and the system test plan. They may require a large number of individual test procedures to verify all necessary combinations of input, process rules, and output associated with a program function.

9.1.3.1 False Negatives

Once test procedures supporting system test have been executed, the test team compares the expected result for each test procedure with the actual result. If the actual result differs from the expected result, the delta (that is, the discrepancy) must be further diagnosed. A failed test procedure does not necessarily indicate a problem with the AUT. The problem could be a false negative, meaning that the test failed even though no problem exists with the AUT. Incidents of false-negative outcomes can be caused by a necessary change in the application, test setup errors, test procedure errors, user errors, automated test script logic errors, or test environment setup errors. Test environment setup problems, for example, may stem from the installation of the wrong version of the application software.

The test team needs to be able to replicate the problem and must ensure that the problem did not result from a user error. For example, a test engineer might expect a specific data outcome after a test procedure has executed; in reality this data outcome might not be possible unless specific data setup activities have occurred. To prevent this type of user error, the test procedure description should contain the level of detail necessary and test procedures must be properly verified. See Section 8.1.10 for further information on test procedure reviews.

Additionally, test engineers, when investigating a failure of an automated test script, must ensure that the problem does not result from a procedural error. For example, a screen menu item may have been changed based upon user input, but the automated script, which was created during a previous software version, may not properly reflect the change. The automated script will indicate a failure when executed, but this failure does not reflect a problem with the AUT.

To document the various failures that occur, a table such as Table 9.2 can be generated to facilitate the evaluation of test outcomes. This table outlines the various test outcome evaluation activities that can be conducted and documented for test metric collection purposes.

Table 9.2. Test Outcome Evaluation Activities

image

The results depicted in Table 9.2 can be used to collectively analyze reported problems. For example, when a number of false negatives point to a fault in the test procedure development, an improvement in the test procedure creation process is warranted. Table 9.3 comprises a legend that delineates the reason codes, troubleshooting activity codes, and solution codes possible when developing a table like Table 9.2.

Table 9.3. Test Outcome Report Legend

image

9.1.3.2 False Positives

Even when test execution results match the expected results, the test team must ensure that the results are not based upon false positives, where a test procedure appears to have executed successfully but a problem actually exists with the AUT. The test team needs to stay alert for false positives caused by automated test tools that are not sensitive enough to the nuances of the AUT. Given the possibility of this kind of condition, it is important that test procedure walkthroughs be conducted prior to test execution. In addition to conducting test procedure walkthroughs or peer reviews, the test team should evaluate and spot-check the correctness of the test result, even if the script has passed the first time. If the expected test result does not match the actual test result, because of a problem in the AUT rather than because of a false positive or a false negative, the test team needs to create a software problem report to document the defect.

9.1.4 Test Results Analysis of Regression Tests

When the test group receives a new application baseline, build release notes should accompany the new build. These release notes should address all new functionality additions and defects that were fixed in the revision. Additionally, once the test team receives the new build, a smoke test should be executed to verify that the major functionality from the previous build still functions properly in the new build. When the smoke test flags discrepancies, then the new build should not be accepted for system testing. When a smoke test passes, the new build is accepted for system testing and incremental regression testing is performed.

Regression testing can consist of running a specific selection of automated tests that reexercise high-risk and potentially affected areas of code after defects have been fixed. Regression test result analysis ensures that previously working system functionality has not been affected as a result of software modifications implemented to correct defects. The test team therefore needs to perform regression tests against both modified code as well as code which was not changed, but potentially could have been affected by the change. When the test engineer encounters a large number of errors associated with functionality that previously worked, it can be inferred that the application developers may have been careless in implementing changes. These findings should be documented as part of the metrics collection process, discussed in Section 9.3.2.

When errors are observed for a system functionality that previously worked, the test team needs to identify other functional areas that are most likely to have an effect on the functionality where the error occurred. Based on the results of such analysis, a greater regression testing emphasis can be placed on the selected functionality. Regression testing is further performed on the problem functionality area to verify closure of the open defects, once the development team has implemented its fixes.

The test team also performs analysis to identify particular components or functionality that are experiencing a greater relative number of problem reports. This analysis may reveal that additional test procedures and test effort need to be assigned to the components. If developers indicate that a particular functional area is now fixed, but regression testing uncovers problems for the particular software, then the test engineer needs to ascertain whether an environment issue is the culprit or whether poor implementation of the software correction is at fault.

Analysis of test results can also confirm whether executed test procedures are adept at identifying errors. This analysis also helps to identify the functionality where most defects have been uncovered and suggests where further test and repair efforts require further focus. The test team may therefore need to consider reallocation of test engineer effort and reassessment of application risk allocation.

System testing is completed once the system acceptance criteria have been met. For more information, see the acceptance criteria metric described in Section 9.3.2.

9.1.5 User Acceptance Test Execution and Evaluation

The test team may need to perform a user acceptance test (UAT) that involves end-user participation. The UAT commonly consists of a subset of the suite of tests performed at the system test level. The specific suite tests planned must be defined and communicated to the customer or end user for approval. Acceptance testing will be performed in a defined test environment.

Defects observed during UAT are documented via a SPR and assigned a priority rating. SPRs, which cannot be readily fixed during the scheduled UAT timeframe, may be referred to an engineering review board (ERB) for further evaluation and engineering review. Depending on the user acceptance criteria, system acceptance could be achieved, for example, following resolution of all level 1 and level 2 problem reports.

Following the performance of UAT or any other testing phase, the test team prepares a report that provides a summary of test activities and includes an evaluation of test results. The satisfactory resolution of all level 1 and level 2 SPRs and approval of the test report generally constitutes the conclusion of UAT testing.

Site acceptance tests may be warranted for some tasks and projects, when required by a customer and specified within approved test plans. These tests usually consist of the same set of test procedures and scripts used during UAT, minus any tests that may not apply for a specific site. The same process for resolving software problem reports applies. Following the performance of site testing, the test team may prepare another test report.

9.2 Defect Tracking and New Build Process

Test engineers will need to help developers understand and replicate system and software problems, when necessary. Each defect is commonly classified in a range of 1 to 4 based upon degree of priority. Test engineers will need to participate in ERBs, as applicable, to review and discuss outstanding defect reports. Following development effort to correct identified defects, test engineers perform regression testing on the applicable software to verify closure of problem reports.

Each test team needs to perform problem reporting operations in compliance with a defined process. Typically, the test engineer creates the SPR within a defect tracking system. Following the creation of the SPR, an automatic e-mail notification is forwarded to cognizant members of the configuration management (CM) group and application development team to advise them that a SPR has been generated. Once the SPR has been corrected and unit testing has been carried out to the satisfaction of the software development team, the new software code is checked via a software CM tool. Once a number of software problem reports have been corrected, the development team issues a new software build and software updates are made to the test environment.

One action required during defect tracking is the assignment of a level of priority for the defect. The test engineer must assess the importance of the solution to the successful operation of the system. The most critical defects cause the software to fail and prevent the continuation of the test activity. In contrast, high-priority defects need to be fixed soon but generally do not prevent the test activity from continuing. A common classification of defect priority levels follows.

  1. Fatal. Operation of the application is interrupted, and testing cannot be continued.
  2. High priority. A significant problem, but the application is still operational.
  3. Medium priority. The problem has little impact on operation of the application.
  4. Low priority. The problem has no impact on the operation of the application.

Defects that cannot be readily fixed are referred to an ERB for further evaluation and disposition. The ERB may confirm that the SPR is valid and possibly adjust the priority level of a SPR. In other cases, the ERB may confirm that the SPR is not valid and cancel the SPR. An SPR that represents an enhancement, will be reclassified as a change request. Figure 9.2 provides an example of a typical defect tracking procedure.

Figure 9.2. Defect Tracking Procedure

image

The documentation and tracking of defects are greatly facilitated by an automated tool. An automated defect tracking tool helps to ensure that reported defects receive the proper attention. Without such a tool, some problem reports may not be assigned to a developer for proper corrective action. In other cases, application developers may inadvertently close defect reports without proper verification by test personnel. Automated defect tracking tools generally support the maintenance of a central defect tracking repository; this repository is accessible by all project team members.

Several basic characteristics can be assessed to determine the value of a defect tracking tool (see Table 9.4 for more details). For example, the tool should be able to perform the following tasks:

• Identify the priority of a defect

• Assign a unique identifier to each defect

• Link each defect to the applicable test procedure as well as to a particular application build

• Log the date on which the defect was reported

• Log the date on which the defect was assigned to an application developer

• Log the date on which the defect was updated

• Identify the developer assigned to the defect

• Identify the test engineer who reported the defect

• Log and track the status of the defect, including values such as new, open, assigned, fixed, retest, and closed

Table 9.4. Defect Tracking Tool Evaluation Criteria

image

The test management tool should permit the automatic validation of as many of the test results as possible. Some test tools, such as Rational’s TestStudio, allow the anticipated results of a test to be hard-coded within the test procedure when a specific response is expected. As a result, when a test script fails, the test engineer can be confident that the actual result does not meet the specific result expected from the test. A specific result cannot always be programmed into the tool, however, especially when numerous transactions are performed against the application’s database and results are very dynamic. The sequential order of the transactions may vary, and the transactions occurring prior to the test may affect the results. In this situation, the test results could be evaluated by querying the database directly using SQL statements and then comparing the query results and the application generated results to the expected results.

Most automated test tools maintain test results and permit for the automatic generation of test results (that is, the test log). The test log will maintain information such as pass or fail status, the name of each test procedure executed, and the start and end times for each test execution. Test tools vary in terms of the level of sophistication that they provide with regard to test result analysis. The more test result attributes that can be documented by a test tool, the more information that the test engineer can use to analyze results. Some tools, for example, may identify the state of the application and the state of the system.

As with any tool to be applied on a project, the selection of a defect tracking tool involves the consideration of a number of characteristics. With a defect tracking tool, the responsible person should go through the various steps described in Chapter 3 to make a decision about the tool (or any other tool). First, the responsible person comes up with a table of tool characteristics to be considered, like that depicted in Table 9.4. Next, he or she assigns a weight to each characteristic.

Defects may be discovered throughout the entire testing life cycle. Therefore, it is recommended that the test team generate and classify SPRs according to the life-cycle phase or product in which the defect emerged. Table 9.5 provides an example of the possible categories for software problem reports.

Table 9.5. SPR Categories

image

9.2.1 Defect Life-Cycle Model

When using a defect tracking tool, the test team will need to define and document the defect life-cycle model, also called the defect workflow. In some organizations, the configuration management group takes responsibility for the defect life cycle; in other organizations, it is a test team responsibility. Figure 9.3 provides an example of a defect life-cycle model.

Figure 9.3. Defect Life-Cycle Model

9.3 Test Program Status Tracking

The test team manager is responsible for ensuring that tests are executed according to schedule, and that test personnel are allocated and redirected when necessary to handle problems that arise during the test effort. To perform this oversight function effectively, the test manager must conduct test program status tracking and management reporting.

Throughout the testing phase, the test engineer will need to provide meaningful reports based on the measures and metrics defined within the test plan and outlined in this section. As part of this effort, the test engineer produces test logs and test coverage reports (see Section 9.3.2). Test logs can be used to verify that all SPRs have been documented and corrected (by checking the status of the SPR). The test engineer reviews the test coverage report to ascertain whether complete (100%) test procedure execution coverage has been achieved. In addition, he determines whether test coverage criteria have been met or whether these criteria should be modified. The test team further needs to decide whether additional test requirements and test procedures are needed to satisfy test coverage or test completion criteria. Several reports produced by the test team will prove especially valuable, including software problem report summaries (or individual problem reports) as well as defect density and defect trend analysis reports. This section will discuss the various metrics that the test engineer can produce and report to management.

To effectively monitor testing progress and report to senior management, the test manager needs to implement an earned value approach to test progress status effort. Implementing an earned value management system (EVMS) is one of the best ways of tracking the test program status [1]; this section provides examples of how to implement an EVMS. Similarly, the test manager needs to collect other measurements of test performance, such as those related to test coverage, predictions of time to release AUT, and quality of the software at time of release. Although an abundant number of test metrics can be collected, time limitations often restrict the test team’s ability to collect, track, and analyze such measurements.

9.3.1 Earned Value Management System

This section outlines an approach for using earned value calculations as a test program status tracking method, and presents a case study of a high-level implementation of an EVMS. Earned value analysis involves tracking the value of completed work and comparing it to planned costs and actual costs so as to provide a true measure of schedule and cost status and to enable the creation of effective corrective actions. The earned value process includes four steps:

  1. Identify short tasks (functional test phase).
  2. Schedule each task (task start date and end date).
  3. Assign a budget to each task (task will require 3,100 hours using four test engineers).
  4. Measure the progress of each task, enabling the engineer to calculate schedule and cost variance.

The use of earned value calculations requires the collection of performance measurements—for example, assessments of test program progress relative to pre-determined goals or objectives. This approach also helps to recast quantified objectives in terms of technical, schedule, resource, or cost/profit parameters. Two key earned value calculations pertain to the assessment of cost and schedule variance:

Earned value for work completed − planned budget = schedule variance

Earned value for work completed − actual cost = cost variance

9.3.2 Test Metrics Collection and Analysis

Test metrics can provide the test manager with key indicators of the test coverage, progress, and the quality of the test effort. Many test metrics can be collected to monitor test program progress, so the test team needs to be careful to choose that set of metrics that best serves its performance concerns. Gathering and analyzing too many test metrics can become time-consuming and reduce the number of hours that are spent actually performing actual test activities. For example, Ivar Jacobsen [2] has noted that the most important measurement during system development is “the rate of change.” The team should take care to measure whether the rate of change in one particular area (such as in requirements, components, or modules) is much larger than that observed in other areas. Improvement activities then focus on those areas with the highest rate of change.

This section identifies some of the more important metrics to collect when the test engineer does not have an elaborate metrics management system in place and has minimal time to collect data. Just as the test design effort was broken down in Chapter 7 into white-box and black-box test design techniques, the effort to manage metrics can be separated into white-box and black-box testing efforts. Before discussing this breakdown, however, it is beneficial to address the scope of the metric collection and analysis process. Basic elements and prerequisites of a software metric process are structured as follows [3]:

• Goals and objectives are set relative to the product and software (test) management process.

• Measurements are defined and selected to ascertain the degree to which the goals and objectives are being met.

• The data collection process and recording mechanism are defined and used.

• Measurements and reports are part of a closed-loop system that provides current (operational) and historical information to technical staff and management.

• Data on post-software product life measurement are retained for analysis that could lead to improvements for future product and process management.

9.3.2.1 White-Box Testing Metrics

White-box testing techniques target the application’s internal workings; similarly, white-box metrics collection has the same focus. During white-box testing, the test engineer measures the depth of testing by collecting data related to path coverage and test coverage. This white-box testing metric is called coverage analysis, which is described in detail in Section 7.2.2.1.

Source code analysis and code profiling help discern the code quality. As already mentioned, code profiling is a tuning process that determines whether an algorithm works inefficiently or whether a function is being called too frequently. Many tools are available to accomplish this task by identifying coding and development errors, such as out-of-range indices, unused code (dead code), and unreachable code. These tools help focus the efforts on those parts of the code that have the greatest potential for defects. An example of such a tool is Rational Quantify.

The objective of source code complexity analysis is to identify complex areas of the source code. High-complexity areas of source code can be sources of high risk. Unnecessary code complexity can decrease code reusability and increase code maintenance. Consequently, testing efforts need to focus on high-complexity code. McCabe’s Cyclomatic Complexity measurement helps determine high complexity code, thus identifying error prone software [4].

Another white-box test metric of interest is fault density [5]. The test team can predict the remaining faults by comparing the measured fault density with the expected fault density and thereby determine whether the amount of testing is sufficient. Fault density is calculated per thousand source lines of code (KSLOC) using the equation Fd = Nd/KSLOC, where Nd is the number of defects, and KSLOC is the number of noncomment lines of source code.

Design complexity metrics measure the number of ways that a module can call other modules. They can serve as an indicator of the integration testing effort required for a set of modules.

9.3.2.2 Black-Box Testing Metrics

During black-box testing, metrics collection focuses on the breadth of testing, such as the amount of demonstrated functionality and the amount of testing that has been performed. Black-box testing techniques are based on the application’s external considerations. As a result, test procedures are based upon system requirements or use cases as described in Chapter 7.

Table 9.8 and the remainder of this section describe the various testing metrics to be collected during the black-box testing phase and their purposes. Each metric is assigned to one of three categories: coverage, progress, or quality.

Table 9.8. Sample Black-Box Test Metrics

image

image

Coverage Metrics

Test Coverage

This measurement divides the total number of test procedures developed by the total number of defined test requirements. It provides the test team with a barometer with which to gauge the depth of test coverage. The depth of test coverage is usually based on the defined acceptance criteria. When testing a mission-critical system, such as an operational medical system, the test coverage indicator would need to be high relative to the depth of test coverage for nonmission-critical systems. The depth of test coverage for a commercial software product that will be used by millions of end users may also be high relative to a government information system that will serve a few hundred end users.

System Coverage Analysis

System coverage analysis measures the amount of coverage at the system interface level. This measurement is collected automatically by SRI’s TCAT tool. It expresses test coverage as the percentage of function call pairs that the tests exercise in relation to the total number of function calls in the system. Appendix B contains more information on the TCAT tool.

Functional Test Coverage

This metric can measure test coverage prior to software delivery. It indicates the percentage of the software tested at any point during the test effort [6]. The functional test coverage metric is calculated by dividing the number of test requirements that were supported by test procedures by the total number of test requirements.

Progress Metrics. During black-box testing, test engineers collect data that help identify test progress, so that the test team can predict the release date for the AUT. Progress metrics are collected iteratively during various stages of the test life cycle, such as weekly or monthly. Several progress metrics are described below.

Test Procedure Execution Status

This execution status measurement divides the number of test procedures already executed by the total number of test procedures planned. By reviewing this metric value, the test team can ascertain the number of test procedures remaining to be executed. This metric, by itself, does not provide an indication of the quality of the application. Instead, it provides information about the depth of the test effort rather than an indication of its success.

Some test management tools, such as Rational’s Test Manager, allow test engineers to automatically keep track of test procedure execution status. In Test Manager, test engineers can enter test requirements and then link these requirements to the automated tests that have been created with the SQA Robot test tool. The test tool identifies which test procedures have executed successfully, which were unsuccessful, and which were not executed.

Error Discovery Rate

This measurement divides the total number of documented defects by the number of test procedures executed. Test team review of the error discovery rate metric supports trend analysis and helps forecast product release dates.

Defect Aging

Another important metric in determining progress status is the measure reflecting the turnaround time for a defect fix—that is, the time from when the defect was identified to the resoultion of the defect. Defect aging pertains to the turnaround time required for an SPR to be corrected. Using defect aging data, the test team can conduct trend analysis. For example, 100 defects may be recorded on a project. When documented past experience indicates that the development team can fix as many as 20 defects per day, then the turnaround time for these problem reports may be only one work week. In this case, the defect aging statistic would be an average of 5 days. When the defect aging measure equals 10 to 15 days, the slower response time by the developers to make corrections may affect the ability of the test team to meet scheduled deadlines.

When evaluating the defect aging measure, the test team also needs to take the priority of the SPRs into consideration. A defect aging measure of 2 to 3 days may be appropriate for level 1 SPRs, while 5 to 10 days may be appropriate for level 3 SPRs. Under this type of rule of thumb, defect aging measurement is not always appropriate and needs to be modified to take into account the complexity of the AUT, among other criteria.

Defect Fix Retest

This metric provides a measure of whether the test team is retesting the corrections at an adequate rate. It is calculated by measuring the time between when the defect was fixed in a new build and when the defect was retested.

Defect Trend Analysis

Defect trend analysis can help to determine the trend of defects found. Is the trend improving as the system testing phase winds down or is the trend worsening? This metric compares the total number of defects found with the number of test procedures executed over time.

Quality Metrics

Test Success Index

This measurement, which is also known as the current quality ratio, is computed by taking the total number of test procedures executed and passed divided by the total number of test procedures executed. It provides the test team with further insight into the amount of functionality that has been successfully demonstrated.

Quality of Fixes1 = Number Total Defects Reopened/Number of Total Defects Fixed

The value obtained from this calculation provides a measure of the quality of the software corrections implemented in response to software problem reports. When this value is high, then the test team may need to notify the developers of this problem.

Quality of Fixes2 = Previously Working Functionality versus New Errors Introduced

This metric aids the test team in determining the degree to which previously working functionality has been adversely affected by software corrections.

Defect Density

The defect density metric is calculated by taking the total number of defects found and dividing this value by the number of test procedures executed for a specific functionality or use case. For example, if a high defect density appears in a specific functionality, a causal analysis should be conducted. Is this functionality very complex and therefore it would be expected that the defect density is high? Is there a problem with the design/implementation of the functionality? Were the wrong (or not enough) resources assigned to the functionality, because an inaccurate risk had been assigned to it? It also could be inferred that the developer responsible for this specific functionality needs more training.

Additionally, when evaluating defect density, the test team should consider the priority of the SPRs. For example, one application requirement may have as many as 50 low-priority SPRs, while the acceptance criteria have been satisfied. Another requirement might have one open high-priority SPR that prevents the acceptance criteria from being satisfied.

Defect Trend Analysis

Defect trend analysis is calculated by dividing the total number of defects found by the number of test procedures executed. For example, if a high amount of defects was found at the beginning of test execution and the number of defects generated decreases after all test procedures have been executed once, then the test engineer can see an improving trend.

Test Effectiveness

Test effectiveness needs to be assessed statistically to determine how well the test data have exposed defects contained in the product. In some cases, test results may have received inadequate analysis. The test team should solicit the assistance of personnel who are experienced in the use of the application, so as to review test results and determine their correctness.

Problem Report—Acceptance Criteria Metric

The acceptance criteria metric (that is, the number of SPRs classified by priority level) needs to be defined during the test planning phase, before test execution begins. The acceptance criteria will stipulate the conditions under which the system is ready to be shipped or implemented at a customer location. The test engineer must ascertain whether an AUT satisfies these criteria, which are stipulated by the customer and defined in the test plan. For example, the acceptance criteria for a simple application might include one of the following statements:

• The system is acceptable providing that all level 1, 2, and 3 (fatal, high, and medium) SPRs documented as a result of testing have been resolved.

• The system is acceptable providing that all level 1 and 2 (fatal and high) SPRs documented as a result of testing have been resolved.

• The system is acceptable providing that all level 1 and 2 (fatal and high) SPRs documented as a result of testing have been resolved, and that 90% of level 3 problem reports have been resolved.

Test Automation Metric. It is important to generate a metric that calculates the value of automation, especially the first time that the project uses an automated testing approach. The test team will need to measure the time spent developing and executing test scripts and compare it with the results that the scripts produced. For example, the test team could compare the number of hours required to develop and execute test procedures with the number of defects documented that would not likely have been revealed during manual testing.

Sometimes it is difficult to quantify or measure the automation benefits. For example, automated test tools may reveal defects that manual test execution could not have discovered. For example, during stress testing, 1,000 virtual users execute a specific functionality and the system crashes. It would be very difficult to discover this problem manually, using 1,000 test engineers. An automated test tool can also be applied to data entry or record setup. In this case, the metric measures the time required to manually set up the needed records versus the time required to set up the records using an automated tool.

Consider the test effort associated with the following system requirement: “The system shall allow the addition of 10,000 new accounts.” Imagine having to manually enter 10,000 accounts into a system to test this requirement! An automated test script can easily handle this requirement by reading account information from a file through the use of a looping construct. The data file can quickly be generated via a data generator. The effort to verify this system requirement using automation tools requires far fewer number of man-hours than performing such a test using manual methods.

In another case, once the test script has entered the 10,000 accounts, it may be desirable to delete all of these records from the test database and reset the database to its original condition. A simple SQL script can quickly and easily manage the deletion of these records. Now imagine that the test team wants to delete only those specific accounts added by the original script. The test team would simply create an automated script that accesses the existing data file, queries for the matching record in the AUT database, and then deletes each corresponding record. Once again, automation of this activity involves a significantly lower number of man-hours than manual testing.

Chapter Summary

• When executing test procedures, the test team will need to comply with a test procedure execution schedule. Following test execution, test outcome evaluations are performed and test result documentation is prepared.

• Plans for unit, integration, system, and user acceptance testing together make up the steps that are required to test the system as a whole. During the unit testing phase, code profiling can be performed. Traditionally, profiling is a tuning process that determines whether an algorithm is inefficient or a function is called too frequently. It can uncover improper scaling of algorithms, instantiations, and resource utilization.

• Integration testing focuses on the application’s internal workings. During integration testing, units are incrementally integrated and tested together based on control flow. Because units may consist of other units, some integration testing (also called module testing) may take place during unit testing.

• During system testing, the test engineer examines the integration of the parts that make up the entire system. System-level tests are usually performed by a separate test team. The test team implements the test procedure execution schedule and the system test plan.

• The test team must perform analysis to identify particular components or functionality that are generating a greater relative number of problem reports. As a result of this analysis, additional test procedures and test effort may need to be assigned to the components. Test results analysis can also confirm whether executed test procedures are worthwhile in terms of identifying errors.

• Each test team must perform problem reporting operations in compliance with a defined process. The documentation and tracking of software problem reports are greatly facilitated by an automated defect tracking tool.

• The test team manager is responsible for ensuring that tests are executed according to schedule and that test personnel are allocated and redirected when necessary to handle problems that arise during the test effort. To perform this oversight function effectively, the test manager needs to perform test program status tracking and management reporting.

• Test metrics provide the test manager with key indicators of the test coverage, progress, and the quality of the test effort. During white-box testing the test engineer measures the depth of testing by collecting data about path coverage and test coverage. During black-box testing, metrics collection focuses on the breadth of testing, including the amount of demonstrated functionality and the amount of testing that has been performed.

References

1. Software Program Management. Laguna Hills, CA: Humphreys and Associates, 1998.

2. Jacobson, I. “Proven Best Practices of Software Development.” Rational ’99 Worldwide Software Symposium, Washington, DC, January 26, 1999.

3. Florac, W.A., et al. Software Quality Measurement: A Framework for Counting Problems and Defects. Technical Report, CMU/SEI-92-TR-22, ESC-TR-92-022. Software Engineering Institute, Pittsburgh, PA, September 1992.

4. McCabe, T.J. Structured Testing: A Software Testing Methodology Using the Cyclomatic Complexity Metric. NBS Special Publication 500-99. Washington, DC: U.S. Department of Commerce/National Institute of Standards and Technology, 1982.

5. ANSI/IEEE Standard 982.2-1988.

6. See note 5.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset