Chapter 15 Test and Evaluation of Distributed Data and Information Fusion Systems and Processes

Since both the first and second editions of this handbook for multisensor data fusion have addressed various of the fundamental and abstract concepts related to test and evaluation (T&E henceforth), we will only briefly remark and summarize some of these still-important notions (see Hall and Llinas [2001], chapter 20 and Liggins et al. [2009], chapter 25 for more extended remarks).

A still-valid and important assertion is that generalized methods and procedures as well as formalized statistical/mathematical analysis techniques for improved T&E of data and information fusion systems and processes are still very understudied. Partially as a result of this deficiency, transition of seemingly capable data fusion systems has been problematical. Among factors needing additional study to smooth the path toward reliable operational use are new, user-oriented techniques, for example, involving new ideas on metrics such as trustworthiness metrics as well as those related to dealing with the inherent uncertainties involved with fusion processes so that a clear understanding of these effects can be realized by typical users. As the field of multisensor data fusion moves its focus to the higher levels of abstraction and inference (Levels 2, 3, and 4 of the Revised JDL Model, see Figure 15.1 [Steinberg et al. 1999, Bowman and Steinberg 2001, Bowman 2004, Llinas et al. 2004, Steinberg and Bowman 2004]), efforts have been made toward defining new types of metrics for these fusion levels (Blasch 2003; Blasch et al. 2004, 2010; Haith and Bowman 2010) with extensions for dual resource management levels, but much more needs to be done in regard to efficient T&E techniques for high-level state estimates, since these are much more complex and of higher dimensionality than Level 1 type states.

Following Hall and Llinas (2001) and Liggins et al. (2009), we still emphasize the remarks having to do with test philosophy and context. A philosophy is that line of thinking that establishes or emphasizes a particular point of view for the tests and/or evaluations that follow. Philosophies primarily establish points of view or perspectives for T&E that are consistent with, and can be traced to, the goals and objectives: they establish the purpose of investing in the T&E process. T&E philosophies, while generally stated in nonfinancial terms, do in fact establish economic philosophies for the commitment of funds and resources to the T&E process. The simplest example of this notion is reflected in the so-called black box or white box viewpoints for T&E, from which either external (I/O behaviors) or internal (procedure execution behaviors) are examined. Another point of view revolves about the research or development goals established for the program. The philosophy establishes the high-level statement of the context for testing and is closely intertwined with the program goals and objectives. Assessments of delivered value for defense or other critical systems must be judged in light of system or program goals and objectives. In the design and development of such systems, many translations of the stated goals and objectives occur as a result of the systems engineering process, which both analyzes (decomposes) the goals into functional and performance requirements and synthesizes (reassembles) system components intended to perform in accordance with these requirements. Throughout this process, however, the program goals and objectives must be kept in view because they establish the context in which value will be judged.

FIGURE 15.1 Dual functional levels of the data fusion and resource management (DF&RM) dual node network (DNN) technical architecture (Data from Bowman, C.L, The dual node network (DNN) DF&RM architecture, AIAA Intelligent Systems Conference, Chicago, IL, 2004; Bowman, C.L. and A.N. Steinberg, A systems engineering approach for implementing data fusion systems, In Handbook of Multisensor Data Fusion, D. Hall and J. Llinas, J. (Eds.), Chapter 16. Boca Raton, FL: CRC Press, 2001; Steinberg, A.N., et al., Proceedings of SPIE Conference Sensor Fusion: Architectures, Algorithms, and Applications III, 3719, 430, 1999; Steinberg, A. and C.L. Bowman, Rethinking the JDL data fusion levels, National symposium on sensor and data fusion (NSSDF). Johns Hopkins Applied Physics Lab (JHAPL), Laurel, MD, 2004; Llinas, J. et al., Revisiting the JDL data fusion model II. International Conference on Information Fusion, Stockholm, Sweden, 2004.)

Context, therefore, reflects what the program (i.e., the DF&RM process or a function within it) is trying to achieve—(e.g., what the research or developmental goals [the purposes of building the system at hand] or the learning intelligent DF&RM system are). Such goals are typically reflected in the program name, such as a “Proof of Concept” program or “Production Prototype” program. Many recent programs involve “demonstrations” or “experiments” of some type or other, with these words reflecting in part the nature of such program goals or objectives.

Once having espoused one or another of the philosophies, there exists a perspective from which to select various criteria, which will collectively provide a basis for evaluation. There is, in the most general case, a functional relationship as

Criterion = fct [(Measure_i = fct (Metric_i, Metric_j…),

Measure_j = fct (Metric, Metric_i…), etc.]

that defines how each criterion is dependent on certain measures that are in turn derived hierarchical functions of higher-level metrics (e.g., from probability of mission success on down), which are the quantities that are (importantly) observable in an experiment. Each metric, measure, and criterion also has a scale that must be considered. Moreover, the scales are often incongruent so that some type of normalized figure of merit approach may be necessary in order to integrate metrics on disparate scales and construct a unified, quantitative parameter for making judgments.

Another important element of the T&E framework is the approach element of the T&E process. In this sense, approach means a set of activities, which are both procedural and analytical, that generates the “measure” results of interest (via analytical operations on the observed metrics) as well as provides the mechanics by which decisions are made based on those measures and in relation to the criteria. The approach consists of two components as described in the following:

• A procedure, which is a metric-gathering paradigm; it is an experimental procedure.

• An experimental design, which defines (1) the test cases, (2) the standards for evaluation, and (3) the analytical framework for assessing the results.

Aspects of experimental design include the formal methods of classical, statistical, experimental design. Few if any fusion T&E research efforts in the literature have applied this type of formal strategy, presumably as a result of cost limitations or other unstated factors. Nevertheless, there are the serious questions of sample size and confidence intervals for estimates, among others, to deal with in the formulation of any T&E program, since simple comparisons of mean values, etc. under unstructured test conditions may not have very much statistical significance in comparison to the formal requirements of a rigorous experimental design. Any fusion-based T&E program, because all fusion systems and processes are inherently dealing with random variables and stochastic behaviors, should at least recognize the risks associated with such simplified analyses.

The T&E process contains a Level 4 fusion performance evaluation (PE) process as per the DNN technical architecture; see Haith and Bowman (2010), Bowman (2008), Gelfand et al. (2009), and Bowman et al. (2009). The PE process architecture typically involves a network of interlaced PE fusion and T&E Process Management (PM) DF&RM nodes. Each PE node performs data preparation, data association, and state estimation where the data are DF&RM outputs (e.g., track estimates) and estimates of truth and the output state estimates are the Measures of Performance (MOPS) such as described earlier.

Much more is said in Hall and Llinas (2001) and Liggins et al. (2009) on the general considerations of T&E for data and information fusion systems, and the interested reader is directed to those sources for additional commentary and insight, along with many references.

15.2 UNDERSTANDING DISTRIBUTED FUSION SYSTEM CONCEPTS AND IMPLICATIONS FOR TEST AND EVALUATION

Testing and evaluating anything requires that the item to be tested, the “test article,” be clearly defined. It is also important to understand the role or purpose of the test article in the context of its use or setting in a larger system framework. To stimulate this discussion, let us characterize what a Distributed Data or Information Fusion System (DDIFS) is; it is appreciated that other chapters in this book may have other characterizations of a DDIFS, but we feel it is important to review these, even if redundant, in relation to developing thoughts about testing and evaluating such systems and functions. So, our local characterization describes a DDIFS as follows:

1. It is first of all “distributed,” meaning that its components (which immediately implies that it comprises a number of components) are spread apart somehow; very often this is a geographical separation, or for defense/security applications, a platform separation where DDIFS components are hosted on an aircraft or ship or satellite, etc. (that could, in turn, be geographically separated)—thus we can also have a kind of local distribution embedded in a larger distributed-system context.

2. The components are interconnected (informationally) according to the design of a specified communication/datalinking network, and share information and/or processing results according to some supported protocol.

a. Note that this makes the components interdependent in some way.

3. The components also may have local resources of various description or type to include sensors, processors, and manageable resources; representative component functionalities can include sensor nodes, processing nodes, fusion nodes, communication nodes, etc.—not every component in a DDIFS is necessarily a fusion node in the sense of producing state estimates, as some may perform functions that contribute to the formation of estimates.

4. In this framework, the components can only fuse two things: local (“organic”) information from resources that they “own” (i.e., for which they have control and design authority), and information that comes to them “somehow” (i.e., according to the inter-component information-sharing strategy (ISS) or protocol) from other components in the networked system (we interchange the terms distributed and networked).

5. Metadata must also be shared across components along with the shared information in order that sender-components can appropriately inform receiver-components of certain information necessary to subsequent processing of the sent “message” or data-parcel by receiver-components. (It can also be the case that receiver-components can request information from various other components, and such requests may have metadata and the requesting-component may ask that certain metadata be contained in the reply.)

a. Another reason for metadata is due to the generally large size of most distributed systems that prevents any given component from knowing much about “distant” components and their (dynamic) status.

6. The topology of the DDIFS is very important since it affects a number of overall system properties to include connectivity, failure vulnerability, etc. Table 15.1, drawn largely from Durrant-Whyte (2000) and Utete (1994), shows a subjective characterization of some DDIFS properties as a function of topological type.

7. It can be expected that in larger, complex systems any given fusion components or nodes may have to have two fusion processes operating, one to process local, organic data as described earlier—since these data are best understood by the local node, allowing optimal fusion processes to be developed—and one to process received network information, about which only the metadata are known, restricting the realization of optimal methods for this “external” data. Such separation may also be required because of distinct differences in the nature of the fusion operations, requiring different algorithmic techniques.

TABLE 15.1
Subjectively Judged Properties of a DDIFS as a Function of Topology-Class

DDIFS Topology	Inherent Redundancy/Failure Protection	Scalability	Ability to Manage Redundant Information (Double Counting)	Practicality
Fully connected	Good	Very poor	Good	Poor
Tree	Poor; branch failures can lead to tree splitting	Poor	Limited to most recent transaction	Reasonable
Decentralized	Moderate	Good	Possible but requires careful design	Good
Dynamically managed	Good	Moderate	Moderate	Complex

Source: Adapted from Durrant-Whyte, H.F., A beginner’s guide to decentralized data fusion, Technical report, Australian Centre for Field Robotics, University of Sydney, Sydney, New South Wales, Australia, 2000; Utete, S., Network management in decentralized sensing systems, PhD thesis, The University of Oxford, Oxford, U.K., 1994.

We will restrict our discussion to DDIFSs that are coherently designed toward some bounded set of overarching purposes and capabilities, as distinct from loosely coupled sets of components that may operate opportunistically or some ad hoc manner, in the fashion of a federated system. Such a restriction makes development of a T&E scheme or plan more controllable, but nevertheless requires a clear partitioning of requirements and capability specifications to parts of the DDIFS. Note that this framework is not unlike the system design approach characterized by Bowman for centralized and distributed fusion systems (Bowman 1994).

There is another feature of DDIFSs that needs to be mentioned, although this attribute is applicable to all information fusion systems and many intelligence and surveillance systems that may not even employ IF methods. That attribute is that the fundamental nature of any IF or DDIFS is that the data and even often the knowledge employed in system design and in system operation has a stochastic quality.^* This immediately raises the question about how to define and develop a T&E approach, in the sense of questioning how to account for and measure statistically assured confidence in the test results. More is said about this in the later sections of the chapter.

15.2.1 IMPLICATIONS FOR TEST AND EVALUATION

There are various implications that the features of a DDIFS impute onto the nature of and methods for T&E. For example, there are two broad types of testing used in the development of defense systems: developmental test & evaluation (DT&E) and operational test & evaluation (OT&E). DT&E is oriented to a bounded system as a test article, the “system under test or SUT,” and verifies that the system’s design is satisfactory and that all technical specifications and contract requirements have been met. It is kind of a check-list process of examining whether defined SUT requirements have been met, one-by-one, as determined by T&E processes that address these requirements either singly or in combination. As noted, it is typically a process that is checking that the delivered system satisfies contractual requirements and so is closely related to the acquisition process. DT&E is usually managed by the governmental client but can be conducted by the government, by the contractor, or by a combined test team with representatives from both government and industry. Most early DT&E in a program will likely be done at the contractor’s facilities under controlled, laboratory conditions. OT&E follows DT&E and validates that the SUT can satisfactorily execute its mission in a realistic operational environment including typical operators and representative threats. The difference between DT&E and OT&E is that DT&E verifies that the system is built correctly in accordance with the specification and contract, and OT&E validates that the system can successfully accomplish its mission in a realistic operational environment. Another way to think of these differences is that DT&E is concerned chiefly with attainment of engineering design goals, whereas OT&E focuses on the system’s operational effectiveness, suitability, and survivability.

For DDIFSs, it can be seen that these differences can become cloudy and problematical, due to the underlying nature of various interdependencies between nodes or platforms in such a system. To define a SUT in a DDIFS, one must cut the connectivity to the network at some points so that a standalone, bounded system can be tested as an integrated deliverable within a contract framework. We will later in this chapter discuss our work in supporting the U.S. Major Test Range at Edwards Air Force Base, California, in their preparations for testing new tactical aircraft that have embedded datalinking and data fusion capabilities. These platforms are designed to share sensors and data, as well as locally computed parameters and target tracks, for example. The fundamental mission sortie envisions multiple aircraft flying cooperatively together in the execution of a mission. However, Edwards has historically been a DT&E facility, testing aircraft against single-platform requirements. With the evolution of fusion-capable aircraft and purposefully cooperative mission plans, the nature of what comprises a SUT and how to do DT&E gets muddy. It may be that some new type of T&E activity that bridges between DT&E and OT&E will need to be defined and developed. Such issues also raise the question of the costs of such boundary activities, for example, the very high cost of flying multi-aircraft “SUTs,” or the corresponding technical challenge of developing real-time capable surrogate aircraft simulation capabilities as virtual wingmen as one alternative strategy for a cost-effective approach. So it can be seen that there are some subtle but nontrivial issues to deal with when deciding on a scheme for DDIFS DT&E and OT&E.

We are discussing here automated DDIFSs, where the core technical and functional capabilities are enabled in software, so another core issue in thinking about DDIFS T&E is the domain of software testing. By and large, software testing is the process of executing a program or system with the intent of finding errors. Software is not unlike other physical or functional processes where inputs are received and outputs are produced, but where software differs is in the manner in which software processes fail. Most physical systems fail in a fixed and bounded set of ways. By contrast, software, ironically because of a wide variety of interdependencies (analogous to DDIFSs in the large), can fail in many bizarre ways. Detecting all of the different failure modes for software is generally infeasible because the complexity of software is generally intractable. Unlike most physical systems, most of the defects in software are design errors, and once the software is shipped, the design defects—or bugs—will be buried in and remain latent until activation.

The transition to network-centric capabilities has introduced new T&E challenges. Network functional capabilities can reside in both nodes and links, and various common system capabilities can reside in, for example, service-oriented architecture (SOA) infrastructures. The T&E of capabilities in this type of framework involving specialized and common functionalities requires new thinking and a new strategy; this is another SUT-defining challenge. In the same way that using live/real nodes or platforms in testing adds great expense as was discussed previously, evaluating the performance of the software network itself is probably not going to be accomplished without extensive use of modeling and simulation because the expense of adding live nodes in a laboratory increases dramatically with the number of nodes added to the test apparatus. A T&E strategy that mitigates risk in the development of a network infrastructure that will support network-centric warfare requires a balance of theoretical analysis and laboratory testing.

15.2.2 MEASURES AND METRICS IN THE NETWORK VALUE CHAIN

Chapter 3 addressed the topic of the network-centric value chain. The value chain has three major quality dimensions: data/information quality, quality of share-ability/reachability, and quality of interactions. All of these dimensions will occur to varying degrees in any net-centric operation (NCO), and the degrees to which they occur form the basis for the wide range of metrics suggested in Garstka and Alberts (2004).

This viewpoint is shown in Figure 15.2, from Garstka and Alberts (2004). One way then to develop a measures and metrics framework for a DDIFS is to simply shift the labeling from the NCO application to DDIFS, as there are more or less one-to-one equivalencies in the applicability of these notions as a basis for T&E and a basis of measurement. One distinction would be that fusion processes do not inherently yield a Sensemaking capability but they can be key to realizing such capability. The fusion–Sensemaking interdependency is expressed and actualized via well-designed human–computer interfaces.

In the same way that in Section 15.1 we defined criteria, their dependency on measures, and the dependency of measures on metrics (the ultimate parameters measured in a T&E experiment or trial), Garstka and Alberts (2004) define top-level concepts (the three we have discussed), the attributes upon which they depend, and the measures and metrics used to quantify them. These dependencies are shown in Figure 15.3, from Garstka and Alberts (2004). Four categories of attributes are defined (excerpted literally from Garstka and Alberts [2004]):

FIGURE 15.2 The NCO framework with quality and degree measures. (Adapted from Garstka, J. and Alberts, D., Network Centric Operations Conceptual Framework Version 2.0, U.S. Office of Force Transformation and Office of the Assistant Secretary of Defense for Networks and Information Integration, Vienna, VA, 2004.)

Objective Attributes measure quality in reference to criteria that are independent of the situation. For example, the currency of a given data element indicates the age of the information available and can be expressed in units like minutes, hours, days, etc.

Fitness-for-Use Attributes measure quality in reference to criteria that are determined by the situation. For example, the timeliness of a given data element indicates the extent to which the information is received in a time that is appropriate for its intended use. What is appropriate is context dependent. In some contexts a currency of two hours is adequate, whereas in other contexts a currency of two minutes is what is needed. Fitness-for-use attributes allows one to capture information that is context dependent.

Agility Attributes measure the aspects of agility across the six dimensions. These attributes inherently are comparative, i.e., agility implies an ability to change over time and, as such, the values of the metrics for these attributes have to be compared to some baseline values.

Concept Specific Attributes measure unique aspects of some concepts. For instance, synchronicity is an attribute of the Quality of Interactions concept that measures the extent to which C2 processes are effective across time (synchronous vs. asynchronous) and space (collocated vs. distributed). This attribute is appropriate in determining the extent to which elements in a C2 organization can interact simultaneously in time and space but is not necessarily relevant to other concepts.

In the same way that we ported the NCO evaluation concepts to the DDIFS application, these attribute categories can also be ported to DDIFS applicability.

FIGURE 15.3 Concepts-to-quality/degree dimensions-to-attributes and measures/metrics. (Adapted from Garstka, J. and Alberts, D., Network Centric Operations Conceptual Framework Version 2.0, U.S. Office of Force Transformation and Office of the Assistant Secretary of Defense for Networks and Information Integration, Vienna, VA, 2004.)

15.2.3 FUSION ESTIMATES AND TRUTH STATES

Information fusion can generally be thought of as an association and estimation process, yielding estimates ranging from attributes of an entity to an estimate of a complex, dynamic, multi-entity situational picture. The entities can be physical objects, events, relationships, and courses of action (COAs). The entity estimates are based upon the association of the data together over space, time, type, etc. as an entity. A rational basis for evaluating the performance of such an estimation process is to compare the estimates to the underlying truth states of either the entity attributes or situations, or whatever estimation product is sought from the fusion process. When the fusion-based estimation process involves multiple entities of various types (physical objects, events, behaviors, informational entities, etc.), there can be a combinatoric complexity in determining which fused estimate should be compared with which truth entity; this is an issue that is known in the fusion community and typically called the “track-to-truth” problem, as the question arose in the application to evaluating multitarget tracking systems. The problem gives rise to essentially a separate fusion type problem, in which an adjunct data association function is required to reconcile which estimate-to-true associations are correct in order to support subsequent computation of estimation errors. These PE functions are part of the Level 4 Process Assessment Fusion which lies outside the L0–3 Fusion SUT.

FIGURE 15.4 Multiple data fusion/association processes in fusion process performance evaluation.

The idea is shown in Figure 15.4, for a multitarget tracking application. On the top we have the SUT fusion process (where we have removed the data preparation common referencing processes for both SUT and PE for simplicity), involving the traditional three-step data association process supporting the production of the SUT state estimates and computed multitarget tracks. Below we have the PE data association process that determines, using an association score and an assignment algorithm, which SUT tracks should be compared to which truth tracks. The assertion of these associations is a core functionality that supports, in turn, the computation of evaluation metrics. It can be seen that, the specific details and nature of the evaluation metrics are clearly interdependent on the methodological details of the PE data association process. Such considerations become yet more complex in the distributed fusion (DDIFS) application since there are different state estimates being produced at various nodes and shared and further fused across the intermodal network.

The DNN technical architecture specified PE Level 4 fusion node is shown in Figure 15.5.

15.2.4 NOTION OF A PERFORMANCE EVALUATION TREE

These PE nodes are processed in networks (e.g., trees) that are interlaced with the SUT DF&RM nodes such as shown in Figure 15.6 for RF, electronic support measures (ESM), and EO/IR SUT fusion nodes. These PE nodes can also be interlaced with PM nodes that manage how these nodes are applied over time, mission, space, etc.

FIGURE 15.5 Exploded view of PE node processes.

FIGURE 15.6 PE process in context of data fusion and resource management architecture.

The DNN technical architecture helps to break the PE and PM processes into more manageable design steps as follows for a PE process architecture.

Step 1 is PE role optimization. This step defines the role for PE as a blackbox to include all its inputs and outputs and measures of success. An example of the role for PE in a T&E system is shown in Figure 15.7. In this example, there is a SUT with multiple subsystems, with the fusion subsystem being only one of them. The role for fusion here is to support certain SUT Effector systems such as weapon systems, and to support a user, say a pilot. The evaluation focus is on the fusion system in this particular context or role only. This is also the step where baseline accuracy and timing MOPS based on PE requirements are established. By providing an objective evaluation, the MOPS help

FIGURE 15.7 Sample role for PE in a T&E system.

• Determine whether the fusion algorithms meet engineering and operational requirements

• Compare alternative algorithms and approaches

• Optimize configuration parameters for a given algorithm

Step 2 is PE fusion network optimization. This step determines how to divide and conquer the PE problem over space, mission, time, etc.

Step 3 is the PE fusion node optimization. While the details of the PE algorithms are tailored to each data analysis step, the overall process follows the canonical fusion node steps, namely data preparation, data association, and state estimation as described earlier. Examples of these processes are given in the remaining sections in this chapter.

Building upon the previous remarks, it can be seen that in a DDIFS there are estimates being produced at different nodes and these estimates are also evolving in time. In turn, additional fusion operations are occurring at certain receiving nodes that combine the estimates sent to them from various sending nodes. Thus, there is also a temporal dimension to the T&E functions (true for most fusion processes whether distributed or not, so long as the problem space is dynamic), and there can be a need to compute both the evolving real-time performance as well as to compute cumulative performance. Hence, in the same way that a typical fusion process can be viewed as a kind of tree or in general a network (see Bowman [1994]), one can also envision a PE Tree, as introduced earlier. The PE Tree will have various computational modules, nodes that accumulate evaluation-related computations, and a network of such nodes that gather the computations in a coordinated way according to the PE Tree design, framed to satisfy the overall role of the PE/T&E process for the evaluation process. As one example, the PE nodes could be the places where, for example, the evaluation calculations for a given platform in a multiplatform DDIFS system are gathered. The rationale for arranging or batching the PE nodes can be drawn from the same considerations given to batching fusion processes in multi-sensor type systems; this idea is shown in Figure 15.8, where one could think of batching the PE nodal processes according to individual sources or sensor—these could also be thought of as nodes of a given type in a DDIFS—or according to a PE sampling-time—or according to important events from an evaluative point of view. Conceptually, any of the nodes in a PE Tree can be performing the SUT-to-Truth calculations shown in Figure 15.4.

A simple, time-batched PE Tree is shown in Figure 15.9 (Rawat 2003) for a notional simple three-node DDIFS performing target tracking, being tested in a simulation environment. At each time slice, the network simulation data are sent to the fusion/tracking nodes according to whatever data-to-node and internodal communication protocol exists (these details not shown), and each node computes its track estimates accordingly. As mentioned earlier, the PE process for each node would have a track-to-truth association process that determines the local best associations for PE at the given time, according to whatever MOPS are being used. It is typical that cumulative performance would want to be computed, and separate PE functions perform these separate calculations as shown. This is a simple case but it can be appreciated that PE Tree (network) design can be relatively complex for more complex network topologies; and when the network information flow protocols (the ISSs discussed previously) are more complex, along with further complexities such as separate local and network fusion operations being done at any network node, a fair (accurate, unbiased) yet affordable PE is needed. Engineering guidelines for achieving the knee-of-the-curve in PE “fairness versus complexity” have been developed based upon the DNN technical architecture. Namely,

FIGURE 15.8 Alternative strategies for computing evaluative metrics in a DDIFS PE process.

FIGURE 15.9 Notional time-based PE Tree.

• The PE solution space is organized as a network of PE functional nodes.

• Each PE node performs fusion and truth data preparation, data association, and MOP state estimation.

In addition, the PE process “fairness” can be improved by the Level 4 PM function as described in Haith and Bowman (2010). For example, the distributed fusion test article and the PE process functional parameters can be optimized for each test scenario to insure a “fair” comparison of alternative distributed fusion systems under test. This optimization can be based upon the top level Measure of Success (e.g., probability of meeting mission requirements) or on the selected MOPS. In this latter case, a “Pareto optimal front” (Haith and Bowman 2010) of parameter values can be derived (i.e., a boundary in parameter space where all other parameters values will yield a lesser performance in at least one MOPS).

15.2.5 COMPLEXITIES IN ERROR AUDIT TRAILS

T&E is performed in part to understand the causes of errors and it is typical that an error audit trail would be developed to understand where improvements can be or need to be made, i.e., to discern the error-producing operation and how to repair it. In the same way that pedigree metadata tags are needed for certain DDIFS fusion functions, it may be necessary to incorporate pedigree tagging to track certain network processing operations for the purpose of error tracking. At the design level, there is both a complexity and tension in developing an optimized DDIFS design between the two major functions of a DDIFS: the nodal fusion operations and the network ISS. Similarly, tracking causal errors is also problematical since, for example, the fusion processes at receiving nodes can only operate on data sent to them, so asserting a cause of a fusion deficiency can be difficult; that is, determining if there was a lack of appropriate data sent to a node or a defect in the nodal association/estimation processes can be difficult and if nothing else adds to T&E analysis complexity. This is not very different than the error audit trail complexities in other fusion systems that have any type of adaptive operation such as dynamic sensor management.

15.2.6 FORMAL EXPERIMENTAL DESIGN AND STATISTICAL ANALYSES

There is usually little argument that any fusion process produces estimates and that those estimates have a stochastic character. This is because, in the strictest sense, the inputs to the fusion processes are the statistically noisy sensor or other data having stochastic properties. These features have yet other implications for the T&E methodology, namely that the stochastic nature of the process needs to be recognized and dealt with in any T&E approach. At least when conducting any simulation-based T&E, this implies that (1) the experiments should be designed through the employment of the methods of statistical experimental design (a.k.a. design of experiments or “DOE”) and (2) in conjunction with this that Monte Carlo based replications of any given test condition should be done. Further, given the execution of such planned experiments, the analysis processes would employ methods that can frame the statistical quality of the analysis results, such as methods from analysis of variance or ANOVA, as well as other formal statistical techniques.

It is recognized by the way that such rigor comes at a price, even when using simulations, and especially when doing field tests and the like. It is likely that there has been limited application of these formal methods because of the cost implications. However, DDIFSs are used in life-critical and other important applications, and it would seem that the cost of rigorous testing is a price that should be paid to assure that the best understanding of system performance is being achieved. It is only through the use of such methods that assertions about the computed metrics can be made with statistical confidence. These remarks are not only applicable to DDIFSs but also to any fusion system.

At any given phase in both the SUT fusion process design and the PE Tree process design, there is the consideration of the independent variables or Factors in that layer’s design, and the Effects of each of those Factors, or perhaps even the composite Effect of certain Factor combinations that might be of interest to evaluate.^* It is convenient to think of Factors as independent variables related in part to the “problem space” (e.g., in tracking problems these can be related to the nature and behaviors of the targets, or also the tracking environment, meaning both weather (which affects the nature of sensor observations used for tracking) or clutter, such as the nature and extent of “confuser” objects, etc. (we mean this in the wide sense). Factors or independent variables can also be related to the “solution space,” meaning the Factors that affect the performance of particular fusion algorithms (e.g., the nature and number of models in an interacting multiple model tracker), or for the case of a DDIFS, the choice of topological structure. Thirdly, and peculiar to the nature of the overall PE process being suggested here, there are Factors involved in the PE approach itself, such as the choice of technique for track-to-truth assignment, or the Factors upon which a specific PE Tree might be partitioned, etc. Thus, in this overall approach, there are three classes of Factors around which the PE process revolves: Problem-space Factors, Solution-space Factors, and PE process Factors. Said otherwise, Factors are those parameters whose influence on performance is sought; in nonfusion applications the PE Factors would not normally be present, but note that now we have a new class of Factors of interest. The influence of any Factor on a performance/effectiveness measure is labeled as the “Effect” in the statistical literature, and is in essence defined explicitly by a given measure or metric. The notion of an Effect can be thought of as the change in response (i.e., in an MOP) resulting from a change in the level of that Factor. For example, we might inquire as to the Effect of a change in SUT nodal tracker type from Kalman to Alpha-Beta on a given MOP, or the difference in an MOP resulting from different inter-target spacing. At any given level of a Factor, we conduct a number of Monte Carlo replications, so we really examine whether there is a statistically significant difference on a mean MOP value resulting from these changes in factor levels, and whether the results really reflect the significance of a statistical hypothesis test. It can happen of course that combinations of Factors can cause Effects; this is called an “interaction” in the statistical literature. Interactions among Factors can occur in a combinatoric sense; if there are three Factors say, then there are three 2-way interactions and one 3-way interaction (ab, ac, bc, and abc, assuming order is unimportant, as is usually done).

What is sought in determining a PE approach is a statistically sound yet cost-effective way to gather the metrics and/or measures. The statistical DOE is a formal and highly quantitative way to develop a test plan that gathers the metrics in a provably cost-effective manner. That is, a DOE-based test or experimental plan extracts the maximum statistically significant information from the minimum number of test runs. DOE is a quite-mature area of study in the field of statistics, and its specific use to perform the PE function in the overall PE Tree methodology can yield the best rigorous framework for T&E.

We believe there are two major reasons for formal experimental designs and formal methods of data analysis: statistical validation of a nominated DDIFS fusion solution for some important real-world application or statistical validation of some knowledge gained about fusion processes in a range of applications for the advancement of science in an in-depth sense (i.e., “laws” as validated, explainable empirical generalizations). This latter rationale can in fact be important for empirically learning design laws for DDIFSs, and we argue in fact that the only way to develop design guidelines for DDIFSs is empirically, due to the combinatorial complexities in choosing design variables.

Designed experiments reflect a notion of a phased learning process, in which a succession of hypotheses are confirmed or denied and knowledge is gained sequentially. The need for a phased process is typically driven by the “curse of dimensionality” and the qualification problem, i.e., that there are too many Factors whose Effects need to be understood or isolated, so that a divide-and-conquer type approach must be employed to achieve in-depth understanding. The details of a phased approach, i.e., the staging of hypotheses of inquiry, are a case-dependent choice, and are of course influenced by the stage-by-stage outcomes.

The dominant analysis methodology for statistically designed experiments is analysis of variance or ANOVA. ANOVA is an analysis technique that determines whether the mean values of an MOP or MOE, for the several “treatments” or set of experimental conditions (as depicted in the Factor-level combinations of the set of independent variables both in the problem-space and the solution [or fusion-process]-space) are equal or not, by examining the estimated population variances across these conditions, often using Fisher’s F-statistic (the “test statistic” can change in various cases). The treatments can be the result of changing problem-domain independent variables or design-domain (fusion process) variables or PE design variables, and/or the associated levels of each variable, or, as noted previously, the Factors that influence the nature of the PE approach. The F-statistic is based on the estimates of the population variance as drawn from the sample variance of the data. ANOVA basically compares two estimates of this variance, one estimate drawn from the variance exhibited within (all) treatment conditions. That is, for any given treatment, say a given tracker design for a given problem condition, the variance of, say, position errors across the “n” Monte Carlo replications for this treatment condition is a “within treatment” variance, and only exists because of the collective errors in this tracker estimation process. As a result, this variance is called the variance due to error in the statistical DOE literature. When these within-treatment variances are properly pooled across all treatments of the experiment,^* they form a pooled estimate of the (supposedly) common variance within each of the treatments. The other estimate is drawn from the variance exhibited between (all) treatment conditions—if we were concerned with position error, for example, this would be the variance of the mean position errors of two different trackers from the global mean position error. However, these two estimates of variance are only equal if in fact there is no difference in position error variance for each tracker. The ANOVA process and the F-statistic are the means by which a hypothesis test that in effect tests the equality of these variance estimates is performed.

When employing DOE test-planning methods, one issue that can arise is the complexity involved in designing efficient test plans if there are many independent variables (Factors) whose Effects on the DDIFS process under test or SUT want to be known. Using traditional DOE experimental designs, the number of runs that have to be made will grow exponentially when the number of Factors is large, and the number of “levels” (specific value settings of the Factors) is large; these go as the number of levels raised to the number of Factors, or L^F. This exponential growth is associated with the type of experimental design being employed, called a “factorial” design, which not only allows the so-called main effects to be discerned from the experiments but also what are called “interaction” effects, where knowledge is gained about the Effects on the metrics of interest due to interacting Effects among the Factors. A representative “2^k” factorial DOE design of test runs for a case involving studying the Effects of target maneuverability, tracker type, target spacing, track-truth association technique, and error in truth tracks is shown in Table 15.2; recall these combinations of test run conditions represent the most cost-effective strategy to gain the information desired.

If the desire to learn about the interaction Effects is relaxed, using a type of experimental design called a “fractional factorial” design, the severity of the exponential growth is lessened but can still be an issue to deal with. One notion of a phased but still DOE-based approach is shown in Figure 15.10, where the fractional designs are used initially as a screening step to determine those Factors which are most influential on the metrics, and then the factorial designs to better understand the main and interaction Effects of the key variables and, if necessary, what are called “response surface” methods to understand the broad Effects of the Factors across the levels of interest for the application.

TABLE 15.2
Table Showing DOE Experimental Design for Two Levels of Each Factor (“2^k Factorial Design”)

Various alternative strategies may be possible, since there are also many types of DOE techniques, each designed for environments involving varying numbers of Factors and where prior information/knowledge may suggest the level of concern for expected interaction Effects. In Kleijnen (2005), a plot of DOE technique (many of these have person’s names associated with them) for different magnitudes of the number of Factors and the expected degree of Factor interactions (or analysis complexity, shown here as response surface complexity) is presented, shown here as Figure 15.11. So, in addition to a phased/layered approach as in Figure 15.10, a direct approach using these special DOE designs can be an alternative approach.

FIGURE 15.10 Notional layered experimental design/DOE strategy for large numbers of Factors and levels.

FIGURE 15.11 Suggested DOE strategies according to numbers of factors and interaction complexity. (Adapted from Kleijnen, J.P.C. et al., INFORMS J. Comput., 17(3), 263, 2005.)

15.3 SUMMARIZING IMPACTS TO AND STRATEGIES FOR DISTRIBUTED FUSION SYSTEM T&E

Much more could be said about these various high-level thoughts regarding how to approach the topic of T&E for distributed fusion systems; there is a large body of literature that can be accessed to further explore the ideas offered here as well as yet more issues on this topic. It can be seen that there are some very basic issues that need to be addressed; just defining the test article or the SUT may not be so easy. In the practical world where a team of contractors may have come together to build a DDIFS, establishing responsibilities for various parts of a DDIFS during the T&E phase, and understanding causal effects and audit trails of errors to determine responsibility (and imputed costs) for corrective actions can be problematical and can create complexities in writing equitable contracts. The Network Centric Operations Conceptual Framework of Garstka and Alberts (2004) forms one reasonable basis from which to develop top-level ideas on DDIFS T&E, but as always the devil is in the details. Many of the subtleties such as the fused estimate-to-truth association issue and the various statistical aspects discussed here are often not adequately addressed in much fusion literature. Any given R&D or development program of course only has a given number and amount of resources, and the role for and value of the T&E phase of the program has to be weighed in terms of overall cost-effectiveness, but the ramifications of poor/inadequate T&E lead to poor transition and receptivity of any fusion prototype. The worse outcome of course is that poor/inadequate T&E results in some type of disastrous outcomes, possibly involving loss of life.

15.4 REMARKS FROM A DDIFS USE CASE

This section describes the ideas and a number of details of the project the authors were involved with for the U.S. Edwards Air Force Base (EAFB) that formed a basis for T&E of advanced tactical fighter aircraft that had integrated Information Fusion capabilities and were linked to concepts of employment that set them in a networked/distributed mission context. EAFB is nominally a DT&E test facility, but staff there have agreed that there is an issue in DDIFS applications as to the atypical nature of DT&E and the tendency toward what is more like an OT&E test environment, as we have previously remarked. EAFB is a large test range in the California desert where prototype tactical aircraft are tested in near-operational conditions. To explore some of the T&E issues and ideas, a simple use case involving two friendly aircraft in a test scenario was defined; each platform has three on-board sensors: Radar, ESM, and IRST (infrared search and track). The focus was on target tracking and threat estimation or fusion Levels 1 and 2 type capability in a two-node network where the aircraft exchanged tracking estimates, as shown in Figure 15.12; “CTP” in the figure means common tactical (track) picture.

The problem scenario was suggested by staff at EAFB and comprised a two versus six offensive sweep problem as shown in Figure 15.13. During the scenario, there are simulated missile launches and various flight dynamics emulating a plausible scenario of this type. The PE Tree for this problem was defined to have seven PE nodes, performing the following evaluative operations: (1) three individual sensor nodes, two ownship nodes (friendlies only), one distributed fusion track-to-truth PE node, and (2) one internetted platforms track-to-track PE node. The PE nodes described in Section 15.2.4 perform three necessary fusion functions:

FIGURE 15.12 Use case two-aircraft configuration.

FIGURE 15.13 Two versus six offensive sweep scenario.

(1) data preparation, (2) data association, and (3) estimation of the metrics or MOPS. During data preparation the PE node puts tracks and truth information in [x, y] coordinates and common time. Data association performs deterministic track-to-truth association and track-to-track association.

In this case study there are two platforms which have their own view of the truth picture called “common” and “unique” pictures. The common tracks are seen by both the platforms while unique tracks are uniquely seen by platforms 1 and 2. From the point of view of supporting the tactical mission, one critical issue of course if whether there is a consistent “track picture” across the two aircraft. It can be seen in Figure 15.12 that it is typical that there are differences in the local target track pictures on each platform which need to be reconciled for mission application. In this study then, one focus of analysis was the fused track picture consistency as a function of certain factors, looking at both track-to-truth and track-to-track consistency metrics. Each of the platforms exchange their track files and data fusion is done upon receipt of this information at each platform. We explain below how this information is exchanged where we assume that there are no bandwidth limitations in communication. The baseline distributed fusion output is the consistent tactical picture (CTP). The sensor track file “consistency” is computed at each time point as the average over time of the percentage of matching CTP tracks in the track files of each platform. In addition to these, the following metrics are computed:

1. Track-to-track consistency

2. Track-to-truth consistency

3. Percentage of tracks from first (or second) platform that are not associated with truth tracks (PFT) (this is just a track-to-truth MOP)

4. Percentage of tracks from first platform that is not associated with tracks from second platform (PFT1)

5. Percentage of tracks from second platform that is not associated with tracks from first platform (PFT2)

6. The average number of standard deviations of error in the associated tracks at each time point

7. The average location error standard deviation of associated tracks at each time point

8. Percentage of correct classification for both platforms

9. Range to correct ID for both platforms

In addition to the above consistency PE metrics, the corresponding performance metric of each of the platform track files relative to truth is computed.

In relation to Figure 15.12, we defined three “tiers” of processing:

• Tier 0: here, each friendly platform generates fusion-based but sensor-specific tracks; that is, each of the radar, ESM, and IRST sensor data streams are locally associated and used to generate tracks.

• Tier 1: here, the above sensor-specific tracks are associated (track-to-track association) and fused, but this fused picture is still local to the “ownship,” or unique tracks as seen by the particular friendly aircraft.

• Tier 2: here, each of the ownship Tier 1 track files are fused at each Tier 1 track file update time (again, track-to-track association and fusion).

Within this framework, we defined a simple but executable statistical experimental design or DOE (partially driven by scope limitations) that was a 2^k type full factorial design with three main Factors with two levels each, as shown in the following:

DOE Factors and Levels:

		Levels
PE factors	Association algorithm	Vogel	Hungarian
	gating factor	approximation 3	algorithm 5
SUT design factors	Gating factor	5	15

1. PE Factors:

a. Two alternative track-to-truth association schemes

b. Two alternative association gate sizes

2. SUT design factors:

a. Two alternative association gate sizes

The factorial experiment is analyzed using ANOVA from the MINITAB statistical analysis package. The factors and interactions that are significant for various MOPS are denoted by “S” in Table 15.3. In Tier 0, we have three sensors on two platforms and they do not fuse any data within or across platform. Hence we have to only analyze track-to-truth associations for each of the MOPS. The summary of the results is shown in Table 15.3. For each MOP we have the normal probability plot and Pareto chart which recapitulates the significant factors. Then for the significant factors we plot the main effects plot which tells us how the change in factor affects the MOP. For the significant interactions we plot the interaction plot which shows the effect of change in factor level combination on MOPS. After taking a look at the summary Table 15.3, we can say that SUT design gating factor is comparatively more significant than PE gating factor and PE association algorithm. SUT design gating factor appears to be a significant factor in nearly all the Tier 0 DOE runs. So at Tier 0 we must be sensitive toward selection of SUT design gating factor.

In Tier 1, we have three sensors on two platforms and they fuse data within platform (not across platform). So we have to analyze track-to-truth and track-to-track associations for each of the MOPS. The summary of the results is also shown in Table 15.3. After taking a look at the summary Table 15.4, we can say that all the three factors are very significant. All the three factors appear to be significant in nearly all the Tier 1 DOE runs. The interaction between SUT design gating factor and PE gating factor is mostly significant for all the MOPS.

In Tier 2, we have three sensors on two platforms and they fuse data within and across platforms. So we have to analyze track-to-truth and track-to-track associations for each of the MOPS. The summary of the results is shown in Table 15.5. After taking a look at the summary Table 15.5, we can say that none of the three factors are significant. In this case only some of the two- and three-way interactions are significant, which suggests that fusing data across platforms reduces the discrepancies in the input data.

In addition to these DOE runs, we ran another set of full factorial runs to see the effect of communication tiers on the various MOPS. We added another factor, (D) Tier, with two levels: Tier 1 and Tier 2. Table 15.6 shows the significant factors and their interactions for the various MOPS. Table 15.6 shows that factor D is significant for all the MOPS, which confirms the intuitive result that fusing data across platforms reduces the input data inconsistency.

TABLE 15.3
Tier 0 Analysis of Variance (ANOVA) Results

TABLE 15.4
Tier 1 Analysis of Variance (ANOVA) Results

15.5 SUMMARY AND CONCLUSIONS

Moving information fusion processes and algorithms into the context of a distributed or networked architecture has many potential operational benefits but can add considerable complexity to the framing of a T&E activity. This chapter has offered some discussion on these complicating factors, to include

• The fundamental question of defining what is being tested, i.e., the test article or system under test

• The fuzzification of the boundary between DT&E and OT&E

• The question of functional boundaries between application functions or services and the generic-service infrastructure, for example, in an SOA

• The question of degree of investment in supporting test facilities and simulation environments

• The need to think about architecting a PE Tree structure to support analysis of the various and many types of functions, processes, and metrics involved in DDIFSs

• The challenge of employing statistically rigorous experimental designs and post-test data analysis techniques to improve the statistical sophistication, but more importantly the effective and efficiency insights into how a DDIFS is functioning

and some other considerations. The user community and R&D community need to come to grips with the challenges of DDIFS T&E, and to carefully examine how to allocate funding and resources to find a best cost-effective path through these challenges in achieving a “fair” PE system. In support of this, we offer the DNN technical architecture providing problem-to-solution space guidance for developing distributed DF&RM PE systems as a Level 4 fusion process that includes PE functional components, interfaces, and engineering methodology, see also Bowman (2004), Bowman and Steinberg (2001), Steinberg et al. (1999), Haith and Bowman (2010), and Bowman et al. (2009).

TABLE 15.5
Tier 2 Analysis of Variance (ANOVA) Results

TABLE 15.6
Inter-Tier (Tiers 1 and 2) Analysis of Variance (ANOVA) Results

REFERENCES

Blasch, E.P. 2003. Performance metrics for fusion evaluation. Proceedings of the MSS National Symposium on Sensor and Data Fusion (NSSDF), Cairns, Queensland, Australia.

Blasch, E.P., M. Pribilski, B. Daughtery, B. Roscoe, and J. Gunsett. 2004. Fusion metrics for dynamic situation analysis. Proceedings of the SPIE, 5429:428–438.

Blasch, E.P., P. Valin, and E. Bossé. 2010. Measures of effectiveness for high-level fusion. International Conference on Info Fusion—Fusion10, Edinburgh, U.K.

Bowman, C. 1994. The data fusion tree paradigm and its dual. Proceedings of 7th National Symposium on Sensor Fusion, Invited paper, Sandia Labs, NM, March.

Bowman, C.L. 2004. The dual node network (DNN) DF&RM architecture. AIAA Intelligent Systems Conference, Chicago, IL.

Bowman, C.L. 2008. Space situation awareness and response testbed (AIAA Intelligent Systems Conference, Chicago, IL) performance assessment (PA) framework, Technical Report for AFRL/RV, January.

Bowman, C.L. and A.N. Steinberg. 2001. A systems engineering approach for implementing data fusion systems. In Handbook of Multisensor Data Fusion, D. Hall and J. Llinas, J. (Eds.), Chapter 16. Boca Raton, FL: CRC Press.

Bowman, C., P. Zetocha, and S. Harvey. 2009. The role for context assessment and concurrency adjudication for adaptive automated space situation awareness. AIAA Conference Intelligence Systems, Seattle, WA.

Durrant-Whyte, H.F. 2000. A beginner’s guide to decentralized data fusion. Technical report, Australian Centre for Field Robotics, University of Sydney, Sydney, New South Wales, Australia.

Garstka, J. and D. Alberts. 2004. Network Centric Operations Conceptual Framework Version 2.0, U.S. Office of Force Transformation and Office of the Assistant Secretary of Defense for Networks and Information Integration, Vienna, VA.

Gelfand, A., C. Smith, M. Colony, and C. Bowman. 2009. Performance evaluation of distributed estimation systems with uncertain communications. International Conference on Information Fusion, Seattle, WA.

Haith, G. and C. Bowman. 2010. Data-driven performance assessment and process management for space situational awareness. Journal of Aerospace Computing, Information, and Communication (JACIC) and 2010 AIAA Infotech@Aerospace Conference, Atlanta, GA.

Hall, D.L. and J. Llinas, Eds. 2001. Handbook of Multisensor Data Fusion. Boca Raton, FL: CRC Press.

Kleijnen, J.P.C. et al. 2005. State-of-the-art review: A user’s guide to the brave new world of designing simulation experiments. INFORMS Journal on Computing, 17(3):263–289.

Liggins, M.E., D.L. Hall, and J. Llinas, Eds. 2009. Handbook of Multisensor Data Fusion: Theory and Practice, 2nd edn. Boca Raton, FL: CRC Press.

Llinas, J. et al. 2004. Revisiting the JDL data fusion model II. International Conference on Information Fusion, Stockholm, Sweden.

Rawat, S., J. Llinas, and C. Bowman. 2003. Design of a performance evaluation methodology for data fusion-based multiple target tracking systems. Presented at the SPIE Aerosense Conference, Orlando, FL.

Steinberg, A. and C.L. Bowman. 2004. Rethinking the JDL data fusion levels, National symposium on sensor and data fusion (NSSDF). Johns Hopkins Applied Physics Lab (JHAPL), Laurel, MD.

Steinberg, A.N., C.L. Bowman, and F.E. White. 1999. Revisions to the JDL data fusion model. Proceedings of SPIE Conference Sensor Fusion: Architectures, Algorithms, and Applications III, 3719:430–441.

Utete, S. 1994. Network management in decentralized sensing systems. PhD thesis, The University of Oxford, Oxford, U.K.

* Sensibly every sensor has embedded thermal noise and other factors that attach randomness to the measured/observed data obtained by the sensor; much system design knowledge is imperfect and draws from world models that have inherent stochastic features, and some such knowledge is drawn from knowledge elicitation from humans that of course involve imperfect and random effects.

* We capitalize the words “Factor” and “Effect” purposely here as we are soon to introduce the language of statistically designed experiments and the associated analysis processes.

* Meaning, in a two-tracker comparative experiment, the within-treatment variances for both trackers across the varied problem conditions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 15 Test and Evaluation of Distributed Data and Information Fusion Systems and Processes

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 15 Test and Evaluation of Distributed Data and Information Fusion Systems and Processes