Chapter 8 Toward a Theoretical Foundation for Distributed Fusion

Measurement-to-track fusion (MTF) refers to the process of collecting measurement data and then using it to improve the accuracy of the most recent estimates of the numbers and states of targets. Over the last two decades, both the theory and the practice of MTF have become increasingly mature. But, in parallel, another development has occurred: the increasing prevalence of physically dispersed sensors connected by communications networks, ad hoc or otherwise. One response to this development might be to try to apply MTF techniques to such situations. But because transmission links are often bandwidth-limited, it is often not possible to transmit raw measurements in a timely fashion, if at all. Consequently, emphasis has shifted to the transmission of track data and to track-to-track fusion, hereafter abbreviated as “T²F.” Most commonly, the term “track data” refers to target state estimates and their associated error-covariance matrices—as supplied, for example, by a radar equipped with an extended Kalman filter (EKF). T²F refers to the process of merging single- or multi-target track data from multiple sensor sources, with the aim of achieving more accurate localization, increased track continuity, and fewer false tracks.

T²F is fundamentally different than MTF. In particular, it cannot be addressed by processing tracks in the same way as measurements. Both MTF theory and practice are commonly based on two independence assumptions. First, measurements are statistically independent from time-step to time-step. Second, measurements generated by different sensor sources are statistically independent.

However, single-target track data is the consequence of some recursive filtering process, such as an EKF, and consequently is inherently time-correlated. If it is processed in the same way as measurements, spuriously optimistic target localization estimates will be the result. “Tracklet” approaches [1], such as inverse Kalman filters, decorrelate tracks so that they can be processed in the same way as measurements. However, such techniques cannot be effectively applied when targets are rapidly maneuvering, since decorrelation must be performed over some extended time-window.

Furthermore, multisource track data (like multisource measurement data) in distributed networks can be corrupted by “double counting” [2]. A simple example: data from node A is passed to nodes X and Y, which then pass it to node B. If node B processes this data as though it were independent, then spuriously optimistic target localization will again be the result. Many T² fusion solutions have been devised for networks with pre-specified topologies. But such methods will not be applicable to ad hoc networks. “Pedigree” techniques have been proposed to address this challenge, by having every node “stamp” the tracks with suitable metadata before passing them on. In a large network, however, accumulated metadata can eventually greatly exceed the size of the track data that it documents. This problem can be sidestepped through node-to-node querying methods—but at the cost of increased bandwidth requirements. (A more practical difficulty: the large number of legacy networks makes it unlikely that any pedigree convention is likely to be accepted, standardized, and implemented across all or even some of them.)

In part because of such issues, T²F theory is probably as underdeveloped now as MTF theory was two or three decades ago. The goal of this chapter is to try to remedy this situation by proposing the elements of a general theoretical foundation for T²F, building on ideas that I first suggested in 2000 [3]. These ideas have recently been greatly refined, especially by Daniel Clark and his associates [4–6].

The methodology will be the same as that which I have previously applied to MTF and which has been described in Statistical Multisource-Multitarget Information Fusion [7]:

1. Model an entire multisensor-multitarget system as a single, evolving stochastic process using the theory of random finite sets.

2. Formulate an optimal solution to the problem at hand—typically in the form of some kind of multisource-multitarget recursive Bayes filter.

3. Recognize that one way to accomplish this is to find an optimal solution to the corresponding single-sensor, single-target problem and then generalize it to the multisensor-multitarget case.

4. Recognize that this optimal solution will almost always be computationally intractable, and thus that principled statistical approximations of it must be formulated.

The principled approximation methods that I have most frequently advocated are as follows:

1. Probability hypothesis density (PHD) filters, in which the multitarget process is approximated as an evolving Poisson process [7, chapter 16].

2. Cardinalized PHD (CPHD) filters, in which it is approximated as an evolving identically, independently distributed cluster (i.i.d.c.) process [7, chapter 16].

3. Multi-Bernoulli filters, in which it is approximated as an evolving multi-Bernoulli process [7, chapter 17].

In what follows I will consider only the first two approximation methods, which will be applied to three successively more difficult multisource-multitarget track fusion challenges:

1. Exact T²F of independent track sources.

2. Exact T²F of track sources with known double-counting.

3. Approximate T²F of track sources having unknown correlations, using multitarget generalizations of Uhlmann and Julier’s covariance intersection (CI) approach.

In each of these cases I proceed by formulating a general approach to multisourcemultitarget T²F and then by deriving more computationally tractable approximations using CPHD and PHD filters in the manner proposed by Clark et al.

The chapter is organized as follows:

1. Section 8.2: Review of single-target T²F theory.

2. Section 8.3: Review of those aspects of finite-set statistics (FISST) required to understand the chapter.

3. Section 8.4: Direct generalization of single-target T²F to multitarget T²F.

4. Section 8.5: Approximation of this general approach using CPHD and PHD filters.

5. Section 8.6: A discussion of possible implementation approaches.

6. Section 8.7: Mathematical derivations.

7. Section 8.8: Summary and conclusions.

8.2 SINGLE-TARGET DISTRIBUTED FUSION: REVIEW

In this section, I summarize some major aspects of single-target T²F that will be required for what follows:

1. Section 8.2.1: The single-target recursive Bayes filter is the foundation of the material in this section. I summarize the basic elements of this filter and define the concept of a “track” in general.

2. Section 8.2.2: Single-target T²F when the track sources are independent. Approach: the track-merging formula of Chong et al. and its special case, Bayes parallel combination.

3. Section 8.2.3: Single-target T²F when the track sources are dependent because of known double-counting. Approach: the generalized track-merging formula of Chong et al.

4. Section 8.2.4: Single-target T²F when the track sources are linear-Gaussian but their correlations are completely unknown. Approach: the CI method of Uhlmann and Julier.

5. Section 8.2.5: Single-target T²F when the track sources are arbitrary and their correlations are completely unknown. Approach: Mahler’s generalized CI method, rechristened by Julier and Uhlmann as “exponential mixture” (XM) fusion.

8.2.1 SINGLE-TARGET BAYES FILTER

The approach in this section is based on the Bayesian theoretical foundation for single-target tracking, the single-target Bayes nonlinear filter (see Chapter 2 of [7]). This filter propagates a Bayes posterior distribution f_{k | k}(x | Z^k) through time

$\dots \to f_{k | k} (x | Z^{k}) \overset{predictor}{\to} f_{k + 1 | k} (x | Z^{k}) \overset{corrector}{\to} f_{k + 1 | k + 1} (x | Z^{k + 1}) \to \dots$

(8.1)

where

x is the single-target state-vector

Z^k: z₁,…,z_k is a time-sequence of measurements collected by the sensor at times t₁,…,t_k

The Bayes filter presumes the existence of models for the sensor and for the presumed interim target motion, for example the additive models

$\begin{matrix} X_{k + 1 | k} = φ_{k} (x) + W_{k}, & Z_{k + 1} = η_{k + 1} (x) + V_{k + 1}, \end{matrix}$

(8.2)

where (1) x is the target state, (2) the deterministic motion model φ_k(x) is a nonlinear function of x, (3) W_k is a zero-mean random vector (the “plant noise”), (4) the deterministic measurement model η(x) is a nonlinear function of x, and (5) V_k is a zero-mean random vector (the sensor measurement noise). Given these models one can construct a Markov transition density and likelihood function. For the additive models, for example, these have the form

$\begin{matrix} f_{k + 1 | k} (x | x^{'}) = f_{w_{k}} (x - φ_{k} (x^{'})), & f_{k + 1} (z | x) = f_{V_{k + 1}} (z - η_{k + 1} (x)) . \end{matrix}$

(8.3)

The single-target recursive Bayes filter is defined by the time-update and measurement-update equations

$f_{k + 1 | k} (x | Z^{k}) = \int f_{k + 1 | k} (x | x^{'}) . f_{k | k} (x^{'} | Z^{k}) d x^{'}$

(8.4)

$f_{k + | 1 k + 1} (x | Z^{k + 1}) = \frac{f_{k + 1} (z_{k + 1} | x) \cdot f_{k + 1 | k} (x | Z^{k})}{f_{k + 1} (z_{k + 1} | Z^{k})}$

(8.5)

where the Bayes normalization factor is

$f_{k + 1} (z_{k + 1} | Z^{k}) = \int f_{k + 1} (z_{k + 1} | x) \cdot f_{k + 1 | k} (x | Z^{k}) d x .$

(8.6)

Information of interest—target position, velocity, type, etc.—can be extracted from f_{k | k}(x | Z^k) using a Bayes-optimal multitarget state estimator. The maximum a posteriori (MAP) estimator, for example, determines the most probable target state:

$x_{k + 1 | k + 1}^{MAP} = arg sup_{x} f_{k + 1 | k + 1} (x | Z^{k + 1}) .$

(8.7)

Multisensor, single-target MTF with independent sensors is accomplished by applying Equation 8.5 successively for each sensor. Suppose, for example, that there are s sensors. Their respective, simultaneously collected measurements $\overset{1}{z}, \dots, \overset{s}{z}$ are mediated by likelihood functions ${\overset{1}{f}}_{k + 1} (\overset{1}{z} | x), \dots, {\overset{s}{f}}_{k + 1} (\overset{s}{z} | x)$ . By applying Equation 8.5 first using ${\overset{1}{f}}_{k + 1} (\overset{1}{z} | x)$ and then using ${\overset{2}{f}}_{k + 1} (\overset{2}{z} | x)$ and so on, the measurements $\overset{1}{z}, \dots, \overset{s}{z}$ are not only fused, but differences in sensor noise, sensor geometry, sensor obscurations, etc., are taken into account. Equivalently, one can apply Equation 8.5 to the joint likelihood function

${\overset{1}{f}}_{k + 1} (Z | x) = {\overset{1}{f}}_{k + 1} (\overset{1}{z} | x) \dots {\overset{s}{f}}_{k + 1} (\overset{s}{z} | x)$

(8.8)

where $Z = {\overset{1}{z}, \dots, \overset{s}{z}}$ denotes the set of multisensor measurements.

When motion and measurement models are linear-Gaussian, the Bayes filter reduces to the Kalman filter. Likewise, the multisensor Bayes filter (for independent sensors) reduces to the multisensor Kalman filter. In either case, a “track” can mean any of the following: (1) an instantaneous state-estimate x_{k+1 | k+1}, (2) x_{k+1 | k+1} together with its error covariance matrix P_k_{+1 | k+1}, (3) a labeled time-sequence of state-estimates, or (4) a labeled time-sequence of state-estimates and error covariance matrices.

Remark 1: Since my goal is to develop a more general T²F theory, in what follows a “track” at a particular time-step k will refer to the entire distribution f_{k | k}(x | Z^k), rather than to the estimates x_{k | k} or (x_{k | k}, P_{k | k}) extracted from it. Also, for the sake of notational simplicity, I will typically suppress measurement-dependence and employ the abbreviation

$f_{k | k} (x) \overset{abbr .}{=} f_{k | k} (x | Z^{k}) .$

(8.9)

8.2.2 T²F WITH INDEPENDENT SOURCES

Suppose that a single target is being tracked and that s independent sources, relying on their own dedicated local sensors, provide track data about this target to a T²F site. The jth sensor suite collects a time-sequence $\overset{j}{Z^{k}} : \overset{j}{Z_{1},} \dots, \overset{j}{Z_{k}}$ , where $\overset{j}{Z_{l}}$ denotes the set of measurements supplied by the jth source’s sensors at time t_l. The source does not pass its measurements directly to the fusion site. Rather, it passes the following information:

• Measurement-updated, single-target track data, in the form of posterior distributions ${\overset{j}{f}}_{k | k} (x) \overset{abbr .}{=} {\overset{j}{f}}_{k | k} (x | \overset{j}{Z^{k}})$

• Time-updated, single-target track data, in the form of distributions ${\overset{j}{f}}_{k + 1 | k} (x) \overset{abbr .}{=} {\overset{j}{f}}_{k + 1 | k} (x | \overset{j}{Z^{k}})$

Let $f_{k | k} (x) \overset{abbr .}{=} f_{k | k} (x | Z^{k})$ be the fusion node’s determination of the target state, given the accumulated track data Z^k supplied by all of the sensor sources. Then Chong et al. [2] noted that the fused data at time-step k + 1 is exactly specified by the following track-merging formula:

$f_{k + 1 | k + 1} (x) \propto \frac{{\overset{1}{f}}_{k + 1 | k + 1} (x)}{{\overset{1}{f}}_{k + 1 | k} (x)} \dots \frac{{\overset{s}{f}}_{k + 1 | k + 1} (x)}{{\overset{s}{f}}_{k + 1 | k} (x)} \cdot f_{k + 1 | k} (x)$

(8.10)

$K = \int \frac{{\overset{1}{f}}_{k + 1 | k + 1} (x)}{{\overset{1}{f}}_{k + 1 | k} (x)} \dots \frac{{\overset{s}{f}}_{k + 1 | k + 1} (x)}{{\overset{s}{f}}_{k + 1 | k} (x)} \cdot f_{k + 1 | k} (x) d x .$

(8.11)

This formula also applies to the asynchronous-sensor case. If each source has its own data rate, then the measurement-collection times t₁,…,t_k can be taken to refer to the arrival times of data from all of the sources, taken collectively. If at time t_l only s_l of the sources provide data, then Equation 8.10 is replaced by the corresponding formula for those sources only.

Equation 8.10 is an immediate consequence of Bayes’ rule. Let $f_{k + 1} (\overset{j}{Z} | x)$ be the joint likelihood function for the jth source’s local sensors. Then

$f_{k + 1 | k + 1} (x) \propto f_{k + 1} ({\overset{1}{Z}}_{k + 1} + | x) \dots f_{k + 1} ({\overset{s}{Z}}_{k + 1} + | x) \cdot f_{k + 1 | k} (x)$

(8.12)

and thus Equation 8.10 follows from the fact that ${\overset{j}{f}}_{k + 1 | k + 1} (x) \propto f_{k + 1} ({\overset{j}{Z}}_{k + 1} + x) \cdot {\overset{j}{f}}_{k + 1 | k} (x)$ for all j = 1,…,s.

Suppose, now, that the sources do not pass on their time-updated track data ${\overset{j}{f}}_{k + 1 | k} (x)$ but, rather, only their measurement-updated track data ${\overset{j}{f}}_{k + 1 | k + 1} (x)$ . (This is what happens with radars equipped with EKFs, for example.) In this case, Equation 8.10 can no longer be constructed, and some approximation must be devised.

One approach is to presume that all of the sources employ identical target motion models. That is, the sources’ Markov densities ${\overset{j}{f}}_{k + 1 | k} (x | x^{'})$ are identical to the fusion site’s Markov density: ${\overset{j}{f}}_{k + 1 | k} (x | x^{'}) = f_{k + 1 | k} (x | x^{'})$ for all j = 1,…,s. Under this assumption, the fusion site can itself construct time-updated track data for the sources, using the prediction integral

${\overset{j}{f}}_{k + 1 | k} (x) = \int f_{k + 1 | k} (x | x^{'}) \cdot {\overset{j}{f}}_{k | k} (x) d x,$

(8.13)

and then apply Equation 8.10.

A second but more restrictive approximation is also possible. It is based on the presumption that the sources’ time-updated track data is identical to the fusion site’s: ${\overset{j}{f}}_{k + 1 | k} (x) = f_{k + 1 | k} (x)$ for all j = 1,…,s. In this case, Equation 8.10 reduces to

$f_{k + 1 | k + 1} (x) \propto {\overset{1}{f}}_{k + 1 | k + 1} (x) \dots {\overset{s}{f}}_{k + 1 | k + 1} (x) \cdot f_{k + 1 | k} {(x)}^{1 - s} .$

(8.14)

This formula is known as “Bayes parallel combination” [7, p. 137].

8.2.3 T²F WITH KNOWN DOUBLE-COUNTING

In the previous section, it was assumed that each data source is equipped with its own suite of dedicated sensors—that is, the sources share no sensors in common. That is, expressed with greater mathematical precision, let ${\overset{i}{Z}}_{1}, \dots {\overset{i}{Z}}_{k}$ be the time-sequence of measurement-sets for the ith source and let ${\overset{i}{Z}}_{1}, \dots {\overset{i}{Z}}_{k}$ be the time-sequence of measurement-sets for the jth source. Then ${\overset{i}{Z}}_{l} \cap {\overset{i}{Z}}_{l} ϕ$ whenever i ≠ j, for all l = 1,…,k.

If on the other hand ${\overset{i}{Z}}_{l} \cap {\overset{j}{Z}}_{l} ϕ$ , then the sources are sharing at least some sensors and double-counting of measurements occurs. Chong et al. [2] generalized Equation 8.10 to this case—assuming that one knows, a priori, which sensors are being shared by which sources. Define $Z_{k + 1} = {\overset{1}{Z}}_{k + 1} \cup \dots \cup {\overset{s}{Z}}_{k + 1} .$ . Let

• ${\overset{12}{Z}}_{k + 1}$ be the measurements supplied to the second source that are not in ${\overset{1}{Z}}_{k + 1}$

• ${\overset{13}{Z}}_{k + 1}$ the measurements supplied to the third source that are not in ${\overset{1}{Z}}_{k + 1} \cup {\overset{12}{Z}}_{k + 1}$

• ${\overset{14}{Z}}_{k + 1}$ the measurements supplied to the fourth source that are not in ${\overset{1}{Z}}_{k + 1} \cup {\overset{12}{Z}}_{k + 1} \cup {\overset{13}{Z}}_{k + 1}$

and so on. Define

${\overset{(j)}{Z}}_{k + 1} = {\overset{j}{Z}}_{k + 1} - {\overset{1 j}{Z}}_{k + 1} .$

(8.15)

Then Equation 8.10 generalizes to

$f_{k + 1 | k + 1} (x) \propto \frac{{\overset{1}{f}}_{k + 1 | k + 1} (x)}{{\overset{1}{f}}_{k + 1 | k} (x)} \cdot \frac{{\overset{2}{f}}_{k + 1 | k + 1} (x)}{{\overset{2}{f}}_{k + 1 | k} (x | \overset{(2)}{Z})} \dots \frac{{\overset{s}{f}}_{k + 1 | k + 1} (x)}{{\overset{2}{f}}_{k + 1 | k} (x | \overset{(s)}{Z})} \cdot f_{k + 1 | k} (x) .$

(8.16)

If Equation 8.16 is to be applied, the jth source must know which sensors it shares with each of sources 1,…,j − 1, and must pass on ${\overset{j}{f}}_{k + 1 | k} (x | \overset{(j)}{Z})$ in addition to ${\overset{j}{f}}_{k + 1 | k} (x)$ . Clearly, as the number of sensors increases, the problem becomes more complex, in terms of both computational cost and communications requirements.

Equation 8.16 is, once again, an immediate consequence of Bayes’ rule:

$f_{k + 1 | k + 1} (x) \propto f_{k + 1} ({\overset{1}{Z}}_{k + 1} | x) \cdot f_{k + 1} ({\overset{12}{Z}}_{k + 1} | x) \dots f_{k + 1} ({\overset{1 s}{Z}}_{k + 1} | x) \cdot f_{k + 1 | k} (x)$

(8.17)

$= f_{k + 1} ({\overset{1}{Z}}_{k + 1} | x) \cdot \frac{f_{k + 1} ({\overset{2}{Z}}_{k + 1} | x)}{f_{k + 1} ({\overset{(2)}{Z}}_{k + 1} | x)} \dots \frac{f_{k + 1} ({\overset{s}{Z}}_{k + 1} | x)}{f_{k + 1} ({\overset{(s)}{Z}}_{k + 1} | x)} \cdot f_{k + 1 | k} (x)$

(8.18)

$\propto \frac{{\overset{1}{f}}_{k + 1 | k + 1} (x)}{{\overset{1}{f}}_{k + 1 | k} (x)} \cdot \frac{{\overset{2}{f}}_{k + 1 | k + 1} (x)}{{\overset{2}{f}}_{k + 1 | k} (x | \overset{(2)}{Z})} \dots \frac{{\overset{s}{f}}_{k + 1 | k + 1} (x)}{{\overset{s}{f}}_{k + 1 | k} (x | \overset{(s)}{Z})} \cdot f_{k + 1 | k} (x) .$

(8.19)

As an example, set s = 2 and suppose that $f_{k + 1 | k} (x) = {\overset{1}{f}}_{k + 1 | k} (x)$ . Then Equation 8.16 reduces to the following formula of Chong et al. [2]:

$f_{k + 1 | k + 1} (x) \propto \frac{{\overset{1}{f}}_{k + 1 | k + 1} (x) \cdot {\overset{2}{f}}_{k + 1 | k + 1} (x)}{{\overset{1}{f}}_{k + 1 | k} (x | {\overset{1}{Z}}_{k + 1} \cap {\overset{2}{Z}}_{k + 1})} \cdot$

(8.20)

For in this case, ${\overset{12}{Z}}_{k + 1} = {\overset{2}{Z}}_{k + 1} - ({\overset{1}{Z}}_{k + 1} \cap {\overset{2}{Z}}_{k + 1})$ and so ${\overset{(2)}{Z}}_{k + 1} = {\overset{2}{Z}}_{k + 1} - {\overset{12}{Z}}_{k + 1} = {\overset{1}{Z}}_{k + 1} \cap {\overset{2}{Z}}_{k + 1}$ . Thus

(8.21)

$= \frac{{\overset{1}{f}}_{k + 1 | k + 1} (x)}{{\overset{1}{f}}_{k + 1 | k} (x)} \cdot \frac{{\overset{2}{f}}_{k + 1 | k + 1} (x)}{{\overset{2}{f}}_{k + 1 | k} (x | {\overset{1}{Z}}_{k + 1} \cap {\overset{2}{Z}}_{k + 1})} \cdot f_{k + 1 | k} (x)$

(8.22)

$= \frac{{\overset{1}{f}}_{k + 1 | k + 1} (x) \cdot {\overset{2}{f}}_{k + 1 | k + 1} (x)}{{\overset{2}{f}}_{k + 1 | k} (x | {\overset{1}{Z}}_{k + 1} \cap {\overset{2}{Z}}_{k + 1})} \cdot$

(8.23)

8.2.4 COVARIANCE INTERSECTION

Sections 8.2.2 and 8.2.3 address situations in which enough a priori knowledge is available to make exact track merging possible. In general, however, this will not be possible. This is because not enough a priori information is available, or because even if available it cannot be effectively utilized. This situation is, in part, what the CI method of Uhlmann and Julier [8–10] is intended to address.

Suppose that a single target is being observed by two track sources. At time-step k, the first source provides a track $({\overset{0}{x}}_{k | k}, {\overset{0}{P}}_{k | k})$ and the second source provides a track $({\overset{1}{x}}_{k | k}, {\overset{1}{P}}_{k | k})$ . CI is a method for merging $({\overset{0}{x}}_{k | k}, {\overset{0}{P}}_{k | k})$ and $({\overset{1}{x}}_{k | k}, {\overset{1}{P}}_{k | k})$ into a single (x_{k | k},P_k _{| k}) that is robust with respect to ambiguity. This means, in particular, that the uncertainty P_k _{| k} in x_{k | k} is neither too small (over-confidence) nor too large (under-confidence). Let 0 ≤ ω ≤ 1 and define $({\overset{ω}{x}}_{k | k}, {\overset{ω}{P}}_{k | k})$ by

$\overset{ω}{P_{k | k}^{- 1}} = (1 - ω) \overset{0}{P_{k | k}^{- 1}} + ω \overset{1}{P_{k | k}^{- 1}}$

(8.24)

$\overset{ω}{P_{k | k}^{- 1}} {\overset{ω}{x}}_{k | k} = (1 - ω) \overset{0}{P_{k | k}^{- 1}} {\overset{0}{x}}_{k | k} + ω \overset{1}{P_{k | k}^{- 1}} {\overset{1}{x}}_{k | k} .$

(8.25)

The matrix ${\overset{ω}{P}}_{k | k}$ is positive-definite regardless of the value of ω, and $({\overset{ω}{x}}_{k | k}, {\overset{ω}{P}}_{k | k})$ instantiates to $({\overset{0}{x}}_{k | k}, {\overset{0}{P}}_{k | k})$ resp. $({\overset{1}{x}}_{k | k}, {\overset{1}{P}}_{k | k})$ when ω = 0 resp. ω = 1.

Suppose that

${(x - {\overset{0}{x}}_{k | k})}^{T} \overset{0}{P_{k | k}^{- 1}} (x - {\overset{0}{x}}_{k | k}) \leq σ^{2}$

(8.26)

${(x - {\overset{1}{x}}_{k | k})}^{T} \overset{1}{P_{k | k}^{- 1}} (x - {\overset{1}{x}}_{k | k}) \leq σ^{2}$

(8.27)

are the error hyper-ellipsoids of size σ associated with the tracks $({\overset{0}{x}}_{k | k}, {\overset{0}{P}}_{k | k})$ and $({\overset{1}{x}}_{k | k}, {\overset{1}{P}}_{k | k})$ . Then it can be shown that, for any 0 ≤ ω ≤ 1 and any σ > 0,

${(x - {\overset{ω}{x}}_{k | k})}^{T} \overset{ω}{P_{k | k}^{- 1}} (x - {\overset{ω}{x}}_{k | k}) \leq σ^{2} .$

(8.28)

That is, the error hyper-ellipsoid of the merged track always contains the intersection of the interiors of the error hyper-ellipsoids of the original tracks.

Intuitively speaking, ω should be chosen so that the hypervolume of the hyperellipsoid ${(x - {\overset{ω}{x}}_{k | k})}^{T} \overset{ω}{P_{k | k}^{- 1}} (x - {\overset{ω}{x}}_{k | k}) = σ^{2}$ is as small as possible. That is, the merged hyper-ellipsoid should have the best possible fit to the intersection-region of the two original hyper-ellipsoids. Uhlmann and Julier proposed choosing $ω = \hat{ω}$ so that it minimizes either the trace tr ${\overset{ω}{P}}_{k | k}$ or the determinant det ${\overset{ω}{P}}_{k | k}$ . They demonstrated that this approach yields an approximation of the exact merged track that is unbiased and whose degree of uncertainty is not overstated.

Fränken and Hüpper [11] subsequently proposed a more computationally tractable “fast CI” approximation. Here, ω is chosen according to the formula

$1 - ω = \frac{\det (\overset{0}{P_{k | k}^{- 1}} + \overset{1}{P_{k | k}^{- 1}}) - \det \overset{1}{P_{k | k}^{- 1}} + \det \overset{0}{P_{k | k}^{- 1}}}{2 \cdot \det (\overset{0}{P_{k | k}^{- 1}} + \overset{1}{P_{k | k}^{- 1})}} .$

(8.29)

These authors also proposed the following generalization. Consider the multisource CI problem defined by

$\overset{ω}{P_{k | k}^{- 1}} = ω_{1} \overset{1}{P_{k | k}^{- 1}} + \dots + ω_{n} \overset{n}{P_{k | k}^{- 1}}$

(8.30)

$\overset{ω}{P_{k | k}^{- 1}} {\overset{ω}{x}}_{k | k} = ω_{1} \overset{1}{P_{k | k}^{- 1}} {\overset{0}{x}}_{k | k} + \dots + ω_{n} \overset{n}{P_{k | k}^{- 1}} {\overset{n}{x}}_{k | k}$

(8.31)

with ω₁ + ⋯ + ω_n = 1. Then their proposed approximation is

$ω_{i} = \frac{\det P_{k k}^{- 1} - \det (P_{k k}^{- 1} - \overset{i}{P_{k | k}^{- 1}}) + \det \overset{i}{P_{k | k}^{- 1}}}{n \cdot \det P_{k k}^{- 1} + \sum_{j = 1}^{n} [\det \overset{j}{P_{k | k}^{- 1}} - \det (P_{k k}^{- 1} - \overset{j}{P_{k | k}^{- 1}})]}$

(8.32)

where

$P_{k k}^{- 1} = \sum_{i = 1}^{n} \overset{i}{P_{k k}^{- 1} .}$

Much research has been devoted to determining the effectiveness of CI. The emerging consensus seems to be that CI tends to produce estimates of the fused track that are pessimistic. That is, the fused target-localizations are significantly worse than what one would get from an exact fused solution. This behavior is exactly what one would expect, given that, by design, CI must address worst-case situations in which to-be-fused tracks could be highly correlated.

8.2.5 EXPONENTIAL MIXTURE FUSION

The CI method addresses the merging of only linear-Gaussian track sources. How might it be generalized to more general sources? In 2000 [3], I observed that the following identity is true:

$\frac{N_{_{P_{k / k}}^{0}} {(x - {\overset{0}{x}}_{k | k})}^{1 - ω} \cdot N_{_{P_{k / k}}^{1}} {(x - {\overset{1}{x}}_{k | k})}^{ω}}{\int N_{_{P_{k / k}}^{0}} {(y - {\overset{0}{x}}_{k | k})}^{1 - ω} \cdot N_{_{P_{k / k}}^{1}} {(y - {\overset{1}{x}}_{k | k})}^{ω} d y} = N_{_{P_{k / k}}^{ω}} (x - {\overset{ω}{x}}_{k | k})$

(8.33)

where, in general, $N_{P_{0}} (x - x_{0})$ denotes a multidimensional Gaussian distribution with mean x₀ and covariance matrix P₀. That is, CI can be expressed entirely in terms of density functions rather than covariance matrices. I proposed, therefore, that the following definition be taken as the obvious generalization of the CI merging formula to arbitrary track sources:

${\overset{ω}{f}}_{k + 1 | k + 1} (x) = \frac{{\overset{0}{f}}_{k + 1 | k + 1} {(x)}^{1 - ω} \cdot {\overset{1}{f}}_{k + 1 | k + 1} {(x)}^{ω}}{\int {\overset{0}{f}}_{k + 1 | k + 1} {(y)}^{1 - ω} \cdot {\overset{1}{f}}_{k + 1 | k + 1} {(y)}^{ω} d y} .$

(8.34)

Hurley independently proposed Equation 8.34 in 2002 [12]. He also justified its theoretical reasonableness on the basis of its similarity to Chernoff information, which is defined as follows:

$C ({\overset{1}{f}}_{k + 1 | k + 1}; {\overset{0}{f}}_{k + 1 | k + 1}) = \sup_{0 \leq ω \leq 1} (- \log \int {\overset{0}{f}}_{k + 1 | k + 1} {(x)}^{1 - ω} \cdot {\overset{1}{f}}_{k + 1 | k + 1} {(x)}^{ω} d x) .$

(8.35)

As it turns out, Equation 8.34 is a special case of “logarithmic opinion pooling,” when the opinions of only two experts are being pooled [13]. This means that CI is itself a special case of logarithmic opinion pooling, given that the opinions of two linear-Gaussian experts are being pooled. Julier and Uhlmann have described Equation 8.34 as an “XM model” for track fusion [14,15]. (It has also been given the name “Chernoff fusion” [16].) I will adopt their terminology in what follows, abbreviating it as “XM fusion.” (Julier has also suggested approximations for computing the XM fusion formula when the original distributions ${\overset{0}{f}}_{k + 1 | k + 1} (x)$ and ${\overset{1}{f}}_{k + 1 | k + 1} (x)$ Gaussian mixtures [14].)

The XM fusion density has several appealing properties. First, and perhaps most importantly, Julier has shown that it is invariant with respect to double counting [17]. That is, suppose that the distributions ${\overset{0}{f}}_{k + 1 | k + 1} (x)$ and ${\overset{1}{f}}_{k + 1 | k + 1} (x)$ have double-counted information in the sense of Section 8.2.3. Then ${\overset{ω}{f}}_{k + 1 | k + 1} (x)$ incorporates the double-counted information only once, in the same sense as does Equation 8.20.

Second, for all 0 ≤ ω ≤ 1 [9]:

$\min {{\overset{0}{f}}_{k + 1 | k + 1} (x), {\overset{1}{f}}_{k + 1 | k + 1} (x)} \leq {\overset{ω}{f}}_{k + 1 | k + 1} (x) (all x)$

(8.36)

$\min {{\overset{0}{f}}_{k + 1 | k + 1} (x_{0}), {\overset{1}{f}}_{k + 1 | k + 1} (x_{0})} \leq {\overset{ω}{f}}_{k + 1 | k + 1} (x_{0}) (there exixts x_{0}) .$

(8.37)

The first inequality indicates that ${\overset{ω}{f}}_{k + 1 | k + 1} (x)$ does not reduce information (as compared to the original distributions), whereas the second one indicates that it can also increase it.

In Ref. [3], I proposed the following as the most theoretically reasonable procedure for optimizing ω

$\hat{ω} = \arg \sup_{ω} \sup_{x} {\overset{ω}{f}}_{k + 1 | k + 1} (x),$

(8.38)

in which case ${\overset{ω}{f}}_{k + 1 | k + 1} (x)$ with $ω = \hat{ω}$ results in the best choice of track merging. That is, the optimal value of ω is the one that results in the largest MAP estimate. (Note that Equation 8.38 can be approximated by computing the covariance matrix ${\overset{ω}{P}}_{k | k}$ of ${\overset{ω}{f}}_{k + 1 | k + 1} (x)$ and minimizing its determinant or trace, as originally proposed by Uhlmann and Julier [8-10].)

Julier has proposed [14] that, rather than Equation 8.38, a more theoretically principled optimization procedure would be to choose ω as the maximizing value in Equation 8.35:

$\overset{⌣}{ω} = \arg \inf_{ω} \int {\overset{0}{f}}_{k + 1 | k + 1} {(x)}^{1 - ω} \cdot {\overset{1}{f}}_{k + 1 | k + 1} {(x)}^{ω} d x .$

(8.39)

This has the effect of minimizing the degree of overlap between the distributions ${\overset{0}{f}}_{k + 1 | k + 1} {(x)}^{1 - ω}$ and ${\overset{1}{f}}_{k + 1 | k + 1} {(x)}^{ω}$ . His reasoning is as follows. First, $\overset{⌣}{ω}$ reflects the information contained in the distribution ${\overset{ω}{f}}_{k + 1 | k + 1} (x)$ as an entirety—rather than just the information contained at a single point, the MAP estimate. Second, ${\overset{\overset{⌣}{ω}}{f}}_{k + 1 | k + 1} (x)$ can be shown to be equally distant from ${\overset{0}{f}}_{k + 1 | k + 1} (x)$ and ${\overset{0}{f}}_{k + 1 | k + 1} (x)$ in a Kullback–Leibler information-theoretic sense.

Nevertheless, I argue that Equation 8.38 is a more justifiable theoretical choice, for two reasons:

1. In target tracking, a track distribution f_k _{| k}(x) is of little interest unless one can extract from it an accurate estimate of target state. Using the entire distribution f_k _{| k}(x) for this purpose is typically a bad idea. For example, in practical application, most of the modes of f_k _{| k}(x) will be minor modes caused by clutter returns, along with (if SNR is large enough) a single larger target-associated mode. Thus an estimator that employs all of f_k _{| k}(x)—the expected value ${\bar{x}}_{k | k}$ of f_k _{| k}(x) for example—can produce unstable and very unaccurate estimates. The MAP estimator, Equation 8.7, is usually more appropriate for practical application, since it tends to produce more stable and accurate state estimates.

2. Abstract information-theoretic distances should be treated with caution when isolated from physical intuition. There is a literal infinitude of information-based distance concepts—most obviously, the Csiszár-divergence family [18,19]

$K_{c} ({\overset{1}{f}}_{k + 1 | k + 1}; {\overset{0}{f}}_{k + 1 | k + 1}) = \int {\overset{0}{f}}_{k + 1 | k + 1} (x) \cdot c (\frac{{\overset{1}{f}}_{k + 1 | k + 1} (x)}{{\overset{0}{f}}_{k + 1 | k + 1} (x)}) d x$

(8.40)

and its multitarget generalizations [20], where c(x) is some nonnegative convex function. For example, choose the convex kernel c(x) to be c_ω(x) = (1 − ω)x + ω − x^ω. Then Chernoff information can be expressed in terms of $K_{c_{ω}}$ , which is

$K_{c_{ω}} ({\overset{0}{f}}_{k + 1 | k + 1}; {\overset{1}{f}}_{k + 1 | k + 1}) = 1 - \int {\overset{0}{f}}_{k + 1 | k + 1} {(x)}^{1 - ω} \cdot {\overset{1}{f}}_{k + 1 | k + 1} {(x)}^{ω} d x .$

(8.41)

In addition to these, there are many distance metrics on probability distributions, such as Wasserstein distance. Which of these is “best,” why is it best, and what might its physical interpretation be?

The reasoning behind Equation 8.38, by way of contrast, inherently arises from the practical goal of trying to achieve the most accurate and stable state estimates possible. For each ω, consider the following statements about ${\overset{ω}{f}}_{k + 1 | k + 1} (x)$ :

1. It is the distribution of the merged track.

2. The MAP estimate for this track is ${\overset{ω}{x}}_{k + 1 | k + 1} = \arg \sup_{x} {\overset{ω}{f}}_{k + 1 | k + 1} (x)$ .

3. The larger the value of $\sup_{x} {\overset{ω}{f}}_{k + 1 | k + 1} (x)$ , the more probable—and therefore the more sharply localized— ${\overset{ω}{x}}_{k + 1 | k + 1}$ will be.

4. Thus one should choose that value $\hat{ω}$ of ω which corresponds to the most-probable (best localized) MAP estimate.

The necessity of this line of reasoning will become apparent when I propose multi-target generalizations of XM fusion later in the chapter. In this situation, concepts such as covariance or trace can no longer even be defined. Concepts such as Chernoff information and Csiszár discrimination can still be defined, but their physical meaning is even less evident than in the single-target case. The primary difficulty is a practical one, namely that in multitarget problems the computability of Equation 8.38 will be questionable. Thus computational tractability will usually be the primary motivation for choosing information-theoretic or other optimization approaches in preference to Equation 8.38.

8.3 FINITE-SET STATISTICS: REVIEW

In this section, I briefly review basic elements of finite-set statistics (FISST) [7,21,22] that are required for the material that follows:

1. Section 8.3.1: The multisensor-multitarget recursive Bayes filter. This is the foundation for the approach to T²F that will be introduced shortly.

2. Section 8.3.2: A brief summary of the basic elements of the FISST differential and integral multitarget calculus, including Poisson processes and i.i.d.c. processes.

3. Section 8.3.3: The PHD filter. This is the first computational approximation of the multitarget Bayes filter.

4. Section 8.3.4: The CPHD filter. This is the second computational approximation of the multitarget Bayes filter.

5. Section 8.3.5: A brief summary of significant recent advances involving PHD and CPHD filters.

8.3.1 MULTITARGET RECURSIVE BAYES FILTER

My approach to multisource-multitarget T²F is based on the multisensor-multitarget recursive Bayes filter [7, chapter 14]. Let Z^(k): Z₁,…,Z_k be a time-sequence of multisensor-multitarget measurement-sets Z_i collected at times t₁,…,t_k. That is, each Z_i consists of the measurements collected by all available sensors at or near time-step i. They can have the form Z_i = ; (no measurements collected); Z_i = {z₁} (one measurement z₁ collected); Z_i = {z₁,z₂} (two measurements z₁,z₂ collected); and so on. Given this, the multitarget Bayes filter propagates a multitarget posterior distribution f_k _{| k}(X | Z^k) through time:

$\dots \to f_{k | k} (X | Z^{(k)}) \overset{predictor}{\to} f_{k + 1 | k} (X | Z^{(k)}) \overset{corrector}{\to} f_{k + 1 | k + 1} (X | Z^{(k + 1)}) \to \dots$

(8.42)

Here, X is the single-target state-set—i.e., X = if no targets are present, X = {x₁} if a single target with state x₁ is present, X = {x₁,x₂} if two targets with states x₁,x₂ are present, etc. The “cardinality distribution”

$p_{k + 1 | k + 1} (n | Z^{(k + 1)}) = \int_{| X | = n} f_{k + 1 | k + 1} (X | Z^{(k + 1)}) δ X$

(8.43)

defines the posterior probability that the multitarget scene contains n targets, where $\int \cdot δ X$ indicates a multitarget “set integral” as defined in Section 8.3.2.

The multitarget Bayes filter presumes the existence of multitarget motion and measurement models, for example:

$\begin{matrix} Ξ_{k + 1 | k} = S_{k} (X) \cup B_{k}, & Σ_{k + 1 | k} = T_{k + 1} (X) \cup C_{k + 1} \end{matrix}$

(8.44)

where

S_k(X) is the random finite subset (RFS) of persisting targets

B_k is the RFS of appearing targets

T_k₊₁(X) is the RFS of target-generated measurements

C_k₊₁ is the RFS of clutter measurements

Given these models, using multitarget calculus (Section 8.3.2) one can construct a multitarget Markov transition density and a multitarget likelihood function

$\begin{matrix} f_{k + 1 | k} (X | X^{'}), & f_{k + 1} (Z | X) \end{matrix}$

(8.45)

(see Chapters 12 and 13 of [7]). Because of this systematic specification of models, at any given time-step the distribution f_k _{| k}(X | Z^(k)) systematically encapsulates all relevant information regarding the presumed strengths and weaknesses of the targets, and the known strengths and weaknesses of the sensors.

The multitarget Bayes filter is defined by the predictor and corrector equations

$f_{k + 1 | k} (X | Z^{(k)}) = \int f_{k + 1 | k} (X | X^{'}) \cdot f_{k | k} (X^{'} | Z^{(k)}) δ X^{'}$

(8.46)

$f_{k + 1 | k + 1} (X | Z^{(k + 1)}) = \frac{f_{k + 1} (Z_{k + 1} | X) \cdot f_{k + 1 | k} (X | Z^{(k)})}{f_{k + 1} (Z_{k + 1} | Z^{(k)})}$

(8.47)

where

$f_{k + 1} (Z_{k + 1} | Z^{(k)}) = \int f_{k + 1} (Z_{k + 1} | X) \cdot f_{k + 1 | k} (X | Z^{(k)}) δ X .$

(8.48)

In what follows, I will abbreviate, for all k ≥ 0,

$f_{k | k} (X) \overset{abbr .}{=} f_{k | k} (X | Z^{(k)})$

(8.49)

$f_{k + 1 | k} (X) \overset{abbr .}{=} f_{k + 1 | k} (X | Z^{(k)}) .$

(8.50)

Information of interest—number of targets, the positions, velocities, and types of the targets, etc.—can be jointly extracted from f_k _{| k}(X | Z^(k)) using a Bayes-optimal multitarget state estimator (see Section 14.5 of [7]). For example, the joint multitarget (JoM) estimator is defined by

$X_{k + 1 | k + 1}^{JoM} = \arg \sup_{X} f_{k + 1 | k + 1} (X | Z^{(k + 1)}) \cdot \frac{c^{| X |}}{| X |!}$

(8.51)

where c is a fixed constant which has the same units of measurement as the single-target state x.

Remark 2: Generally speaking, c should be approximately equal to the accuracy to which the state is to be estimated, as long as the following inequality is satisfied [7, p. 500]: $f_{k + 1 | k + 1} (X | Z^{(k + 1)}) \cdot c^{\hat{n}} \leq 1$ for all X, where $\hat{n}$ is the MAP estimate derived from the cardinality distribution.

8.3.2 MULTITARGET CALCULUS

The finite-set statistics multitarget integral-differential calculus is central to the approach that I advocate. Functional derivatives and set derivatives [7, chapter 11] are key to the construction of “true” multitarget Markov densities and multitarget likelihood functions. They are also key to the construction of principled approximations of the multitarget Bayes filter, such as the PHD and CPHD filters.

A set integral accounts for random variability in target number as well as in target state. Let f_k _{| k}(X) be a multitarget probability distribution. Then it has the form

$\int f (X) δ X = f (ϕ) + \sum_{n = 1}^{\infty} \frac{1}{n!} \int f_{k | k} ({x_{1}, \dots, x_{n}}) d x_{1} \dots d x_{n} .$

(8.52)

Let F[h] be any functional—i.e., a scalar-valued function whose argument h is a function h(x). Then the functional derivative of F with respect to any finite set X = {x₁,…,x_n} with |X| = n ≥ 0 is given by

$\frac{δ F}{δ X} [h] = \frac{δ}{δ x_{1}} \dots \frac{δ}{δ x_{n}} F [h]$

(8.53)

$\frac{δ}{δ x} F [h] = \lim_{ε ↘ 0} \frac{F [h + {εδ}_{x}] - F [h]}{ε}$

(8.54)

where δ_x(x′) denotes the Dirac delta function concentrated at x. Functional derivatives and set integrals are inverse operations, in the sense that

$F [h] = \int h^{X} \cdot \frac{δ F}{δ X} [0] δ X$

(8.55)

${[\frac{δ}{δ X} \int h^{Y} \cdot f (X) δ X]}_{h = 0} = f (X) .$

(8.56)

Here, for any function h(x),

$h^{X} = {\begin{matrix} 1 & if & X = ϕ \\ \prod_{x \in X} h (x) & if & otherwise \end{matrix} .$

(8.57)

In this chapter, we will require frequent use of two special multitarget processes. Suppose that f(X) is a multitarget probability distribution. Then it is the distribution of

• A Poisson process (Poisson RFS) if

$f (X) = e^{- N} \cdot D^{X}$

(8.58)

where

$N = \int D (x) d x$

D(x) is the PHD, or “intensity function,” of the process

• An independent, identically distributed cluster (i.i.d.c.) process (i.i.d.c. RFS) if

$f (X) = | X |! \cdot p (| X |) \cdot s^{X}$

(8.59)

where

s(x) is the spatial density

p(n) is the cardinality distribution of the process

Equation 8.58 is a special case of Equation 8.59 with $p (n) = e^{- N . N^{n} / n!}$

As an example, one can verify that Equation 8.58 defines a multitarget probability distribution:

$\int f (X) δ X = e^{- N} \cdot D^{ϕ} + e^{- N} \sum_{n = 1}^{\infty} \frac{1}{n!} \int D (x_{1}) \dots D (x_{n}) d x_{1} \dots d x_{n}$

(8.60)

$= e^{- N} + e^{- N} \sum_{n = 1}^{\infty} \frac{1}{n!} N^{n} = e^{- N} \cdot e^{N} = 1.$

(8.61)

Likewise, Equation 8.59 defines a multitarget probability distribution:

$\int f (X) δ X = 0! \cdot p (0) \cdot s^{ϕ} + \sum_{n = 1}^{\infty} \frac{n! \cdot p (n)}{n!} \int s (x_{1}) \dots s (x_{n}) d x_{1} \dots d x_{n}$

(8.62)

$= p (0) + \sum_{n = 1}^{\infty} p (n) = 1.$

(8.63)

8.3.3 PHD FILTER

Constant-gain Kalman filters—the alpha-beta filter, for example—provide the most computationally tractable approximation of the single-sensor Bayes filter. A constant-gain Kalman filter propagates the first statistical moment (posterior expectation) ${\hat{x}}_{k | k}$ in place of f_k _{| k}(x | Z^k), using alternating predictor steps ${\hat{x}}_{k | k} \to {\hat{x}}_{k + 1 | k}$ and corrector steps ${\hat{x}}_{k + 1 | k} \to {\hat{x}}_{k + 1 | k + 1}$ .

The PHD filter mimics this basic idea, but at a more abstract, statistical level [7, Chapter 16] [23]. It propagates a first-order multitarget moment of the multitarget posterior f_k _{| k}(X | Z^(k)) instead of f_k _{| k}(X | Z^(k)) itself:

$\dots \to D_{k | k} (x | Z^{(k)}) \overset{predictor}{\to} D_{k + 1 | k} (x | Z^{(k)}) \overset{corrector}{\to} D_{k + 1 | k + 1} (x | Z^{(k + 1)}) \to \dots$

(8.64)

This moment, the PHD, is the density function on single-target states x defined by

$D_{k | k} (x) \overset{abbr .}{=} D_{k | k} (x | Z^{(k)}) = \int f_{k | k} (X \cup {x} | Z^{(k)}) δ X .$

(8.65)

It is not a probability density, since its integral is in general not 1. Rather, $N_{k | k} = \int D_{k | k} (x) d x$ is the total expected number of targets in the scenario. Intuitively speaking, D_k _{| k}(x) is the track density at x. The peaks of D_k _{| k}(x) are approximately at the locations of the most likely target states. So, one way of estimating the number $\hat{n}$ and states ${\hat{x}}_{1}, \dots, {\hat{x}}_{\hat{n}}$ of the predicted tracks is to take $\hat{n}$ to be the nearest integer $\hat{n}$ in N_k_{+1 | k} and then determine the $\hat{n}$ highest peaks of D_k _{| k}(x).

The PHD can be propagated through time using the following predictor (time-update) and corrector (data-update) equations. Neglecting the spawning of targets by other targets, these are

$D_{k + 1 | k} (x) = N_{k + 1 | k}^{B} s_{k + 1 | k}^{B} (x) + \int p_{s} (x^{'}) \cdot f_{k + 1 | k} (x | x^{'}) \cdot D_{k | k} (x^{'}) d x^{'}$

(8.66)

$\frac{D_{k + 1 | k + 1} (x)}{D_{k + | 1 k} (x)} = 1 - p_{D} (x) + \sum_{z \in Z_{k + 1}} \frac{p_{D} (x) \cdot L_{z} (x)}{λ_{k + 1} c_{k + 1} (z) + τ_{k + 1} (z)} .$

(8.67)

Here,

• $N_{k + 1 | k}^{B}$ is the expected number, and $s_{k + 1 | k}^{B} (x)$ the spatial distribution, of newly appearing targets.

• $p_{s} (x^{'}) \overset{abbr .}{=} p_{s, k + 1 | k} (x^{'})$ is the probability that a target with state x′ at time-step k will survive into time-step k + 1.

• f_k_{+1 | k}(x | x′) is the single-target Markov transition density.

• $p_{D} (x) \overset{abbr .}{=} p_{D, k + 1} (x)$ is the probability that a target with state x at time-step k + 1 will generate a measurement.

• $L_{z} (x) \overset{abbr .}{=} f_{k + 1} (z | x)$ is the single-target likelihood function.

• $λ_{k + 1}$ is the clutter rate and c_k₊₁(z) is the spatial distribution of the Poisson clutter process, where

$τ_{k + 1} (z) = \int p_{D} (x) \cdot L_{z} (x) \cdot D_{k + 1 | k} (x) d x .$

(8.68)

One can get an intuitive understanding of how the PHD filter works by noticing that the measurement-updated expected number of targets is

$N_{k + 1 | k + 1} = \int D_{k + 1 | k + 1} (x) d x = {\overset{ND}{N}}_{k + 1 | k + 1} + \sum_{z \in Z_{k + 1}} {\overset{D}{N}}_{k + 1 | k + 1} (z)$

(8.69)

where

${\overset{ND}{N}}_{k + 1 | k + 1} = \int (1 - p_{D} (x)) \cdot D_{k + 1 | k + 1} (x) d x$

(8.70)

${\overset{D}{N}}_{k + 1 | k + 1} (z) = \frac{τ_{k + 1} (z)}{λ_{k + 1} c_{k + 1} (z) + τ_{k + 1} (z)} \leq 1.$

(8.71)

The nondetection term ${\overset{ND}{N}}_{k + 1 | k + 1}$ is an estimate of the number of targets that have not been detected. The detection ratio ${\overset{D}{N}}_{k + 1 | k + 1} (z)$ assesses whether or not z originated with clutter or with a target. If ${\overset{D}{N}}_{k + 1 | k + 1} (z) > 1 / 2$ —that is, if τ_k+1(z) > λ_k+1c_k₊₁(z)—then z is “target-like.” If ${\overset{D}{N}}_{k + 1 | k + 1} (z) < 1 / 2$ then it is “clutter-like.”

The derivation of Equation 8.67 requires the following simplifying assumption: the predicted target process is approximately Poisson. As is evident from Equation 8.67, the PHD filter does not require explicit measurement-to-track association. It has computational order O(mn), where m is the current number of measurements and n is the current number of targets. It tends to produce inaccurate (high variance) instantaneous estimates N_k _{| k} of target number. Thus it is typically necessary to average N_k _{| k} over some time window.

The PHD filter can be implemented using both sequential Monte Carlo (SMC, a.k.a. particle-system) approximation, or Gaussian-mixture approximation. In the first case, it is called a “particle-PHD filter” and in the second case a “GM-PHD filter” (see Chapter 16 of [7] and [45–47]).

8.3.4 CPHD FILTER

The CPHD filter generalizes the PHD filter [7, chapter 16] [23]. It admits more general false alarm models (called “independent, identically distributed cluster” [i.i.d.c.] models) than the Poisson models assumed in the PHD filter. It propagates two things: a spatial distribution s_k _{| k}(x) and a cardinality distribution $p_{k | k} (n) \overset{abbr .}{=} p_{k | k} (n | Z^{(k)})$ on target number n:

$\dots \to {\begin{matrix} s_{k | k} (x | Z^{(k)}) \\ p_{k | k} (n | Z^{(k)}) \end{matrix} \overset{predictor}{\to} {\begin{matrix} s_{k + 1 | k} (x | Z^{(k)}) \\ p_{k + 1 | k} (n | Z^{(k)}) \end{matrix} \overset{corrector}{\to} {\begin{matrix} s_{k + 1 | k + 1} (x | Z^{(k + 1)}) \\ p_{k + 1 | k + 1} (n | Z^{(k + 1)}) \end{matrix} \to \dots$

(8.72)

If $N_{k | k} = \sum_{n \geq 0} n \cdot p_{k | k} (n | Z^{(k)})$ is the expected number of targets, then D_k _|k(x | Z^(k)) = N_k_|k·s_k_|k(x|Z^(k)) is the corresponding PHD. Or, equivalently, $s_{k | k} (x | Z^{(k)}) = N_{k | k}^{- 1} D_{k | k} (x | Z^{(k)})$ . CPHD Filter Time-Update Equations. The predictor equations for the CPHD filter are

$D_{k + 1 | k} (x) = b_{k + 1 | k} (x) + \int p_{s} (x^{'}) \cdot f_{k + 1 | k} (x | x^{'}) \cdot D_{k | k} (x^{'}) d x^{'}$

(8.73)

$p_{k + 1 | k} (n) = \sum_{n^{'} \geq 0} p_{k + 1 | k} (n | n^{'}) \cdot p_{k | k} (n^{'})$

(8.74)

where $p_{k + 1 | k}^{B} (n - j)$ is the cardinality distribution of the birth process and where

$p_{k + 1 | k} (n | n^{'}) = \sum_{j = 0}^{n} p_{k + 1 | k}^{B} (n - j) \cdot C_{n^{'}, j} \cdot ψ_{k}^{j} {(1 - ψ_{k})}^{n^{'} - j}$

(8.75)

$ψ_{k} = \int p_{s} (x) \cdot s_{k + 1 | k} (x) d x$

(8.76)

$N_{k + 1 | k} = N_{k + 1 | k}^{B} + \int p_{s} (x^{'}) \cdot D_{k | k} (x^{'}) d x^{'}$

(8.77)

$N_{k + 1 | k}^{B} + \int b_{k + 1 | k} (x) d x$

(8.78)

$C_{n^{'}, j} = {\begin{matrix} \frac{n^{'}!}{j! \cdot (n^{'} - j)!} & if & 0 \leq j \leq n^{'} \\ 0 & if & otherwise \end{matrix} .$

(8.79)

CPHD Filter Measurement-Update Equations. If m = |Z_k₊₁| where Z_k₊₁ = {z₁,…,z_m} is the newly collected measurement-set, then the corrector equations for the CPHD filter are

$\frac{D_{k + 1 | k + 1} (x)}{s_{k + 1 | k} (x)} = (1 - p_{D} (x)) \cdot {\overset{ND}{E}}_{k + 1} + \sum_{z \in Z_{k + 1}} p_{D} (x) \cdot L_{z} (x) \cdot {\overset{D}{E}}_{k + 1} (z)$

(8.80)

$\frac{p_{k + 1 | k + 1} (n)}{p_{k + 1 | k} (n)} = \frac{\sum_{j = 0}^{\min {m, n}} (m - j)! \cdot p_{k + 1}^{κ} (m - j) \cdot P_{n, j} \cdot ϕ_{k}^{n - j} \cdot σ_{j} (Z_{k + 1})}{\sum_{l = 0}^{m} (m - l)! \cdot p_{k + 1}^{κ} (m - l) \cdot σ_{l} (Z_{k + 1}) \cdot G_{k + 1 | k}^{(l)} (ϕ_{k})}$

(8.81)

where

${\overset{ND}{E}}_{k + 1} = \frac{\sum_{j = 0}^{m} (m - j)! \cdot p_{k + 1}^{κ} (m - j) \cdot σ_{j} (Z_{k + 1}) \cdot G_{k + 1 | k}^{(j + 1)} (ϕ_{k})}{\sum_{l = 0}^{m} (m - l)! \cdot p_{k + 1}^{κ} (m - l) \cdot σ_{l} (Z_{k + 1}) \cdot G_{k + 1 | k}^{(l)} (ϕ_{k})}$

(8.82)

${\overset{D}{E}}_{k + 1} (z) = \frac{1}{c_{k + 1} (z)} \cdot \frac{\sum_{j = 0}^{m - 1} (m - j - 1)! \cdot p_{k + 1}^{κ} (m - j - 1) \cdot σ_{j} (Z_{k + 1} - {z_{j}}) \cdot G_{k + 1 | k}^{(j + 1)} (ϕ_{k})}{\sum_{l = 0}^{m} (m - l)! \cdot p_{k + 1}^{κ} (m - l) \cdot σ_{l} (Z_{k + 1}) \cdot G_{k + 1 | k}^{(l)} (ϕ_{k})}$

(8.83)

and where

$σ_{j} (Z_{k + 1}) = σ_{m, j} (\frac{τ_{k + 1} (z_{1})}{c_{k + 1} (z_{1})}, \dots, \frac{τ_{k + 1} (z_{m})}{c_{k + 1} (z_{m})})$

(8.84)

$G_{k + 1 | k}^{(l)} (ϕ_{k}) = \sum_{n^{'} \geq l} P_{n^{'}, l} \cdot p_{k + 1 | k} (n^{'}) \cdot ϕ_{k}^{n^{'} - l}$

(8.85)

$G_{k + 1 | k}^{(j + 1)} (ϕ_{k}) = \sum_{n^{'} \geq j + 1} P_{n^{'}, j + 1} \cdot p_{k + 1 | k} (n^{'}) \cdot ϕ_{k}^{n^{'} - j - 1}$

(8.86)

$ϕ_{k} = \int (1 - p_{D} (x)) \cdot s_{k + 1 | k + 1} (x) d x$

(8.87)

$τ_{k + 1} (z) = \int p_{D} (x) \cdot L_{z} (x) \cdot s_{k + 1 | k} (x) d x$

(8.88)

where P_n,i = n!/(n − i)! is the permutation coefficient.

The corrector equations for the CPHD filter require the following simplifying assumption: that the predicted target process is approximately an i.i.d.c. process. The CPHD filter has computational order O(m³n), though this can be reduced to O(m²n) using special numerical techniques.

The CPHD filter can be implemented using both particle approximation and Gaussian-mixture approximation. In the first case, it is called a “particle-CPHD filter” and in the second case a “GM-CPHD filter.”

8.3.5 SIGNIFICANT RECENT DEVELOPMENTS

The theory and practice of random set filters has developed rapidly in recent years. In this section, I briefly summarize a few of the most recent advances:

1. Track-before-detect filtering in pixelized images without preprocessing. Most multitarget tracking algorithms using pixelized image data rely on some kind of image preprocessing step to extract detection-type features: threshold detectors, edge detectors, blob detectors, etc. In Ref. [24], Vo, Vo, and Pham have demonstrated a computationally tractable multitarget detection and tracking algorithm that does not require such preprocessing. It is based on a suitable modification of the “multi-Bernoulli filter” introduced in Ref. [7, chapter 17] and then corrected and implemented in Ref. [25].

2. Simultaneous localization and mapping (SLAM). When neither GPS nor terrain maps are available, a robotic platform must detect landmarks, use them to construct a terrain map on the fly, and simultaneously orient the platform with respect to that map. The current state-of-the-art in SLAM is the FastSLAM approach, which employs measurement-to-track association, in conjunction with heuristic procedures for clutter rejection and initiation and termination of landmarks. Mullane, Vo, Adams, and Vo have shown that a PHD filter-based SLAM filter significantly outperforms FastSLAM in regard to the accuracy of both platform trajectory estimation and landmark detection and localization [26,27]. Clark has devised an even faster and more accurate SLAM-PHD filter based on a cluster-process formulation [28].

3. “Background agnostic” (BAG) CPHD filters. The “classical” CPHD filter relies on an a priori model $λ_{k + 1}, c_{k + 1} (z), p_{k + 1}^{κ} (m)$ of the clutter process and on an a priori model p_D(x) of the state-dependent probability of detection. In 2009, I initiated a study of PHD and CPHD filters that do not require a priori clutter models but, rather, are capable of estimating them, on the fly, directly from the measurements. In Refs. [29,30], the clutter process was assumed to be a finite superposition of Poisson clutter processes, each with an intensity function of the form κ(z) = λ · θ_c(z) with clutter rate 0 ≤ λ ≤ 1 and spatial distribution θ_c(z) parameterized by c. Unfortunately, the resulting PHD/CPHD filters are combinatorially complex. Subsequently, in Ref. [31], I derived computationally tractable version CPHD filters. In this case, the clutter process is assumed to be an infinite superposition of Bernoulli clutter processes, each with an intensity function of the form κ(z) = λ · θ_c(z) with 0 ≤ λ ≤ 1. Then, in Ref. [32], I showed how to further extend these filters when both the clutter process and p_D(x) are unknown. This filter has been implemented in certain special cases and shown to perform reasonably well under simulated conditions [33,34].

4. “Background agnostic” multi-Bernoulli filters. Vo, Vo, Hoseinnezhad, and Mahler have generalized the just-mentioned approach to nonlinear situations, via a particle-filter implementation of a background-agnostic multi-Bernoulli filter [35–37].

5. Principled, tractable multisensor CPHD/PHD filters. The PHD/CPHD filter measurement-update steps described in Sections 8.3.3 and 8.3.4 are inherently single-sensor formulas. What of the multisensor case? In practical application, the de facto approach has been to employ the “iterated corrector” approximation. That is, apply the measurement-update equations successively, once for each sensor. It is well known that this approach is not invariant to changes in the order of the sensors. Moreover, for the PHD filter (but apparently not for the CPHD filter) it turns out that the iterated-corrector approach leads to performance degradation when the probabilities of detection for the sensors are significantly different [38]. In Ref. [39], I introduced a new approximation that leads to principled, order-invariant, computationally tractable multisensor PHD and CPHD filters. Nagappa et al. have shown that this approximation outperforms the interated-corrector approach and, for the PHD filter, is also a good approximation of the theoretically correct two-sensor PHD filter [40].

6. Joint multisensor-multitarget tracking and sensor-bias estimation. Current multitarget detection and tracking algorithms presume that all sensors are spatially registered—i.e., that all sensor states are precisely specified with respect to some common coordinate system. In actuality, any particular sensor’s observations may be contaminated by spatial misregistration biases that may take translational, rotational, and other forms. In Ref. [41], I proposed an approach that leverages any unknown targets that may be in the scene, if there are enough of them present, to estimate the spatial biases of the sensors while simultaneously detecting and tracking the targets. Ristić and Clark have implemented a cluster-process variant of this approach for a specific kind of spatial misregistration, and found that it performs well [42].

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 8 Toward a Theoretical Foundation for Distributed Fusion

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 8 Toward a Theoretical Foundation for Distributed Fusion