8

Toward a Theoretical Foundation for Distributed Fusion

Ronald Mahler

CONTENTS

8.1    Introduction

8.2    Single-Target Distributed Fusion: Review

8.2.1    Single-Target Bayes Filter

8.2.2    T2F with Independent Sources

8.2.3    T2F with Known Double-Counting

8.2.4    Covariance Intersection

8.2.5    Exponential Mixture Fusion

8.3    Finite-Set Statistics: Review

8.3.1    Multitarget Recursive Bayes Filter

8.3.2    Multitarget Calculus

8.3.3    PHD Filter

8.3.4    CPHD Filter

8.3.5    Significant Recent Developments

8.4    General Multitarget Distributed Fusion

8.4.1    Multitarget T2F of Independent Sources

8.4.2    Multitarget T2F with Known Double-Counting

8.4.3    Multitarget XM Fusion

8.5    CPHD/PHD Filter Distributed Fusion

8.5.1    CPHD Filter T2F of Independent Sources

8.5.2    PHD Filter T2F of Independent Sources

8.5.3    CPHD Filter T2F with Known Double-Counting

8.5.4    PHD Filter T2F with Known Double-Counting

8.5.5    CPHD Filter XM Fusion

8.5.6    PHD Filter XM Fusion

8.6    Computational Issues

8.6.1    Implementation: Exact T2F Formulas

8.6.1.1    Case: GM-PHD Tracks

8.6.1.2    Case: Particle-PHD Tracks

8.6.2    Implementation: XM T2F Formulas

8.6.2.1    Case: GM-PHD Tracks

8.6.2.2    Case: Particle-PHD Tracks

8.7    Mathematical Derivations

8.7.1    Proof: CPHD T2F Fusion—Independent Sources

8.7.2    Proof: PHD T2F Fusion—Independent Sources

8.7.3    Proof: CPHD Filter with Double-Counting

8.7.4    Proof: CPHD Filter XM Fusion

8.7.5    Proof: PHD Filter XM Fusion

8.7.6    Proof: PHD Filter Chernoff Information

8.7.7    Proof: XM Implementation

8.8    Conclusions

References

8.1    INTRODUCTION

Measurement-to-track fusion (MTF) refers to the process of collecting measurement data and then using it to improve the accuracy of the most recent estimates of the numbers and states of targets. Over the last two decades, both the theory and the practice of MTF have become increasingly mature. But, in parallel, another development has occurred: the increasing prevalence of physically dispersed sensors connected by communications networks, ad hoc or otherwise. One response to this development might be to try to apply MTF techniques to such situations. But because transmission links are often bandwidth-limited, it is often not possible to transmit raw measurements in a timely fashion, if at all. Consequently, emphasis has shifted to the transmission of track data and to track-to-track fusion, hereafter abbreviated as “T2F.” Most commonly, the term “track data” refers to target state estimates and their associated error-covariance matrices—as supplied, for example, by a radar equipped with an extended Kalman filter (EKF). T2F refers to the process of merging single- or multi-target track data from multiple sensor sources, with the aim of achieving more accurate localization, increased track continuity, and fewer false tracks.

T2F is fundamentally different than MTF. In particular, it cannot be addressed by processing tracks in the same way as measurements. Both MTF theory and practice are commonly based on two independence assumptions. First, measurements are statistically independent from time-step to time-step. Second, measurements generated by different sensor sources are statistically independent.

However, single-target track data is the consequence of some recursive filtering process, such as an EKF, and consequently is inherently time-correlated. If it is processed in the same way as measurements, spuriously optimistic target localization estimates will be the result. “Tracklet” approaches [1], such as inverse Kalman filters, decorrelate tracks so that they can be processed in the same way as measurements. However, such techniques cannot be effectively applied when targets are rapidly maneuvering, since decorrelation must be performed over some extended time-window.

Furthermore, multisource track data (like multisource measurement data) in distributed networks can be corrupted by “double counting” [2]. A simple example: data from node A is passed to nodes X and Y, which then pass it to node B. If node B processes this data as though it were independent, then spuriously optimistic target localization will again be the result. Many T2 fusion solutions have been devised for networks with pre-specified topologies. But such methods will not be applicable to ad hoc networks. “Pedigree” techniques have been proposed to address this challenge, by having every node “stamp” the tracks with suitable metadata before passing them on. In a large network, however, accumulated metadata can eventually greatly exceed the size of the track data that it documents. This problem can be sidestepped through node-to-node querying methods—but at the cost of increased bandwidth requirements. (A more practical difficulty: the large number of legacy networks makes it unlikely that any pedigree convention is likely to be accepted, standardized, and implemented across all or even some of them.)

In part because of such issues, T2F theory is probably as underdeveloped now as MTF theory was two or three decades ago. The goal of this chapter is to try to remedy this situation by proposing the elements of a general theoretical foundation for T2F, building on ideas that I first suggested in 2000 [3]. These ideas have recently been greatly refined, especially by Daniel Clark and his associates [4–6].

The methodology will be the same as that which I have previously applied to MTF and which has been described in Statistical Multisource-Multitarget Information Fusion [7]:

1.  Model an entire multisensor-multitarget system as a single, evolving stochastic process using the theory of random finite sets.

2.  Formulate an optimal solution to the problem at hand—typically in the form of some kind of multisource-multitarget recursive Bayes filter.

3.  Recognize that one way to accomplish this is to find an optimal solution to the corresponding single-sensor, single-target problem and then generalize it to the multisensor-multitarget case.

4.  Recognize that this optimal solution will almost always be computationally intractable, and thus that principled statistical approximations of it must be formulated.

The principled approximation methods that I have most frequently advocated are as follows:

1.  Probability hypothesis density (PHD) filters, in which the multitarget process is approximated as an evolving Poisson process [7, chapter 16].

2.  Cardinalized PHD (CPHD) filters, in which it is approximated as an evolving identically, independently distributed cluster (i.i.d.c.) process [7, chapter 16].

3.  Multi-Bernoulli filters, in which it is approximated as an evolving multi-Bernoulli process [7, chapter 17].

In what follows I will consider only the first two approximation methods, which will be applied to three successively more difficult multisource-multitarget track fusion challenges:

1.  Exact T2F of independent track sources.

2.  Exact T2F of track sources with known double-counting.

3.  Approximate T2F of track sources having unknown correlations, using multitarget generalizations of Uhlmann and Julier’s covariance intersection (CI) approach.

In each of these cases I proceed by formulating a general approach to multisourcemultitarget T2F and then by deriving more computationally tractable approximations using CPHD and PHD filters in the manner proposed by Clark et al.

The chapter is organized as follows:

1.  Section 8.2: Review of single-target T2F theory.

2.  Section 8.3: Review of those aspects of finite-set statistics (FISST) required to understand the chapter.

3.  Section 8.4: Direct generalization of single-target T2F to multitarget T2F.

4.  Section 8.5: Approximation of this general approach using CPHD and PHD filters.

5.  Section 8.6: A discussion of possible implementation approaches.

6.  Section 8.7: Mathematical derivations.

7.  Section 8.8: Summary and conclusions.

8.2    SINGLE-TARGET DISTRIBUTED FUSION: REVIEW

In this section, I summarize some major aspects of single-target T2F that will be required for what follows:

1.  Section 8.2.1: The single-target recursive Bayes filter is the foundation of the material in this section. I summarize the basic elements of this filter and define the concept of a “track” in general.

2.  Section 8.2.2: Single-target T2F when the track sources are independent. Approach: the track-merging formula of Chong et al. and its special case, Bayes parallel combination.

3.  Section 8.2.3: Single-target T2F when the track sources are dependent because of known double-counting. Approach: the generalized track-merging formula of Chong et al.

4.  Section 8.2.4: Single-target T2F when the track sources are linear-Gaussian but their correlations are completely unknown. Approach: the CI method of Uhlmann and Julier.

5.  Section 8.2.5: Single-target T2F when the track sources are arbitrary and their correlations are completely unknown. Approach: Mahler’s generalized CI method, rechristened by Julier and Uhlmann as “exponential mixture” (XM) fusion.

8.2.1    SINGLE-TARGET BAYES FILTER

The approach in this section is based on the Bayesian theoretical foundation for single-target tracking, the single-target Bayes nonlinear filter (see Chapter 2 of [7]). This filter propagates a Bayes posterior distribution fk | k(x | Zk) through time

fk|k(x|Zk)predictorfk+1|k(x|Zk)correctorfk+1|k+1(x|Zk+1)

(8.1)

where

x is the single-target state-vector

Zk: z1,…,zk is a time-sequence of measurements collected by the sensor at times t1,…,tk

The Bayes filter presumes the existence of models for the sensor and for the presumed interim target motion, for example the additive models

Xk+1|k=φk(x)+Wk,Zk+1=ηk+1(x)+Vk+1,

(8.2)

where (1) x is the target state, (2) the deterministic motion model φk(x) is a nonlinear function of x, (3) Wk is a zero-mean random vector (the “plant noise”), (4) the deterministic measurement model η(x) is a nonlinear function of x, and (5) Vk is a zero-mean random vector (the sensor measurement noise). Given these models one can construct a Markov transition density and likelihood function. For the additive models, for example, these have the form

fk+1|k(x|x)=fwk(xφk(x)),fk+1(z|x)=fVk+1(zηk+1(x)).

(8.3)

The single-target recursive Bayes filter is defined by the time-update and measurement-update equations

fk+1|k(x|Zk)=fk+1|k(x|x).fk|k(x|Zk)dx

(8.4)

fk+|1k+1(x|Zk+1)=fk+1(zk+1|x)fk+1|k(x|Zk)fk+1(zk+1|Zk)

(8.5)

where the Bayes normalization factor is

fk+1(zk+1|Zk)=fk+1(zk+1|x)fk+1|k(x|Zk)dx.

(8.6)

Information of interest—target position, velocity, type, etc.—can be extracted from fk | k(x | Zk) using a Bayes-optimal multitarget state estimator. The maximum a posteriori (MAP) estimator, for example, determines the most probable target state:

xk+1|k+1MAP=argsupxfk+1|k+1(x|Zk+1).

(8.7)

Multisensor, single-target MTF with independent sensors is accomplished by applying Equation 8.5 successively for each sensor. Suppose, for example, that there are s sensors. Their respective, simultaneously collected measurements z1,,zs are mediated by likelihood functions f1k+1(z1|x),,fsk+1(zs|x). By applying Equation 8.5 first using f1k+1(z1|x) and then using f2k+1(z2|x) and so on, the measurements z1,,zs are not only fused, but differences in sensor noise, sensor geometry, sensor obscurations, etc., are taken into account. Equivalently, one can apply Equation 8.5 to the joint likelihood function

f1k+1(Z|x)=f1k+1(z1|x)fsk+1(zs|x)

(8.8)

where Z={z1,,zs} denotes the set of multisensor measurements.

When motion and measurement models are linear-Gaussian, the Bayes filter reduces to the Kalman filter. Likewise, the multisensor Bayes filter (for independent sensors) reduces to the multisensor Kalman filter. In either case, a “track” can mean any of the following: (1) an instantaneous state-estimate xk+1 | k+1, (2) xk+1 | k+1 together with its error covariance matrix Pk+1 | k+1, (3) a labeled time-sequence of state-estimates, or (4) a labeled time-sequence of state-estimates and error covariance matrices.

Remark 1: Since my goal is to develop a more general T2F theory, in what follows a “track” at a particular time-step k will refer to the entire distribution fk | k(x | Zk), rather than to the estimates xk | k or (xk | k, Pk | k) extracted from it. Also, for the sake of notational simplicity, I will typically suppress measurement-dependence and employ the abbreviation

fk|k(x)=abbr.fk|k(x|Zk).

(8.9)

8.2.2    T2F WITH INDEPENDENT SOURCES

Suppose that a single target is being tracked and that s independent sources, relying on their own dedicated local sensors, provide track data about this target to a T2F site. The jth sensor suite collects a time-sequence Zkj:Z1,j,Zkj, where Zlj denotes the set of measurements supplied by the jth source’s sensors at time tl. The source does not pass its measurements directly to the fusion site. Rather, it passes the following information:

•  Measurement-updated, single-target track data, in the form of posterior distributions fjk|k(x)=abbr.fjk|k(x|Zkj)

•  Time-updated, single-target track data, in the form of distributions fjk+1|k(x)=abbr.fjk+1|k(x|Zkj)

Let fk|k(x)=abbr.fk|k(x|Zk) be the fusion node’s determination of the target state, given the accumulated track data Zk supplied by all of the sensor sources. Then Chong et al. [2] noted that the fused data at time-step k + 1 is exactly specified by the following track-merging formula:

fk+1|k+1(x)f1k+1|k+1(x)f1k+1|k(x)fsk+1|k+1(x)fsk+1|k(x)fk+1|k(x)

(8.10)

K=f1k+1|k+1(x)f1k+1|k(x)fsk+1|k+1(x)fsk+1|k(x)fk+1|k(x)dx.

(8.11)

This formula also applies to the asynchronous-sensor case. If each source has its own data rate, then the measurement-collection times t1,…,tk can be taken to refer to the arrival times of data from all of the sources, taken collectively. If at time tl only sl of the sources provide data, then Equation 8.10 is replaced by the corresponding formula for those sources only.

Equation 8.10 is an immediate consequence of Bayes’ rule. Let fk+1(Zj|x) be the joint likelihood function for the jth source’s local sensors. Then

fk+1|k+1(x)fk+1(Z1k+1+|x)fk+1(Zsk+1+|x)fk+1|k(x)

(8.12)

and thus Equation 8.10 follows from the fact that fjk+1|k+1(x)fk+1(Zjk+1+x)fjk+1|k(x) for all j = 1,…,s.

Suppose, now, that the sources do not pass on their time-updated track data fjk+1|k(x) but, rather, only their measurement-updated track data fjk+1|k+1(x). (This is what happens with radars equipped with EKFs, for example.) In this case, Equation 8.10 can no longer be constructed, and some approximation must be devised.

One approach is to presume that all of the sources employ identical target motion models. That is, the sources’ Markov densities fjk+1|k(x|x) are identical to the fusion site’s Markov density: fjk+1|k(x|x)=fk+1|k(x|x) for all j = 1,…,s. Under this assumption, the fusion site can itself construct time-updated track data for the sources, using the prediction integral

fjk+1|k(x)=fk+1|k(x|x)fjk|k(x)dx,

(8.13)

and then apply Equation 8.10.

A second but more restrictive approximation is also possible. It is based on the presumption that the sources’ time-updated track data is identical to the fusion site’s: fjk+1|k(x)=fk+1|k(x) for all j = 1,…,s. In this case, Equation 8.10 reduces to

fk+1|k+1(x)f1k+1|k+1(x)fsk+1|k+1(x)fk+1|k(x)1s.

(8.14)

This formula is known as “Bayes parallel combination” [7, p. 137].

8.2.3    T2F WITH KNOWN DOUBLE-COUNTING

In the previous section, it was assumed that each data source is equipped with its own suite of dedicated sensors—that is, the sources share no sensors in common. That is, expressed with greater mathematical precision, let Zi1,Zik be the time-sequence of measurement-sets for the ith source and let Zi1,Zik be the time-sequence of measurement-sets for the jth source. Then ZilZilϕ whenever ij, for all l = 1,…,k.

If on the other hand ZilZjlϕ, then the sources are sharing at least some sensors and double-counting of measurements occurs. Chong et al. [2] generalized Equation 8.10 to this case—assuming that one knows, a priori, which sensors are being shared by which sources. Define Zk+1=Z1k+1Zsk+1.. Let

•  Z12k+1 be the measurements supplied to the second source that are not in Z1k+1

•  Z13k+1 the measurements supplied to the third source that are not in Z1k+1Z12k+1

•  Z14k+1 the measurements supplied to the fourth source that are not in Z1k+1Z12k+1Z13k+1

and so on. Define

Z(j)k+1=Zjk+1Z1jk+1.

(8.15)

Then Equation 8.10 generalizes to

fk+1|k+1(x)f1k+1|k+1(x)f1k+1|k(x)f2k+1|k+1(x)f2k+1|k(x|Z(2))fsk+1|k+1(x)f2k+1|k(x|Z(s))·fk+1|k(x).

(8.16)

If Equation 8.16 is to be applied, the jth source must know which sensors it shares with each of sources 1,…,j − 1, and must pass on fjk+1|k(x|Z(j)) in addition to fjk+1|k(x). Clearly, as the number of sensors increases, the problem becomes more complex, in terms of both computational cost and communications requirements.

Equation 8.16 is, once again, an immediate consequence of Bayes’ rule:

fk+1|k+1(x)fk+1(Z1k+1|x)fk+1(Z12k+1|x)fk+1(Z1sk+1|x)fk+1|k(x)

(8.17)

=fk+1(Z1k+1|x)fk+1(Z2k+1|x)fk+1(Z(2)k+1|x)fk+1(Zsk+1|x)fk+1(Z(s)k+1|x)fk+1|k(x)

(8.18)

f1k+1|k+1(x)f1k+1|k(x)f2k+1|k+1(x)f2k+1|k(x|Z(2))fsk+1|k+1(x)fsk+1|k(x|Z(s))fk+1|k(x).

(8.19)

As an example, set s = 2 and suppose that fk+1|k(x)=f1k+1|k(x). Then Equation 8.16 reduces to the following formula of Chong et al. [2]:

fk+1|k+1(x)f1k+1|k+1(x)f2k+1|k+1(x)f1k+1|k(x|Z1k+1Z2k+1)

(8.20)

For in this case, Z12k+1=Z2k+1(Z1k+1Z2k+1) and so Z(2)k+1=Z2k+1Z12k+1=Z1k+1Z2k+1. Thus

fk+1|k+1(x)f1k+1|k+1(x)f1k+1|k(x)f2k+1|k+1(x)f2k+1|k(x|Z(2))fk+1|k(x).

(8.21)

=f1k+1|k+1(x)f1k+1|k(x)f2k+1|k+1(x)f2k+1|k(x|Z1k+1Z2k+1)fk+1|k(x)

(8.22)

=f1k+1|k+1(x)f2k+1|k+1(x)f2k+1|k(x|Z1k+1Z2k+1)

(8.23)

8.2.4    COVARIANCE INTERSECTION

Sections 8.2.2 and 8.2.3 address situations in which enough a priori knowledge is available to make exact track merging possible. In general, however, this will not be possible. This is because not enough a priori information is available, or because even if available it cannot be effectively utilized. This situation is, in part, what the CI method of Uhlmann and Julier [8–10] is intended to address.

Suppose that a single target is being observed by two track sources. At time-step k, the first source provides a track (x0k|k,P0k|k) and the second source provides a track (x1k|k,P1k|k). CI is a method for merging (x0k|k,P0k|k) and (x1k|k,P1k|k) into a single (xk | k,Pk | k) that is robust with respect to ambiguity. This means, in particular, that the uncertainty Pk | k in xk | k is neither too small (over-confidence) nor too large (under-confidence). Let 0 ≤ ω ≤ 1 and define (xωk|k,Pωk|k) by

Pk|k1ω=(1ω)Pk|k10+ωPk|k11

(8.24)

Pk|k1ωxωk|k=(1ω)Pk|k10x0k|k+ωPk|k11x1k|k.

(8.25)

The matrix Pωk|k is positive-definite regardless of the value of ω, and (xωk|k,Pωk|k) instantiates to (x0k|k,P0k|k) resp. (x1k|k,P1k|k) when ω = 0 resp. ω = 1.

Suppose that

(xx0k|k)TPk|k10(xx0k|k)σ2

(8.26)

(xx1k|k)TPk|k11(xx1k|k)σ2

(8.27)

are the error hyper-ellipsoids of size σ associated with the tracks (x0k|k,P0k|k) and (x1k|k,P1k|k). Then it can be shown that, for any 0 ≤ ω ≤ 1 and any σ > 0,

(xxωk|k)TPk|k1ω(xxωk|k)σ2.

(8.28)

That is, the error hyper-ellipsoid of the merged track always contains the intersection of the interiors of the error hyper-ellipsoids of the original tracks.

Intuitively speaking, ω should be chosen so that the hypervolume of the hyperellipsoid (xxωk|k)TPk|k1ω(xxωk|k)=σ2 is as small as possible. That is, the merged hyper-ellipsoid should have the best possible fit to the intersection-region of the two original hyper-ellipsoids. Uhlmann and Julier proposed choosing ω=ω^ so that it minimizes either the trace tr Pωk|k or the determinant det Pωk|k. They demonstrated that this approach yields an approximation of the exact merged track that is unbiased and whose degree of uncertainty is not overstated.

Fränken and Hüpper [11] subsequently proposed a more computationally tractable “fast CI” approximation. Here, ω is chosen according to the formula

1ω=det(Pk|k10+Pk|k11)detPk|k11+detPk|k102det(Pk|k10+Pk|k1)1.

(8.29)

These authors also proposed the following generalization. Consider the multisource CI problem defined by

Pk|k1ω=ω1Pk|k11++ωnPk|k1n

(8.30)

Pk|k1ωxωk|k=ω1Pk|k11x0k|k++ωnPk|k1nxnk|k

(8.31)

with ω1 + ⋯ + ωn = 1. Then their proposed approximation is

ωi=detPkk1det(Pkk1Pk|k1i)+detPk|k1indetPkk1+j=1n[ detPk|k1jdet(Pkk1Pk|k1j) ]

(8.32)

where

Pkk1=i=1nPkk1.i

Much research has been devoted to determining the effectiveness of CI. The emerging consensus seems to be that CI tends to produce estimates of the fused track that are pessimistic. That is, the fused target-localizations are significantly worse than what one would get from an exact fused solution. This behavior is exactly what one would expect, given that, by design, CI must address worst-case situations in which to-be-fused tracks could be highly correlated.

8.2.5    EXPONENTIAL MIXTURE FUSION

The CI method addresses the merging of only linear-Gaussian track sources. How might it be generalized to more general sources? In 2000 [3], I observed that the following identity is true:

NPk/k0(xx0k|k)1ωNPk/k1(xx1k|k)ωNPk/k0(yx0k|k)1ωNPk/k1(yx1k|k)ωdy=NPk/kω(xxωk|k)

(8.33)

where, in general, NP0(xx0) denotes a multidimensional Gaussian distribution with mean x0 and covariance matrix P0. That is, CI can be expressed entirely in terms of density functions rather than covariance matrices. I proposed, therefore, that the following definition be taken as the obvious generalization of the CI merging formula to arbitrary track sources:

fωk+1|k+1(x)=f0k+1|k+1(x)1ωf1k+1|k+1(x)ωf0k+1|k+1(y)1ωf1k+1|k+1(y)ωdy.

(8.34)

Hurley independently proposed Equation 8.34 in 2002 [12]. He also justified its theoretical reasonableness on the basis of its similarity to Chernoff information, which is defined as follows:

C(f1k+1|k+1;f0k+1|k+1)=sup0ω1(logf0k+1|k+1(x)1ωf1k+1|k+1(x)ωdx).

(8.35)

As it turns out, Equation 8.34 is a special case of “logarithmic opinion pooling,” when the opinions of only two experts are being pooled [13]. This means that CI is itself a special case of logarithmic opinion pooling, given that the opinions of two linear-Gaussian experts are being pooled. Julier and Uhlmann have described Equation 8.34 as an “XM model” for track fusion [14,15]. (It has also been given the name “Chernoff fusion” [16].) I will adopt their terminology in what follows, abbreviating it as “XM fusion.” (Julier has also suggested approximations for computing the XM fusion formula when the original distributions f0k+1|k+1(x) and f1k+1|k+1(x) Gaussian mixtures [14].)

The XM fusion density has several appealing properties. First, and perhaps most importantly, Julier has shown that it is invariant with respect to double counting [17]. That is, suppose that the distributions f0k+1|k+1(x) and f1k+1|k+1(x) have double-counted information in the sense of Section 8.2.3. Then fωk+1|k+1(x) incorporates the double-counted information only once, in the same sense as does Equation 8.20.

Second, for all 0 ≤ ω ≤ 1 [9]:

min{f0k+1|k+1(x),f1k+1|k+1(x)}fωk+1|k+1(x)(allx)

(8.36)

min{f0k+1|k+1(x0),f1k+1|k+1(x0)}fωk+1|k+1(x0)(thereexixtsx0).

(8.37)

The first inequality indicates that fωk+1|k+1(x) does not reduce information (as compared to the original distributions), whereas the second one indicates that it can also increase it.

In Ref. [3], I proposed the following as the most theoretically reasonable procedure for optimizing ω

ω^=argsupωsupxfωk+1|k+1(x),

(8.38)

in which case fωk+1|k+1(x) with ω=ω^ results in the best choice of track merging. That is, the optimal value of ω is the one that results in the largest MAP estimate. (Note that Equation 8.38 can be approximated by computing the covariance matrix Pωk|k of fωk+1|k+1(x) and minimizing its determinant or trace, as originally proposed by Uhlmann and Julier [8-10].)

Julier has proposed [14] that, rather than Equation 8.38, a more theoretically principled optimization procedure would be to choose ω as the maximizing value in Equation 8.35:

ω=arginfωf0k+1|k+1(x)1ωf1k+1|k+1(x)ωdx.

(8.39)

This has the effect of minimizing the degree of overlap between the distributions f0k+1|k+1(x)1ω and f1k+1|k+1(x)ω. His reasoning is as follows. First, ω reflects the information contained in the distribution fωk+1|k+1(x) as an entirety—rather than just the information contained at a single point, the MAP estimate. Second, fωk+1|k+1(x) can be shown to be equally distant from f0k+1|k+1(x) and f0k+1|k+1(x) in a Kullback–Leibler information-theoretic sense.

Nevertheless, I argue that Equation 8.38 is a more justifiable theoretical choice, for two reasons:

1.  In target tracking, a track distribution fk | k(x) is of little interest unless one can extract from it an accurate estimate of target state. Using the entire distribution fk | k(x) for this purpose is typically a bad idea. For example, in practical application, most of the modes of fk | k(x) will be minor modes caused by clutter returns, along with (if SNR is large enough) a single larger target-associated mode. Thus an estimator that employs all of fk | k(x)—the expected value x¯k|k of fk | k(x) for example—can produce unstable and very unaccurate estimates. The MAP estimator, Equation 8.7, is usually more appropriate for practical application, since it tends to produce more stable and accurate state estimates.

2.  Abstract information-theoretic distances should be treated with caution when isolated from physical intuition. There is a literal infinitude of information-based distance concepts—most obviously, the Csiszár-divergence family [18,19]

Kc(f1k+1|k+1;f0k+1|k+1)=f0k+1|k+1(x)c(f1k+1|k+1(x)f0k+1|k+1(x))dx

(8.40)

and its multitarget generalizations [20], where c(x) is some nonnegative convex function. For example, choose the convex kernel c(x) to be cω(x) = (1 − ω)x + ω − xω. Then Chernoff information can be expressed in terms of Kcω, which is

Kcω(f0k+1|k+1;f1k+1|k+1)=1f0k+1|k+1(x)1ωf1k+1|k+1(x)ωdx.

(8.41)

In addition to these, there are many distance metrics on probability distributions, such as Wasserstein distance. Which of these is “best,” why is it best, and what might its physical interpretation be?

The reasoning behind Equation 8.38, by way of contrast, inherently arises from the practical goal of trying to achieve the most accurate and stable state estimates possible. For each ω, consider the following statements about fωk+1|k+1(x):

1.  It is the distribution of the merged track.

2.  The MAP estimate for this track is xωk+1|k+1=argsupxfωk+1|k+1(x).

3.  The larger the value of supxfωk+1|k+1(x), the more probable—and therefore the more sharply localized—xωk+1|k+1 will be.

4.  Thus one should choose that value ω^ of ω which corresponds to the most-probable (best localized) MAP estimate.

The necessity of this line of reasoning will become apparent when I propose multi-target generalizations of XM fusion later in the chapter. In this situation, concepts such as covariance or trace can no longer even be defined. Concepts such as Chernoff information and Csiszár discrimination can still be defined, but their physical meaning is even less evident than in the single-target case. The primary difficulty is a practical one, namely that in multitarget problems the computability of Equation 8.38 will be questionable. Thus computational tractability will usually be the primary motivation for choosing information-theoretic or other optimization approaches in preference to Equation 8.38.

8.3    FINITE-SET STATISTICS: REVIEW

In this section, I briefly review basic elements of finite-set statistics (FISST) [7,21,22] that are required for the material that follows:

1.  Section 8.3.1: The multisensor-multitarget recursive Bayes filter. This is the foundation for the approach to T2F that will be introduced shortly.

2.  Section 8.3.2: A brief summary of the basic elements of the FISST differential and integral multitarget calculus, including Poisson processes and i.i.d.c. processes.

3.  Section 8.3.3: The PHD filter. This is the first computational approximation of the multitarget Bayes filter.

4.  Section 8.3.4: The CPHD filter. This is the second computational approximation of the multitarget Bayes filter.

5.  Section 8.3.5: A brief summary of significant recent advances involving PHD and CPHD filters.

8.3.1    MULTITARGET RECURSIVE BAYES FILTER

My approach to multisource-multitarget T2F is based on the multisensor-multitarget recursive Bayes filter [7, chapter 14]. Let Z(k): Z1,…,Zk be a time-sequence of multisensor-multitarget measurement-sets Zi collected at times t1,…,tk. That is, each Zi consists of the measurements collected by all available sensors at or near time-step i. They can have the form Zi = ; (no measurements collected); Zi = {z1} (one measurement z1 collected); Zi = {z1,z2} (two measurements z1,z2 collected); and so on. Given this, the multitarget Bayes filter propagates a multitarget posterior distribution fk | k(X | Zk) through time:

fk|k(X|Z(k))predictorfk+1|k(X|Z(k))correctorfk+1|k+1(X|Z(k+1))

(8.42)

Here, X is the single-target state-set—i.e., X = if no targets are present, X = {x1} if a single target with state x1 is present, X = {x1,x2} if two targets with states x1,x2 are present, etc. The “cardinality distribution”

pk+1|k+1(n|Z(k+1))=|X|=nfk+1|k+1(X|Z(k+1))δX

(8.43)

defines the posterior probability that the multitarget scene contains n targets, where δX indicates a multitarget “set integral” as defined in Section 8.3.2.

The multitarget Bayes filter presumes the existence of multitarget motion and measurement models, for example:

Ξk+1|k=Sk(X)Bk,Σk+1|k=Tk+1(X)Ck+1

(8.44)

where

Sk(X) is the random finite subset (RFS) of persisting targets

Bk is the RFS of appearing targets

Tk+1(X) is the RFS of target-generated measurements

Ck+1 is the RFS of clutter measurements

Given these models, using multitarget calculus (Section 8.3.2) one can construct a multitarget Markov transition density and a multitarget likelihood function

fk+1|k(X|X),fk+1(Z|X)

(8.45)

(see Chapters 12 and 13 of [7]). Because of this systematic specification of models, at any given time-step the distribution fk | k(X | Z(k)) systematically encapsulates all relevant information regarding the presumed strengths and weaknesses of the targets, and the known strengths and weaknesses of the sensors.

The multitarget Bayes filter is defined by the predictor and corrector equations

fk+1|k(X|Z(k))=fk+1|k(X|X)fk|k(X|Z(k))δX

(8.46)

fk+1|k+1(X|Z(k+1))=fk+1(Zk+1|X)fk+1|k(X|Z(k))fk+1(Zk+1|Z(k))

(8.47)

where

fk+1(Zk+1|Z(k))=fk+1(Zk+1|X)fk+1|k(X|Z(k))δX.

(8.48)

In what follows, I will abbreviate, for all k ≥ 0,

fk|k(X)=abbr.fk|k(X|Z(k))

(8.49)

fk+1|k(X)=abbr.fk+1|k(X|Z(k)).

(8.50)

Information of interest—number of targets, the positions, velocities, and types of the targets, etc.—can be jointly extracted from fk | k(X | Z(k)) using a Bayes-optimal multitarget state estimator (see Section 14.5 of [7]). For example, the joint multitarget (JoM) estimator is defined by

Xk+1|k+1JoM=argsupXfk+1|k+1(X|Z(k+1))c|X||X|!

(8.51)

where c is a fixed constant which has the same units of measurement as the single-target state x.

Remark 2: Generally speaking, c should be approximately equal to the accuracy to which the state is to be estimated, as long as the following inequality is satisfied [7, p. 500]: fk+1|k+1(X|Z(k+1))cn^1 for all X, where n^ is the MAP estimate derived from the cardinality distribution.

8.3.2    MULTITARGET CALCULUS

The finite-set statistics multitarget integral-differential calculus is central to the approach that I advocate. Functional derivatives and set derivatives [7, chapter 11] are key to the construction of “true” multitarget Markov densities and multitarget likelihood functions. They are also key to the construction of principled approximations of the multitarget Bayes filter, such as the PHD and CPHD filters.

A set integral accounts for random variability in target number as well as in target state. Let fk | k(X) be a multitarget probability distribution. Then it has the form

f(X)δX=f(ϕ)+n=11n!fk|k({x1,,xn})dx1dxn.

(8.52)

Let F[h] be any functional—i.e., a scalar-valued function whose argument h is a function h(x). Then the functional derivative of F with respect to any finite set X = {x1,…,xn} with |X| = n ≥ 0 is given by

δFδX[h]=δδx1δδxnF[h]

(8.53)

δδxF[h]=limε0F[h+εδx]F[h]ε

(8.54)

where δx(x′) denotes the Dirac delta function concentrated at x. Functional derivatives and set integrals are inverse operations, in the sense that

F[h]=hXδFδX[0]δX

(8.55)

[ δδXhYf(X)δX ]h=0=f(X).

(8.56)

Here, for any function h(x),

hX={ 1ifX=ϕxXh(x)ifotherwise.

(8.57)

In this chapter, we will require frequent use of two special multitarget processes. Suppose that f(X) is a multitarget probability distribution. Then it is the distribution of

•  A Poisson process (Poisson RFS) if

f(X)=eNDX

(8.58)

where

N=D(x)dx

D(x) is the PHD, or “intensity function,” of the process

•  An independent, identically distributed cluster (i.i.d.c.) process (i.i.d.c. RFS) if

f(X)=|X|!p(|X |)sX

(8.59)

where

s(x) is the spatial density

p(n) is the cardinality distribution of the process

Equation 8.58 is a special case of Equation 8.59 with p(n)=eN.Nn/n!

As an example, one can verify that Equation 8.58 defines a multitarget probability distribution:

f(X)δX=eNDϕ+eNn=11n!D(x1)D(xn)dx1dxn

(8.60)

=eN+eNn=11n!Nn=eNeN=1.

(8.61)

Likewise, Equation 8.59 defines a multitarget probability distribution:

f(X)δX=0!p(0)sϕ+n=1n!p(n)n!s(x1)s(xn)dx1dxn

(8.62)

=p(0)+n=1p(n)=1.

(8.63)

8.3.3    PHD FILTER

Constant-gain Kalman filters—the alpha-beta filter, for example—provide the most computationally tractable approximation of the single-sensor Bayes filter. A constant-gain Kalman filter propagates the first statistical moment (posterior expectation) x^k|k in place of fk | k(x | Zk), using alternating predictor steps x^k|kx^k+1|k and corrector steps x^k+1|kx^k+1|k+1.

The PHD filter mimics this basic idea, but at a more abstract, statistical level [7, Chapter 16] [23]. It propagates a first-order multitarget moment of the multitarget posterior fk | k(X | Z(k)) instead of fk | k(X | Z(k)) itself:

Dk|k(x|Z(k))predictorDk+1|k(x|Z(k))correctorDk+1|k+1(x|Z(k+1))

(8.64)

This moment, the PHD, is the density function on single-target states x defined by

Dk|k(x)=abbr.Dk|k(x|Z(k))=fk|k(X{x}|Z(k))δX.

(8.65)

It is not a probability density, since its integral is in general not 1. Rather, Nk|k=Dk|k(x)dx is the total expected number of targets in the scenario. Intuitively speaking, Dk | k(x) is the track density at x. The peaks of Dk | k(x) are approximately at the locations of the most likely target states. So, one way of estimating the number n^ and states x^1,,x^n^ of the predicted tracks is to take n^ to be the nearest integer n^ in Nk+1 | k and then determine the n^ highest peaks of Dk | k(x).

The PHD can be propagated through time using the following predictor (time-update) and corrector (data-update) equations. Neglecting the spawning of targets by other targets, these are

Dk+1|k(x)=Nk+1|kBsk+1|kB(x)+ps(x)fk+1|k(x|x)Dk|k(x)dx

(8.66)

Dk+1|k+1(x)Dk+|1k(x)=1pD(x)+zZk+1pD(x)Lz(x)λk+1ck+1(z)+τk+1(z).

(8.67)

Here,

•  Nk+1|kB is the expected number, and sk+1|kB(x) the spatial distribution, of newly appearing targets.

•  ps(x)=abbr.ps,k+1|k(x) is the probability that a target with state x′ at time-step k will survive into time-step k + 1.

•  fk+1 | k(x | x′) is the single-target Markov transition density.

•  pD(x)=abbr.pD,k+1(x) is the probability that a target with state x at time-step k + 1 will generate a measurement.

•  Lz(x)=abbr.fk+1(z|x) is the single-target likelihood function.

•  λk+1 is the clutter rate and ck+1(z) is the spatial distribution of the Poisson clutter process, where

τk+1(z)=pD(x)Lz(x)Dk+1|k(x)dx.

(8.68)

One can get an intuitive understanding of how the PHD filter works by noticing that the measurement-updated expected number of targets is

Nk+1|k+1=Dk+1|k+1(x)dx=NNDk+1|k+1+zZk+1NDk+1|k+1(z)

(8.69)

where

NNDk+1|k+1=(1pD(x))Dk+1|k+1(x)dx

(8.70)

NDk+1|k+1(z)=τk+1(z)λk+1ck+1(z)+τk+1(z)1.

(8.71)

The nondetection term NNDk+1|k+1 is an estimate of the number of targets that have not been detected. The detection ratio NDk+1|k+1(z) assesses whether or not z originated with clutter or with a target. If NDk+1|k+1(z)>1/2—that is, if τk+1(z) > λk+1ck+1(z)—then z is “target-like.” If NDk+1|k+1(z)<1/2 then it is “clutter-like.”

The derivation of Equation 8.67 requires the following simplifying assumption: the predicted target process is approximately Poisson. As is evident from Equation 8.67, the PHD filter does not require explicit measurement-to-track association. It has computational order O(mn), where m is the current number of measurements and n is the current number of targets. It tends to produce inaccurate (high variance) instantaneous estimates Nk | k of target number. Thus it is typically necessary to average Nk | k over some time window.

The PHD filter can be implemented using both sequential Monte Carlo (SMC, a.k.a. particle-system) approximation, or Gaussian-mixture approximation. In the first case, it is called a “particle-PHD filter” and in the second case a “GM-PHD filter” (see Chapter 16 of [7] and [45–47]).

8.3.4    CPHD FILTER

The CPHD filter generalizes the PHD filter [7, chapter 16] [23]. It admits more general false alarm models (called “independent, identically distributed cluster” [i.i.d.c.] models) than the Poisson models assumed in the PHD filter. It propagates two things: a spatial distribution sk | k(x) and a cardinality distribution pk|k(n)=abbr.pk|k(n|Z(k)) on target number n:

{ sk|k(x|Z(k))pk|k(n|Z(k))predictor{ sk+1|k(x|Z(k))pk+1|k(n|Z(k))corrector{ sk+1|k+1(x|Z(k+1))pk+1|k+1(n|Z(k+1))

(8.72)

If Nk|k=n0npk|k(n|Z(k)) is the expected number of targets, then Dk |k(x | Z(k)) = Nk|k·sk|k(x|Z(k)) is the corresponding PHD. Or, equivalently, sk|k(x|Z(k))=Nk|k1Dk|k(x|Z(k)). CPHD Filter Time-Update Equations. The predictor equations for the CPHD filter are

Dk+1|k(x)=bk+1|k(x)+ps(x)fk+1|k(x|x)Dk|k(x)dx

(8.73)

pk+1|k(n)=n0pk+1|k(n|n)pk|k(n)

(8.74)

where pk+1|kB(nj) is the cardinality distribution of the birth process and where

pk+1|k(n|n)=j=0npk+1|kB(nj)Cn,jψkj(1ψk)nj

(8.75)

ψk=ps(x)sk+1|k(x)dx

(8.76)

Nk+1|k=Nk+1|kB+ps(x)Dk|k(x)dx

(8.77)

Nk+1|kB+bk+1|k(x)dx

(8.78)

Cn,j={ n!j!(nj)!if0jn0ifotherwise.

(8.79)

CPHD Filter Measurement-Update Equations. If m = |Zk+1| where Zk+1 = {z1,…,zm} is the newly collected measurement-set, then the corrector equations for the CPHD filter are

Dk+1|k+1(x)sk+1|k(x)=(1pD(x))ENDk+1+zZk+1pD(x)Lz(x)EDk+1(z)

(8.80)

pk+1|k+1(n)pk+1|k(n)=j=0min{m,n}(mj)!pk+1κ(mj)Pn,jϕknjσj(Zk+1)l=0m(ml)!pk+1κ(ml)σl(Zk+1)Gk+1|k(l)(ϕk)

(8.81)

where

ENDk+1=j=0m(mj)!pk+1κ(mj)σj(Zk+1)Gk+1|k(j+1)(ϕk)l=0m(ml)!pk+1κ(ml)σl(Zk+1)Gk+1|k(l)(ϕk)

(8.82)

EDk+1(z)=1ck+1(z)j=0m1(mj1)!pk+1κ(mj1)σj(Zk+1{zj})Gk+1|k(j+1)(ϕk)l=0m(ml)!pk+1κ(ml)σl(Zk+1)Gk+1|k(l)(ϕk)

(8.83)

and where

σj(Zk+1)=σm,j(τk+1(z1)ck+1(z1),,τk+1(zm)ck+1(zm))

(8.84)

Gk+1|k(l)(ϕk)=nlPn,lpk+1|k(n)ϕknl

(8.85)

Gk+1|k(j+1)(ϕk)=nj+1Pn,j+1pk+1|k(n)ϕknj1

(8.86)

ϕk=(1pD(x))sk+1|k+1(x)dx

(8.87)

τk+1(z)=pD(x)Lz(x)sk+1|k(x)dx

(8.88)

where Pn,i = n!/(ni)! is the permutation coefficient.

The corrector equations for the CPHD filter require the following simplifying assumption: that the predicted target process is approximately an i.i.d.c. process. The CPHD filter has computational order O(m3n), though this can be reduced to O(m2n) using special numerical techniques.

The CPHD filter can be implemented using both particle approximation and Gaussian-mixture approximation. In the first case, it is called a “particle-CPHD filter” and in the second case a “GM-CPHD filter.”

8.3.5    SIGNIFICANT RECENT DEVELOPMENTS

The theory and practice of random set filters has developed rapidly in recent years. In this section, I briefly summarize a few of the most recent advances:

1.  Track-before-detect filtering in pixelized images without preprocessing. Most multitarget tracking algorithms using pixelized image data rely on some kind of image preprocessing step to extract detection-type features: threshold detectors, edge detectors, blob detectors, etc. In Ref. [24], Vo, Vo, and Pham have demonstrated a computationally tractable multitarget detection and tracking algorithm that does not require such preprocessing. It is based on a suitable modification of the “multi-Bernoulli filter” introduced in Ref. [7, chapter 17] and then corrected and implemented in Ref. [25].

2.  Simultaneous localization and mapping (SLAM). When neither GPS nor terrain maps are available, a robotic platform must detect landmarks, use them to construct a terrain map on the fly, and simultaneously orient the platform with respect to that map. The current state-of-the-art in SLAM is the FastSLAM approach, which employs measurement-to-track association, in conjunction with heuristic procedures for clutter rejection and initiation and termination of landmarks. Mullane, Vo, Adams, and Vo have shown that a PHD filter-based SLAM filter significantly outperforms FastSLAM in regard to the accuracy of both platform trajectory estimation and landmark detection and localization [26,27]. Clark has devised an even faster and more accurate SLAM-PHD filter based on a cluster-process formulation [28].

3.  “Background agnostic” (BAG) CPHD filters. The “classical” CPHD filter relies on an a priori model λk+1,ck+1(z),pk+1κ(m) of the clutter process and on an a priori model pD(x) of the state-dependent probability of detection. In 2009, I initiated a study of PHD and CPHD filters that do not require a priori clutter models but, rather, are capable of estimating them, on the fly, directly from the measurements. In Refs. [29,30], the clutter process was assumed to be a finite superposition of Poisson clutter processes, each with an intensity function of the form κ(z) = λ · θc(z) with clutter rate 0 ≤ λ ≤ 1 and spatial distribution θc(z) parameterized by c. Unfortunately, the resulting PHD/CPHD filters are combinatorially complex. Subsequently, in Ref. [31], I derived computationally tractable version CPHD filters. In this case, the clutter process is assumed to be an infinite superposition of Bernoulli clutter processes, each with an intensity function of the form κ(z) = λ · θc(z) with 0 ≤ λ ≤ 1. Then, in Ref. [32], I showed how to further extend these filters when both the clutter process and pD(x) are unknown. This filter has been implemented in certain special cases and shown to perform reasonably well under simulated conditions [33,34].

4.  “Background agnostic” multi-Bernoulli filters. Vo, Vo, Hoseinnezhad, and Mahler have generalized the just-mentioned approach to nonlinear situations, via a particle-filter implementation of a background-agnostic multi-Bernoulli filter [35–37].

5.  Principled, tractable multisensor CPHD/PHD filters. The PHD/CPHD filter measurement-update steps described in Sections 8.3.3 and 8.3.4 are inherently single-sensor formulas. What of the multisensor case? In practical application, the de facto approach has been to employ the “iterated corrector” approximation. That is, apply the measurement-update equations successively, once for each sensor. It is well known that this approach is not invariant to changes in the order of the sensors. Moreover, for the PHD filter (but apparently not for the CPHD filter) it turns out that the iterated-corrector approach leads to performance degradation when the probabilities of detection for the sensors are significantly different [38]. In Ref. [39], I introduced a new approximation that leads to principled, order-invariant, computationally tractable multisensor PHD and CPHD filters. Nagappa et al. have shown that this approximation outperforms the interated-corrector approach and, for the PHD filter, is also a good approximation of the theoretically correct two-sensor PHD filter [40].

6.  Joint multisensor-multitarget tracking and sensor-bias estimation. Current multitarget detection and tracking algorithms presume that all sensors are spatially registered—i.e., that all sensor states are precisely specified with respect to some common coordinate system. In actuality, any particular sensor’s observations may be contaminated by spatial misregistration biases that may take translational, rotational, and other forms. In Ref. [41], I proposed an approach that leverages any unknown targets that may be in the scene, if there are enough of them present, to estimate the spatial biases of the sensors while simultaneously detecting and tracking the targets. Ristić and Clark have implemented a cluster-process variant of this approach for a specific kind of spatial misregistration, and found that it performs well [42].

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset