Chapter 7 Decentralized Data Fusion: Formulation and Algorithms

A decentralized data fusion (DDF) system consists of a network of sensing and computing nodes that aim to cooperatively estimate a common state [12]. Fusion occurs on each node using locally obtained observations and communications from neighboring nodes, without relying on a centralized decision or fusion system. This chapter summarizes and builds on previous research in DDF, including [18,22].

DDF systems have been characterized by three constraints [9,12]:

1. There should be no single central fusion center; no single node should be central to the successful operation of the network.

2. There is no common communication facility; nodes cannot broadcast results and communication must be kept on a strictly node-to-node basis.

3. Sensor nodes do not have any global knowledge of the network topology; nodes should only know about connections in their own neighborhood.

The resulting estimates in the decentralized system can be compared to an equivalent centralized estimator operating with the same observations and modeling assumptions. The focus of this chapter is on exact solutions for DDF, which are equivalent to centralized data fusion, in the following sense:

• The use of consistent fusion with information terms which are conditionally independent given the state, as opposed to methods which double count or miss information terms or use conservative fusion methods.

• The use of direct solution methods as opposed to iterative or convergent methods.

This chapter is organized as follows. Section 7.2 introduces the information form, which is used in this chapter as the expression for fusion operations. Section 7.3 discusses the fusion update and communication aspects of DDF and discusses the operation of DDF on tree topology networks. Section 7.4 introduces the trajectory state formulation for dynamics and uses this to operate decentralized networks for dynamic systems, including handling delayed, asequent and burst communications issues. Section 7.5 extends the tree topology to k-tree topologies for redundant and dynamic decentralized topologies.

7.2 INFORMATION FORM INTRODUCTION

For the decentralized algorithms presented in this chapter, it is convenient to express the fusion operations in terms of information by reformulating multiplication of probability as summation of log-probability.

The main properties that motivate the use of the information form are as follows:

1. Additivity of fusion and observation updates

2. Sparsity of the information matrix

Consider a random variable x with prior probability density function (PDF) described by a Gaussian PDF, together with a linear observation, described by a Gaussian likelihood:

$p (x) = \frac{1}{b} exp (- \frac{1}{2} {(x - \hat{x})}^{T} P^{- 1} (x - \hat{x}))$

(7.1)

$p (z | x) = \frac{1}{c} exp (- \frac{1}{2} {(H x - z)}^{T} R^{- 1} (H x - z))$

(7.2)

Where the observation is modeled as

$z = H x + w E [w] = 0 E [w w^{T}] = R$

(7.3)

Under Bayes’ rule, p(x | z) = p(z | x)p(x)/p(z), the posterior PDF given the prior and observation is

$p (x | z) = \frac{1}{d} exp (- \frac{1}{2} {(x - \hat{x})}^{T} P^{- 1} (x - \hat{x}) - \frac{1}{2} {(H x - z)}^{T} R^{- 1} (H x - z))$

(7.4)

$= \frac{1}{d} exp (- \frac{1}{2} {(x - {\hat{x}}_{+})}^{T} P_{+}^{- 1} (x - {\hat{x}}_{+}))$

(7.5)

The two expressions for the posterior must equate

${(x - {\hat{x}}_{+})}^{T} P_{+}^{- 1} (x - {\hat{x}}_{+}) = \begin{matrix} {(x - {\hat{x}}_{+})}^{T} P^{- 1} (x - \hat{x}) \\ + {(H x - z)}^{T} R^{- 1} (H x - z) \end{matrix}$

(7.6)

By matching first and second derivatives of each side with respect to x, this results in

$P_{+}^{- 1} = P^{- 1} + H^{Τ} R^{- 1} H$

(7.7)

$P_{+}^{- 1} {\hat{x}}_{+} = P^{- 1} {\hat{x}}_{+} + H^{Τ} R^{- 1} z$

(7.8)

The information form is defined by these terms P⁻¹ and $P^{- 1} \hat{x}$ :

$\begin{matrix} Y & ≜ & - \frac{\partial^{2} log {p (x)}}{\partial x^{2}} & = & P^{- 1} \\ \hat{y} & ≜ & {- \frac{\partial log {p (x)}}{\partial x} |}_{@ x = 0} & = & P^{- 1} \hat{x} \end{matrix}$

(7.9)

Consequently, given Y and $\hat{y}$ , the estimate is recovered by the solution of the linear system

$Y \hat{x} = \hat{y}$

(7.10)

The estimate $\hat{x}$ will be identical to that obtained by a covariance-based Gaussian estimator such as a Kalman filter operating under identical assumptions.

So, given a prior PDF described by information matrix Y and information vector y, the posterior following the observation is

$Y^{+} = Y + H^{T} R^{- 1} H$

(7.11)

$y^{+} = y + H^{T} R^{- 1} z$

(7.12)

It is convenient to label the observation as contributing observation information in the form

$I = H^{T} R^{- 1} H$

(7.13)

$i = H^{T} R^{- 1} z$

(7.14)

In general, the fusion of multiple, statistically independent terms is a straightforward addition:

$Y^{+} = \sum_{i} Y_{i} y^{+} = \sum_{i} y_{i}$

(7.15)

7.3 DECENTRALIZED FUSION AND COMMUNICATION

This section discusses DDF with a focus on the fusion and update steps resulting from communication and observations. These aspects are highlighted by considering DDF on a static state variable. Section 7.4 extends this discussion by considering system dynamics and temporal aspects.

The static case highlights the basic properties of the fusion component of the decentralized system, since the present formulation of DDF is built on the additive properties of the information fusion operations.

The system then consists of a common static state x, which is to be estimated on the multiple platforms. The decentralized system is required to obtain an estimate in exact agreement with an equivalent centralized estimator using the same observations and modeling assumptions. In a static system, the posterior information is identically the sum of the individual independent observation information terms

$Y = \sum_{k} H_{j}^{T} R_{k}^{- 1} H_{k} \hat{y} = \sum_{k} H_{k}^{T} R_{k}^{- 1} z_{k}$

(7.16)

$\hat{x} = Y / y$

(7.17)

In essence, the DDF nodes communicate to obtain the Y and $\hat{y}$ sums in Equation 7.16. The required globally agreeing estimate is then obtained through the solution of Equation 7.17 separately at each node.

Different algorithms on different topologies operate different methods for obtaining the sums in Equation 7.16:

• In a centralized estimator, each $H_{k}^{T} R_{k}^{- 1} H_{k}$ and $H_{k}^{T} R_{k}^{- 1} z_{k}$ is communicated to the central estimator, which performs the sum in Equation 7.16.

• In a fully connected decentralized topology, each node transmits $H_{k}^{T} R_{k}^{- 1} H_{k}$ and $H_{k}^{T} R_{k}^{- 1} z_{k}$ to each other node. Each node is then able to separately perform the sum in Equation 7.16.

• In a tree-connected decentralized topology, nodes accumulate partial sums of Equation 7.16 and communicate in a tree to obtain the global sums. This topology is discussed further later.

7.3.1 TREE NETWORK TOPOLOGY, CHANNEL CACHE

This section considers the singly connected or tree decentralized topology. Under this topology, the graph properties of the tree and the distributivity of the addition are exploited in order to perform the required summation in Equation 7.16.

A tree topology has no cycles. This means that for each node any communications to a neighbor cannot affect any other neighbor. Also, any communications from a neighbor cannot be affected by any other neighbor. This occurs because at any node, a, the neighbors of the neighbors of a, excluding a, are disjoint.

$For every node a in a tree: {N (N (a))} a are disjoint$

(7.18)

where

$N (a)$ is the neighbor of node a

Sa is the set S excluding a

This means that the sum in Equation 7.16 can be written as a hierarchy of partial sums over disjoint subsets:

$Y_{a} = I_{a} + \sum_{i \in N (a)} {I_{i} + \sum_{i \in {N (i) a}} {I_{j} + \sum_{k \in {N (j) i}} {I_{k} + …}}}$

(7.19)

The tree topology guarantees that the terms inside each summation are disjoint (independent from each other) and therefore prevents double counting of observation information.

The algorithm developed in this section will be referred to as the channel cache algorithm. The channel cache algorithm is a variant of the well-known channel filter algorithm [6,9,12,18, 19, 20,22]. The channel cache algorithm is also inspired by junction tree algorithm for inference in graphical models [21] and [16].

The operation of a tree topology decentralized network is illustrated in Figure 7.1. Figure 7.1 shows a branch in a tree network.

Each node stores its own observation information (dark gray) in the form I = H^TR⁻¹H and i = H^TR⁻¹z. These correspond to the I terms in Equation 7.19. These observation information terms are required to be statistically independent information unique to one node.

The communicated term from a node i to a is an information matrix C_ia (and its information vector counterpart). C_ia consists of the transmit node’s own independent observation information plus the sum of all communicated terms received from the “upstream” part of the tree network:

FIGURE 7.1 DDF with channel caches. Four nodes are arranged in the topology shown in the lower right. Each node stores its own fused observations (dark gray). Each node caches the received communication term from each of its neighbors (light gray). The total fused information at each node is the sum of each stack, since each layer consists of independent information. The transmitted communication term is the sum of the stack excluding the destination’s cache term.

$C_{i a} = I_{i} + \sum_{j \in {N (i) a}} C_{j i}$

(7.20)

$= I_{i} + \sum_{j \in {N (i) a}} {I_{j} + \sum_{k \in {N (j) i}} {C_{k j}}}$

(7.21)

Each node locally caches the received communication term from each of its neigh-bors (light gray) in a so-called channel cache. All of the channel cache, C, and the observation, I, information terms are statistically independent.

Transmission of a communication term has no effect at the transmitting node, so transmissions can be lost without breaking the consistency of the estimates. On reception of a communication term, the received term is simply stored in the channel cache. This means that duplicate transmissions and/or duplicate receptions are acceptable.

Each node can obtain the total network sum, Y, by summing its own observation information together with all the locally cached communication terms, e.g., at node a:

$Y_{a} = I_{a} + \sum_{i \in N (a)} C_{i a}$

(7.22)

The net result is that the network computes a series of partial sums, with each node obtaining the sum as in Equation 7.19. For each node, the evaluation of Equation 7.19 operates as a series of messages propagating inward on the tree toward that node.

Nodes initialize their observation and communication cache information terms to zero, I = 0, C = 0, such that nodes can produce estimates even before the network has finished propagating terms across the network span.

7.3.2 RELATED CHANNEL FILTER APPROACHES

The previous section presented the channel cache algorithm for tree topology networks. The channel cache is closely related to the channel filter algorithm, which has been discussed in various papers [6,9,12,18, 19, 20,22].

The approach used in a channel filter is to maintain the total information estimate at each node and maintain the common information between pairs of nodes on a tree network. The channel filter’s use of the common information is motivated by the following equation for the fusion of a local Y_A and a received communication Y_B from a remote node:

$Y_{A \cup B} = Y_{A} + Y_{B} - Y_{A \cap B}$

(7.23)

Each primary operation of the channel filter is described next (referring to operations at a node i, transmitting to a node a and receiving from a node j). Table 7.1 summarizes the channel algorithms. The channel filter consists of the following operations:

• Observation: Y_i + = I. Observation information I adds simply into the total information Y_i without affecting any channel common information.

• Transmit: The node’s current total information is transmitted, C_ia = Y_i. It is assumed that the destination node a will successfully receive the communication; therefore, the common information is set Y_ia = Y_i.

• Receive: The received information is Y_ji, so given the existing common information C_ji, the total information is updated: Y_i + = C_ji − Y_ji. The nodes now have common information from the communication, so the node sets Y_ji = C_ji.

• Result: The total information is maintained in Y_i.

TABLE 7.1
Summary of the Primary Operations for the Channel Cache and Channel Filter Algorithms

Obs Update	Channel Cache I_i+ = I	Channel Filter Y_i+ = I
Transmit	$C_{i a} = I_{i} + \sum_{j \in {N (i) a}} C_{j i}$	C_ia = Y_i Y_ia = Y_i
Receive	Store C_ji	Y_i + = C_ji – Y_ji Y_ji = C_ji
Result	$Y_{i} = I_{i} + \sum_{j \in N (i)} C_{j i}$	Use Y_i

Both the channel filter and the channel cache algorithms are designed to exploit a tree topology network. At each node, both the channel filter and channel cache algorithms store an information matrix and vector for each neighbor, intended to ensure correct consistent interaction with that neighbor. The basic difference between the channel filter and the channel cache algorithms is as follows:

• The channel filter algorithm maintains the common information between the local total information and each neighbor’s total information.

• The channel cache algorithm maintains the contributed information from each neighbor.

Using the common information requires both nodes to maintain identical copies of the common information, which is vulnerable to failure if the two copies differ (cases in which this can happen are discussed later). The common information maintained at both nodes on a channel is required to be identical, since the common information, Y_i∩j, is symmetrical between two nodes, i.e., Y_i∩j = Y_j∩i [12].

By contrast, the channel cache algorithm maintains a local record of the contributed information from the neighbor. This decouples the communication between nodes such that the changes only occur locally when information is received, not when it is transmitted. Table 7.2 shows a decentralized communication transaction between two nodes, showing both the contributed information and the common information, in the case of an ideal communication. The communication must update the common information at both nodes, which requires an assumption of successful communication for the sender. On the other hand, the communication only needs to update the contributed information record once (at the destination node) and only upon an actual successful communication.

Table 7.2 also shows that the two contributed information terms sum to the common information:

$Y_{A contributed to B} + Y_{B contributed to A} = Y_{A \cap B}$

(7.24)

The channel filter algorithm can fall into cases where the two common-information records can differ due to miscommunication:

• Asynchronous operation. If nodes send messages which “cross over,” then their common-information records can become misaligned. An example is shown in Table 7.3. Consider a pair of nodes i,j which transmit almost simultaneously at times t_i, t_j and receive at times r_i, r_j. This asynchronous case arises if r_j > t_j or r_i > t_i.

• Lost transmissions. The channel filter algorithm can also become misaligned in the case of transmissions which are lost. This occurs if a node completes the “transmit” update to its channel filter but the destination fails to receive the message. An example is shown in Table 7.4.

7.3.3 SUMMARY

This section described DDF with a focus on the observation update and the decentralized communication, particularly in tree topology networks.

The fusion of independent observation and/or communicated information is additive when performed in the log-likelihood or information form. Therefore, the problem of forming a decentralized estimate which is identical to a centralized equivalent reduces down to a decentralized algorithm for forming a correct sum of the observation information terms.

When applied to tree topology networks, it suffices to maintain a local node information matrix and vector and one information matrix and vector for each neighbor in the network.

This section presented the channel cache algorithm for handling the local node information and decentralized communications operations. The channel cache algorithm handles imperfect communications such as asynchronous transmissions, lost transmissions in a simple manner. The channel cache algorithm operates on records of contributed information from each neighbor. This is in addition to the capabilities previously possible with channel filters on tree networks, especially the avoidance of double counting, avoidance of conservative fusion, while achieving global agreement among nodes in a decentralized network.

The aforementioned discussion has focused on the observation update and the decentralized communication. The following section extends the discussion of DDF into dynamic systems.

7.4 DYNAMIC SYSTEMS

The observation and communication updates, as described in the previous section, were discussed with respect to a static system, i.e., a single-state vector x. This section extends the discussion into dynamic systems and reviews the smoothing or trajectory state formulation of dynamic systems to formulate DDF for estimation of dynamic systems. This trajectory state formulation of dynamics is then applied to address the issues of delayed and asequent observations and burst communications in DDF.

When the decentralized system has observation and/or communication interruptions and delays, it becomes important to decide when and where the dynamic propagation of the estimate is to be applied. Furthermore, at each decentralized node, there are stored communication terms relating to other nodes, and so it is also necessary to consider the dynamic propagation of these.

This section presents the trajectory state approach to representing system dynamics. The trajectory state approach expands the state for a dynamic system into a joint state consisting of a sequence (trajectory) of states. The tools for manipulating joint probabilities in several dimensions and tools for manipulating probabilities and decentralization of static states then become applicable to the dynamic system.

The trajectory state approach relates to smoothing methods used in Kalman smoothing [17]. It is also known as delayed states and has been used to account for delayed decision making in estimation such as delayed associations [15]. The use of delayed states in the information form, with the resulting sparse structure, has been applied in localization and mapping [8,10]. Delayed states have more recently been applied to DDF as a tool for delayed measurements [1] and for delayed and asequent measurements and communications [4].

This section focuses on correct approaches to dealing with delayed, asequent, and burst communications with dynamic models that are known and can be applied by each node. The trajectory state approach can be extended to allow the decentralized system to distribute dynamic models which originate from one node (known as model distribution). The issue of dynamic communication topologies in DDF is discussed separately in Section 7.5.

7.4.1 STATE DYNAMICS

This section explains the state dynamics and trajectory state form, in general. Section 7.4.2 applies these to DDF specifically.

7.4.1.1 State Dynamic Model

We consider a basic, linear discrete time state dynamic model in the form

$x_{k + 1} = F x_{k} + B u_{k} + G v_{k}$

(7.25)

where

x_k is the state vector at instant k

F is the state transition matrix

v_k is unknown, zero mean, white noise, $E [v_{k}] = 0, E [v_{k} v_{k}^{T}] = Q$

u_k is a known control signal, if available

The conventional treatment is to form a prediction by using a dynamic transformation of the estimate [2]

${\hat{x}}_{k + 1} = F {\hat{x}}_{k} + B u_{k}$

(7.26)

$P_{k + 1} = F P_{k} F^{T} + G Q G^{T}$

(7.27)

where this is considered to be a transformation of the estimate, replacing the estimate for time k by that for time k + 1, as a discrete operation.

7.4.1.2 Trajectory Information Approach

Equation 7.27 can actually be considered (see later) to consist of an augmentation of the estimate into the latter timestep k + 1, followed immediately by a marginalization to remove timestep k. In this way, the prediction operation in Equation 7.27 moves the estimate forward in time but removes the state components for the past timestep. This removal of the past timestep makes it impossible (or difficult) to fuse late observations or communicated information. This transformative prediction approach thus requires observations to be fused in at the appropriate timestep.

This section describes an alternative approach known as delayed state or trajectory state approach, which is designed to address the aforementioned issues. In the trajectory state approach, instead of considering prediction equations to explicitly transform the estimate from time k to k + 1, we instead consider the trajectory described by a joint state X = [x_k x_k+1]

The key reason why this is useful is as follows: If we operate with a joint trajectory state vector X = [x_k x_k+1 … x_k+n], then observations and communications in any time (k to k + n), and the dynamic model all act additively on the joint state X. Thus the methods of Section 7.3 remain applicable, since they are designed to exploit additive operations over decentralized networks.

Therefore, we rearrange Equation 7.25 to focus on the joint trajectory state X = [x_k x_k+1]:

$B u = - F x_{k} + I x_{k + 1} - G v$

(7.28)

$B u = [- F I] {[x_{k} x_{k + 1}]}^{T} - G v$

(7.29)

where $I$ denotes the identity matrix. We can then consider Equation 7.29 in the form of an observation, as in Equation 7.3:

$\begin{matrix} z & = & H & x & + w & E [w w^{T}] & = & R \\ B u & = & [- F I] & {[x_{k} x_{k + 1}]}^{T} & - G v & E [G v v^{T} G^{T}] & = & G Q G^{T} \end{matrix}$

(7.30)

Considering the dynamic model in the form of an observation requires the following replacements:

$H \leftarrow [- F I] R \leftarrow G Q G^{T} z \leftarrow B u$

(7.31)

By analogy with Equation 7.13, the dynamic model can then be represented as an information matrix and vector in the joint trajectory state X:

$I = H^{T} R^{- 1} H i = H^{T} R^{- 1} z$

(7.32)

$I = [\begin{matrix} F^{T} Q^{- 1} F & - F^{T} Q^{- 1} \\ Q^{- 1} F & Q^{- 1} \end{matrix}] i = (\begin{matrix} - F^{T} Q^{- 1} B u \\ Q^{- 1} B u \end{matrix})$

(7.33)

where Q ≜ GQG^T.

7.4.1.3 Equivalence to the Conventional Approach

We next show the equivalence of the trajectory state approach to existing prediction equations in the information and covariance forms. To show the equivalence, we setup the same initial conditions and steps as for a prediction:

1. Define some prior information in the earlier timestep x_k: Y_k and y_k satisfying $Y_{k} {\hat{x}}_{k} = y_{k}$ .

2. Allow no prior information in the latter x_k+1.

3. Apply the dynamic model to the joint x_k and x_k+1.

4. Evaluate the marginal information in the latter timestep x_k+1.

The posterior information, Y and y, after step 3 (i.e., given the prior and the dynamic model) is

$Y = (\begin{matrix} Y_{k} & 0 \\ 0 & 0 \end{matrix}) + H^{T} R^{- 1} H y = (\begin{matrix} y_{k} \\ 0 \end{matrix}) + H^{T} R^{- 1} z$

(7.34)

$Y = (\begin{matrix} Y_{k} + F^{T} Q^{- 1} F & - F^{T} Q^{- 1} \\ Q^{- 1} F & Q^{- 1} \end{matrix}) y = (\begin{matrix} y_{k} - F^{T} Q^{- 1} B u \\ Q^{- 1} B u \end{matrix})$

(7.35)

Equation 7.35 is equivalent to the conventional prediction in Equation 7.27 (proofs are provided in the appendix):

• The joint $\hat{X}$ satisfies $Y \hat{X} = y$ , with $\hat{X} = (\begin{matrix} \hat{x} \\ {\hat{x}}_{k + 1} \end{matrix}) = (\begin{matrix} {\hat{x}}_{k} \\ F {\hat{x}}_{k} + B u \end{matrix})$ .

• The x_k marginal of Y remains as the given Y_k and y_k.

• The x_k+1 marginal of Y yields known expressions [2,19] for the prediction in covariance and information forms:

$Y_{k + 1 | k} = {F P_{k | k} F^{T} + Q}^{- 1}$

(7.36)

$= M - M G Σ^{- 1} G^{T} M$

(7.37)

$y_{k + 1 | k} = [I - M G Σ^{- 1} G^{T}] F^{- T} y_{k | k} + Y_{k + 1 | k} B u$

(7.38)

$(Σ = G^{T} M G + Q^{- 1})$

(7.39)

$(M = F^{- T} Y_{k | k} F^{- 1})$

(7.40)

7.4.1.4 Multiple Trajectory States

Earlier we discussed the formation of a pair of joint successive dynamic states, x_k and x_k+1. Now consider a longer sequence of trajectory states. The original discrete time dynamic model in Equation 7.25 holds for each pair of successive dynamic states; therefore, each successive pair has the dynamic model information added, as in Equation 7.33. The information matrix and vector for the dynamic model between any successive states k and k + 1 is

$I_{k}^{dyn} = (\begin{matrix} F^{T} Q^{- 1} F & - F^{T} Q^{- 1} \\ Q^{- 1} F & Q^{- 1} \end{matrix}) i_{k}^{dyn} = (\begin{matrix} - F^{T} Q^{- 1} B u \\ Q^{- 1} B u \end{matrix})$

(7.41)

We later re-write $I_{k}^{dyn} = (\begin{matrix} A & D \\ D^{T} & C \end{matrix})$ and $i_{k}^{dyn} = {(\begin{matrix} a & c \end{matrix})}^{T}$ to save space.

Over a sequence of trajectory states, these pairwise I_dyn blocks add up to form a sparse banded matrix:

$\sum_{k \in [1, 7]} I_{k}^{dyn} = (\begin{matrix} A & D & 0 & 0 & 0 & 0 & 0 & 0 \\ D^{T} & C + A & D & 0 & 0 & 0 & 0 & 0 \\ 0 & D^{T} & C + A & D & 0 & 0 & 0 & 0 \\ 0 & 0 & D^{T} & C + A & D & 0 & 0 & 0 \\ 0 & 0 & 0 & D^{T} & C + A & D & 0 & 0 \\ 0 & 0 & 0 & 0 & D^{T} & C + A & D & 0 \\ 0 & 0 & 0 & 0 & 0 & D^{T} & C + A & D \\ 0 & 0 & 0 & 0 & 0 & 0 & D^{T} & C \end{matrix})$

(7.42)

The benefit of the trajectory state formulation is that observations of states within the trajectory appear additively in the information matrix. For example, the total information for the trajectory system from times 1 to 8, including a prior Y_{1 | 1} at time k = 1, an observation I₅ at time k = 5, and the dynamic model information between each time is given by

$\begin{array}{l} Y_{1 : 8} = Y_{1 | 1} + I_{5} + \sum_{k \in [1, 7]} I_{k}^{dyn} = (\begin{matrix} Y_{1 | 1} + A & D & 0 & 0 & 0 & 0 & 0 & 0 \\ D^{T} & C + A & D & 0 & 0 & 0 & 0 & 0 \\ 0 & D^{T} & C + A & D & 0 & 0 & 0 & 0 \\ 0 & 0 & D^{T} & C + A & D & 0 & 0 & 0 \\ 0 & 0 & 0 & D^{T} & C + A + I_{5} & D & 0 & 0 \\ 0 & 0 & 0 & 0 & D^{T} & C + A & D & 0 \\ 0 & 0 & 0 & 0 & 0 & D^{T} & C + A & D \\ 0 & 0 & 0 & 0 & 0 & 0 & D^{T} & C \end{matrix}) \\ y_{1 : 8} = {(y_{1 | 1} + a c + a c + a c + a c + a + i_{5} c + a c + a c)}^{T} \end{array}$

(7.43)

where the observation information I₅ appears as an addition on the diagonal of Y corresponding to the state at the observed time.

To propagate the trajectory state system forward in time (maintaining a fixed duration trajectory), there are two steps:

1. Augmenting the system with the additional timestep. This requires expanding the state vector for the new timestep (k + 1) and adding the dynamic model information $I_{k}^{dyn}$ and $i_{k}^{dyn}$ .

2. Marginalizing away the earliest timestep.

The system following propagation by one timestep is now given by

$\begin{array}{l} Y_{2 : 9} = (\begin{matrix} A + Y_{211} & D & 0 & 0 & 0 & 0 & 0 & 0 \\ D^{T} & C + A & D & 0 & 0 & 0 & 0 & 0 \\ 0 & D^{T} & C + A & D & 0 & 0 & 0 & 0 \\ 0 & 0 & D^{T} & C + A + I_{5} & D & 0 & 0 & 0 \\ 0 & 0 & 0 & D^{T} & C + A & D & 0 & 0 \\ 0 & 0 & 0 & 0 & D^{T} & C + A & D & 0 \\ 0 & 0 & 0 & 0 & 0 & D^{T} & C + A & D \\ 0 & 0 & 0 & 0 & 0 & 0 & D^{T} & C \end{matrix}) \\ y_{2 : 9} = {(a + y_{2 | 1} c + a c + a c + a + i_{5} c + a c + a c + a c)}^{T} \end{array}$

(7.44)

where Y_2|1 and y_2|1 are

$Y_{2 | 1} = C - D^{T} {(Y_{2 | 1} + A)}^{- 1} D$

(7.45)

$= Q^{- 1} - Q^{- 1} F {(Y_{1 | 1} + F^{T} Q^{- 1} F)}^{- 1} F^{T} Q^{- 1}$

(7.46)

$y_{2 | 1} = c - Q^{T} {(Y_{1} + A)}^{- 1} (y_{1} + a)$

(7.47)

$= Q^{- 1} B u + Q^{- 1} F {(Y_{1 | 1} + F^{T} Q^{- 1} F)}^{- 1} (y_{1 | 1} - F^{T} Q^{- 1} B u)$

(7.48)

Y_2|1 is actually the same expression as for the predicted Y_k+1|k in Equation 7.35. This is proven in the appendix. The earlier prior information, Y_1|1, clearly resides in a nonadditive form, in the expression D^T(Y_1|1 + A)⁻¹D.

In summary:

• The augmentation process, which extends the system to further timesteps, continues the same sparse banded pattern in the information matrix.

• The fusion of observations within the duration of the trajectory states is a straightforward addition in the information matrix and vector.

• Marginalization of the earliest timestep in a succession of trajectory states follows the same pattern as for information filtering prediction, leaving any observations or prior information in the removed timestep k in a nonadditive form.

7.4.2 DYNAMICS IN DECENTRALIZED DATA FUSION

The previous section discussed the state dynamics generally, resulting in the formation of a trajectory state system. The key advantage of using a sequence of trajectory states is that for observations of states within the trajectory states, the observation information is additive, just as for observations of a static state. This additivity of observation information applies regardless of the timing or sequence of observations, as long as the state at the observed time exists in the current set of trajectory states.

This section describes the application of the trajectory state approach for handling timing issues in DDF. In particular, we consider the following problem cases:

• Delayed and asequent data fusion, in which an observation from an earlier time becomes available after a prediction step, has been performed (delayed) or after other data have been fused for later times (asequent). Delayed and asequent observations usually refer to local sensor node observations.

• Burst communication, which occurs when decentralized communications is resumed after a period of interruption. The communications that occurs after the interruption is referred to as burst communication, since it aims to deliver a large amount of information in a short time (or single message) to re-establish agreement between the nodes. Burst communications can also be thought of as delayed/asequent fusion across multiple decentralized nodes.

The key issue behind the aforementioned difficulties is additivity of observation and communicated information, and the fact that for states that have been replaced by predictions cannot be updated additively by other predictions. This is explained in further detail next.

7.4.2.1 Common Process Noise Problem

The underlying issue of concern relates to the common process noise problem. They are so called because separately predicted terms ignore their common use of the same process noise. This can also be expressed as the problem that the fusion of predicted information is unequal to the prediction of fused information:

$Predict (Fuse (A, B)) \neq Fuse (Predict (A), Predict (B))$

(7.49)

Consider a case where a fused estimate is predicted forward. The fused estimate at time k is obtained as the sum of two independent information terms, e.g., Y_k,a and Y_k,b.

$Y_{k} = Y_{k, a} + Y_{k, b}$

(7.50)

The correct expression for the predicted information for time k + 1 requires the prediction of the sum

$Y_{k + 1}^{exact} = Predict (Y_{k})$

(7.51)

$= {F Y_{k}^{- 1} F^{T} + Q}^{- 1}$

(7.52)

If, however, the term Y_k,a has already been predicted forward, a common approximation to Y_k+1 is to take

$Y_{k + 1}^{approx} = Predict (Y_{k, a}) + Predict (Y_{k, b})$

(7.53)

$= {F Y_{k, a}^{- 1} F^{T} + Q}^{- 1} + {F Y_{k, b}^{- 1} F^{T} + Q}^{- 1}$

(7.54)

The approximate form is not generally equal to the exact form $Y_{k + 1}^{approx} \neq Y_{k + 1}^{exact}$ . The approximate form ignores the fact that there is only one underlying process; hence, the two prediction instances share common process noise, v (of which E[vv^T] = Q). To consider the approximation further, consider a simpler worst case where Y_a = Y_b:

$\frac{Y_{k + 1}^{approx}}{Y_{k + 1}^{exact}} = \frac{{F Y_{a}^{- 1} F^{T} + Q}^{- 1} + {F Y_{b}^{- 1} F^{T} + Q}^{- 1}}{{F {(Y_{a} + Y_{b})}^{- 1} F^{T} + Q}^{- 1}}$

(7.55)

$= \frac{{I + 2 a}}{{I + a}}$

(7.56)

where

$a = F^{- T} Y_{a} F^{- 1} Q = {(F Y_{a}^{- 1} F^{T})}^{- 1} Q$

(7.57)

for which it can be seen that

$\lim_{a \to 0} \frac{Y_{k + 1}^{approx}}{Y_{k + 1}^{exact}} = I \lim_{a \to \infty} \frac{Y_{k + 1}^{approx}}{Y_{k + 1}^{exact}} = 2 I$

(7.58)

$Y_{k + 1}^{exact} \leq Y_{k + 1}^{approx} \leq 2 Y_{k + 1}^{exact}$

(7.59)

This shows that $Y_{k + 1}^{approx}$ is always slightly overconfident, but is close to $Y_{k + 1}^{exact}$ for small Q. However, for large Q, the $Y_{k + 1}^{approx}$ is overconfident, being up to $2 Y_{k + 1}^{exact}$ in the worst case. Predicting the Y_k,a and Y_k,b independently is equivalent to claiming that there are two independent process models available.

7.4.2.2 Delayed and Asequent Observations

A delayed observation occurs when an observation from an earlier time becomes available after a prediction step has been performed [19]. The problem of delayed observations can occur in any form of estimator, not only decentralized estimators. Delayed observations can occur as a result of processing and/or communication delays before observations are available at the estimator.

An asequent observation occurs when an observation from an earlier time becomes available after other data have been fused for later times [19]. Asequent observations may occur if multiple sensors are used locally on a single node, and these sensors have differing observation delays. The case of asequent observations occurring on distinct decentralized nodes is similar, but since it involves the communication aspect it is more similar to the burst communications case discussed later.

The problem with delayed or asequent data fusion is that once the estimator has predicted the local state forward to time k + 1, the (late) incoming information for time k needs to be considered. If the late arriving information is predicted forward separately, the common process noise problem applies (as discussed in Section 7.4.2) and the result will be approximate and over-confident.

The problem with delayed and asequent data fusion is basically caused by the filter architecture destructively predicting estimates forward. That is, applying the prediction equations in a way that replaces a local estimate.

The proposed solution instead applies the trajectory state approach to avoid destructively predicting estimates until after a window of time has passed, while still obtaining correct current-time filter estimates given all available past observations.

The trajectory information matrix is constructed as in Section 7.4.1:

$Y_{k : k + 4} = (\begin{matrix} A + Y_{k | k - 1} + I_{k} & D & 0 & 0 & 0 \\ D^{T} & C + A + I_{k + 1} & D & 0 & 0 \\ 0 & D^{Τ} & C + A + I_{k + 2} & D & 0 \\ 0 & 0 & D^{T} & C + A + I_{k + 3} & D \\ 0 & 0 & 0 & D^{T} & C + I_{k + 4} \end{matrix})$

(7.60)

where Y_k:k+4 is written with observation information on each timestep, indicating how current, delayed, and/or asequent observations can be fused additively in the trajectory information matrix and vector at their appropriate timestep, as long as that timestep is available within the trajectory state system.

Given the trajectory state system, the estimate solution for the current (latest) timestep will be equivalent to a filtered solution, correctly accounting for the late and asequent observations. Methods for obtaining the solution are discussed in Section 7.4.2.

The trajectory state approach with N timesteps of trajectory states defers the destructive prediction of the earliest state by N timesteps, allowing delayed and asequent observations in that duration. However, very late observations beyond N timesteps will still be subject to the same common process noise problem preventing their use. Very late observations beyond N timesteps are expected to occur less frequently and be less informative to the present estimate and should be discarded (which is conservative). Note that the intention of the trajectory state method is to use N timesteps such that the system can still benefit from the observations with small delay which are very likely to occur and very beneficial to the present estimate.

7.4.2.3 Burst Communications

Burst communication occurs when decentralized communications are resumed after a period of interruption. The communications that occurs after the interruption is referred to as burst communication since it aims to deliver a large amount of information in a short time (or single message) to re-establish agreement between the nodes.

The problem with burst communications occurs when the estimator predicts the local state forward during a period of interrupted communications. The problem is that other decentralized nodes will also perform the same prediction on their local estimates. When the nodes re-connect and communicate, the common process noise problem arises, since the information from each node will have been separately predicted.

The problem with delayed and asequent data fusion is again caused by the filter architecture destructively predicting estimates forward. That is, applying the prediction equations in a way that replaces a local estimate.

The proposed solution, as for asequent observations, involves using trajectory states in order to maintain a window of some duration in which communications can be late, but still fuse additively into states in the trajectory window. Estimates for the current time can still be obtained from the system, conditioned on all the available past observations.

Referring to Equation 7.60, the decentralized system can communicate the diagonal matrix consisting of the I_k blocks:

$I_{k : k + 4} = (\begin{matrix} I_{k} & 0 & 0 & 0 & 0 \\ 0 & I_{k + 1} & 0 & 0 & 0 \\ 0 & 0 & I_{k + 2} & 0 & 0 \\ 0 & 0 & 0 & I_{k + 3} & 0 \\ 0 & 0 & 0 & 0 & I_{k + 4} \end{matrix})$

(7.61)

This becomes equivalent to a sequence of static decentralized problems, one for each timestep. The band structure corresponding to the dynamic model can be applied locally at each node.

For normal operation, with frequent communications, the nodes transmit their current I_k block. But if the communications is blocked for an interval of time and then later resumed, the resulting burst communications will contain the diagonal blocks I_j:k for the fused observations for the blocked interval.

The methods for obtaining the solution are discussed in Section 7.4.2.

7.4.2.4 Solution Using Trajectory States

The cases of delayed and asequent observations and burst communications presented earlier can be addressed using a set of trajectory states. This section focuses on how to solve the resulting trajectory state system:

$Y_{k - 4 : k} = (\begin{matrix} A + Y_{k - n | k - n - 1} + I_{k - n} & D & 0 & 0 & 0 \\ D^{T} & C + A + I_{k - n + 1} & D & 0 & 0 \\ 0 & D^{T} & C + A + I … & D & 0 \\ 0 & 0 & D^{T} & C + A + I_{k - 1} & D \\ 0 & 0 & 0 & D^{T} & C + I_{k} \end{matrix})$

(7.62)

$y_{k : k + 4} = (\begin{matrix} a + y_{k - n | k - n - 1} + i_{k - n} \\ a + c + i_{k - n + 1} \\ a + c + i … \\ a + c + i_{k - 1} \\ c + i_{k} \end{matrix})$

(7.63)

The system in Equation 7.62 is a block tridiagonal sparse linear system. Such a system can be solved very efficiently in O(n) time, for n trajectory states [11]. It is also possible to obtain smoothing estimates for the duration of the trajectory states by solving the joint system fully. This basically corresponds to solving for the latest estimate as described later, together with back-substitution for the smoothed estimates of the earlier states.

7.4.2.5 Filtering the Trajectory State System

The solution process for the filtered estimate of a trajectory state system is very similar to an online filtering process. In that case, the dynamic model in the trajectory state system need only be defined implicitly, leaving only the diagonal blocks (observations and prior) to be explicitly stored. So the current filtering estimate can be obtained basically by running the information filtering prediction cycles, starting from the prior information in the start of the trajectory and using the stored fused observation information at each time. Note that we only need to run this when requiring an estimate for the present state. Multiple observations can be added into the trajectory system without requiring this solve process. This approach is less general than the next, which will be described in greater detail.

7.4.2.6 Filtering with Stored Filter Estimates

In most cases, it is likely that observations and decentralized communications will arrive with only a small delay, and thus only affect the latter part of the trajectory state system. In that case, it is inefficient to process the entire trajectory state system for its whole duration. Also, allowing re-processing only the affected portion of the trajectory may allow a longer trajectory system to be used. When an estimate of the present state is required, it is only necessary to process forward from timestep k − n to the present, where timestep k − n is the earliest changed state in the trajectory. (Changes include local observations and decentralized communications.) In this way, the cost in computation to re-process following delayed or asequent observations or newly arrived burst communications depends on how far back the observation occurs, such that the normal case of short delays can proceed forward with little overhead.

This method is similar to that described for asequent data fusion in Ref. [19]. The difference is that we store the observations I_k in the trajectory state approach, not only the filtered estimates. This allows the handling of burst communications and also simplifies the case for asequent observations.

This method corresponds to filtering, but stores in memory the filtered estimates for a few key timesteps in the trajectory duration. The system needs to store I_j for each timestep, and Y_j|j−1 and y_j|j−1 for a few key timesteps.

The I_j terms for each j and the filtered Y_{k−n|k−n−1} for the earliest trajectory state timestep, k − n are statistically independent of each other. These are regarded as “source” information. The stored filter Y_j|j−1 estimates for the other timesteps are not independent of each other, and not independent of the I_k or Y_{k−n|k−n−1}. These are to be regarded as a computational aid, storing partial results. These filter Y_j|j1 could instead be re-processed from the initial Y_{k−n|k−n−1} and the observation information terms I_j.

This occurs as follows:

1. The forward filtering can start from time k − n where timestep k − n is the earliest changed state in the trajectory, or wherever a starting or stored prior information exists. Starting from time k − n, define a current information matrix and vector of the size of the state at a single time:

$Y_{c} = Y_{k - n / k - n - 1} y_{c} = y_{k - n k - n - 1}$

(7.64)

2. For each time j = k − n: k

a. Fuse observations for time j:

$Y_{c} = Y_{j | j} = Y_{j | j - 1} + I_{j} y_{c} = y_{j | j} = y_{j | j - 1} + i_{j}$

(7.65)

b. Predict the current Y_j|j and y_j|j to time j + 1 using Equation 7.37 (except at the last time k):

$Y_{c} = Y_{j + 1 | j} = M - M G Σ^{- 1} G^{T} M$

(7.66)

$y_{c} = y_{j + 1 | j} = [I - M G Σ^{- 1} G^{T}] F^{- T} y_{j | j} + Y_{c} B u$

(7.67)

$(Σ = G^{T} M G + Q^{- 1})$

(7.68)

$(M = F^{- T} Y_{j | j} F^{- 1})$

(7.69)

c. The resulting Y_c = Y_j+1|j and y_c = y_j+1|j can be stored if desired, so processing can resume from time j + 1 later.

3. The resulting Y_c = Y_k|k and y_c = y_k|k is the filtered information for the state at time k. ${\hat{X}}_{k}$ is obtained from

${\hat{x}}_{k} = Y_{k | k}^{- 1} y_{k | k}$

(7.70)

FIGURE 7.2 Illustration of combined trajectory state and channel cache–based DDF at a single node. The lower portion shows the whole system propagated forward by one timestep.

7.4.2.7 Operation of Channel Caches with Trajectory States

Figure 7.2 shows the combined trajectory state and channel cache–based DDF node. The node stores the observations I_j and i_j for each timestep k − n to k in the trajectory state window for each channel. These are the channel cache contributed information terms from the neighbors. The node similarly stores its own observations for each timestep. The node also stores the prior Y_k−n|k−n and y_k−n|k−n. The complete trajectory state system is formed by summing all the entries for each timestep, including the dynamic model information.

To shift the combined system forward by one timestep, at the back of the trajectory state window (earliest timestep), the prior Y_k−n|k−n is propagated forward as in Equation 7.43. The observations I_k−n+1 are added into Y_{k−n+1|k−n}, leaving the next prior Y_{k−n+1|k−n+1}. The observations for time k − n + 1 are then popped out of the trajectory system, so that the guaranteed conditional independence of all information terms is maintained. The front end (latest and current timesteps) of the trajectory state window is extended into timestep k + 1, ready for new observations or channel terms.

7.4.3 SUMMARY

This section presented the smoothing or trajectory state formulation of dynamic systems to formulate DDF for estimation of dynamic systems. This trajectory state formulation of dynamics was then applied to address the issues of delayed and asequent observations and burst communications in DDF.

The solution method for the trajectory state formulation requires nodes to store the fused observation information for each timestep for a finite duration, allowing for observation and communication delays. For efficient solving it is also useful to store the results of filter estimates.

7.5 K-TREE TOPOLOGIES FOR REDUNDANT AND DYNAMIC NETWORKS

The algorithms presented in Section 7.3.1 relate to tree topology networks. Exact decentralized estimation has, in the past, largely been restricted to singly connected tree networks [7,9,18,20]. The key point relating to tree networks is that the DDF problem can be reduced down to a problem of finding a global sum of information terms, which is performed in an efficient local manner on a tree network. In tree networks, there is only one path between any two nodes. This is used in the decentralized algorithm to ensure exact fusion, especially avoiding cases of double counting or rumor propagation in the network. However, the single path property of tree networks also means that tree networks are vulnerable to the failure of nodes and links, since the failure of any nonleaf node or link would leave the network in multiple disconnected pieces. Tree networks include both “star” and chain topologies as well as branching trees.

This section presents an extension beyond tree communications network topologies into so-called k-tree network topologies. The k-tree topologies are more general than tree topologies but are more specialized for scalability than arbitrary topologies. The presentation of this section is based on Ref. [24]. The k-tree is an extension beyond tree topologies, which keeps an overall strict tree-like pattern on a large scale (N nodes ⨠ k), as shown in Figure 7.3f, but allows redundant, looped, dynamic topologies or other subsets of full connection within groups of nodes smaller or equal to k + 1. The costs in storage and communication grow with k but not with N, the total number of nodes in the network. The k-tree topologies are intended to be used with as small k as possible.

The motivation behind using k-tree topologies is to improve redundancy and dynamism while maintaining scalability and correctness. For redundancy and fault robustness, it is desirable to allow the network to include multiple redundant paths such that some links or nodes can fail without disconnecting the network topology. It is also desirable to improve dynamism so that some topology changes are able to allow for link failures and re-connections, especially for mobile decentralized networks. The dynamic topology capability is closely related to the link redundancy capability, because once the algorithm is capable of handling multiple paths redundantly, then the network can pick and choose among them dynamically. These capabilities are obtained while ensuring the scalability and correctness of the DDF network.

The k-tree topology is used to define an allowable topology; the allowable set of links for the decentralized network. Once this k-tree allowable topology is established, it defines which nodes can communicate on which links and establishes what each node needs to store and communicate in order to ensure correct and exact DDF, as will be described later in this section. This use of a defined restricted topology is similar to how spanning-tree algorithms can be used to define an allowable tree of links in an otherwise unstructured network for DDF [16,18]. A spanning-tree can then be used with tree-topology decentralized networks (as in Section 7.3.1), but these do not offer a simple, exact method for handling the data fusion aspects of changing topology or dealing with link failures.

FIGURE 7.3 Example k-tree topologies. Each line is an allowable decentralized communication link, each vertex is a decentralized node. (a) A complete two-tree topology, (b) a mixed one-two-tree topology, (c) a one-tree topology over the same nodes, (d) a ring topology (black lines) is a subset of a two-tree (gray dashed lines), (e) a complete three-tree network, (f) a larger two-tree example showing the broad scale tree topology for N ⨠ k.

In a k-tree allowable topology (and using the k-tree algorithms presented here), nodes can communicate dynamically on all/any links within the k-tree topology, even if there are multiple redundant paths or loops and links fail and reconnect unpredictably.

A k-tree allowable topology becomes an arbitrary unrestricted topology for k ≥ N, the number of nodes. This would allow completely arbitrary dynamic and redundant decentralized communications, but would however result in expensive storage and communication.

The treewidth of a graph is well known for its role in limiting the complexity of algorithms in graph theory [3,13,14], graphical models [5], and sparse linear algebra [21]. Given the strong effect of the treewidth on the complexity of the algorithms and network, we considered generalizations of one-tree topologies into k-tree topologies, focusing in particular on the next-highest k; k = 2, since it is the simplest topology that demonstrates the novel properties of the k-tree approach. It is notable that arbitrarily large ring networks can be expressed as a two-tree network. Example k-tree topologies are shown in Figure 7.3.

A complete k-tree graph is made up of cliques of k + 1 nodes [13,14]. Each adjacent pair of cliques overlaps at k nodes (a junction or separator). The overall graph of connections between the cliques is a tree. A k-tree graph has treewidth of k, so called because the separators are made of k nodes.

Table 7.5 shows the number of links in various k-tree topologies compared with those of a fully connected topology. This shows that the number of k-tree links grows at O(n²) up until n = k + 1 (when the first k + 1 clique is formed), after which each additional node only adds an extra k links. Hence, the number of k-tree links grows as O(n) ultimately. By contrast, the fully connected topology always grows as O(n²).

7.5.1 DECENTRALIZED DATA FUSION ON K-TREES

The decentralized algorithm defines what each node needs to store and communicate such that each node can obtain the global fused information. The algorithm actively limits the data sizes communicated and stored, leading to the scalable performance of the system. The goal of the topology and message passing is to produce a set of terms p_i(x), such that the fusion of these is a consistent estimate for the state x:

$p_{\cup} (x) = \frac{1}{c} \prod_{i} p_{i} (x)$

(7.71)

These p_i(x) are probabilities which are conditionally independent of each other given x, or equivalently, that they have independent errors.

As shown earlier, it is convenient to express this as a sum of information terms:

$Y_{\cup} = \sum_{i} Y_{i}$

(7.72)

$y_{\cup} = \sum_{i} y_{i}$

(7.73)

7.5.2 DATA-TAGGING SETS

The approach used here guarantees against double counting of information by using explicit “data-tagging” sets. A data-tagging set is a set of separate information terms, Y_i, each with a unique identifier. Each data-tagging set stores only conditionally independent terms, so Equation 7.72 can be used on all items in a data-tagging set to recover a consistent fused estimate. Fusion of two or more sets is performed as a set union followed by Bayesian fusion (Equation 7.72). The set union step identifies any terms with matching labels and ensures that these are counted only once in the Bayesian fusion. Thus data-tagging avoids double counting of information.

The approach used here ensures scalability by summarizing every stored or communicated data-tagging set into a minimal size. This summarization process exploits the global k-tree property and uses the local topology around the sending and receiving nodes. The necessary local topology properties are guaranteed by designing the global network topology as a k-tree.

The proposed approach uses an efficient, minimal form of data-tagging. This is in contrast to the inefficient full data-tagging approach. In the full data-tagging method, each node maintains a set of independent information terms (conditionally independent of each other, given the true state), including its own sensor observations. In communicating out to any neighbor, the full set of information terms is sent. In receiving communication from a neighbor, the received set is merged (unioned) into the local set. The full data-tagging approach guarantees avoidance of double counting in arbitrary network topology and allows arbitrary dynamism, but is expensive for large-scale networks. Eventually every node’s storage and every communicated set have the full list of the conditionally independent information terms arising from every other node. In the full data-tagging approach, the node storage and communication size is O(n) for n nodes in the whole network. This increasing storage and communication size limits the scalability of the network for large n. The full data-tagging approach is equivalent to a k-tree operating with k ≥ n for n nodes in the network.

The proposed k-tree approach is obtained by reducing the data-tagging sets to exploit the tree nature of the communications network. The communications and storage scheme proposed achieves correct operation in k-tree networks without using full data-tagging, thus obtaining a decentralized algorithm which is scalable in the number of nodes.

The “stack” of channel cache terms, in Section 7.3.1, Figure 7.1 is actually a minimal data-tag set for the tree network. Each node has n + 1 entries corresponding to the n neighbors and a single entry for itself.

7.5.3 SEPARATOR AND NEIGHBORHOOD PROPERTIES

Before explaining the k-tree decentralized algorithm, it is necessary to discuss some properties of the k-tree.

7.5.3.1 Separator Property

An important k-tree property is the existence of tree separators, as shown in Figure 7.4. In a k-tree any k-clique is a separator. Each separator divides the network into distinct parts. Within each part, the effect of all other parts can be summarized into the separator. Separators enable efficient summarization of entire branches of the k-tree network. Separators use the k-tree separator property: in a k-tree, if any path between any two nodes i,k passes through the separator, then all paths between nodes i,k pass through the separator.

These separators are used at the borders of the local neighborhood $L$ to summarize the fused total of the rest of the network beyond the local neighborhood. For example in Figure 7.4, the total information in each half can be expressed as

FIGURE 7.4 Illustration of the separator property. In a k-tree any k-clique is a separator. Each separator divides the network into two parts. In this figure, b − d is the separator. The two parts and the intersection are shown. Within each part, the effect of the other part can be summarized into the separator.

$Y_{rhs} = {Y_{interior}} + [Y_{separator}]$

(7.74)

$= {Y_{b} + Y_{e} + Y_{h} + Y_{d} + Y_{f}} + [Y_{b d}]$

(7.75)

$Y_{lhs} = {Y_{a} + Y_{b} + Y_{c} + Y_{d} + Y_{g}} + [Y_{b d}]$

(7.76)

where Y_bd represents information in the separator b − d.

The identifiers in the data-tagging sets are used to identify which node or branch of the tree network the information originates from. This means that the identifier should be a set of node labels, to allow reference to one neighbor, or k neighbors on a k-tree branch separator.

7.5.3.2 Local Neighborhood Property

A consequence of the separator property is that the local neighborhood around a node becomes a sufficient representation for that node’s interaction with the whole rest of the network. The k-tree networks allow an efficient decentralized and local neighborhood representation to serve as the only required topology awareness at the nodes. This is important for scalability, allowing the representation of a global network with only small local neighborhood representations. The local neighborhood is therefore an important data structure used in the algorithm proposed in this chapter.

At any node, $V_{i}$ , the local neighborhood subgraph consists of $V_{i}$ , the neighbors of $V_{i}$ and the links and cliques between them, as shown in Figure 7.5.

The local neighborhood representation is motivated by the k-tree “junction path covering property”: in a k-tree, if any path between any two nodes i,k passes through the local neighborhood of a node j, then all paths between nodes i,k pass through the local neighborhood of j.

This junction path covering property means that the local neighborhood around a node j has control over how any messages can pass from one side to the other. The local neighborhood encodes which neighbors to communicate with, which information terms must be maintained separately in data-tag sets (for correctness), and which terms can be fused into others (for scalability).

FIGURE 7.5 Illustration of the local neighborhood representation, $L$ . In k-tree networks, the local neighborhood $L$ is an efficient local summary of the relevant parts of the global topology: (a) global network topology, (b) local neighborhood representations, $L$ , at a, b, e respectively.

For one-tree topologies used in prior works, the local neighborhood representation is simply the list of neighboring vertices and list of the corresponding edges to those neighbors.

7.5.4 K-TREE COMMUNICATIONS ALGORITHM

This section explains the decentralized communications algorithm for k-trees. We explain the algorithm in the case that the complete k-tree is present. Note, however, that the full set of links is not required.

The algorithm will be described by referring to the sending node, $V_{t}$ (“transmitting vertex”) and the receiving node $V_{d}$ (“destination vertex”). The transmitting node knows the topology of the allowable links within its own neighborhood of the allowable k-tree topology, denoted as $L$ . The sending node has an existing data-tag set. The objective of the algorithm is to calculate a reduced data-tag set to send to the destination, $V_{d}$ .

The communications algorithm is simply stated as follows:

• The data-tag set is reduced into the intersection of the local and destination neighborhoods.

The communications algorithm is given in algorithm 1 and illustrated in Figure 7.6. In step 1, the algorithm initially copies the local data-tag set to the output data-tag set. This corresponds to the full data-tagging solution. The subsequent steps erase and/or summarize some of the entries, thus ensuring scalability. For step 2, data-tag terms involving the destination vertex are redundant and can be explicitly deleted. Step 3 eliminates any data-tag terms which are not neighbors of the destination vertex. This is explained in the following section.

FIGURE 7.6 Summary of the communications algorithm. The local neighborhood at the source node is summarized into the neighborhood separator for the destination. This summarized separator set is sent to the destination and merged into the local set. (a) The full network, highlighting neighbor-hoods of V_h and V_m and their intersection. (b) At V_h the network beyond the immediate neighbors is already summarized within the neighborhood. (c) To prepare a communication set, the source V_h can summarize its local neighborhood set into the intersection with the destination neighbor V_m (resulting in the left hand set). This set is communicated to V_m. V_h keeps the union of the received set (left) with its local set (right).

7.5.4.1 Data-Tag Set Elimination

This section explains the marginalization process which summarizes nonlocal information, in Algorithm 1.

Elimination proceeds at each step by eliminating a so-called leaf vertex, which reduces the size of the data-tag set. A leaf vertex in an ordinary tree would be any vertex with exactly one edge. More generally, however, there are k-tree leaves which are defined as follows: A vertex which is part of exactly one clique of k + 1 is a k-tree leaf.

Eliminating a k-tree leaf results in its former k + 1 clique being reduced to a clique of k. To eliminate a vertex v in a k + 1 clique

• The result data-tag term r is that with identifier containing the k node labels of the neighbors of v

• For each data-tag term, t, whose identifier contains v: add t into r

Examples of leaf vertex elimination for k ≤ 3 are shown in Table 7.6.

7.5.5 LINK AND NODE FAILURE ROBUSTNESS

The key properties of the proposed approach are correct fusion, scalability and robustness against node and link loss. Achieving these properties simultaneously is achieved by using the bounded treewidth network topology.

ALGORITHM 1: K-TREE DECENTRALIZED COMMUNICATIONS

Compute the communication output data-tag set to send

Input: $L$ : a copy of the local neighborhood graph

Input: $V_{t}$ : this transmitting node in $L$

Input: $V_{d}$ : the destination neighbor in $L$

Input: localTags: the local data-tag set

Result: destTags: the output data-tag set to send

1. Starting case: No summarization:

Copy destTags ← localTags

2. Delete terms involving $V_{d}$ :

Erase term $V_{d}$ from destTags

Erase any terms for $V_{d}$ separators from destTags

3. Summarize away parts not local to $V_{d}$ :

Determine the region to summarize out, S:

S is all vertices in $L$ except $V_{d}$ and its neighbors

while S is not empty do

Find a leaf vertex $V_{1}$ of $L$ in S

Eliminate $V_{1}$ , updating destTags

Erase $V_{1}$ from S

The proposed method is robust against link and node failures simply because it can send information terms on multiple paths. This still yields correct and consistent fusion since the method uses data-tagging to avoid double counting and/or the need for conservative fusion. Furthermore, the method still yields a scalable solution for large networks since the multiple-path and data-tagging is only performed within the nodes and separators of the k + 1 cliques.

Figure 7.7 shows the pattern of communication of individual information terms in the data-tag sets. In various cases, there are multiple sources redundantly communicating the same term. The receiving node always stores the incoming information terms into a given data-tag set entry. The receive process has no effect other than storing the information, so it is acceptable to receive the same term multiple times from different paths.

7.5.6 SUMMARY

This section presented an algorithm for scalable DDF based on k-tree topologies. The k-tree topologies are more densely connected than 1-trees, but still have an overall sparse (k-)tree topology which gives scalability for large networks. The k-tree topologies have some redundancy in the topology, which makes them more robust to node or link failures than 1-tree topologies. The k-tree topologies allow dynamic changes to the communications topology within subsets of links in the k-tree. Finally, k-tree topologies transition into the fully connected topology and fully data-tagged decentralized algorithm as k increases to N, the number of nodes in the network. Thus k-trees allow some trade-off via k between the tree-based approaches (k ≪ N) and the unstructured approaches (k ∼ N).

FIGURE 7.7 Diagrams showing the individual data-tag terms which would be stored and communicated in the given topology. At each node the diagram shows the stored terms at that node (cluster of labels), including its own independent information (circled labels). Each arrow indicates the communication of an individual data-tag term. Terms which originate from each node are shown in different shades. Communications which result from the fusion of multiple terms are shown in dashed lines. Communications is strictly with nearest neighbors only, but the sum of all data-tag terms at each node equals the global sum of independent information.

7.6 CONCLUSION

This chapter presented and reviewed methods for DDF. This chapter focused on the channel cache algorithm for DDF in tree topologies and robustness to imperfect communications. In the second part, this chapter reviewed the trajectory state formulation of dynamic systems to formulate DDF for the estimation of dynamic systems. This trajectory state formulation of DDF was applied to address the issues of delayed and asequent observations and burst communications in DDF. In the final part, this chapter extended the operation of DDF on tree topology networks into so-called k-tree topologies. The k-tree topologies are tree-like on the broad scale, which gives good scalability for large networks of nodes. The k-tree topologies allow loops, dense connections, and hence redundancy and dynamic changes among groups of up to k + 1 nodes.

Taken together, these algorithms contribute significantly toward achieving DDF that is robust to communications latencies and failures, but still yield centralized equivalent estimator performance and are scalable for larger networks.

7.A APPENDIX

7.A.1 MARGINALIZATION IN THE INFORMATION FORM

This appendix states the expressions required for marginalization in the information form. Consider an information matrix partitioned into state variables x_a and x_c:

$Y = (\begin{matrix} A & B \\ B^{T} & C \end{matrix}) \hat{x} = (\begin{matrix} {\hat{x}}_{a} \\ {\hat{x}}_{c} \end{matrix}) \hat{y} = (\begin{matrix} a \\ c \end{matrix})$

(7.77)

where these satisfy $Y \hat{X} = \hat{y}$

Then the marginal information matrix Y_a and marginal information vector y_a which satisfy $Y_{a} {\hat{X}}_{a} = {\hat{y}}_{a}$ , and similarly for Y_c are

$Y_{a} = A - B C^{- 1} B^{T} y_{a} = a - B C^{- 1} c Y_{a} {\hat{x}}_{a} = {\hat{y}}_{a}$

(7.78)

$Y_{c} = C - B^{T} A^{- 1} B y_{c} = c - B^{T} A^{- 1} a Y_{c} {\hat{x}}_{c} = {\hat{y}}_{c}$

(7.79)

7.A.2 Trajectory Information Form Equivalence

As stated earlier, the following are both equivalent:

$\begin{array}{l} Y & = & (\begin{matrix} Y_{k} + F^{T} Q^{- 1} F & - F^{T} Q^{- 1} \\ - Q^{- 1} F & Q^{- 1} \end{matrix}) y = (\begin{matrix} y_{k} - F^{T} Q^{- 1} B u \\ Q^{- 1} B u \end{matrix}) \\ P_{k + 1} & = & F P_{k} F^{T} + G Q G^{T} {\hat{x}}_{k + 1} = F {\hat{x}}_{k} + B u_{k} \end{array}$

This can be shown in a few ways:

• The joint $\hat{X}$ satisfies $Y \hat{X} = y$ , with $\hat{X} = (\begin{matrix} {\hat{x}}_{k} \\ {\hat{x}}_{k + 1} \end{matrix}) = (\begin{matrix} {\hat{x}}_{k} \\ F {\hat{x}}_{k} + B u \end{matrix})$ :

$Y \hat{X} = (\begin{matrix} Y_{k} + F^{T} Q^{- 1} F & - F^{T} Q^{- 1} \\ - Q^{- 1} F & Q^{- 1} \end{matrix}) (\begin{matrix} {\hat{x}}_{k} \\ F {\hat{x}}_{k} + B u \end{matrix})$

(7.80)

$= (\begin{matrix} Y_{k} \hat{x} + F^{T} Q^{- 1} F \hat{x} - F^{T} Q^{- 1} {F {\hat{x}}_{k} + B u} \\ - Q^{- 1} F \hat{x} + Q^{- 1} {F {\hat{x}}_{k} + B u} \end{matrix})$

(7.81)

$= (\begin{matrix} Y_{k} + F^{T} Q^{- 1} B u \\ Q^{- 1} B u \end{matrix})$

(7.82)

$= y$

(7.83)

• The x_k marginal of Y is equal to the prior Y_k and y_k. This means that augmenting a predicted state x_k+1 onto a given x_k system (including the addition of the dynamic model information) does not alter the marginal PDF for x_k. This is shown as follows.

• We write the x_k marginal of Y as $Y_{k}^{marg}$ , leaving Y_k to mean the prior information matrix of timestep k

$Y_{k}^{marg} = {Y_{k} + F^{T} Q^{- 1} F} - F^{T} Q^{- 1} {Q^{- 1}}^{- 1} Q^{- 1} F$

(7.84)

$= Y_{k}$

(7.85)

• The x_k+1 marginal of Y yields known expressions [2,19] for the prediction in covariance and information forms:

$Y_{k + 1} = {F P_{k} F^{T} + Q}^{- 1}$

(7.86)

$= M - M G {(G^{T} M G + Q^{- 1})}^{- 1} G^{T} M$

(7.87)

$(M = F^{- T} Y_{k} F^{- 1})$

(7.88)

The x_k+1 marginal of Y, using Equation 7.79, is

$Y_{k + 1} = Q^{- 1} - Q^{- 1} F {[Y_{k} + F^{T} Q^{- 1} F]}^{- 1} F^{T} Q^{- 1}$

(7.89)

Using the matrix inversion lemma:

${[B C D + A]}^{- 1} = A^{- 1} - A^{- 1} B {[C^{- 1} + D A^{- 1} B]}^{- 1} D A^{- 1}$

(7.90)

With $A \to Q B \to F C \to Y_{k}^{- 1} D \to F^{T}$

$Y_{k + 1} = {[F Y_{k}^{- 1} F^{T} + Q]}^{- 1}$

(7.91)

$= {[F P_{k} F^{T} + G Q G^{T}]}^{- 1}$

(7.92)

which is the covariance form prediction equation.

The information form prediction equation is obtained by a different use of the matrix inversion lemma, using

$A \to F Y_{k}^{- 1} F^{T} B \to G C \to Q D \to G^{T}$

$Y_{k + 1} = {[F Y_{k}^{- 1} F^{T} + G Q G^{T}]}^{- 1}$

(7.93)

$= M - M G {(Q^{- 1} + G^{T} M G)}^{- 1} G^{T} M$

(7.94)

where

$M = {[F Y_{k}^{- 1} F^{T}]}^{- 1} = F^{- T} Y_{k} F^{- 1}$

Equations 7.89, 7.92, and 7.94 can also be found systematically from the following augmented system [23]:

$(\begin{matrix} Y_{k} & 0 & F & 0 \\ 0 & 0 & - I & 0 \\ F & - I & 0 & G \\ 0 & 0 & G^{T} & Q^{- 1} \end{matrix}) (\begin{matrix} x_{k} \\ x_{k + 1} \\ v \\ v_{k} \end{matrix}) = (\begin{matrix} y_{k} \\ 0 \\ - B u_{k} \\ 0 \end{matrix})$

(7.95)

where ν is a vector of Lagrange multipliers [23] and v_k is the (unknown) process noise, as in Equation 7.25.

• Marginalizing (7.95) in the ordering (v, ν, then x_k) results in the predicted information marginal:

$Y_{k + 1} = Q^{- 1} - Q^{- 1} F {(Y + F^{T} Q^{- 1} F)}^{- 1} F^{T} Q^{- 1}$

(7.96)

• Marginalizing (7.95) in the ordering (x_k, ν then v) results in the same predicted information marginal, in the form conventionally used in information filtering:

$Y_{k + 1} = M - M G {(Q^{- 1} + G^{T} M G)}^{- 1} G^{T} M$

(7.97)

$M = F^{- T} Y F^{- 1}$

(7.98)

• Marginalizing (7.95) in the ordering (x_k, v then ν) results in the inverse of the expression used in the covariance form Kalman filtering:

$Y_{k + 1} = {(F P F^{T} + G Q G^{T})}^{- 1}$

(7.99)

ACKNOWLEDGMENTS

The work in this chapter was supported by the Australian Centre for Field Robotics. The authors would also like to thank Tim Bailey and Chris Lloyd for related technical discussions.

REFERENCES

1. T. Bailey and H. Durrant-Whyte. Decentralised data fusion with delayed states for consistent inference in mobile ad hoc networks. Technical report, 2007, http://www.personal.acfr.usyd.edu.au/tbailey/papers/delayedstated.pdf

2. Y. Bar-Shalom, X. Rong Li, and T. Kirubarajan. Estimation with Applications to Tracking and Navigation. Wiley, Hoboken, NJ, 2001.

3. B. Bollobás. Modern Graph Theory. Springer, New York, 1998.

4. J. Capitan, L. Merino, F. Caballero, and A. Ollero. Decentralized delayed-state information filter (DDSIF): A new approach for cooperative decentralized tracking. Robotics and Autonomous Systems, 59(6):376–388, 2011.

5. V. Chandrasekaran, N. Srebro, and P. Harsha. Complexity of inference in graphical models. In 24th Conference on Uncertainty in Artificial Intelligence and Statistics, Helsinki, Finland, pp. 70–78, 2008.

6. K.C. Chang, C. Chong, and S. Mori. On scalable distributed sensor fusion. International Fusion, 2008 11th International conference Cologne, Germany, pp. 1–8, 2008.

7. K. Chang, C.-Y. Chong, and S. Mori. Analytical and computational evaluation of scalable distributed fusion algorithms. IEEE Transactions on Aerospace and Electronic Systems, 46(4):2022–2034, October 2010.

8. F. Dellaert and M. Kaess. Square root SAM: Simultaneous localization and mapping via square root information smoothing. International Journal of Robotics Research, 25(12):1181–1203, 2006.

9. H. Durrant-Whyte, M. Stevens, and E. Nettleton. Data fusion in decentralised sensing networks. In 4th International Conference on Information Fusion, Montreal, Quebec, Canada, pp. 302–307, 2001.

10. R. Eustice, H. Singh, and J. Leonard. Exactly sparse delayed-state filters. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation, Barcelona, Spain, pp. 2417–2424, 2005.

11. G.H. Golub and C.F. Van Loan. Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore, MD, 1996.

12. S. Grime and H.F. Durrant-Whyte. Data fusion in decentralized sensor networks. Control Engineering Practice, 2(5):849–863, 1994.

13. T. Kloks. Treewidth, computations and approximations. Lecture Notes in Computer Science. 1994.

14. E. Korach and N. Solel. Tree-width, path-width and cut-width. Discrete Applied Mathematics, 43(1):97–101, 1993.

15. J.J. Leonard and R J. Rikoski. Incorporation of delayed decision making into stochastic mapping. In International Symposium on Experimental Robotics, Montreal, Quebec, Canada, pp. 533–542, 2000.

16. A. Makarenko, A. Brooks, T. Kaupp, H. Durrant-Whyte, and F. Dellaert. Decentralised data fusion: A graphical model approach. In Proceedings of the 12th International Conference on Information Fusion, Seattle, WA, pp. 545–554, July 2009.

17. P.S. Maybeck. Stochastic models estimation and control. Mathematics in Science and Engineering, 1:423, 1979.

18. E. Nettleton. Decentralised architectures for tracking and navigation with multiple flight vehicles. PhD thesis, Australian Centre for Field Robotics, Department of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, Sydney, Australia, 2003.

19. E.W. Nettleton and H.F. Durrant-Whyte. Delayed and asequent data in decentralised sensing networks. Proceedings of SPIE—The International Society for Optical Engineering, 4571:1–9, 2001.

20. D. Nicholson, C.M. Lloyd, S.J. Julier, and J.K. Uhlmann. Scalable distributed data fusion. In Information Fusion, 2002. Proceedings of the Fifth International Conference, Annapolis, MD, Vol. 1, pp. 630–635, 2002.

21. M.A. Paskin and G.D. Lawrence. Junction tree algorithms for solving sparse linear systems. Technical Report UCB/CSD-03-1271, University of California, Berkeley, CA, 2003.

22. S. Sukkarieh, E. Nettleton, J.-H. Kim, M. Ridley, A. Goktogan, and H. Durrant-Whyte. The ANSER project: Data fusion across multiple uninhabited air vehicles. International Journal of Robotics Research, 22(7–8):505–539, 2003.

23. P. Thompson. A novel augmented graph approach for estimation in localisation and mapping. PhD thesis, The University of Sydney, Sydney, Australia, March 2009.

24. P. Thompson and H. Durrant-Whyte. Decentralised data fusion in 2-tree sensor networks. In Proceedings of the 13th International Conference on Information Fusion, Edinburgh, U.K., pp. 1–8, July 2010.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 7 Decentralized Data Fusion: Formulation and Algorithms

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 7 Decentralized Data Fusion: Formulation and Algorithms