Comparing Using Divergences

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

60 6. PROBABILISTIC INTERPRETATION OF CHANNEL REPRESENTATIONS

A second problem are outliers that are located outside the channel representation: two

channel representations might have the same coeﬃcient vectors, but are generated from diﬀerent

numbers of samples. ese additional samples must inﬂuence the posterior distribution.

e ﬁrst problem requires a de-correlation of the coeﬃcient vector, which is basically the

problem solved in Section 5.3. e calculation of the coeﬃcient vector in (5.22) consist mainly of

solving the linear system with the channel correlation matrix as a system matrix. Consequently,

the obtained coeﬃcient vector 



 is a good approximation of independent events in the sense of

a histogram. us, we replace c with 



 in (6.11).

Regarding the outliers, we simply add one further dimension corresponding to the range

that is not covered by the channel representation. e de-correlation cannot be solved easily in

this case, but we can assume that this additional dimension is independent of the other coeﬃ-

cients in the representation and therefore add the dimension directly to the de-correlated vector

in terms of 

. e corresponding concentration parameter ˛

is not necessarily identical to the

other 



. For notational simplicity, we however stick to the direct formulation in terms of channel

coeﬃcient c

; n D 1; : : : ; N for the subsequent sections.

6.2 COMPARING CHANNEL REPRESENTATIONS

In many applications, the estimated distribution of measurements is only an intermediate step

in some processing chain. In parameter regression problems, the subsequent step is to extract

the modes of one distribution (see Chapter 5). In matching problems, two or more distributions

need to be compared to produce matching scores.

e latter is, for instance, used in tracking as suggested by Felsberg [2013], where an

appearance model is built over time, consisting of a channel coded feature map (see Chapter 4)

that represents the empirical distribution of gray values over the tracking window. If a new frame

is processed, candidate windows in the vicinity of the predicted position are to be compared

to the existing model. e candidate window with the best score is chosen as the new object

location; see Figure 6.2.

In the predecessor to the work of Felsberg [2013], Sevilla-Lara and Learned-Miller

[2012] suggest using the L

-distance to compare (smoothed) histograms, which is also applied

for the channel-based tracker.

Obviously, it makes much more sense to use the L

-distance between two channel vectors

c and c

.c; c

/ D

nD1

 c

j (6.12)

or their Hellinger distance (jcj denotes the L

-norm of c)

1=2

.c; c

/ D

nD1



jcj C jc



nD1

(6.13)

6.2. COMPARING CHANNEL REPRESENTATIONS 61

Figure 6.2: Applying channel representations for tracking as proposed by Felsberg [2013]. Left

panel: current frame with tracked bounding box. Right panel, top: contents of the bounding box.

Right panel, bottom: current model decoded (note the coarser representation in the occluded

part).

instead of their L

-distance, because channel coeﬃcients are non-negative. e Hellinger dis-

tance has been introduced to channel representations through the use of the Bhattacharyya coef-

ﬁcient by Jonsson [2008], similar to the square-root transform on Fisher vectors [Sánchez et al.,

2013].

e generalization of the Hellinger distance leads to comparisons my means of diver-

gences; see Section 6.3. Before moving on to divergences where we consider the model coeﬃ-

cients and the new coeﬃcients as representations of two distributions, we ﬁrst consider the new

coeﬃcients in terms of a further likelihood term.

62 6. PROBABILISTIC INTERPRETATION OF CHANNEL REPRESENTATIONS

Similar to (6.11), we obtain a new Dirichlet distribution, which is then integrated out to

obtain the posterior predictive (here computed for the uniform prior ˛ D 1)

p.c

jc/ D

p.c

jP/Dir.Pjc C 1/ dP (6.14)

.jc

j C 1/.jcj C N /

nD1

.c

C 1/.c

C 1/

nD1

C1/1

dP (6.15)

.jc

j C 1/.jcj C N /

.jcj C jc

j C N /

nD1

.c

C c

C 1/

.c

C 1/.c

C 1/

(6.16)

.jc

j C 1/

.jc

j C N /

B.c C c

C 1/

B.c

C 1/B.c C 1/

; (6.17)

where B./ denotes the N -dimensional Beta function

B.c/ D

nD1

.c

.jcj/

: (6.18)

B./ is obtained by recursively applying the deﬁnition of the ordinary Beta function (see Evans

et al. [2000]) that exist in many scientiﬁc computing programming libraries.

e ﬁrst factor in (6.17) just depends on the number of bins and the number of draws, i.e.,

it is independent of the distribution and can be precomputed. e second factor is preferably

computed in the logarithmic domain to avoid numerical problems.

Since the probability (6.17) is non-negative and bounded between 0 and 1, the negative

logarithm results in a suitable distance measure

; c/ D log p.c

jc/ (6.19)

D gammaln.jc

j C N /  gammaln.jc

j C 1/ C

Cbetaln.c

C 1/ C betaln.c C 1/  betaln.c C c

C 1/; (6.20)

where gammaln./ is the log-Gamma function and betaln./ is the log-Beta function. Both exist

in many scientiﬁc computing programming libraries.

In practice, the observed data often contains outliers and if we assume the coeﬃcient

vector c represents a convex combination of the true distribution and a uniform distribution,

with ratio parameter ˇ, (6.17) is modiﬁed according to

jc/ D ˇ

.jc

j C 1/.N /

.jc

j C N /

C .1  ˇ/p.c

jc/ (6.21)

and the distance function (6.20) is modiﬁed accordingly. As expected, the outlier part only de-

pends on the number of drawn samples in c

and the number of bins.

Instead of the posterior predictive, we can also use a symmetric setting and use the diver-

gence of the posterior distributions estimated from the two channel vectors; see Section 6.3.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Comparing Using Divergences

Create new playlist

Sign In

Sign Up

Table of Contents for
Comparing Using Divergences