60 6. PROBABILISTIC INTERPRETATION OF CHANNEL REPRESENTATIONS
A second problem are outliers that are located outside the channel representation: two
channel representations might have the same coefficient vectors, but are generated from different
numbers of samples. ese additional samples must influence the posterior distribution.
e first problem requires a de-correlation of the coefficient vector, which is basically the
problem solved in Section 5.3. e calculation of the coefficient vector in (5.22) consist mainly of
solving the linear system with the channel correlation matrix as a system matrix. Consequently,
the obtained coefficient vector
is a good approximation of independent events in the sense of
a histogram. us, we replace c with
in (6.11).
Regarding the outliers, we simply add one further dimension corresponding to the range
that is not covered by the channel representation. e de-correlation cannot be solved easily in
this case, but we can assume that this additional dimension is independent of the other coeffi-
cients in the representation and therefore add the dimension directly to the de-correlated vector
in terms of
0
. e corresponding concentration parameter ˛
0
is not necessarily identical to the
other
. For notational simplicity, we however stick to the direct formulation in terms of channel
coefficient c
n
; n D 1; : : : ; N for the subsequent sections.
6.2 COMPARING CHANNEL REPRESENTATIONS
In many applications, the estimated distribution of measurements is only an intermediate step
in some processing chain. In parameter regression problems, the subsequent step is to extract
the modes of one distribution (see Chapter 5). In matching problems, two or more distributions
need to be compared to produce matching scores.
e latter is, for instance, used in tracking as suggested by Felsberg [2013], where an
appearance model is built over time, consisting of a channel coded feature map (see Chapter 4)
that represents the empirical distribution of gray values over the tracking window. If a new frame
is processed, candidate windows in the vicinity of the predicted position are to be compared
to the existing model. e candidate window with the best score is chosen as the new object
location; see Figure 6.2.
In the predecessor to the work of Felsberg [2013], Sevilla-Lara and Learned-Miller
[2012] suggest using the L
1
-distance to compare (smoothed) histograms, which is also applied
for the channel-based tracker.
Obviously, it makes much more sense to use the L
1
-distance between two channel vectors
c and c
0
d
1
.c; c
0
/ D
N
X
nD1
jc
n
c
0
n
j (6.12)
or their Hellinger distance (jcj denotes the L
1
-norm of c)
H
1=2
.c; c
0
/ D
1
2
N
X
nD1
.
p
c
n
p
c
0
n
/
2
D
jcj C jc
0
j
2
N
X
nD1
p
c
n
c
0
n
(6.13)
6.2. COMPARING CHANNEL REPRESENTATIONS 61
Figure 6.2: Applying channel representations for tracking as proposed by Felsberg [2013]. Left
panel: current frame with tracked bounding box. Right panel, top: contents of the bounding box.
Right panel, bottom: current model decoded (note the coarser representation in the occluded
part).
instead of their L
2
-distance, because channel coefficients are non-negative. e Hellinger dis-
tance has been introduced to channel representations through the use of the Bhattacharyya coef-
ficient by Jonsson [2008], similar to the square-root transform on Fisher vectors [Sánchez et al.,
2013].
e generalization of the Hellinger distance leads to comparisons my means of diver-
gences; see Section 6.3. Before moving on to divergences where we consider the model coeffi-
cients and the new coefficients as representations of two distributions, we first consider the new
coefficients in terms of a further likelihood term.
62 6. PROBABILISTIC INTERPRETATION OF CHANNEL REPRESENTATIONS
Similar to (6.11), we obtain a new Dirichlet distribution, which is then integrated out to
obtain the posterior predictive (here computed for the uniform prior ˛ D 1)
p.c
0
jc/ D
Z
p.c
0
jP/Dir.Pjc C 1/ dP (6.14)
D
.jc
0
j C 1/.jcj C N /
Q
N
nD1
.c
0
n
C 1/.c
n
C 1/
Z
N
Y
nD1
P
.c
n
Cc
0
n
C1/1
n
dP (6.15)
D
.jc
0
j C 1/.jcj C N /
.jcj C jc
0
j C N /
N
Y
nD1
.c
n
C c
0
n
C 1/
.c
0
n
C 1/.c
n
C 1/
(6.16)
D
.jc
0
j C 1/
.jc
0
j C N /
B.c C c
0
C 1/
B.c
0
C 1/B.c C 1/
; (6.17)
where B./ denotes the N -dimensional Beta function
B.c/ D
Q
N
nD1
.c
n
/
.jcj/
: (6.18)
B./ is obtained by recursively applying the definition of the ordinary Beta function (see Evans
et al. [2000]) that exist in many scientific computing programming libraries.
e first factor in (6.17) just depends on the number of bins and the number of draws, i.e.,
it is independent of the distribution and can be precomputed. e second factor is preferably
computed in the logarithmic domain to avoid numerical problems.
Since the probability (6.17) is non-negative and bounded between 0 and 1, the negative
logarithm results in a suitable distance measure
d
p
.c
0
; c/ D log p.c
0
jc/ (6.19)
D gammaln.jc
0
j C N / gammaln.jc
0
j C 1/ C
Cbetaln.c
0
C 1/ C betaln.c C 1/ betaln.c C c
0
C 1/; (6.20)
where gammaln./ is the log-Gamma function and betaln./ is the log-Beta function. Both exist
in many scientific computing programming libraries.
In practice, the observed data often contains outliers and if we assume the coefficient
vector c represents a convex combination of the true distribution and a uniform distribution,
with ratio parameter ˇ, (6.17) is modified according to
p
0
.c
0
jc/ D ˇ
.jc
0
j C 1/.N /
.jc
0
j C N /
C .1 ˇ/p.c
0
jc/ (6.21)
and the distance function (6.20) is modified accordingly. As expected, the outlier part only de-
pends on the number of drawn samples in c
0
and the number of bins.
Instead of the posterior predictive, we can also use a symmetric setting and use the diver-
gence of the posterior distributions estimated from the two channel vectors; see Section 6.3.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset