3.2. ENHANCED DISTRIBUTION FIELD TRACKING 21
and the channel coefficients are linear features of the density that we are interested in. is
property will be used in the next chapter for decoding channel representations.
Another consequence of the choice of kernels and their placement is that
N
X
nD1
c
n
D 1 if all x
.m/
2 Œ0IN N / : (3.16)
e proof is easy and left to the reader. Another property that is easy to show is that cos
2
-kernels
also have a constant sum of squares independent of x [Nordberg et al., 1994]:
N
X
nD1
K
2
C
x
.m/
n C 3=2
D 1=2 for all x
.m/
2 Œ0IN N / : (3.17)
Note that this property does not hold for linear B-splines or any other continuous positive-
definite kernel [Felsberg et al., 2015], but obviously it holds for the rectangular kernel at nearly
all x.
So far, we have only encoded one-dimensional samples x, but before we consider multi-
dimensional encodings, we will focus on one application: visual object tracking.
3.2 ENHANCED DISTRIBUTION FIELD TRACKING
Local averaging of channel representations within a bounding box has been used successfully in
object tracking [Danelljan et al., 2015, Felsberg, 2013, Sevilla-Lara and Learned-Miller, 2012],
i.e., for video analysis. Visual object tracking is defined as the causal, model-free sequential de-
tection of a single object in an image sequence [Kristan, Leonardis, Matas, Felsberg, Pflugfelder
and et al., 2016]. e only available information about the object to be tracked is its bounding-
box in the first frame; this is why it is called generic or model-free. e tracking method may
build an internal model based on the bounding-box contents in the first frame and may also
update its internal model on the fly, but only information from previous and the current frame
may be used (causality).
Enhanced distribution field tracking (EDFT, Felsberg [2013]) builds on distribution field
tracking (DFT) by Sevilla-Lara and Learned-Miller [2012], a visual object tracking method
that is based on comparing smoothed local histograms of the image patch inside the bounding-
box. e original DFT algorithm was developed independently of the concept of channel rep-
resentations and differs in mainly one property, namely that the bins are smoothed after the
accumulation.
e image intensity (gray scale) I.i; j / is considered a stochastic variable and its distribu-
tion is estimated by spatially weighted histograms using K
R
.x/ and N D 16, i.e., x
.i;j /
D
I.i;j /
16
(see (3.4)):
c
n
D
X
i;j
w
.i;j /
c
.i;j /
n
D
X
i;j
w
.i;j /
K
R
x
.i;j /
n C 17=32
; (3.18)
22 3. CHANNEL CODING OF FEATURES
where the offset has been computed according to (3.6) and w
.i;j /
denotes the spatial weights, a
2D Gaussian kernel h
.i; j / with standard deviation .
e original DFT approach is a multi-scale algorithm typically using three different stan-
dard deviations, but for simplifying the subsequent arguments, we ignore this detail and stick to
the single-scale case.
e next step of DFT after accumulating histogram values locally is to smooth the coeffi-
cients c
n
along the channel index n using a 1D Gaussian kernel and results in the DF d.i; j; n/.
is is the post-smoothing step previously mentioned.
During the tracking, the DF of the internal model, d
model
, is compared to the DF of a
bounding-box in the current frame, d
f
, within a local search window. e distance measure used
is the sum of absolute differences, i.e.,
L
1
.d
model
; d
f
/ D
X
i;j;n
jd
model
.i; j; n/ d
f
.i; j; n/j : (3.19)
e displacement is estimated by local search of the minimum L
1
error within a window
of maximum displacement, a further parameter of the method chosen as 30 pixels according
to Sevilla-Lara and Learned-Miller [2012].
When the best-fitting position has been found, the current template d
model;t
is updated
with the current DF d
f
using linear weights D 0:95 for the previous template and .1 / D
0:05 for the novel patch
d
model;tC1
.i; j; n/ D d
model;t
.i; j; n/ C .1 /d
f
.i; j; ci/: (3.20)
Due to the density-based comparison, the method is robust against outliers, and due to the
template-update, the method can also deal with continuous changes of object aspects and the
lighting [Sevilla-Lara and Learned-Miller, 2012].
EDFT enhances DFT in various ways [Felsberg, 2013], but most importantly, the post-
smoothing of histograms (rectangular kernels) is replaced with a quadratic B-spline channel rep-
resentation, i.e., pre-smoothing before binning, and results in significant improvements. In later
versions of EDFT, cos
2
-kernels have been used instead, giving further improvements [Öäll and
Felsberg, 2014b], also based on a modified model update (3.20) and distance measure (3.19).
e model update in (3.20) has been replaced with a power-update rule
c
.i;j /
model;tC1;n
D
c
.i;j /
model;t;n
q
C .1 /
c
.i;j /
f;n
q
1=q
(3.21)
and a coherence weight has been introduced to the distance measure (3.19)
L
w
1
.c
model
; c
f
/ D
X
i;j;n
coh.c
i;j
/
ˇ
ˇ
ˇ
c
.i;j /
model;n
c
.i;j /
f;n
ˇ
ˇ
ˇ
: (3.22)
e coherence measure will be formally introduced in Chapter 5, but it can easily be explained
using Figure 3.3. Coherent regions imply that the information in the distribution is highly dis-
criminative. Incoherent regions imply that the distribution is uninformative. e discriminative
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset