Orientation Scores as Channel Representations

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

3.2. ENHANCED DISTRIBUTION FIELD TRACKING 21

and the channel coeﬃcients are linear features of the density that we are interested in. is

property will be used in the next chapter for decoding channel representations.

Another consequence of the choice of kernels and their placement is that

nD1

D 1 if all x

.m/

2 Œ0IN  N / : (3.16)

e proof is easy and left to the reader. Another property that is easy to show is that cos

-kernels

also have a constant sum of squares independent of x [Nordberg et al., 1994]:

nD1



.m/

 n C 3=2



D 1=2 for all x

.m/

2 Œ0IN  N / : (3.17)

Note that this property does not hold for linear B-splines or any other continuous positive-

deﬁnite kernel [Felsberg et al., 2015], but obviously it holds for the rectangular kernel at nearly

all x.

So far, we have only encoded one-dimensional samples x, but before we consider multi-

dimensional encodings, we will focus on one application: visual object tracking.

3.2 ENHANCED DISTRIBUTION FIELD TRACKING

Local averaging of channel representations within a bounding box has been used successfully in

object tracking [Danelljan et al., 2015, Felsberg, 2013, Sevilla-Lara and Learned-Miller, 2012],

i.e., for video analysis. Visual object tracking is deﬁned as the causal, model-free sequential de-

tection of a single object in an image sequence [Kristan, Leonardis, Matas, Felsberg, Pﬂugfelder

and et al., 2016]. e only available information about the object to be tracked is its bounding-

box in the ﬁrst frame; this is why it is called generic or model-free. e tracking method may

build an internal model based on the bounding-box contents in the ﬁrst frame and may also

update its internal model on the ﬂy, but only information from previous and the current frame

may be used (causality).

Enhanced distribution ﬁeld tracking (EDFT, Felsberg [2013]) builds on distribution ﬁeld

tracking (DFT) by Sevilla-Lara and Learned-Miller [2012], a visual object tracking method

that is based on comparing smoothed local histograms of the image patch inside the bounding-

box. e original DFT algorithm was developed independently of the concept of channel rep-

resentations and diﬀers in mainly one property, namely that the bins are smoothed after the

accumulation.

e image intensity (gray scale) I.i; j / is considered a stochastic variable and its distribu-

tion is estimated by spatially weighted histograms using K

.x/ and N D 16, i.e., x

.i;j /

I.i;j /

(see (3.4)):

i;j

.i;j /

i;j

.i;j /



.i;j /

 n C 17=32



; (3.18)

22 3. CHANNEL CODING OF FEATURES

where the oﬀset has been computed according to (3.6) and w

.i;j /

denotes the spatial weights, a

2D Gaussian kernel h



.i; j / with standard deviation .

e original DFT approach is a multi-scale algorithm typically using three diﬀerent stan-

dard deviations, but for simplifying the subsequent arguments, we ignore this detail and stick to

the single-scale case.

e next step of DFT after accumulating histogram values locally is to smooth the coeﬃ-

cients c

along the channel index n using a 1D Gaussian kernel and results in the DF d.i; j; n/.

is is the post-smoothing step previously mentioned.

During the tracking, the DF of the internal model, d

model

, is compared to the DF of a

bounding-box in the current frame, d

, within a local search window. e distance measure used

is the sum of absolute diﬀerences, i.e.,

model

; d

/ D

i;j;n

model

.i; j; n/  d

.i; j; n/j : (3.19)

e displacement is estimated by local search of the minimum L

error within a window

of maximum displacement, a further parameter of the method chosen as 30 pixels according

to Sevilla-Lara and Learned-Miller [2012].

When the best-ﬁtting position has been found, the current template d

model;t

is updated

with the current DF d

using linear weights  D 0:95 for the previous template and .1  / D

0:05 for the novel patch

model;tC1

.i; j; n/ D d

model;t

.i; j; n/ C .1  /d

.i; j; ci/: (3.20)

Due to the density-based comparison, the method is robust against outliers, and due to the

template-update, the method can also deal with continuous changes of object aspects and the

lighting [Sevilla-Lara and Learned-Miller, 2012].

EDFT enhances DFT in various ways [Felsberg, 2013], but most importantly, the post-

smoothing of histograms (rectangular kernels) is replaced with a quadratic B-spline channel rep-

resentation, i.e., pre-smoothing before binning, and results in signiﬁcant improvements. In later

versions of EDFT, cos

-kernels have been used instead, giving further improvements [Öäll and

Felsberg, 2014b], also based on a modiﬁed model update (3.20) and distance measure (3.19).

e model update in (3.20) has been replaced with a power-update rule

.i;j /

model;tC1;n







.i;j /

model;t;n



C .1  /



.i;j /

f;n



1=q

(3.21)

and a coherence weight has been introduced to the distance measure (3.19)

model

; c

/ D

i;j;n

coh.c

i;j

.i;j /

model;n

 c

.i;j /

f;n

: (3.22)

e coherence measure will be formally introduced in Chapter 5, but it can easily be explained

using Figure 3.3. Coherent regions imply that the information in the distribution is highly dis-

criminative. Incoherent regions imply that the distribution is uninformative. e discriminative

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Orientation Scores as Channel Representations

Create new playlist

Sign In

Sign Up

Table of Contents for
Orientation Scores as Channel Representations