Grid-Based Feature Representations

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

10 2. BASICS OF FEATURE DESIGN

A necessary condition for obtaining invariant scalar products is that the output space of the

operator f establishes a tight frame, i.e., the generalized Parseval’s identity applies. In certain

cases, this can only be achieved approximately due to eﬀects of the pixel grid and deviations

from the invariance property can be used to optimize discrete operators on the grid.

For instance, the Scharr ﬁlter Scharr et al. [1997] has been optimized by minimizing the

anisotropy of the structure tensor. Note that diﬀerent assumptions in the formulation of the

scalar product and thus the weights in the optimization lead to diﬀerent results and the Scharr

ﬁlter might not be the most isotropic choice [Felsberg, 2011].

Further practical problems besides grid eﬀects are caused by the ﬁnal extent of image data.

Global scale-invariance can only be achieved if data is available on an inﬁnite domain. Since

this is impossible in practice, the image domain has to be extended by other tricks, e.g., peri-

odic repetition (Fourier transform) or reﬂective boundaries (discrete cosine transform, DCT)

of a rectangular domain. However, these tricks hamper a proper rotation invariance. Using a

circular domain with reﬂective boundaries theoretically solves all issues, but becomes infeasible

to compute [Duits et al., 2003].

2.3 SPARSE REPRESENTATIONS, HISTOGRAMS, AND

SIGNATURES

Obviously, useful features need to be feasible to compute. Depending on the application, the

selection of features might be limited due to real-time constraints and this is actually one area

where deep features are still problematic. Also, the space complexity of features, i.e., their mem-

ory consumption, might be decisive for the design. In the past, paradigms have shifted regularly

back and forth between using compact features and sparse features [Granlund, 2000a].

e kernel trick in support vector machines (SVMs) and Gaussian processes are examples

of implicit high-dimensional spaces that are computationally dealt with in low-dimensional,

nonlinear domains. In contrast, channel representations and convolutional networks generate

explicit high-dimensional spaces. e community has conﬂicting opinions in this respect, but

recently, compactiﬁcation of originally sparse and explicit features seems to be the most promis-

ing approach, also conﬁrmed by ﬁndings on deep features [Danelljan et al., 2017].

Another strategy to improve computational feasibility and the memory footprint is to use

feedback loops in the feature extraction. Whereas deep features are typically feed-forward and

thus mostly do not exploit feedback, adaptive [Knutsson et al., 1983] or steerable [Freeman

and Adelson, 1991] ﬁlters are a well-established approach in designed feature extractors. In

relation to equivariance properties and factorized representations, adaptive ﬁlters often exploit

projections of the equivariant part of the representation, e.g., orientation vectors or structure

tensors.

Alternatively, iterative methods such as diﬀusion ﬁltering can be applied [Weickert, 1996],

which potentially open up more eﬃcient feature extraction using recurrent networks. e rela-

tionship between recurrent schemes and robust, unbiased feature extraction has been identiﬁed,

2.3. SPARSE REPRESENTATIONS, HISTOGRAMS, AND SIGNATURES 11

for instance for channel representations Felsberg et al. [2015]. In that work, also the connection

to population codes and their readout [Denève et al., 1999] has been made explicit.

Channel representations and population codes combine properties of a signature-based

descriptor with those of a histogram-based descriptor; see Figure 2.2. Signature-based descrip-

tors, e.g., speeded-up robust features (SURF) and 3D SURF as proposed by Bay et al. [2008]

and Knopp et al. [2010], respectively, consist of an array of feature-values

indexed over coor-

dinates. Histogram-based descriptors, e.g., bag of visual words (BOV) and fast point feature

histograms (FPFH) as proposed by Sivic and Zisserman [2003] and Rusu et al. [2009], respec-

tively, contain the cardinality (counts) of feature-values in dictionaries or histogram bins.

Signature Histogram

Count

Trait Value

Coordinates

Support Keypoint Local RF Descriptor

Figure 2.2: e two main classes of descriptors: signatures and histograms (here in a local 2D

reference frame, RF). Figure based on Salti et al. [2014].

In this text, the main diﬀerence in the use of histograms and dictionaries (visual vocab-

ularies) is that the former are regularly placed in some space, whereas the latter are irregularly

In their terminology, Salti et al. [2014] refer to traits for what is called feature elsewhere in this text.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Grid-Based Feature Representations

Create new playlist

Sign In

Sign Up

Table of Contents for
Grid-Based Feature Representations