10 2. BASICS OF FEATURE DESIGN
A necessary condition for obtaining invariant scalar products is that the output space of the
operator f establishes a tight frame, i.e., the generalized Parseval’s identity applies. In certain
cases, this can only be achieved approximately due to effects of the pixel grid and deviations
from the invariance property can be used to optimize discrete operators on the grid.
For instance, the Scharr filter Scharr et al. [1997] has been optimized by minimizing the
anisotropy of the structure tensor. Note that different assumptions in the formulation of the
scalar product and thus the weights in the optimization lead to different results and the Scharr
filter might not be the most isotropic choice [Felsberg, 2011].
Further practical problems besides grid effects are caused by the final extent of image data.
Global scale-invariance can only be achieved if data is available on an infinite domain. Since
this is impossible in practice, the image domain has to be extended by other tricks, e.g., peri-
odic repetition (Fourier transform) or reflective boundaries (discrete cosine transform, DCT)
of a rectangular domain. However, these tricks hamper a proper rotation invariance. Using a
circular domain with reflective boundaries theoretically solves all issues, but becomes infeasible
to compute [Duits et al., 2003].
2.3 SPARSE REPRESENTATIONS, HISTOGRAMS, AND
SIGNATURES
Obviously, useful features need to be feasible to compute. Depending on the application, the
selection of features might be limited due to real-time constraints and this is actually one area
where deep features are still problematic. Also, the space complexity of features, i.e., their mem-
ory consumption, might be decisive for the design. In the past, paradigms have shifted regularly
back and forth between using compact features and sparse features [Granlund, 2000a].
e kernel trick in support vector machines (SVMs) and Gaussian processes are examples
of implicit high-dimensional spaces that are computationally dealt with in low-dimensional,
nonlinear domains. In contrast, channel representations and convolutional networks generate
explicit high-dimensional spaces. e community has conflicting opinions in this respect, but
recently, compactification of originally sparse and explicit features seems to be the most promis-
ing approach, also confirmed by findings on deep features [Danelljan et al., 2017].
Another strategy to improve computational feasibility and the memory footprint is to use
feedback loops in the feature extraction. Whereas deep features are typically feed-forward and
thus mostly do not exploit feedback, adaptive [Knutsson et al., 1983] or steerable [Freeman
and Adelson, 1991] filters are a well-established approach in designed feature extractors. In
relation to equivariance properties and factorized representations, adaptive filters often exploit
projections of the equivariant part of the representation, e.g., orientation vectors or structure
tensors.
Alternatively, iterative methods such as diffusion filtering can be applied [Weickert, 1996],
which potentially open up more efficient feature extraction using recurrent networks. e rela-
tionship between recurrent schemes and robust, unbiased feature extraction has been identified,
2.3. SPARSE REPRESENTATIONS, HISTOGRAMS, AND SIGNATURES 11
for instance for channel representations Felsberg et al. [2015]. In that work, also the connection
to population codes and their readout [Denève et al., 1999] has been made explicit.
Channel representations and population codes combine properties of a signature-based
descriptor with those of a histogram-based descriptor; see Figure 2.2. Signature-based descrip-
tors, e.g., speeded-up robust features (SURF) and 3D SURF as proposed by Bay et al. [2008]
and Knopp et al. [2010], respectively, consist of an array of feature-values
1
indexed over coor-
dinates. Histogram-based descriptors, e.g., bag of visual words (BOV) and fast point feature
histograms (FPFH) as proposed by Sivic and Zisserman [2003] and Rusu et al. [2009], respec-
tively, contain the cardinality (counts) of feature-values in dictionaries or histogram bins.
Signature Histogram
Count
Trait Value
Trait Value
Coordinates
Support Keypoint Local RF Descriptor
Figure 2.2: e two main classes of descriptors: signatures and histograms (here in a local 2D
reference frame, RF). Figure based on Salti et al. [2014].
In this text, the main difference in the use of histograms and dictionaries (visual vocab-
ularies) is that the former are regularly placed in some space, whereas the latter are irregularly
1
In their terminology, Salti et al. [2014] refer to traits for what is called feature elsewhere in this text.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset