Channel-Coded Feature Maps

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

26 3. CHANNEL CODING OF FEATURES

where  denotes the convolution (thus scalar products over all translations), R

a rotation by ˛,

and the kernel of the orientation score, the wavelet.

In order to enable well-posed reconstruction, must fulﬁll an energy constraint based on

.u; v/ D

N 1

nD0

jF .R

2 n=N

/.u; v/j

; (3.24)

where F denotes the Fourier transform from .i; j / to .u; v/.

If M

D 1, the original function x can by reconstructed as

Qx.i; j / D

N 1

.i; j; n/ : (3.25)

Orientation scores can also be extended to the similitude group, as suggested by Sharma

and Duits [2015]. e approach is very similar to calculating a channel representation in the

log-polar domain, see e.g., Öäll and Felsberg [2017]. For this approach, multi-dimensional

channel coding is required.

3.4 MULTI-DIMENSIONAL CODING

In previous sections, only one-dimensional random variables x have been considered. e chan-

nel coeﬃcients that are obtained by coding one-dimensional variables are basically stochastically

independent, except for those with overlapping kernels, i.e., the two neighbors in case of cos

kernels. In the subsequent considerations in this section, the eﬀect of the overlapping kernels

is ignored and all coeﬃcients are considered independent, which is strictly true only for the

rectangular kernel.

If multi-dimensional random vectors with independent components are encoded, all di-

mensions can be encoded and stored independently. After summing over all observed samples,

the densities of the marginal distributions are obtained and since those are independent, an

estimate of the joint estimate is obtained by multiplying these marginal estimates. For two-

dimensional samples .x

; x

/ with channel coeﬃcients .c

1;n

; c

2;n

/, we obtain by (3.15)

EŒc

1;n

EŒc

2;n

 D

p.x

/K.x

 

1;n

/ dx

p.x

/K.x

 

2;n

/ dx

(3.26)

“

p.x

; x

/K.x

 

1;n

/K.x

 

2;n

/ dx

(3.27)

D EŒc

1;n

2;n

 : (3.28)

One such case is color and orientation in a local region, see Figure 3.7.

In general, one might assume that color and orientation are mutually independent and

the joint distribution can be computed by multiplying the marginal distributions of color and

orientation.

3.4. MULTI-DIMENSIONAL CODING 27

Figure 3.7: Simultaneous encoding of color and orientation in a local image region. Figure taken

from Jonsson [2008] courtesy Erik Jonsson. e red circle on the left indicates the considered

local region and the color and orientation patterns on the right represent the channels of the

marginal distributions. In the present example, color and orientation are not completely inde-

pendent, such that the channel matrix, represented by the black blobs, cannot be factorized.

If the components of the random vector are not mutually independent, for instance be-

cause the observed objects, such as bananas or skyscrapers, lead to color-dependent orientation

distributions, the identity leading to (3.27) breaks. is eﬀect has also been observed in the case

of texture classiﬁcation by Khan et al. [2013].

In the general case, the expectation of the product of channel coeﬃcients can thus not be

computed from the product of their expectations,

EŒc

1;n

EŒc

2;n

 ¤ EŒc

1;n

2;n

 ; (3.29)

implying that summation over the sample index must happen after the outer product of channel

vectors.

One such example are optical ﬂow vectors, where moving edges and lines have highly cor-

related spatial displacements [Spies and Forssén, 2003]. Obviously, this leads to an exponential

growth of the number of coeﬃcients with the number of dimensions. If all d dimensions are

encoded with the same number of channels N , the total number of coeﬃcients becomes N

In practice, however, this is seldom a problem. In many cases, not all product terms stem-

ming from the outer products are required, e.g., Granlund [2000b] suggests to only use second

order terms, or the mutual dependency can be ignored altogether without compromising the

results [Wallenberg et al., 2011].

In the latter work, RGB color and depth gradients (D), which both have a two-

dimensional correlation structure, are channel coded and concatenated. is approach leads

to improved segmentation performance, see Figure 3.8. Presumably, the segmentation perfor-

28 3. CHANNEL CODING OF FEATURES

mance could be further improved by replacing RGB values with color names, the previously

mentioned 11-dimensional learned soft-assignment suggested by Van De Weijer et al. [2009]

that shares also similarities to channel representations, but this pathway has not been further

explored.

Plant6Plant5P

lant4Plant3*Plant2Plant1*

Score

Depth

Color

∆depth

RGB+∆D

CC RGB+∆D

0.75

0.5

0.25

Figure 3.8: Top from left to right: plant 5 from the plant data set (Wallenberg et al. [2011])

RGB, depth, and ground truth segmentation. Bottom: consensus scores by various feature com-

binations. Images indicated with asterisks were used for parameter tuning of all compared meth-

ods. Note that although RGBCD and channel-coded RGBCD have similar results on the

tuning images, channel-coded RGBCD has a higher score on all evaluation images.

Even if all possible combinations of channel coeﬃcients are taken into account, the total

number of non-zero products has an upper bound which is linear in the number of observed

samples as observed by Felsberg and Granlund [2006].

In that work, three performance experiments have been conducted where a number M

of samples in spaces of dimension d D 1 : : : 9 have been encoded with N channels in each di-

mension, see Figure 3.9. e comparison of time consumption for M D 5;000 and M D 10;000

samples and N D 10 shows a proportional increase with the number of samples. e comparison

of N D 10 and N D 20 for M D 10;000 samples shows no increase of time consumption. us,

if sparse data structures are used to store the coeﬃcient matrices, a brute-force outer product

of all dimensions is feasible and still allows for real-time performance [Felsberg and Hedborg,

2007b, Pagani et al., 2009].

Interestingly, the absence of observations for most of the products gets completely dif-

ferent meanings in a frequentist and a bayesian interpretation: e former simply assumes zero

probability, whereas the latter computes a non-zero probability depending on the assumed prior

3.4. MULTI-DIMENSIONAL CODING 29

987654321

0.1

0.2

0.3

0.4

Figure 3.9: Time consumption for encoding, implemented in Matlab on a Powerbook with

1 GHz PPC. Ordinate: dimensions d D 1 : : : 9, abscissa: time in seconds (averaged over 10

runs). Dashed line: 5,000 samples, 10

channels. Solid line: 10,000 samples, 10

channels. Dot-

ted line: 10,000 samples, 20

channels. Figure from Felsberg and Hedborg [2007b].

(e.g., a Dirichlet distribution, see also Chapter 6) and the total number of observations falling

into other bins. However, in either case, the estimate for the density can be computed from the

sparse data structure.

With the introduction of P-channels [Felsberg and Granlund, 2006] a computationally

more eﬃcient approximation to compute linear B-spline kernels has been introduced. Besides

the computational considerations, also explicit spatial pooling has been suggested.

Instead of computing local averages over image regions and maintaining the image res-

olution, as e.g., suggested by Felsberg and Granlund [2003], the channel representations are

simultaneously low-pass ﬁltered and down-sampled, reducing the computational eﬀort further.

is spatial pooling is explicit, because the local coordinates are also considered as part of the

random vector. e vector now contains a part with spatial coordinates and a feature part, to-

gether forming the spatio-featural space proposed by Jonsson and Felsberg [2009]. is approach

will be further considered in the subsequent chapter.

SUMMARY

In this chapter we have introduced the channel encoding of visual features. e resulting rep-

resentation is an eﬃcient approximation of a kernel density estimator for the distribution of

the encoded features. We have related the approach to population coding, DFs, and orientation

scores. In the subsequent chapter we will now focus on spatial maps of channel encoded features,

i.e., CCFMs and their relation to popular 2D features such as SIFT and HOG and 3D features

such as SHOT.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Channel-Coded Feature Maps

Create new playlist

Sign In

Sign Up

Table of Contents for
Channel-Coded Feature Maps