64 6. PROBABILISTIC INTERPRETATION OF CHANNEL REPRESENTATIONS
Similarly, we can compute a closed form solution for the Hellinger divergence of Dirichlet
distributions
H
˛
.p.Pjc/kp.Pjc
0
// D
1
˛ 1
B.c
0
/
˛1
B.c
˛
/
B.c/
˛
1
; (6.26)
but for numerical reasons, the Rényi divergence is preferable as it can be computed using log-
Beta functions betaln./
R
˛
.p.Pjc/kp.Pjc
0
// D betaln.c
0
/ C
1
˛ 1
betaln.c
˛
/
˛
˛ 1
betaln.c/ : (6.27)
A further reason to prefer Rényi divergence over Hellinger divergence is the respective sensitivity
to small perturbations. e derivative of the Hellinger divergence with respect to single channel
coefficients scales with the divergence itself, which leads to a lack of robustness. In contrast,
@R
˛
.p.Pjc/kp.Pjc
0
//
@c
0
n
D .c
0
n
/ .jc
0
j/ C .jc
˛
j/ .c
˛;n
/ ; (6.28)
where .c
n
/ D
0
.c
n
/
.c
n
/
is the digamma function, see, e.g., Van Trees et al. [2013, p. 104].
us, the Rényi divergences of the posteriors estimated from c and c
0
are candidates for
suitable distance measures. Unfortunately, robustness against outliers is still limited and the
introduction of an outlier process as in (6.25) is analytically cumbersome.
Also, the fully symmetric setting is less common in practice, but occurs, e.g., in the com-
putation of affinity matrices for spectral clustering. Most cases aim at the comparison of a new
measurement with previously acquired ones and the posterior predictive (6.25) is more suitable.
e proposed distances have been discussed assuming one-dimensional distributions, but
the results generalize to higher dimensions, both for independent and dependent stochastic
variables. Obviously, it is an advantage to have uniform marginals and for that case, dependent
joint distributions correspond to a non-constant Copula distribution; see Section 6.4.
6.4 UNIFORMIZATION AND COPULA ESTIMATION
As mentioned in Section 6.1, we often assume a uniform prior for the channel vector. However,
if we compute the marginal distribution from the posterior distribution for a large dataset, the
components of a channel vector might be highly unbalanced. is issue can be addressed by
placing channels in a non-regular way, according to the marginal distribution, i.e., with high
channel density where samples are likely.
is placement is obtained by mapping samples using the cumulative density function of
the distribution from which the samples are drawn. e cumulative density can be computed
from the estimated distribution as obtained from maximum entropy decoding; see Section 5.3.
e subsequent procedure has been proposed by Öäll and Felsberg [2017].