6.1. ON THE DISTRIBUTION OF CHANNEL VALUES 59
is the continuous generalization of the factorial function .s 1/Š for s 2 N.
Obviously, we are interested in the posterior rather than the likelihood and therefore we
also have to consider the prior of the multinomial distribution. e conjugate prior of the multi-
nomial distribution is the Dirichlet distribution
p.P/ D Dir.Pj˛
˛
˛/ D
.
P
N
nD1
˛
n
/
Q
N
nD1
.˛
n
/
N
Y
nD1
P
˛
n
1
n
; (6.6)
where the concentration parameters ˛
˛
˛ D .˛
1
; : : : ; ˛
N
/ are positive reals and small values prefer
sparse distributions [Hutter, 2013]. Since we do not have a reason to assume different initial
concentrations for different bins, we consider the symmetric Dirichlet distribution as prior, i.e.,
˛
1
D : : : D ˛
N
D ˛,
Dir.Pj˛/ D
.˛N /
.˛/
N
N
Y
nD1
P
˛1
n
: (6.7)
us, the posterior distribution p.Pjc/ is proportional to
p.cjP/Dir.Pj˛/ D
.M C 1/
Q
N
nD1
.c
n
C 1/
N
Y
nD1
P
c
n
n
.˛N /
.˛/
N
N
Y
nD1
P
˛1
n
(6.8)
D
.M C 1/
Q
N
nD1
.c
n
C 1/
.
P
N
nD1
˛/
Q
N
nD1
.˛/
N
Y
nD1
P
c
n
C˛1
n
(6.9)
/
.
P
N
nD1
c
n
C ˛/
Q
N
nD1
.c
n
C ˛/
N
Y
nD1
P
c
n
C˛1
n
(6.10)
D Dir.Pjc C ˛/ ; (6.11)
and the posterior distribution is a Dirichlet distribution with concentration parameter vector c C
˛. e posterior distribution is useful as it allows to compute divergences between histograms
in statistically correct sense, from a Bayesian point of view; see Section 6.3.
From the posterior distribution we can compute the posterior predictive p.c
0
jc/ by in-
tegrating the product of likelihood and posterior distribution over the probability simplex S
N
,
resulting in a predicted histogram; see Section 6.2.
Finally, by integrating the posterior distribution over the probability simplex S
N
, we ob-
tain the marginal distribution of the data, relevant for determining the correct Copula (in a
Bayesian sense); see Section 6.4.
So far, we have only been considering histograms, i.e., rectangular kernel functions. Fol-
lowing the arguments of Scott [1992], the same statistical properties are obtained for linear
interpolation between histogram bins. In the most general case, however, and in particular for
the most useful kernels such as cos
2
, the correlation between neighbored bins violates the as-
sumption about independent events for the respective bins.