The previous example uses the Gaussian distribution for features that are essentially binary (UP = 1 and DOWN = 0) to represent the change in value. The mean value is computed as the ratio of the number of observations for which xi = UP over the total number of observations.
As stated in the first section, the Gaussian distribution is more appropriate for either continuous features or binary features for very large labeled datasets. The example is the perfect candidate for the Bernoulli model.
The Bernoulli model differs from the Naïve Bayes classifier in such a way that it penalizes the feature x that does not have any observation; the Naïve Bayes classifier ignores it [5:10].
The implementation of the Bernoulli model consists of modifying the score
function in the Likelihood
class using the Bernoulli density method, bernoulli
, defined in the Stats
object:
object Stats { def bernoulli(mean: Double, p: Int): Double = mean*p + (1-mean)*(1-p) def bernoulli(x: Double*): Double = bernoulli(x(0), x(1).toInt) …
The first version of the Bernoulli algorithm is the direct implementation of the M8 mathematical formula. The second version uses the signature of the Density (Double*) => Double
type.
The mean value is the same as in the Gaussian density function. The binary feature is implemented as an Int
type with the value UP = 1 (with respect to DOWN = 0) for the upward (with respect to downward) direction of the financial technical indicator.