The hidden Markov model has numerous applications related to speech recognition, face identification (biometrics), and pattern recognition in pictures and videos [7:3].
A hidden Markov model consists of a Markov process (also known as a Markov chain) for observations with a discrete time. The main difference with the Markov processes is that the states are not observable. A new observation is emitted with a probability known as the emission probability each time the state of the system or model changes.
There are two sources of randomness, which are as follows:
Let's reuse the boxes and balls example. If the boxes are hidden states (nonobservable), then the user draws the balls whose color is not visible. The emission probability is the probability bik =p(ot = colork|qt =Si) to retrieve a ball of the color k from a hidden box I, as shown in the following diagram:
In this example, we do not assume that all the boxes contain balls of different colors. We cannot make any assumptions on the order as defined by the transition aij. The HMM does not assume that the number of colors (observations) is identical to the number of boxes (states).
Keep in mind that the observations, in this case the color of the balls, are the only tangible data available to the experimenter. From this example, we can conclude that a formal HMM has three components:
A Lambda model, λ, is composed of initial probabilities π, the probabilities of state transitions as defined by the matrix A, and the probabilities of states emitting one or more observations, as shown in the following diagram:
The preceding diagram illustrates that, given a sequence of observations, the HMM tackles the following three problems known as canonical forms:
The solution to these three problems uses dynamic programming techniques. However, we need to clarify the notations prior to diving into the mathematical foundation of the hidden Markov model.
One of the challenges of describing the hidden Markov model is the mathematical notation that sometimes differs from author to author. From now on, we will use the following notation:
Description |
Formulation | |
---|---|---|
N |
The number of hidden states | |
S |
A finite set of N hidden states |
S = {S0, S1, … SN-1} |
M |
The number of observation symbols | |
qt |
The state at time or step t | |
Q |
A time sequence of states |
Q = {q0, q1, … qn-1} = Q0:n-1 |
T |
The number of observations | |
ot |
The observation at time t | |
O |
A finite sequence of T observations |
O = {o0, o1, … oT-1} = O0:T-1 |
A |
The state transition probability matrix |
aji = p(qt+1=Si| qt=Sj) |
B |
The emission probability matrix |
bjk = p(ot=Ok| qt=Sj) |
π |
The initial state probability vector |
πi = p(q0=Sj) |
λ |
The hidden Markov model |
λ = (π, A, B) |
For convenience, let's simplify the notation of the sequence of observations and states using the condensed form: p(O0:T, qt| λ) = p(O0, O1, … OT, qt| λ). It is quite common to visualize a hidden Markov model with a lattice of states and observations, which is similar to our description of the boxes and balls examples, as shown here:
The state Si is observed as Ok at time t, before being transitioned to the state Sj observed as Om at the time t+1. The first step in the creation of our HMM is the definition of the class that implements the lambda model λ = (π, A, B) [7:4].
The three canonical forms of the hidden Markov model rely heavily on manipulation and operations on matrices and vectors. For convenience, let's define an HMMConfig
class that contains the dimensions used in the HMM:
class HMMConfig(val numObs: Int, val numStates: Int, val numSymbols: Int, val maxIters: Int, val eps: Double) extends Config
The input parameters for the class are as follows:
numObs
: This is the number of observationsnumStates
: This is the number of hidden statesnumSymbols
: This is the number of observation symbols or featuresmaxIters
: This is the maximum number of iterations required for the HMM trainingeps
: This is the convergence criteria for the HMM trainingConsistency with a mathematical notation
The implementation uses numObs
(with respect to numStates
and numSymbols
) to represent programmatically the number of observations T
(with respect to the N
hidden states and M
features). As a general rule, the implementation reuses the mathematical symbols as much as possible.
The HMMConfig
companion object defines the operations on ranges of index of matrix rows and columns. The foreach
(line 1
), foldLeft
(/:
) (line 2
), and maxBy
(line 3
) methods are regularly used in each of the three canonical forms:
object HMMConfig { def foreach(i: Int, f: Int => Unit): Unit = Range(0, i).foreach(f) //1 def /:(i: Int, f: (Double, Int) => Double, zero: Double) = Range(0, i)./:(zero)(f) //2 def maxBy(i: Int, f: Int => Double): Int = Range(0,i).maxBy(f) //3 … }
The λ notation
The λ model in the HMM should not be confused with the regularization factor discussed in the Ln roughness penalty section in Chapter 6, Regression and Regularization.
As mentioned earlier, the lambda model is defined as a tuple of the transition probability matrix A, emission probability matrix B, and the initial probability π. It is easily implemented as an HMMModel
class using the DMatrix
class, as defined in the Utility classes section in the Appendix A, Basic Concepts. The simplest constructor for the HMMModel
class is invoked in the case where the state-transition probability matrix, the emission probability matrix, and the initial states are known, as shown in the following code:
class HMMModel( val A: DMatrix, val B: DMatrix, var pi: DblArray, val numObs: Int) { //4 val numStates = A.nRows val numSymbols = B.nCols def setAlpha(obsSeqNum: Vector[Int]): DMatrix def getAlphaVal(a: Double, i: Int, obsId: Int): Double def getBetaVal(b: Double, i: Int, obsId: Int): Double def update(gamma: Gamma, diGamma: DiGamma, obsSeq: Vector[Int]) def normalize: Unit }
The constructor of the HMMModel
class has the following four arguments (line 4
):
A
: This is the state transition probabilities matrixB
: This is the omission probabilities matrixpi
: This is the initial probability for the statesnumObs
: This is the number of observationsThe number of states and symbols are extracted from the dimension of the A
and B
matrices.
The HMMModel
class has several methods that will be described in detail whenever they are required for the execution of the model. The probabilities for the pi
initial states are unknown, and therefore, they are initialized with a random generator of values [0, 1].
The other two components of the HMM are the sequence of observations and the sequence of hidden states.
The canonical forms of the HMM are implemented through dynamic programming techniques. These techniques rely on variables that define the state of the execution of the HMM for any of the canonical forms:
Alpha
(the forward pass): The probability of observing the first t < T observations for a specific state at Si for the observation t is αt(i) = p(O0:t, qt=Si|λ)Beta
(the backward pass): The probability of observing the remainder of the sequence qt for a specific state is βt(i) =p(Ot+1:T-1|qt=Si,λ)Gamma
: The probability of being in a specific state given a sequence of observations and a model is γt(i) =p(qt=Si|O0:T-1, λ)Delta
: This is the sequence that has the highest probability path for the first i observations defined for a specific test δt(i)Qstar
: This is the optimum sequence q* of states Q0:T-1DiGamma
: The probability of being in a specific state at t and another defined state at t + 1 given the sequence of observations and the model is γt(i,j) = p(qt=Si,qt+1=Sj|O0:T-1, λ)Each of the parameters is described mathematically and programmatically in the section related to each specific canonical form. The Gamma
and DiGamma
classes are used and described in the evaluation canonical form. The DiGamma
singleton is described as part of the Viterbi algorithm to extract the sequence of states with the highest probability given a λ model and a set of observations.
The list of dynamic programming-related algorithms used in any of the three canonical forms is visualized through the class hierarchy of our implementation of the HMM:
The UML diagram omits the utility traits and classes such as Monitor
or the Apache Commons Math components.
The λ model, the HMM state, and the sequence of observations are all the elements needed to implement the three canonical cases. Each class is described as needed in the description of the three canonical forms of HMM. It is time to dive into the implementation details of each of the canonical forms, starting with the evaluation.
The execution of any of the three canonical forms relies on dynamic programming techniques (refer to the Overview of dynamic programming section in the Appendix A, Basic Concepts) [7:5]. The simplest of the dynamic programming techniques is a single traversal of the observations/state chain.
The objective is to compute the probability (or likelihood) of the observed sequence Ot given a λ model. A dynamic programming technique is used to break down the probability of the sequence of observations into two probabilities (M1):
The likelihood is computed by marginalizing over all the hidden states {Si} [7:6] (M2):
If we use the notation introduced in the previous chapter for alpha and beta variables, the probability for the observed sequence Ot given a λ model can be expressed as follows (M3):
The product of the α and β probabilities can potentially underflow. Therefore, it is recommended that you use the log of the probabilities instead of the probabilities.
The computation of the probability of observing a specific sequence given a sequence of hidden states and a λ model relies on a two-pass algorithm. The alpha algorithm consists of the following steps:
For those with some inclination toward mathematics, the computation of the alpha matrix is defined in the following information box.
Let's take a look at the implementation of the alpha class in Scala, using the referenced number of the mathematical expressions of the alpha class. The alpha and beta values have to be normalized [M3], and therefore, we define an HMMTreillis
base class for the alpha and beta algorithms that implements the normalization:
class HMMTreillis(numObs: Int, numStates: Int){ //5 var treillis: DMatrix = _ //6 val ct = Array.fill(numObs)(0.0) def normalize(t: Int): Unit = { //7 ct.update(t, /:(numStates, (s, n) => s + treillis(t, n))) treillis /= (t, ct(t)) } def getTreillis: DMatrix = treillis }
The HMMTreillis
class has two configuration parameters: the number of observations, numObs
, and the number of states, numStates
(line 5
). The treillis
variable represents the scaling matrix used in the alpha (or forward) and beta (or backward) passes (line 6
).
The normalization method, normalize
, implements the M6 formula by recomputing the ct
scaling factor (line 7
).
The computation of the alpha
variable in the Alpha
class follows the same computation flow as defined in the M4, M5, and M6 mathematical expressions:
class Alpha(lambda: HMMModel, obsSeq: Vector[Int]) //8 extends HMMTreillis(lambda.numObs, lambda.numStates) { val alpha: Double = Try { treillis = lambda.setAlpha(obsSeq) //9 normalize(0) //10 sumUp //11 }.getOrElse(Double.NaN) override def isInitialized: Boolean = alpha != Double.NaN val last = lambda.numObs-1 def sumUp: Double = { foreach(1, lambda.numObs, t => { updateAlpha(t) //12 normalize(t) //13 }) /:(lambda.numStates, (s,k) => s + treillis(last, k)) } def updateAlpha(t: Int): Unit = foreach(lambda.numStates, i => { //14 val newAlpha = lambda.getAlphaVal(treillis(t-1, i) treillis += (t, i, newAlpha, i, obsSeq(t))) }) def logProb: Double = /:(lambda.numObs, (s,t) => //15 s + Math.log(ct(t)), Math.log(alpha)) }
The Alpha
class has two arguments: the lambda
model and the obsSeq
sequence of observations (line 8
). The definition of the scaling factor alpha
initializes the treillis
scaling matrix using the HMMModel.setAlpha
method (line 9
), normalizes the initial value of the matrix by invoking the HMMTreillis.normalize
method for the first observation (line 10
), and sums the matrix element to return the scaling factor by invoking sumUp
(line 11
).
The setAlpha
method implements the mathematical expression M4 as follows:
def setAlpha(obsSeq: Array[Int]): DMatrix =
Range(0,numStates)./:(DMatrix(numObs, numStates))((m,j) =>
m += (0, j, pi(j)*B(j, obsSeq.head)))
}
The fold generates an instance of the DMatrix
class, as described in the Utility classes section in the Appendix A, Basic Concepts.
The sumUp
method implements the mathematical expression M6 as follows:
treillis
matrix of the scaling factor in the updateAlpha
method (line 12
)13
)The updateAlpha
method updates the treillis
scaling matrix by computing all the alpha
factors for all states (line 14
). The logProb
method implements the mathematical expression M7. It computes the logarithm of the probability of observing a specific sequence, given the sequence of states and a predefined λ model (line 15
).
The computation of beta values is similar to the Alpha
class except that the iteration executes backward on the sequence of states.
The implementation of Beta
is similar to the alpha class:
The definition of the Beta
class is very similar to the Alpha
class:
class Beta(lambda: HMMModel, obsSeq: Vector[Int]) extends HMMTreillis(lambda.numObs, lambda.numStates) { val initialized: Boolean //16 override def isInitialized: Boolean = initialized def sumUp: Unit = //17 (lambda.numObs-2 to 0 by -1).foreach(t => { //18 updateBeta(t) //19 normalize(t) }) def updateBeta(t: Int): Unit = foreach(lambda.numStates, i => { val newBeta = lambda.getBetaVal(treillis(t+1, i) treillis += (t, i, newBeta, i, obsSeq(t+1))) //20 }) }
Contrary to the Alpha
class, the Beta
class does not generate an output value. The Beta
class has an initialized
Boolean attribute to indicate whether the constructor has executed successfully (line 16
). The constructor updates and normalizes the beta matrix by traversing the sequence of observations backward from before the last observation to the first:
val initialized: Boolean = Try { treillis = DMatrix(lambda.numObs, lambda.numStates) treillis += (lambda.numObs-1, 1.0) //21 normalize(lambda.numObs-1) //22 sumUp //23 }._toBoolean("Beta initialization failed")
The initialization of the treillis
beta scaling matrix of the DMatrix
type assigns the value 1.0
to the last observation (line 21
) and normalizes the beta values for the last observation, as defined in M8 (line 22
). It implements the mathematical expressions M9 and M10 by invoking the sumUp
method (line 23
).
The sumUp
method is similar to Alpha.sumUp
(line 17
). It traverses the sequence of observations backward (line 18
) and updates the beta scaling matrix, as defined in the mathematical expression M9 (line 19
). The implementation of the mathematical expression M10 in the updateBeta
method is similar to the alpha pass: it updates the treillis
scaling matrix with the newBeta
values computed in the lambda
model (line 20
).
What is the value of a model if it cannot be created? The next canonical form CF2 leverages dynamic programming and recursive functions to extract the λ model.
The objective of this canonical form is to extract the λ model given a set of observations and a sequence of states. It is similar to the training of a classifier. The simple dependency of a current state on the previous state enables an implementation using an iterative procedure, known as the Baum-Welch estimator or expectation-maximization (EM).
At its core, the algorithm consists of three steps and an iterative method, which is similar to the evaluation canonical form:
The algorithm uses the digamma and summation gamma classes.
The Baum-Welch algorithm
M11: The joint probability of the state qi at t and qj at t+1 (digamma) is defined as:
M12: The initial probabilities vector N−1 and sum of joint probabilities for all the states (gamma)are defined as:
M13: The update of the transition probabilities matrix is defined as:
M14: The update of the emission probabilities matrix is defined as:
The Baum-Welch algorithm is implemented in the BaumWelchEM
class and requires the following two inputs (line 24
):
lambda
, computed from the config
configurationobsSeq
sequence (vector) of observationsThe code will be as follows:
class BaumWelchEM(config: HMMConfig, obsSeq: Vector[Int]) { //24 val lambda = HMMModel(config) val diGamma = new DiGamma(lambda.numObs,lambda.numStates)//25 val gamma = new Gamma(lambda.numObs, lambda.numStates) //26 val maxLikelihood: Option[Double] //27 }
The DiGamma
class defines the joint probabilities for any consecutive states (line 25
):
class DiGamma(numObs: Int, numStates: Int) { val diGamma = Array.fill(numObs-1)(DMatrix(numStates)) def update(alpha: DMatrix, beta: DMatrix, A: DMatrix, B: DMatrix, obsSeq: Array[Int]): Try[Int] }
The diGamma
variable is an array of matrices that represents the joint probabilities of two consecutive states. It is initialized through an invocation of the update
method, which implements the mathematical expression M11.
The Gamma
class computes the sum of the joint probabilities across all the states (line 26
):
class Gamma(numObs: Int, numStates: Int) { val gamma = DMatrix(numObs, numStates) def update(alpha: DMatrix, beta: DMatrix): Unit }
The update
method of the Gamma
class implements the mathematical expression M12.
The maximum likelihood, maxLikelihood
, for the sequence of states given an existing lambda model and a sequence of observations (line 27
) is computed using the getLikelihood
tail recursive method, as follows:
val maxLikelihood: Option[Double] = Try { @tailrec def getLikelihood(likelihood: Double, index: Int): Double ={ lambda.update(gamma, diGamma, obsSeq) //28 val _likelihood = frwrdBckwrdLattice //29 val diff = likelihood - _likelihood if( diff < config.eps ) _likelihood //30 else if (index >= config.maxIters) //31 throw new IllegalStateException(" … ") else getLikelihood(_likelihood, index+1) } val max = getLikelihood(frwrdBckwrdLattice, 0) lambda.normalize //32 max }._toOption("BaumWelchEM not initialized", logger)
The maxLikelihood
value implements the mathematical expressions M13 and M14. The getLikelihood
recursive method updates the lambda model matrices A and B and initial state probabilities pi (line 28
). The likelihood for the sequence of states is recomputed using the forward-backward lattice algorithm implemented in the frwrBckwrdLattice
method (line 29
).
The core of the Baum-Welch expectation maximization is the iterative forward and backward update of the lattice of states and observations between time t and t + 1. The lattice-based iterative computation is illustrated in the following diagram:
The code will be as follows:
def frwrdBckwrdLattice: Double = { val _alpha = Alpha(lambda, obsSeq) //33 val beta = Beta(lambda, obsSeq).getTreillis //34 val alphas = _alpha.getTreillis gamma.update(alphas, beta) //35 diGamma.update(alphas, beta, lambda.A, lambda.B, obsSeq) _alpha.alpha }
The forward-backward algorithm uses the Alpha
class for the computation/update of the lambda
model in the forward pass(line 33
) and the Beta
class for the update of lambda
in the backward pass (line 34
). The joint probabilities-related gamma
and diGamma
matrices are updated at each recursion (line 35
), reflecting the iteration of the mathematical expressions M11 to M14.
The recursive computation of maxLikelihood
exists if the algorithm converges (line 30
). It throws an exception if the maximum number of recursions is exceeded (line 31
).
This last canonical form consists of extracting the most likely sequence of states {qt} given a set of observations Ot and a λ model. Solving this problem requires, once again, a recursive algorithm.
The extraction of the best state sequence (the sequence of a state that has the highest probability) is very time consuming. An alternative consists of applying a dynamic programming technique to find the best sequence {qt} through iteration. This algorithm is known as the Viterbi algorithm. Given a sequence of states {qt} and sequence of observations {oj}, the probability δt(i) for any sequence to have the highest probability path for the first T observations is defined for the state Si [7:7].
The ViterbiPath
class implements the Viterbi algorithm whose purpose is to compute the optimum sequence (or path) of states given a set of observations and a λ model. The optimum sequence or path of states is computed by maximizing the delta function.
The constructors for the ViterbiPath
class have the same arguments as the forward, backward, and Baum-Welch algorithm: the lambda
model and the set of observations obsSeq
:
class ViterbiPath(lambda: HMMModel
, obsSeq: Vector[Int]) {
val nObs = lambda.numObs
val nStates = lambda.numStates
val psi = Array.fill(nObs)(Array.fill(nStates)(0)) //35
val qStar = new QStar(nObs, nStates) //36
val delta = { //37
Range(0, nStates)./:(DMatrix(nObs, nStates))((m,n) => {
psi(0)(n) = 0
m += (0, n, lambda.pi(n) * lambda.B(n,obsSeq.head))
})
val path = HMMPrediction(viterbi(1), qStar()) //38
}
As seen in the preceding information box containing the mathematical expressions for the Viterbi algorithm, the following matrices have to be defined:
psi
: This is the matrix of indices of nObs
observations by indices of nStates
states (line 35
).qStar
: This is the optimum sequence of states at each recursion of the Viterbi algorithm (line 36
).delta
: This is the sequence that has the highest probability path for the first n observations. It also sets the psi
values for the first observation to 0 (line 37
).All members of the ViterbiPath
class are private except path
that defines the optimum sequence or path of states given the obsSeq
observations (line 38
).
The matrix that defines the maximum probability delta
of any sequence of states given the lambda
model and the obsSeq
observation is initialized using the mathematical expression M13 (line 37
). The predictive model returns the path or optimum sequence of states as an instance of HMMPrediction
:
case class HMMPrediction(likelihood: Double, states: Array[Int])
The first argument of likelihood
is computed by the viterbi
recursive method. The indices of the states in the states
optimum sequence is computed by the QStar
class (line 38
).
Let's take a look under the hood of the Viterbi recursive method:
@tailrec def viterbi(t: Int): Double = { Range(0, numStates).foreach( updateMaxDelta(t, _)) //39 if( t == obsSeq.size-1) { //40 val idxMaxDelta = Range(0, numStates) .map(i => (i, delta(t, i))).maxBy(_._2) //41 qStar.update(t+1, idxMaxDelta._1) //42 idxMaxDelta._2 } else viterbi(t+1) //43 }
The recursion started on the second observation as the qStar
, psi
, and delta
parameters have already been initialized in the constructor. The recursive implementation invokes the updateMaxDelta
method to update the psi
indexing matrix and the highest probability for any state, as follows:
def updateMaxDelta(t: Int, j: Int): Unit = {. val idxDelta = Range(0, nStates) .map(i => (i, delta(t-1, i)*lambda.A(i, j))) .maxBy(_._2) //44 psi(t)(j) = idxDelta._1 delta += (t, j, idxDelta._2) //45 }
The updateMaxDelta
method implements the mathematical expression M14 that extracts the index of the state that maximizes psi
(line 44
). The delta
probability matrix and the psi
indexing matrix are updated accordingly (line 45
).
The viterbi
method is called recursively for the remaining observations except the last one (line 43
). At the last observation of the obsSeq.size-1
index, the algorithm executes the mathematical expression M15, which is implemented in the QStar
class (line 42
).
This implementation of the decoding form of the hidden Markov model completes the description of the hidden Markov model and its implementation in Scala. Now, let's put this knowledge into practice.
The main HMM
class implements the three canonical forms. A view bound to an array of integers is used to parameterize the HMM
class. We assume that a time series of continuous or pseudo-continuous values is quantized into discrete symbol values.
The @specialized
annotation ensures that the byte code is generated for the Array[Int]
primitive without executing the conversion implicitly declared by the bound view.
There are two modes that execute any of the three canonical forms of the hidden Markov model:
ViterbiPath
class: The constructor initializes/trains a model similar to any other learning algorithm, as described in the Design template for immutable classifiers section of the Appendix A, Basic Concepts. The constructor generates the model by executing the Baum-Welch algorithm. Once the model is successfully created, it can be used for decoding or evaluation.ViterbiPath
object: The companion provides the decode
and evaluate
methods for the decoding and evaluation of the sequence of observations using HMM.The two modes of operations are described in the following diagram:
Let's complete our implementation of the HMM with the definition of its class. The HMM
class is defined as a data transformation using a model implicitly generated from an xt
training set, as described in the Monadic data transformation section in Chapter 2, Hello World! (line 46
):
class HMM[@specialized(Double) T <: AnyVal]( config: HMMConfig, xt: XVSeries[T], form: HMMForm) (implicit quantize: Array[T] => Int, f: T => Double) extends ITransform[Array[T]](xt) with Monitor[Double] {//46 type V = HMMPrediction //47 val obsSeq: Vector[Int] = xt.map(quantize(_)) //48 val model: Option[HMMModel] = train //49 override def |> : PartialFunction[U, Try[V]] //50 }
The HMM
constructor takes the following four arguments (line 46
):
config
: This is the configuration of the HMM that is the dimension of lambda
model and execution parametersxt
: This is the multidimensional time series of observations whose features have the T
typeform
: This is the canonical form to be used once the model is generated (evaluation or decoding)quantize
: This is the quantization function that converts an observation of the Array[T]
type to an Int
typef
: This is the implicit conversion from the T
type to Double
The constructor has to override the V
type (HMMPrediction
) of the output data (line 47
) declared in the ITransform
abstract class. The structure of the HMMPrediction
class has been defined in the previous section.
The Monitor
trait is used to collect the profiling information during training (refer to the Monitor section under Utility classes in the Appendix A, Basic Concepts).
The time series of xt
observations is converted into a vector of obsSeq
observed states by applying the quantize
quantization function to each observation (line 48
).
As with any supervised learning technique, the model is created through training (line 49
). Finally, the |>
polymorphic predictor invokes either the decode
method or the evaluate
method (line 50
).
The train
method consists of the execution of the Baum-Welch algorithm and returns the lambda
model:
def train: Option[HMMModel] = Try {
BaumWelchEM(config, obsSeq).lambda }.toOption
Finally. the |>
predictor is a simple wrapper to the evaluation form (evaluate
) and the decoding form (decode
):
override def |> : PartialFunction[U, Try[V]] = { case x: Array[T] if(isModel && x.length > 1) => form match { case _: EVALUATION => evaluation(model.get, Vector[Int](quantize(x)) case _: DECODING => decoding(model.get, Vector[Int](quantize(x)) } }
The protected evaluation
method of the HMM
companion object is a wrapper around the Alpha
computation:
def evaluation(model: HMMModel, obsSeq: Vector[Int]): Try[HMMPrediction] = Try { HMMPrediction(-Alpha(model,obsSeq).logProb, obsSeq.toArray) }
The evaluate
method of the HMM
object exposes the evaluation canonical form:
def evaluate[T <: AnyVal]( model: HMMModel, xt: XVSeries[T])(implicit quantize: Array[T] => Int, f: T => Double): Option[HMMPrediction] = evaluation(model, xt.map(quantize(_))).toOption
The decoding
method wraps the Viterbi algorithm to extract the optimum sequence of states:
def decoding( model: HMMModel, obsSeq: Vector[Int]): Try[HMMPrediction] = Try { ViterbiPath(model, obsSeq).path }
The decode
method of the HMM
object exposes the decoding canonical form:
def decode[T <: AnyVal](model: HMMModel, xt: XVSeries[T])(implicit quantize: Array[T] => Int, f: T => Double): Option[HMMPrediction] = decoding(model, xt.map(quantize(_))).toOption
Normalized probabilities input
You need to make sure that the input probabilities for the λ model for evaluating and decoding canonical forms are normalized—the sum of the probabilities of all the states for the π vector and A and B matrices are equal to 1. This validation code is omitted in the example code.
Our first test case is to train an HMM to predict the sentiment of investors as measured by the weekly sentiment survey of the members of the American Association of Individual Investors (AAII) [7:8]. The goal is to compute the transition probabilities matrix A, the emission probabilities matrix B, and the steady state probability distribution π, given the observations and hidden states (training canonical forms).
We assume that the change in investor sentiments is independent of time, as required by the hidden Markov model.
The AAII sentiment survey grades the bullishness on the market in terms of percentage:
The sentiment of investors is known as a contrarian indicator of the future direction of the stock market. Refer to the Terminology section in the Appendix A, Basic Concepts.
Let's select the ratio of the percentage of investors that are bullish over the percentage of investors that are bearish. The ratio is then normalized. The following table lists this:
Time |
Bullish |
Bearish |
Neutral |
Ratio |
Normalized Ratio |
---|---|---|---|---|---|
t0 |
0.38 |
0.15 |
0.47 |
2.53 |
1.0 |
t1 |
0.41 |
0.25 |
0.34 |
1.68 |
0.53 |
t2 |
0.25 |
0.35 |
0.40 |
0.71 |
0.0 |
… |
… |
… |
… |
… |
…. |
The sequence of nonnormalized observations (the ratio of bullish sentiments over bearish sentiments) is defined in a CSV file as follows:
val OBS_PATH = "resources/data/chap7/obsprob.csv" val NUM_SYMBOLS = 6 val NUM_STATES = 5 val EPS = 1e-4 val MAX_ITERS = 150 val observations = Vector[Double]( 0.01, 0.72, 0.78, 0.56, 0.61, 0.56, 0.45, … ) val quantize = (x: DblArray) => (x.head* (NUM_STATES+1)).floor.toInt //51 val xt = observations.map(Array[Double](_)) val config = HMMConfig(xt.size, NUM_STATES, NUM_SYMBOLS, MAX_ITERS, EPS) val hmm = HMM[Array[Int]](config, xt) //52 show(s"Training): ${hmm.model.toString}")
The constructor for the HMM
class requires a T => Array[Int]
implicit conversion, which is implemented by the quantize
function (line 51
). The hmm.model
model is created by instantiating an HMM
class with a predefined configuration and an obsSeq
sequence of observed states (line 52
).
The training of the HMM generates the following state transition probabilities matrix:
A |
1 |
2 |
3 |
4 |
5 |
---|---|---|---|---|---|
1 |
0.090 |
0.026 |
0.056 |
0.046 |
0.150 |
2 |
0.094 |
0.123 |
0.074 |
0.058 |
0.0 |
3 |
0.093 |
0.169 |
0.087 |
0.061 |
0.056 |
4 |
0.033 |
0.342 |
0.017 |
0.031 |
0.147 |
5 |
0.386 |
0.47 |
0.314 |
0.541 |
0.271 |
The emission matrix is as follows:
B |
1 |
2 |
3 |
4 |
5 |
6 |
---|---|---|---|---|---|---|
1 |
0.203 |
0.313 |
0.511 |
0.722 |
0.264 |
0.307 |
2 |
0.149 |
0.729 |
0.258 |
0.389 |
0.324 |
0.471 |
3 |
0.305 |
0.617 |
0.427 |
0.596 |
0.189 |
0.186 |
4 |
0.207 |
0.312 |
0.351 |
0.653 |
0.358 |
0.442 |
5 |
0.674 |
0.520 |
0.248 |
0.294 |
0.259 |
0.03 |
The objective of the evaluation is to compute the probability of the xt
observed data given a λ model (A0
, B0
, and PI0
):
val A0 = Array[Array[Double]]( Array[Double](0.21, 0.13, 0.25, 0.06, 0.11, 0.24), Array[Double](0.31, 0.17, 0.18, 0.04, 0.19, 0.11), …. ) val B0 = Array[Array[Double]]( Array[Double](0.61, 0.39), Array[Double](0.54, 0.46), … ) val PI0 = Array[Double]( 0.26, 0.04, 0.11, 0.26, 0.19, 0.14) val xt = Vector[Double]( 0.0, 1.0, 21.0, 1.0, 30.0, 0.0, 1.0, 0.0, … ).map(Array[Double](_)) val max = data.max val min = data.min implicit val quantize = (x: DblArray) => ((x.head/(max - min) + min)*(B0.head.length-1)).toInt //55 val lambda = HMMModel( DMatrix(A0), DMatrix(B0), PI0, xt.length) //53 evaluation(lambda, xt).map( _.toString).map(show(_)) //54
The model is created directly by converting the A0
state-transition probabilities and B0
emission probabilities as matrices of the DMatrix
type (line 53
). The evaluation method generates an HMMPrediction
object, which is stringized, and then displays it in the standard output (line 54
).
The quantization
method consists of normalizing the input data over the number (or range) of symbols associated with the lambda
model. The number of symbols is the size of the rows of the emission probabilities matrix B. In this case, the range of the input data is [0.0, 3.0]. The range is normalized using the linear transform f(x) = x/(max – min) + min, then adjusted for the number of symbols (or values for states) (line 55
).
The quantize
quantization function has to be explicitly defined before invoking the evaluation method.
The evaluation form of the hidden Markov model is very suitable for filtering data for discrete states. Contrary to time series filters such as the Kalman filter introduced in the The discrete Kalman filter section in Chapter 3, Data Preprocessing, the HMM requires data to be stationary in order to create a reliable model. However, the hidden Markov model overcomes some of the limitations of analytical time series analysis. Filters and smoothing techniques assume that the noise (frequency mean, variance, and covariance) is known and usually follows a Gaussian distribution.
The hidden Markov model does not have such a restriction. Filtering techniques, such as moving averaging techniques, discrete Fourier transforms, and Kalman filters apply to both discrete and continuous states while the HMM does not. Moreover, the extended Kalman filter can estimate nonlinear states.