The hidden Markov model

The hidden Markov model has numerous applications related to speech recognition, face identification (biometrics), and pattern recognition in pictures and videos [7:3].

A hidden Markov model consists of a Markov process (also known as a Markov chain) for observations with a discrete time. The main difference with the Markov processes is that the states are not observable. A new observation is emitted with a probability known as the emission probability each time the state of the system or model changes.

There are two sources of randomness, which are as follows:

  • Transition between states
  • Emission of an observation when a state is given

Let's reuse the boxes and balls example. If the boxes are hidden states (nonobservable), then the user draws the balls whose color is not visible. The emission probability is the probability bik =p(ot = colork|qt =Si) to retrieve a ball of the color k from a hidden box I, as shown in the following diagram:

The hidden Markov model

The hidden Markov model for the balls and boxes example

In this example, we do not assume that all the boxes contain balls of different colors. We cannot make any assumptions on the order as defined by the transition aij. The HMM does not assume that the number of colors (observations) is identical to the number of boxes (states).

Note

Time invariance

Contrary to the Kalman filter, for example, the hidden Markov model requires that the transition elements, aji, are independent of time. This property is known as stationary or homogeneous restriction.

Keep in mind that the observations, in this case the color of the balls, are the only tangible data available to the experimenter. From this example, we can conclude that a formal HMM has three components:

  • A set of observations
  • A sequence of hidden states
  • A model that maximizes the joint probability of the observations and hidden states, known as the Lambda model

A Lambda model, λ, is composed of initial probabilities π, the probabilities of state transitions as defined by the matrix A, and the probabilities of states emitting one or more observations, as shown in the following diagram:

The hidden Markov model

The visualization of the HMM key components

The preceding diagram illustrates that, given a sequence of observations, the HMM tackles the following three problems known as canonical forms:

  • CF1 (evaluation): This evaluates the probability of a given sequence of observations Ot, given a model λ = (π, A, B)
  • CF2 (training): This identifies (or learns) a model λ = (π, A, B), given a set of observations O
  • CF3 (decoding): This estimates the state sequence Q with the highest probability to generate a given set of observations O and a model λ

The solution to these three problems uses dynamic programming techniques. However, we need to clarify the notations prior to diving into the mathematical foundation of the hidden Markov model.

Notations

One of the challenges of describing the hidden Markov model is the mathematical notation that sometimes differs from author to author. From now on, we will use the following notation:

 

Description

Formulation

N

The number of hidden states

 

S

A finite set of N hidden states

S = {S0, S1, … SN-1}

M

The number of observation symbols

 

qt

The state at time or step t

 

Q

A time sequence of states

Q = {q0, q1, … qn-1} = Q0:n-1

T

The number of observations

 

ot

The observation at time t

 

O

A finite sequence of T observations

O = {o0, o1, … oT-1} = O0:T-1

A

The state transition probability matrix

aji = p(qt+1=Si| qt=Sj)

B

The emission probability matrix

bjk = p(ot=Ok| qt=Sj)

π

The initial state probability vector

πi = p(q0=Sj)

λ

The hidden Markov model

λ = (π, A, B)

Note

Variance in the notation

Some authors use the symbol z to represent the hidden states instead of q and x to represent the observations O.

For convenience, let's simplify the notation of the sequence of observations and states using the condensed form: p(O0:T, qt| λ) = p(O0, O1, … OT, qt| λ). It is quite common to visualize a hidden Markov model with a lattice of states and observations, which is similar to our description of the boxes and balls examples, as shown here:

Notations

The formal HMM-directed graph

The state Si is observed as Ok at time t, before being transitioned to the state Sj observed as Om at the time t+1. The first step in the creation of our HMM is the definition of the class that implements the lambda model λ = (π, A, B) [7:4].

The lambda model

The three canonical forms of the hidden Markov model rely heavily on manipulation and operations on matrices and vectors. For convenience, let's define an HMMConfig class that contains the dimensions used in the HMM:

class HMMConfig(val numObs: Int, val numStates: Int, 
    val numSymbols: Int, val maxIters: Int, val eps: Double) 
    extends Config

The input parameters for the class are as follows:

  • numObs: This is the number of observations
  • numStates: This is the number of hidden states
  • numSymbols: This is the number of observation symbols or features
  • maxIters: This is the maximum number of iterations required for the HMM training
  • eps: This is the convergence criteria for the HMM training

Note

Consistency with a mathematical notation

The implementation uses numObs (with respect to numStates and numSymbols) to represent programmatically the number of observations T (with respect to the N hidden states and M features). As a general rule, the implementation reuses the mathematical symbols as much as possible.

The HMMConfig companion object defines the operations on ranges of index of matrix rows and columns. The foreach (line 1), foldLeft (/:) (line 2), and maxBy (line 3) methods are regularly used in each of the three canonical forms:

object HMMConfig {
  def foreach(i: Int, f: Int => Unit): Unit =
      Range(0, i).foreach(f)  //1
  def /:(i: Int, f: (Double, Int) => Double, zero: Double) = 
        Range(0, i)./:(zero)(f) //2
  def maxBy(i: Int, f: Int => Double): Int = 
      Range(0,i).maxBy(f)   //3
   … 
}

Note

The λ notation

The λ model in the HMM should not be confused with the regularization factor discussed in the Ln roughness penalty section in Chapter 6, Regression and Regularization.

As mentioned earlier, the lambda model is defined as a tuple of the transition probability matrix A, emission probability matrix B, and the initial probability π. It is easily implemented as an HMMModel class using the DMatrix class, as defined in the Utility classes section in the Appendix A, Basic Concepts. The simplest constructor for the HMMModel class is invoked in the case where the state-transition probability matrix, the emission probability matrix, and the initial states are known, as shown in the following code:

class HMMModel( val A: DMatrix, val B: DMatrix, var pi: DblArray, 
    val numObs: Int) {   //4
  val numStates = A.nRows
  val numSymbols = B.nCols

  def setAlpha(obsSeqNum: Vector[Int]): DMatrix
  def getAlphaVal(a: Double, i: Int, obsId: Int): Double
  def getBetaVal(b: Double, i: Int, obsId: Int): Double
  def update(gamma: Gamma, diGamma: DiGamma, 
      obsSeq: Vector[Int])
  def normalize: Unit
}

The constructor of the HMMModel class has the following four arguments (line 4):

  • A: This is the state transition probabilities matrix
  • B: This is the omission probabilities matrix
  • pi: This is the initial probability for the states
  • numObs: This is the number of observations

The number of states and symbols are extracted from the dimension of the A and B matrices.

The HMMModel class has several methods that will be described in detail whenever they are required for the execution of the model. The probabilities for the pi initial states are unknown, and therefore, they are initialized with a random generator of values [0, 1].

Note

Normalization

Input states and observation data may have to be normalized and converted to probabilities before we initialize the A and B matrices.

The other two components of the HMM are the sequence of observations and the sequence of hidden states.

Design

The canonical forms of the HMM are implemented through dynamic programming techniques. These techniques rely on variables that define the state of the execution of the HMM for any of the canonical forms:

  • Alpha (the forward pass): The probability of observing the first t < T observations for a specific state at Si for the observation t is αt(i) = p(O0:t, qt=Si|λ)
  • Beta (the backward pass): The probability of observing the remainder of the sequence qt for a specific state is βt(i) =p(Ot+1:T-1|qt=Si,λ)
  • Gamma: The probability of being in a specific state given a sequence of observations and a model is γt(i) =p(qt=Si|O0:T-1, λ)
  • Delta: This is the sequence that has the highest probability path for the first i observations defined for a specific test δt(i)
  • Qstar: This is the optimum sequence q* of states Q0:T-1
  • DiGamma: The probability of being in a specific state at t and another defined state at t + 1 given the sequence of observations and the model is γt(i,j) = p(qt=Si,qt+1=Sj|O0:T-1, λ)

Each of the parameters is described mathematically and programmatically in the section related to each specific canonical form. The Gamma and DiGamma classes are used and described in the evaluation canonical form. The DiGamma singleton is described as part of the Viterbi algorithm to extract the sequence of states with the highest probability given a λ model and a set of observations.

The list of dynamic programming-related algorithms used in any of the three canonical forms is visualized through the class hierarchy of our implementation of the HMM:

Design

Scala classes' hierarchy for HMM (the UML class diagram)

The UML diagram omits the utility traits and classes such as Monitor or the Apache Commons Math components.

The λ model, the HMM state, and the sequence of observations are all the elements needed to implement the three canonical cases. Each class is described as needed in the description of the three canonical forms of HMM. It is time to dive into the implementation details of each of the canonical forms, starting with the evaluation.

The execution of any of the three canonical forms relies on dynamic programming techniques (refer to the Overview of dynamic programming section in the Appendix A, Basic Concepts) [7:5]. The simplest of the dynamic programming techniques is a single traversal of the observations/state chain.

Evaluation – CF-1

The objective is to compute the probability (or likelihood) of the observed sequence Ot given a λ model. A dynamic programming technique is used to break down the probability of the sequence of observations into two probabilities (M1):

Evaluation – CF-1

The likelihood is computed by marginalizing over all the hidden states {Si} [7:6] (M2):

Evaluation – CF-1

If we use the notation introduced in the previous chapter for alpha and beta variables, the probability for the observed sequence Ot given a λ model can be expressed as follows (M3):

Evaluation – CF-1

The product of the α and β probabilities can potentially underflow. Therefore, it is recommended that you use the log of the probabilities instead of the probabilities.

Alpha – the forward pass

The computation of the probability of observing a specific sequence given a sequence of hidden states and a λ model relies on a two-pass algorithm. The alpha algorithm consists of the following steps:

  1. Compute the initial alpha value [M4]. The value is then normalized by the sum of alpha values across all the hidden states [M5].
  2. Compute the alpha value iteratively for the time 0 to time t, and then normalize it by the sum of alpha values for all states [M6].
  3. The final step is to compute of the log of the probability of observing the sequence [M7].

Note

Performance consideration

A direct computation of the probability of observing a specific sequence requires 2TN2 multiplications. The iterative alpha and beta classes reduce the number of multiplications to N2T.

For those with some inclination toward mathematics, the computation of the alpha matrix is defined in the following information box.

Note

Alpha (the forward pass)

M4: Initialization is defined as:

Alpha – the forward pass

M5: Normalization of initial values N – 1 is defined as:

Alpha – the forward pass

M6: Normalized summation is defined as:

Alpha – the forward pass

M7: The probability of observing a sequence given a lambda model and states is defined as:

Alpha – the forward pass

Let's take a look at the implementation of the alpha class in Scala, using the referenced number of the mathematical expressions of the alpha class. The alpha and beta values have to be normalized [M3], and therefore, we define an HMMTreillis base class for the alpha and beta algorithms that implements the normalization:

class HMMTreillis(numObs: Int, numStates: Int){ //5
  var treillis: DMatrix = _   //6
  val ct = Array.fill(numObs)(0.0) 

   def normalize(t: Int): Unit = { //7
     ct.update(t, /:(numStates, (s, n) => s + treillis(t, n)))
     treillis /= (t, ct(t))
   }
   def getTreillis: DMatrix = treillis
}

The HMMTreillis class has two configuration parameters: the number of observations, numObs, and the number of states, numStates (line 5). The treillis variable represents the scaling matrix used in the alpha (or forward) and beta (or backward) passes (line 6).

The normalization method, normalize, implements the M6 formula by recomputing the ct scaling factor (line 7).

Note

Computation efficiency

Scala's reduce, fold, and foreach methods are far more efficient iterators than the for loop. You need to keep in mind that the main purpose of the for loop in Scala is the monadic composition of the map and flatMap operations.

The computation of the alpha variable in the Alpha class follows the same computation flow as defined in the M4, M5, and M6 mathematical expressions:

class Alpha(lambda: HMMModel, obsSeq: Vector[Int]) //8
    extends HMMTreillis(lambda.numObs, lambda.numStates) {

  val alpha: Double = Try { 
    treillis = lambda.setAlpha(obsSeq) //9
    normalize(0)  //10
    sumUp  //11
  }.getOrElse(Double.NaN)
  
  override def isInitialized: Boolean = alpha != Double.NaN

  val last = lambda.numObs-1
  def sumUp: Double = {
    foreach(1, lambda.numObs, t => {
      updateAlpha(t) //12
      normalize(t)  //13
    })
    /:(lambda.numStates, (s,k) => s + treillis(last, k))
  }

  def updateAlpha(t: Int): Unit = 
    foreach(lambda.numStates, i => { //14
      val newAlpha = lambda.getAlphaVal(treillis(t-1, i)
      treillis += (t, i, newAlpha, i, obsSeq(t))) 
    })
  
  def logProb: Double = /:(lambda.numObs, (s,t) => //15
    s + Math.log(ct(t)), Math.log(alpha))
}

The Alpha class has two arguments: the lambda model and the obsSeq sequence of observations (line 8). The definition of the scaling factor alpha initializes the treillis scaling matrix using the HMMModel.setAlpha method (line 9), normalizes the initial value of the matrix by invoking the HMMTreillis.normalize method for the first observation (line 10), and sums the matrix element to return the scaling factor by invoking sumUp (line 11).

The setAlpha method implements the mathematical expression M4 as follows:

def setAlpha(obsSeq: Array[Int]): DMatrix = 
  Range(0,numStates)./:(DMatrix(numObs, numStates))((m,j) => 
      m += (0, j, pi(j)*B(j, obsSeq.head)))
}

The fold generates an instance of the DMatrix class, as described in the Utility classes section in the Appendix A, Basic Concepts.

The sumUp method implements the mathematical expression M6 as follows:

  • Update the treillis matrix of the scaling factor in the updateAlpha method (line 12)
  • Normalize all the scaling factors for all the remaining observations (line 13)

The updateAlpha method updates the treillis scaling matrix by computing all the alpha factors for all states (line 14). The logProb method implements the mathematical expression M7. It computes the logarithm of the probability of observing a specific sequence, given the sequence of states and a predefined λ model (line 15).

Note

The log probability

The logProb method computes the logarithm of the probability instead of the probability itself. The summation of the logarithm of probabilities is less likely to cause an underflow than the product of probabilities.

Beta – the backward pass

The computation of beta values is similar to the Alpha class except that the iteration executes backward on the sequence of states.

The implementation of Beta is similar to the alpha class:

  1. Compute (M5) and normalize (M6) the value of beta at t = 0 across states.
  2. Compute and normalize iteratively the beta value at time T - 1 to t, which is updated from its value at t + 1 (M7).

Note

Beta (the backward pass)

M8: Initialization of beta βT-1(t) = 1.

M9: Normalization of initial beta values is defined as:

Beta – the backward pass

M10: Normalized summation of beta is defined as:

Beta – the backward pass

The definition of the Beta class is very similar to the Alpha class:

class Beta(lambda: HMMModel, obsSeq: Vector[Int]) 
     extends HMMTreillis(lambda.numObs, lambda.numStates) {

  val initialized: Boolean  //16

  override def isInitialized: Boolean = initialized
  def sumUp: Unit =   //17
    (lambda.numObs-2 to 0 by -1).foreach(t => { //18
      updateBeta(t)  //19
      normalize(t) 
    })
   
   def updateBeta(t: Int): Unit =
     foreach(lambda.numStates, i => { 
       val newBeta = lambda.getBetaVal(treillis(t+1, i)
       treillis += (t, i, newBeta, i, obsSeq(t+1))) //20
     })
}

Contrary to the Alpha class, the Beta class does not generate an output value. The Beta class has an initialized Boolean attribute to indicate whether the constructor has executed successfully (line 16). The constructor updates and normalizes the beta matrix by traversing the sequence of observations backward from before the last observation to the first:

val initialized: Boolean = Try {
  treillis = DMatrix(lambda.numObs, lambda.numStates)
  treillis += (lambda.numObs-1, 1.0) //21
  normalize(lambda.numObs-1)  //22
  sumUp  //23
}._toBoolean("Beta initialization failed")

The initialization of the treillis beta scaling matrix of the DMatrix type assigns the value 1.0 to the last observation (line 21) and normalizes the beta values for the last observation, as defined in M8 (line 22). It implements the mathematical expressions M9 and M10 by invoking the sumUp method (line 23).

The sumUp method is similar to Alpha.sumUp (line 17). It traverses the sequence of observations backward (line 18) and updates the beta scaling matrix, as defined in the mathematical expression M9 (line 19). The implementation of the mathematical expression M10 in the updateBeta method is similar to the alpha pass: it updates the treillis scaling matrix with the newBeta values computed in the lambda model (line 20).

Note

Constructors and initialization

The alpha and beta values are computed within the constructors of their respective classes. The client code has to validate these instances by invoking isInitialized.

What is the value of a model if it cannot be created? The next canonical form CF2 leverages dynamic programming and recursive functions to extract the λ model.

Training – CF-2

The objective of this canonical form is to extract the λ model given a set of observations and a sequence of states. It is similar to the training of a classifier. The simple dependency of a current state on the previous state enables an implementation using an iterative procedure, known as the Baum-Welch estimator or expectation-maximization (EM).

The Baum-Welch estimator (EM)

At its core, the algorithm consists of three steps and an iterative method, which is similar to the evaluation canonical form:

  1. Compute the probability π (the gamma value at t = 0) (M11).
  2. Compute and normalize the state's transition probabilities matrix A (M12).
  3. Compute and normalize the matrix of emission probabilities B (M13).
  4. Repeat steps 2 and 3 until the change of likelihood is insignificant.

The algorithm uses the digamma and summation gamma classes.

Note

The Baum-Welch algorithm

M11: The joint probability of the state qi at t and qj at t+1 (digamma) is defined as:

The Baum-Welch estimator (EM)
The Baum-Welch estimator (EM)

M12: The initial probabilities vector N−1 and sum of joint probabilities for all the states (gamma)are defined as:

The Baum-Welch estimator (EM)

M13: The update of the transition probabilities matrix is defined as:

The Baum-Welch estimator (EM)

M14: The update of the emission probabilities matrix is defined as:

The Baum-Welch estimator (EM)

The Baum-Welch algorithm is implemented in the BaumWelchEM class and requires the following two inputs (line 24):

  • The λ model, lambda, computed from the config configuration
  • The obsSeq sequence (vector) of observations

The code will be as follows:

class BaumWelchEM(config: HMMConfig, obsSeq: Vector[Int]) { //24
  val lambda = HMMModel(config)
  val diGamma = new DiGamma(lambda.numObs,lambda.numStates)//25
  val gamma = new Gamma(lambda.numObs, lambda.numStates) //26
  val maxLikelihood: Option[Double] //27
}

The DiGamma class defines the joint probabilities for any consecutive states (line 25):

class DiGamma(numObs: Int, numStates: Int) {
  val diGamma = Array.fill(numObs-1)(DMatrix(numStates))
  def update(alpha: DMatrix, beta: DMatrix, A: DMatrix, 
  B: DMatrix, obsSeq: Array[Int]): Try[Int]
}

The diGamma variable is an array of matrices that represents the joint probabilities of two consecutive states. It is initialized through an invocation of the update method, which implements the mathematical expression M11.

The Gamma class computes the sum of the joint probabilities across all the states (line 26):

class Gamma(numObs: Int, numStates: Int) {
  val gamma = DMatrix(numObs, numStates)
  def update(alpha: DMatrix, beta: DMatrix): Unit
}

The update method of the Gamma class implements the mathematical expression M12.

Note

Source code for Gamma and DiGamma

The Gamma and DiGamma classes implement the mathematical expressions for the Baum-Welch algorithm. The update method uses simple linear algebra and is not described; refer to the documented source code for details.

The maximum likelihood, maxLikelihood, for the sequence of states given an existing lambda model and a sequence of observations (line 27) is computed using the getLikelihood tail recursive method, as follows:

val maxLikelihood: Option[Double] = Try {

  @tailrec
  def getLikelihood(likelihood: Double, index: Int): Double ={
    lambda.update(gamma, diGamma, obsSeq) //28
    val _likelihood = frwrdBckwrdLattice   //29
    val diff = likelihood - _likelihood
      
    if( diff < config.eps ) _likelihood    //30
    else if (index >= config.maxIters)  //31
      throw new IllegalStateException(" … ")
    else getLikelihood(_likelihood, index+1) 
  }

  val max = getLikelihood(frwrdBckwrdLattice, 0)
  lambda.normalize   //32
  max
}._toOption("BaumWelchEM not initialized", logger)

The maxLikelihood value implements the mathematical expressions M13 and M14. The getLikelihood recursive method updates the lambda model matrices A and B and initial state probabilities pi (line 28). The likelihood for the sequence of states is recomputed using the forward-backward lattice algorithm implemented in the frwrBckwrdLattice method (line 29).

Note

Update of lambda model

The update method of the HMMModel object uses simple linear algebra and is not described; refer to the documented source code for details.

The core of the Baum-Welch expectation maximization is the iterative forward and backward update of the lattice of states and observations between time t and t + 1. The lattice-based iterative computation is illustrated in the following diagram:

The Baum-Welch estimator (EM)

The visualization of the HMM graph lattice for the Baum-Welch algorithm

The code will be as follows:

def frwrdBckwrdLattice: Double  = {
  val _alpha = Alpha(lambda, obsSeq) //33
  val beta = Beta(lambda, obsSeq).getTreillis //34
  val alphas = _alpha.getTreillis
  gamma.update(alphas, beta) //35
  diGamma.update(alphas, beta, lambda.A, lambda.B, obsSeq)
  _alpha.alpha
}

The forward-backward algorithm uses the Alpha class for the computation/update of the lambda model in the forward pass(line 33) and the Beta class for the update of lambda in the backward pass (line 34). The joint probabilities-related gamma and diGamma matrices are updated at each recursion (line 35), reflecting the iteration of the mathematical expressions M11 to M14.

The recursive computation of maxLikelihood exists if the algorithm converges (line 30). It throws an exception if the maximum number of recursions is exceeded (line 31).

Decoding – CF-3

This last canonical form consists of extracting the most likely sequence of states {qt} given a set of observations Ot and a λ model. Solving this problem requires, once again, a recursive algorithm.

The Viterbi algorithm

The extraction of the best state sequence (the sequence of a state that has the highest probability) is very time consuming. An alternative consists of applying a dynamic programming technique to find the best sequence {qt} through iteration. This algorithm is known as the Viterbi algorithm. Given a sequence of states {qt} and sequence of observations {oj}, the probability δt(i) for any sequence to have the highest probability path for the first T observations is defined for the state Si [7:7].

Note

The Viterbi algorithm

M12: The definition of the delta function is as follows:

The Viterbi algorithm

M13: Initialization of delta is defined as:

The Viterbi algorithm

M14: Recursive computation of delta is defined as:

The Viterbi algorithm

M15: The computation of the optimum state sequence {q} is defined as:

The Viterbi algorithm

The ViterbiPath class implements the Viterbi algorithm whose purpose is to compute the optimum sequence (or path) of states given a set of observations and a λ model. The optimum sequence or path of states is computed by maximizing the delta function.

The constructors for the ViterbiPath class have the same arguments as the forward, backward, and Baum-Welch algorithm: the lambda model and the set of observations obsSeq:

class ViterbiPath(lambda: HMMModel, obsSeq: Vector[Int]) {
  val nObs = lambda.numObs
  val nStates = lambda.numStates
  val psi = Array.fill(nObs)(Array.fill(nStates)(0)) //35
  val qStar = new QStar(nObs, nStates) //36
  
  val delta = { //37
    Range(0, nStates)./:(DMatrix(nObs, nStates))((m,n) => {
     psi(0)(n) = 0
     m += (0, n, lambda.pi(n) * lambda.B(n,obsSeq.head))
    })
  val path = HMMPrediction(viterbi(1), qStar()) //38
}

As seen in the preceding information box containing the mathematical expressions for the Viterbi algorithm, the following matrices have to be defined:

  • psi: This is the matrix of indices of nObs observations by indices of nStates states (line 35).
  • qStar: This is the optimum sequence of states at each recursion of the Viterbi algorithm (line 36).
  • delta: This is the sequence that has the highest probability path for the first n observations. It also sets the psi values for the first observation to 0 (line 37).

All members of the ViterbiPath class are private except path that defines the optimum sequence or path of states given the obsSeq observations (line 38).

The matrix that defines the maximum probability delta of any sequence of states given the lambda model and the obsSeq observation is initialized using the mathematical expression M13 (line 37). The predictive model returns the path or optimum sequence of states as an instance of HMMPrediction:

case class HMMPrediction(likelihood: Double, states: Array[Int])

The first argument of likelihood is computed by the viterbi recursive method. The indices of the states in the states optimum sequence is computed by the QStar class (line 38).

Let's take a look under the hood of the Viterbi recursive method:

@tailrec
def viterbi(t: Int): Double = {
  Range(0, numStates).foreach( updateMaxDelta(t, _)) //39

  if( t == obsSeq.size-1) {  //40
    val idxMaxDelta = Range(0, numStates)
                .map(i => (i, delta(t, i))).maxBy(_._2) //41
    qStar.update(t+1, idxMaxDelta._1)  //42
    idxMaxDelta._2
  }
  else viterbi(t+1)  //43
}

The recursion started on the second observation as the qStar, psi, and delta parameters have already been initialized in the constructor. The recursive implementation invokes the updateMaxDelta method to update the psi indexing matrix and the highest probability for any state, as follows:

def updateMaxDelta(t: Int, j: Int): Unit = {.
   val idxDelta = Range(0, nStates)
        .map(i => (i, delta(t-1, i)*lambda.A(i, j)))
        .maxBy(_._2)   //44
   psi(t)(j) = idxDelta._1
   delta += (t, j, idxDelta._2)  //45
}

The updateMaxDelta method implements the mathematical expression M14 that extracts the index of the state that maximizes psi (line 44). The delta probability matrix and the psi indexing matrix are updated accordingly (line 45).

The viterbi method is called recursively for the remaining observations except the last one (line 43). At the last observation of the obsSeq.size-1 index, the algorithm executes the mathematical expression M15, which is implemented in the QStar class (line 42).

Note

The QStar class

The QStar class and its update method use linear algebra and are not described here; refer to the documented source code and Scaladocs files for details.

This implementation of the decoding form of the hidden Markov model completes the description of the hidden Markov model and its implementation in Scala. Now, let's put this knowledge into practice.

Putting it all together

The main HMM class implements the three canonical forms. A view bound to an array of integers is used to parameterize the HMM class. We assume that a time series of continuous or pseudo-continuous values is quantized into discrete symbol values.

The @specialized annotation ensures that the byte code is generated for the Array[Int] primitive without executing the conversion implicitly declared by the bound view.

There are two modes that execute any of the three canonical forms of the hidden Markov model:

  • The ViterbiPath class: The constructor initializes/trains a model similar to any other learning algorithm, as described in the Design template for immutable classifiers section of the Appendix A, Basic Concepts. The constructor generates the model by executing the Baum-Welch algorithm. Once the model is successfully created, it can be used for decoding or evaluation.
  • The ViterbiPath object: The companion provides the decode and evaluate methods for the decoding and evaluation of the sequence of observations using HMM.

The two modes of operations are described in the following diagram:

Putting it all together

The computational flow for the hidden Markov model

Let's complete our implementation of the HMM with the definition of its class. The HMM class is defined as a data transformation using a model implicitly generated from an xt training set, as described in the Monadic data transformation section in Chapter 2, Hello World! (line 46):

class HMM[@specialized(Double) T <: AnyVal](
    config: HMMConfig,
    xt: XVSeries[T], 
    form: HMMForm)
    (implicit quantize: Array[T] => Int, f: T => Double) 
  extends ITransform[Array[T]](xt) with Monitor[Double] {//46

  type V = HMMPrediction  //47
  val obsSeq: Vector[Int] = xt.map(quantize(_)) //48

  val model: Option[HMMModel] = train  //49
  override def |> : PartialFunction[U, Try[V]] //50
}

The HMM constructor takes the following four arguments (line 46):

  • config: This is the configuration of the HMM that is the dimension of lambda model and execution parameters
  • xt: This is the multidimensional time series of observations whose features have the T type
  • form: This is the canonical form to be used once the model is generated (evaluation or decoding)
  • quantize: This is the quantization function that converts an observation of the Array[T] type to an Int type
  • f: This is the implicit conversion from the T type to Double

The constructor has to override the V type (HMMPrediction) of the output data (line 47) declared in the ITransform abstract class. The structure of the HMMPrediction class has been defined in the previous section.

The Monitor trait is used to collect the profiling information during training (refer to the Monitor section under Utility classes in the Appendix A, Basic Concepts).

The time series of xt observations is converted into a vector of obsSeq observed states by applying the quantize quantization function to each observation (line 48).

As with any supervised learning technique, the model is created through training (line 49). Finally, the |> polymorphic predictor invokes either the decode method or the evaluate method (line 50).

The train method consists of the execution of the Baum-Welch algorithm and returns the lambda model:

def train: Option[HMMModel] = Try {
  BaumWelchEM(config, obsSeq).lambda }.toOption

Finally. the |>predictor is a simple wrapper to the evaluation form (evaluate) and the decoding form (decode):

override def |> : PartialFunction[U, Try[V]] = {
  case x: Array[T] if(isModel && x.length > 1) => 
  form match {
    case _: EVALUATION => 
      evaluation(model.get, Vector[Int](quantize(x))
    case _: DECODING => 
       decoding(model.get, Vector[Int](quantize(x))
   }
}

The protected evaluation method of the HMM companion object is a wrapper around the Alpha computation:

def evaluation(model: HMMModel, 
    obsSeq: Vector[Int]): Try[HMMPrediction] = Try {
  HMMPrediction(-Alpha(model,obsSeq).logProb, obsSeq.toArray) 
}

The evaluate method of the HMM object exposes the evaluation canonical form:

def evaluate[T <: AnyVal]( model: HMMModel, 
    xt: XVSeries[T])(implicit quantize: Array[T] => Int, 
      f: T => Double): Option[HMMPrediction] =  
  evaluation(model, xt.map(quantize(_))).toOption

The decoding method wraps the Viterbi algorithm to extract the optimum sequence of states:

def decoding( model: HMMModel, obsSeq: Vector[Int]): 
     Try[HMMPrediction] = Try { 
  ViterbiPath(model, obsSeq).path
}

The decode method of the HMM object exposes the decoding canonical form:

def decode[T <: AnyVal](model: HMMModel, 
    xt: XVSeries[T])(implicit quantize: Array[T] => Int,
    f: T => Double): Option[HMMPrediction] =
  decoding(model, xt.map(quantize(_))).toOption

Note

Normalized probabilities input

You need to make sure that the input probabilities for the λ model for evaluating and decoding canonical forms are normalized—the sum of the probabilities of all the states for the π vector and A and B matrices are equal to 1. This validation code is omitted in the example code.

Test case 1 – training

Our first test case is to train an HMM to predict the sentiment of investors as measured by the weekly sentiment survey of the members of the American Association of Individual Investors (AAII) [7:8]. The goal is to compute the transition probabilities matrix A, the emission probabilities matrix B, and the steady state probability distribution π, given the observations and hidden states (training canonical forms).

We assume that the change in investor sentiments is independent of time, as required by the hidden Markov model.

The AAII sentiment survey grades the bullishness on the market in terms of percentage:

Test case 1 – training

The weekly AAII market sentiment (reproduced by courtesy from AAII)

The sentiment of investors is known as a contrarian indicator of the future direction of the stock market. Refer to the Terminology section in the Appendix A, Basic Concepts.

Let's select the ratio of the percentage of investors that are bullish over the percentage of investors that are bearish. The ratio is then normalized. The following table lists this:

Time

Bullish

Bearish

Neutral

Ratio

Normalized Ratio

t0

0.38

0.15

0.47

2.53

1.0

t1

0.41

0.25

0.34

1.68

0.53

t2

0.25

0.35

0.40

0.71

0.0

….

The sequence of nonnormalized observations (the ratio of bullish sentiments over bearish sentiments) is defined in a CSV file as follows:

val OBS_PATH = "resources/data/chap7/obsprob.csv"
val NUM_SYMBOLS = 6
val NUM_STATES = 5
val EPS = 1e-4
val MAX_ITERS = 150
val observations = Vector[Double](
   0.01, 0.72, 0.78, 0.56, 0.61, 0.56, 0.45, …  )

val quantize = (x: DblArray) => 
      (x.head* (NUM_STATES+1)).floor.toInt  //51
val xt = observations.map(Array[Double](_))

val config = HMMConfig(xt.size, NUM_STATES, NUM_SYMBOLS, 
    MAX_ITERS, EPS)
val hmm = HMM[Array[Int]](config,  xt) //52
show(s"Training):
${hmm.model.toString}")

The constructor for the HMM class requires a T => Array[Int] implicit conversion, which is implemented by the quantize function (line 51). The hmm.model model is created by instantiating an HMM class with a predefined configuration and an obsSeq sequence of observed states (line 52).

The training of the HMM generates the following state transition probabilities matrix:

A

1

2

3

4

5

1

0.090

0.026

0.056

0.046

0.150

2

0.094

0.123

0.074

0.058

0.0

3

0.093

0.169

0.087

0.061

0.056

4

0.033

0.342

0.017

0.031

0.147

5

0.386

0.47

0.314

0.541

0.271

The emission matrix is as follows:

B

1

2

3

4

5

6

1

0.203

0.313

0.511

0.722

0.264

0.307

2

0.149

0.729

0.258

0.389

0.324

0.471

3

0.305

0.617

0.427

0.596

0.189

0.186

4

0.207

0.312

0.351

0.653

0.358

0.442

5

0.674

0.520

0.248

0.294

0.259

0.03

Test case 2 – evaluation

The objective of the evaluation is to compute the probability of the xt observed data given a λ model (A0, B0, and PI0):

val A0 = Array[Array[Double]](
  Array[Double](0.21, 0.13, 0.25, 0.06, 0.11, 0.24),
  Array[Double](0.31, 0.17, 0.18, 0.04, 0.19, 0.11),
  …. 
) 
val B0 =  Array[Array[Double]](
  Array[Double](0.61, 0.39),
  Array[Double](0.54, 0.46),
  …  
)
val PI0 = Array[Double](
  0.26, 0.04, 0.11, 0.26, 0.19, 0.14)

val xt = Vector[Double](
  0.0, 1.0, 21.0, 1.0, 30.0, 0.0, 1.0, 0.0, …
).map(Array[Double](_))
val max = data.max
val min = data.min
implicit val quantize = (x: DblArray) => 
  ((x.head/(max - min) + min)*(B0.head.length-1)).toInt   //55
val lambda = HMMModel(
  DMatrix(A0), DMatrix(B0), PI0, xt.length) //53
evaluation(lambda, xt).map( _.toString).map(show(_)) //54

The model is created directly by converting the A0 state-transition probabilities and B0 emission probabilities as matrices of the DMatrix type (line 53). The evaluation method generates an HMMPrediction object, which is stringized, and then displays it in the standard output (line 54).

The quantization method consists of normalizing the input data over the number (or range) of symbols associated with the lambda model. The number of symbols is the size of the rows of the emission probabilities matrix B. In this case, the range of the input data is [0.0, 3.0]. The range is normalized using the linear transform f(x) = x/(max – min) + min, then adjusted for the number of symbols (or values for states) (line 55).

The quantize quantization function has to be explicitly defined before invoking the evaluation method.

Note

Test case for decoding

Refer to the source code and the API documents for the test case related to the decoding form.

HMM as a filtering technique

The evaluation form of the hidden Markov model is very suitable for filtering data for discrete states. Contrary to time series filters such as the Kalman filter introduced in the The discrete Kalman filter section in Chapter 3, Data Preprocessing, the HMM requires data to be stationary in order to create a reliable model. However, the hidden Markov model overcomes some of the limitations of analytical time series analysis. Filters and smoothing techniques assume that the noise (frequency mean, variance, and covariance) is known and usually follows a Gaussian distribution.

The hidden Markov model does not have such a restriction. Filtering techniques, such as moving averaging techniques, discrete Fourier transforms, and Kalman filters apply to both discrete and continuous states while the HMM does not. Moreover, the extended Kalman filter can estimate nonlinear states.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset