Chapter 7. Sequential Data Models

The universe of Markov models is vast and encompasses computational concepts such as the Markov decision process, discrete Markov, Markov chain Monte Carlo for Bayesian networks, and hidden Markov models.

Markov processes, and more specifically, the hidden Markov model (HMM), are commonly used in speech recognition, language translation, text classification, document tagging, and data compression and decoding.

The first section of this chapter introduces and describes the hidden Markov model with the full implementation of the three canonical forms of the hidden Markov model using Scala. This section details the different dynamic programming techniques used in the evaluation, decoding, and training of the hidden Markov model. The design of the classifier follows the same pattern as the logistic and linear regression.

The second and last section of the chapter is dedicated to a discriminative (labels conditional to observation) alternative to the hidden Markov model: conditional random fields. The open source CRF Java library authored by Sunita Sarawagi from the Indian Institute of Technology, Bombay, is used to create a predictive model using conditional random fields [7:1].

Markov decision processes

This first section also describes the basic concepts you need to know in order to understand, develop, and apply the hidden Markov model. The foundation of the Markovian universe is the concept known as the Markov property.

The Markov property

The Markov property is a characteristic of a stochastic process where the conditional probability distribution of a future state depends on the current state and not on its past states. In this case, the transition between the states occurs at a discrete time, and the Markov property is known as the discrete Markov chain.

The first-order discrete Markov chain

The following example is taken from Introduction to Machine Learning by E. Alpaydin [7:2].

Let's consider the following use case. N balls of different colors are hidden in N boxes (one each). The balls can have only three colors {Blue, Red, and Green}. The experimenter draws the balls one by one. The state of the discovery process is defined by the color of latest ball drawn from one of the boxes: S0 = Blue, S1 = Red, and S2 = Green.

Let 0, π1, π2} be the initial probabilities for having an initial set of color in each of the boxes.

Let qt denote the color of the ball drawn at the time t. The probability of drawing a ball of color Sk at the time k after drawing a ball of the color Sj at the time j is defined as p(qt= Sk| qt-1= Sj) = ajk. The probability to draw a red ball in the first attempt is p(qt0= S1) = π1. The probability to draw a blue ball in the second attempt is p(q0= S1) p(q1= S0|q0= S1) = π1 a10. The process is repeated to create a sequence of the state {St} = {Red, Blue, Blue, Green, …} with the following probability:

The first-order discrete Markov chain

The sequence of states/colors can be represented as follows:

The first-order discrete Markov chain

Illustration of the ball and boxes example

Let's estimate the probabilities p using historical data (learning phase):

  1. The estimation of the probability to draw a red ball (S1) in the first attempt is π1, which is computed as the number of sequences starting with S1 (red) / total number of balls.
  2. The estimation of the probability of retrieving a blue ball in the second attempt is a10, the number of sequences for which a blue ball is drawn after a red ball / total number of sequences, and so on.

Note

Nth-order Markov

The Markov property is popular mainly because of its simplicity. As you will discover while studying the Hidden Markov model, having a state solely dependent on the previous state allows us to apply efficient dynamic programming techniques. However, some problems require dependencies between more than two states. These models are known as Markov random fields.

Although the discrete Markov process can be applied to trial and error types of applications, its applicability is limited to solving problems for which the observations do not depend on hidden states. Hidden Markov models are a commonly applied technique to meet such a challenge.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset