Chapter 15. Discrete Channel Models

In this chapter we take a distinctly different approach to the problem of channel modeling. In our previous work, the channel was defined in terms of the noise, interference, and other disturbances that combine with the transmitted signal to produce a distorted and noisy waveform at the input to the receiver. The transmitted signal, as well as the noise, interference, and other channel disturbances, were all represented by samples of waveforms. The result was a waveform-level simulation that processed data on a sample-by-sample basis. We now consider partitioning the system in a way that eliminates much of the need for simulating at the waveform level. The result is a discrete channel model in which simulations operate on a symbol-by-symbol basis. The motivation for replacing a waveform-level channel model by a discrete channel model is speed of simulation. We will see that the discrete channel model is an abstraction of the physical (waveform) channel in which the channel is completely characterized by a small set of parameters. Determining these parameters constitutes an important part of the modeling process, and must be accomplished through measurements made on the physical channel or through the use of a single waveform-level simulation.

Introduction

The basic model of a communication system, illustrated in Figure 15.1, typically consists of a discrete data source, a channel coder for error control, a modulator and transmitter, a channel, a receiver, and a decoder. Depending on the application for the system and the detail of the simulation, other elements such as equalizers, interleavers, deinterleavers, carrier synchronizers, and symbol synchronizers, may be part of the system model. As we know, the modulator maps a symbol, or a sequence of symbols, at the modulator input onto a waveform at the modulator output. The waveform is subjected to a number of degrading effects in the channel. Among these effects are noise, bandlimiting, interference, and fading. All of these effects can be characterized by waveforms. The input to the receiver is also a waveform, which is the transmitted waveform combined with the degrading effects of the channel. The role of the receiver is to observe the input waveform over a symbol period, or over a sequence of symbols, and determine the transmitted symbols. The channel in this case is referred to as a waveform channel, since both the channel input and the channel output are characterized as waveforms. It is important to note that all system elements, with the exception of the channel, are characterized by deterministic mappings. The channel performs a stochastic, or random, mapping of the channel input to the channel output.

Communication system with discrete channel model.

Figure 15.1. Communication system with discrete channel model.

The terminology “discrete channel model” (DCM) is used to denote all the elements of a communication system that lie between the two points A and B in the system, where the input at point A is a vector of discrete symbols (input sequence), denoted X = [x1,x2, . . . ,xk, . . . ], and the output at B is another vector of discrete symbols (output sequence), denoted Y = [y1,y2, . . . , yk, . . . ]. Usually, point A will be the output of the channel encoder or, equivalently, the modulator input, and point B will be the input to the decoder as shown in Figure 15.1. Note that, with the partitioning shown in Figure 15.1, the modulator and transmitter, the waveform channel, and the receiver are part of the discrete channel model. The relationship between the discrete channel input vector X and the output vector Y will be affected by the system components, such as filters, and by the random disturbances induced in the physical (waveform) channel. In a binary communication system with hard decision receivers, the elements of both X and Y will be binary sequences. In the case of soft-decision receivers, the elements of Y will be from an M-ary alphabet. Transmission errors resulting from imperfections in the system elements between points A and B in Figure 15.1, including the physical channel, will cause elements of X and Y to be different, at least occasionally.

Discrete channel models describe the error-generation mechanism probabilistically. These models fall into two categories. The first class of these models, referred to as “memoryless” channel models, are used to model the transmission errors or the transitions from the channel input to the channel output under the assumption that there is no temporal correlation in the transition mechanism. That is, the input-output transition probabilities for the nth channel input symbol are not affected by what happened to any other input symbol. Such models are applicable to channels in which there is no inter-symbol interference (ISI) or fading, and the noise is AWGN. In a hard decision binary system, this assumption implies that bit errors are uncorrelated. Discrete channel models for memoryless channels are usually trivial to derive.

The second and more interesting class of discrete models apply to situations in which the transitions from the input symbols to the output symbols are temporally correlated. In this case the probability of error for the nth symbol depends on whether or not an error occurred in the transmission of previous symbols. The fading channel commonly encountered in wireless communications is a good example of a channel having correlated errors. Errors will tend to occur in bursts when the radio channel is in a deep fade, and these channels are referred to as burst error channels or channels with memory.

Discrete channel models are probabilistic models that are computationally more efficient than waveform channel models. The increased efficiency comes from two factors. Discrete models are simulated at the symbol rate, whereas waveform level models are typically simulated at 8 to 16 times the symbol rate. This alone reduces the computational burden by roughly an order of magnitude. While each individual block is simulated in detail in a waveform-level model, there is a high level of abstraction in the discrete channel model. This level of abstraction, as we will see, further reduces the computational burden. These two factors may well contribute to several orders of magnitude savings in the time required to execute a simulation.

In simple cases, discrete channel models can be derived analytically from the models of the underlying components between the discrete channel input A and the discrete channel output B. However, in most cases, discrete channel models are derived from simulated or measured error patterns between the points A and B. Discrete channel models are used to design and analyze the impact of components outside the portion of the system between points A and B. This includes, for example, error control coders, source encoders, and interleavers.

Modeling discrete memoryless channels is a straightforward process. For example, in the case of a discrete memoryless symmetric channel with binary input and binary output, all we need to know to characterize the channel is a single number, namely, the bit error probability. Simulation of this channel involves drawing a single random number and comparing that number to a threshold in order to decide whether or not a given bit will suffer a transmission error or be received without error. Thus, in order to simulate the transmission of a million bits through a given channel, all we have to do is generate a million independent uniformly distributed random numbers and perform the required threshold comparisons. The entire system represented by this channel is therefore simulated efficiently. (Compare this with a waveform-level simulation, which involves generating millions of samples representing the waveforms present in the system being simulated, and processing these samples though all of the functional blocks in the system.)

Discrete channels with memory are more difficult to model. This temporally correlated error generation mechanism is usually modeled by a discrete-time Markov sequence [1, 2, 3, 4] in which a state model is used to characterize the various states of the channel and a set of transition probabilities are used to capture the progression of the channel states. Each state will also have associated with it a set of input-to-output symbol transition probabilities. Thus, the model is now more complex and requires more parameters than the uncorrelated channel. Model structure and parameter values are estimated from either simulated or measured error patterns. Simulation of the discrete channel consists of generating a random number prior to the transmission of each symbol to determine the channel state, and then drawing another random number to determine the input-to-output transition. While the model and parameter estimation procedures are complex, simulation of the Markov model is very efficient. We will elaborate on these ideas in the following sections.

Discrete Memoryless Channel Models

In a discrete memoryless channel, the mapping of input to output is instantaneous and is described by a set of transition probabilities. The simplest discrete memoryless channel model is the binary symmetric channel (BSC), which is illustrated in Figure 15.2(a). The input to the discrete channel is a sequence of binary symbols, denoted by the vector X. The kth component of this vector, xk, corresponds to the kth channel input, and pk corresponds to the probability of error on the kth symbol transmission. For a memoryless channel, pk is independent of k, so the error on all symbols is effected by the channel in the same manner. (If the channel has memory, pk will, in general, be a function of pk−1.) A sequence of symbols of length M is processed through the memoryless channel by invoking the channel model M times in succession. For the kth symbol, a binary input of “0” is received correctly as “0” with a probability 1 − pk, and incorrectly as “1” with an error probability of pk. The kth channel output is denoted by yk and the M-symbol sequence of outputs is denoted by the vector Y. The channel is symmetric in that zeroes and ones are affected by the channel in the same manner. Note that one consequence of symmetry is that the error probability is independent of the transmitted symbol so that the error source can be simulated separately from the information (data) source.

Binary channel models.

Figure 15.2. Binary channel models.

For a binary channel, the input-output relationship can be expressed as

Equation 15.1. 

in which X and Y are the input and output data vectors, respectively, denotes the XOR operation, and E is the error vector. Specifically, E = {e1,e2,e3, . . .} is a binary vector or sequence having elements {0,1} in which ek = 0 denotes that the kth element of X, xk, is received correctly (yk = xk), and ek = 1 denotes that the kth element of X is received in error (yk ≠ xk). The parameter defining the performance of the binary symmetric channel model is the probability of error PE, which can be easily estimated from measurements or from a system simulation performed at the waveform level. Note that, for a binary system, the Monte Carlo estimate of the error probability is the Hamming weight of the error vector E divided by N, the number of elements in the vector E.[1]

Note that once pk is known, the error sequence corresponding to N transmitted symbols can be generated by making N calls to a uniform random number generator and comparing the random number to the threshold p. Specifically:

Equation 15.2. 

where Uk denotes the number obtained from the kth call to the random number generator. It is important to realize that the discrete channel model will not reproduce the original error pattern used to originally determine p. It will, however, reproduce a statistically equivalent error pattern. The simplicity of (15.2) illustrates why substitution of a DCM for the portion of the system between points A and B in Figure 15.1 results in a significant savings in computational complexity.

A slightly more complicated model for the binary channel is the binary nonsymmetric channel model in which the probability of error may be different for zeroes and ones, as in some optical communication systems. This model is illustrated in Figure 15.2(b). This model is characterized by a set of four input-to-output transition probabilities, but only two of these probabilities, αk and βk, are independent. The channel is nonsymmetric if αk ≠ βk.

Extension of this model to the case in which the input and output belong to an M-ary alphabet is straightforward and is shown in Figure 15.3 for M = 3. The channel is characterized by a set of probabilities αij, i,j = 0,1,2, in which αij represents the conditional probability

Equation 15.3. 

Discrete channel model with three inputs and three outputs.

Figure 15.3. Discrete channel model with three inputs and three outputs.

If the set of probabilities αij are constant, and therefore independent of k, the kth channel output depends only on the kth channel input. For this case, the model shown in Figure 15.3 is memoryless. The transition probabilities are estimated using a detailed waveform-level simulation in which the transitions from the ith input to the jth output are counted. In these models, the transition probabilities are estimated from simulated or measured data as

Equation 15.4. 

where Ni is the number of occurrences of the ith input symbol, and nij is the number of times the ith input symbol transitions to the jth output. Another example, shown in Figure 15.4, is used for memoryless channels in which the output of the channel is quantized to a different number of levels from the number of input levels, as in soft decision decoding.

<source>Source: M. C. Jeruchim, P. Balaban, and K. S. Shanmugan, Simulation of Communications Systems, 2nd ed., New York: Kluwer Academic/Plenum Publishers, 2000. </source>
Discrete channel model with two inputs and four outputs.

Figure 15.4. Discrete channel model with two inputs and four outputs.

Markov Models for Discrete Channels with Memory

For channels with memory, the most commonly used model is the discrete-time, finite-state Markov model (MM). There are several reasons for the popularity of Markov models. These models are analytically tractable, their theory is well established in the statistical literature, and they have been applied successfully to a variety of important communication problems. Markov models have been used to model the output of discrete information sources such as English text. (The occurrence of a sequence of letters of the English alphabet in a typical passage can be modeled as a Markov sequence.) Similarly, the sampled values of an audio or video waveform can also be modeled by an MM. Markov modeling techniques are directly applicable to the modeling and analysis of discrete communication channels [1]. Also, Markov models can be used to evaluate the capacity of a discrete channel and for the design of optimal error control coding techniques. Markov models have been successfully used to characterize fading channels in wireless communication systems [510]. Most importantly, computationally efficient techniques are available for estimating the parameters of Markov sequences from simulated or measured error patterns. As an introduction to the Markov Model, we first consider a simple two-state model. The results are then generalized to an N-state model.

Two-State Model

To set the stage for Markov channel models, consider a fading channel in which the received signal strength is above an acceptable performance threshold part of the time, and below the threshold during a deep fade. If we have interest only in the “above threshold” or “below threshold” conditions, we can model the channel to be in either one of the two states as shown in Figure 15.5. In Figure 15.5 we have a good state, g, in which the system performance is acceptable (PE < 10−3) and a bad state, b, in which the received signal level is so low that the probability of error is unacceptably high (PE > 10−3).

Assumed receiver input signal level and two-state model.

Figure 15.5. Assumed receiver input signal level and two-state model.

Thus, in the simple two-state model shown in Figure 15.5, the states can be represented by the set

Equation 15.5. 

As time progresses, the channel goes from good state to bad state, and vice versa. The rate of transition and the length of stay in each of the two states will depend on the temporal correlation of the fading process. If time is measured in increments of a symbol (bit) time, then we can construct a discrete channel model as follows.

At the beginning of each symbol (bit) interval, the channel is in one of the two states. If the channel is in a good state, the probability of transmission error is negligible. On the other hand, if the channel is in a bad state, the probability of transmission error is unacceptably high. Prior to the transmission of each new bit, the channel may change state or remain in the current state. This transition between states takes place with a set of transition probabilities, aij. These are conditional probabilities, and for the two-state model there are four probabilities of interest. Let St and St+1 denote the state of the channel at time t and at time t+1, respectively. (Note that t+1 represents time one increment greater than t, and not one second greater than t, so that t can be viewed as a discrete time index.) With this notation, we define the four transition probabilities to be

Equation 15.6. 

This can be represented by the state transition matrix

Equation 15.7. 

Note that since we are considering a two-state model, given that St = g, we must have St+1 = g (the channel remains in the good state) or St+1 = b (the channel transitions to the bad state). Thus, each row of the state transition matrix must sum to 1. A graphical representation of the state transition diagram of the two-state model is shown in Figure 15.6.

Two-state Markov model.

Figure 15.6. Two-state Markov model.

In the work to follow we will assume that the channel model is stationary and therefore the state state transition matrix A(t) is fixed so that A(t) = A. However, if a simulation is performed in which the initial state probability distribution Π0 is different from the steady-state distribution, some time is required for the state distribution to evolve to the steady-state value Πss. The probability of finding the model in a given state is of interest. Define Πt as the state probability distribution at time t. Specifically:

Equation 15.8. 

in which πt,g and πt,b represent the probabilities of finding the channel in the good state or in the bad state at time t, respectively.[2] By definition of the state transition matrix, the state distribution at time t+1 is given by

Equation 15.9. 

In a similar manner, the state distribution at time t+2 is

Equation 15.10. 

Thus, in general

Equation 15.11. 

where Ak represents the k step transition matrix.

Most, but not all, Markov processes settle to a steady-state probability distribution as time evolves.[3] Assuming that the Markov process converges to a steady-state value distribution Πt+k = Πt for sufficiently large t and arbitrary k, it follows that

Equation 15.12. 

for arbitrary k. It is easily shown that for sufficiently large values of k, the rows of Ak give Πss [1]. The convergence of Π0 to Πss is illustrated in the following example.

Example 15.1. 

As a simple example let

Equation 15.13. 

with the initial state distribution

Equation 15.14. 

The following MATLAB program illustrates the convergence of Π0 to Πss:

% File:  c15_MMtransient.m
N = 100;
pie = zeros(N,2);
A = [0.98 0.02; 0.05 0.95];
pie(1,:)  = [0.50 0.50];
for k=2:N
    pie(k,:)  = pie(k-1,:)*A;
end
kk = 1:N;
plot(kk,pie(:,1),'k-',kk,pie(:,2),'k:')
xlabel('Iteration')
ylabel('Probability')
text1 = ['The steady-state probabilities are ',...
        num2str(pie(N,1)),' and ' ,num2str(pie(N,2)),'.'];
legend('State 1','State 2',2)
disp(text1)
disp(' ')
disp('The value of A^N is'), A^N
% End of script file.

Executing the program results in the script

≫ c15_MMtransient
The steady-state probabilities are 0.71412 and 0.28588.

The value of A^N is
ans =
0.7145 0.2855
0.7138 0.2862

We see that calculating the value of Πss iteratively using (15.9) and by raising A to a high value gives, as expected, consistent results. The manner in which the probability distribution converges to the steady-state value is shown in Figure 15.7.

Convergence of the state probability distribution to the steady-state values.

Figure 15.7. Convergence of the state probability distribution to the steady-state values.

 

Before leaving the two-state model, we need to discuss the error generation matrix, defined as

Equation 15.15. 

in which “C” denotes that a correct decision is made and “E” denotes that an error is made. By simple matrix multiplication it follows that the unconditional probability of a correct decision PC and error PE are given by

Equation 15.16. 

where Πss is the steady-state state distribution matrix and BT is the transpose of the error generation matrix B.

Note that, if all elements of B are nonzero, either state can produce an error although the error probabilities may be quite different for different states. Thus, upon observation of an error, the state that produced the error cannot be identified. It is for this reason that we call the model “hidden”. These models are referred to as hidden Markov models (HMMs).

Example 15.2. 

In this example, the simulation of the channel using the Markov model is demonstrated and the system error probability is computed. Assume that errors can be produced in either state where, obviously, the probability of error in the good state will be much less than the error probability in the bad state. Specifically we define the conditional error probabilities as

Equation 15.17. 

and

Equation 15.18. 

For these error probabilities, the error generation matrix B takes the form

Equation 15.19. 

In addition, the Markov chain is defined by the transition matrix

Equation 15.20. 

as in the previous example. The MATLAB program for simulating the channel is

% File:  c15_hmm2.m
N = 100000;                           % number of iterations
state = 'Good';                       % initial state
A = [0.98 0.02; 0.05 0.95];           % state transition matrix
B = [0.0005 0.1000];                  % second row of B
out = zeros(1,N);                     % initialize matrix
errors = 0;
for i=1:N
    error = 0;                        % initialize error counter
    y = rand(1);                      % RV for state transition
    err = rand(1);                    % RV for error given state
    if state=='Good'                  % test for Good state
        if y<A(1,1)
            state='Good';             % remain in Good state
            if err<B(1);              % test for error
                error = 1;            % record an error
            end
        else
            state='Bad ';             % transition to Bad state
            if err<B(2);              % test for error
                error = 1;            % record an error
            end
        end
    else                              % state = Bad
        if y<A(2,2);
            state='Bad ';             % remain in Bad state
            if err<B(2);              % test for error
                error = 1;            % record an error
            end
        else
            state='Good';             % transition to Good state
            if err<B(1);              % test for error
                error = 1;            % record an error
            end
        end
    end
    errors = errors + error;          % increment error counter
end
PE = errors/N                         % calculate error probability
% End of script file.

Executing the program yields the following MATLAB result:

≫ c15_hmm2
PE =
0.0285

Thus, we have

Equation 15.21. 

Note that PE computed in this manner is a random variable. We now “sanity-check” this simulation result.

From the previous example, the steady-state probabilities are

Equation 15.22. 

From (15.16) we have

Equation 15.23. 

This gives

Equation 15.24. 

which is very close (PE = 0.0289) to the value given by the simulation. Thus, the simulation result appears reasonable.

 

A more general version of the two-state Markov model can be constructed with a larger number of states, with many good and bad states with varying degrees of “goodness” or “badness” corresponding to increasingly more severity of fading and or noise. The result is the N-state Markov model, which is a generalization of the two-state model considered in this section.

N-state Markov Model

An N-state Markov (processes) model for a discrete communication channel is defined by a number of parameters. The set of states S is denoted by

Equation 15.25. 

and St, as before, denotes the state at time t. Thus, St ranges over the set S. We have interest in the probability, Πt,i, that the Markov model will be in state i at time t. We denote this as

Equation 15.26. 

The transition probabilities, aij, denote the probability of going from state i at time t(St = i) to state j at time t+1 (St+1 = j). In equation form

Equation 15.27. 

This transition normally takes place in a time increment equal to a bit or symbol duration. Finally we have the input-to-output transition (error symbol) probabilities. The error symbols for an M-ary symbol alphabet are denoted by the set

Equation 15.28. 

For the familiar binary case

Equation 15.29. 

where 0 denotes no error and 1 denotes an error. The probability that error symbol ek occurs given that the model is in state i is denoted bi(ek). In equation form we have

Equation 15.30. 

In binary hard decision applications, the parameters bi(ek) are represented by a matrix having two rows and N columns, where N represents the number of states in the model. Thus, the error generation matrix takes the form

Equation 15.31. 

where b1i denotes the probability of a correct decision given that the model is in state i, and b2i represents the probability of error given that the system is in state i. With this formulation, note that the columns of B must sum to one. For nonbinary applications, or binary soft-decision applications, the error generation matrix B will have more than two rows.

These parameters define a discrete-time Markov process operating at a transition rate equal to the symbol rate of the communication system, and the output of the process consists of two sequences: the sequence of states {St}, and a sequence of error symbols {Et}, where t is the time index that can be indexed over the integer set {0,1,2, . . .}. Normally, only the input and the output of the channel and hence the error sequence can be observed. The state sequence itself cannot be observed. Hence the state sequence is “hidden” or not visible from external observations and the Markov model is labeled as a hidden Markov model.

First-Order Markov Process

The Markov process of order m (or memory m), defined as the probability of the state at time t+1, St+1, is given by

Equation 15.32. 

which states that the probability is dependent upon the previous m states. For a first-order Markov process, the probability of the state at time t+1 is a function only of the previous state. In other words:

Equation 15.33. 

For channel models we will use a first-order Markov model. For modeling data sources, higher-order models may be required. As an example, modeling the symbol sequences in an English-language text may require a second-order or third-order Markov model, since the probability of occurrence of a given letter may depend on the previous two or three letters.

Stationarity

Since stationarity is usually assumed in modeling the analog portion of the communication channel, it is common to assume stationarity of the Markov model for discrete channels also. Stationarity implies that the parameters of the model, that is, the probabilities Πt,i, aij(t), and bi(ek) do not depend on t. In this case, we have

Equation 15.34. 

where the terms aki are the elements in the ith column of the state transition matrix

Equation 15.35. 

and Πk are the elements of the state probability distribution vector

Equation 15.36. 

As in the case of a two-state model, the n-stage transition matrix is given by An. As was demonstrated in Example 15.1 for a two-state model, (15.34) along with the constraint that

Equation 15.37. 

implies that the state probabilities, πi, are uniquely determined from the transition probabilities, aij. Hence the Markov model is completely defined by the matrix A of the state transition probabilities and B, where B is the matrix of the input-to-output symbol transition (i.e., error) probabilities

Equation 15.38. 

as previously discussed.

Simulation of the Markov Model

Once the discrete Markov model is derived, simulation of the model is relatively easy. We saw this in Example 15.2 for a two-state model, and the technique used in Example 15.2 is now generalized for an N-state model.

Assume that the model is in state k at time t and that both the next state and the corresponding component of the error vector, ek+1, is to be simulated. The first step is to generate two uniformly distributed random variables, U1 and U2. The first random variable, U1, is used to determine the new state. Given that the model is in state k, the transition probabilities are defined by the kth row of the state transition matrix A. The kth row of A is designated by the vector Ak where

Equation 15.39. 

The cumulative probabilities associated with Ak are denoted βk and are given by

Equation 15.40. 

Note that the MATLAB library function cumsum maps the vector Ak into the a vector having elements βk. The probability of making a transition from state k to state i is, by definition, aki, and is given by

Equation 15.41. 

This probability is represented by the shaded area in Figure 15.8(a).

After the new state is established, the next step is to determine whether or not an error is made in the new state. In order to accomplish this, we compare the second uniform random variable U2 with the threshold γ shown in Figure 15.8(b). The probability of a correct decision is Pr{U2 < γ}, and the probability of error is Pr{U2 > γ}, which is the shaded region in Figure 15.8(b). Since the current state is i, the value of γ is given by the element b1i(row 1, column i) in the state observation matrix B.

Steps in the simulation of a Markov model.

Figure 15.8. Steps in the simulation of a Markov model.

The implementation of the simulation procedure is realized using the following MATLAB code:

u1 = rand(1);                      % get uniform RV 1
cum_sum = [0 cumsum(A(state,:))];
for i=1:total_states               % loop to determine new state
    if u1 > = cum_sum(i) & u1 < cum_sum(i+1);
        state = i;                 % assign new state
    end
end
state_seq(t) = state;              % new state
u2 = rand(1);                      % get uniform RV 2
if u2 > B(1,state)
    out(t) = 1;                    % record error
end

We now demonstrate this process in the following example.

Example 15.3. 

In this example, a binary sequence of length N = 20,000 is generated representing symbol errors on a channel. This error sequence is generated using a three-state Markov model. If a given element in the sequence is a binary “one,” a channel symbol error is recorded for the position corresponding to the given element. A binary “zero” denotes that an error did not occur in the corresponding position.

We assume that errors can occur in any of the three states. Specifically we assume

Equation 15.42. 

Equation 15.43. 

and

Equation 15.44. 

which corresponds to the error probability matrix

Equation 15.45. 

The state transition matrix for the Markov model is assumed to be

Equation 15.46. 

We assume that the model is initially in state 1.

MATLAB code c15_errvector.m, for generating the error sequence, is given in Appendix A. Execution of the program c15_errvector requires as input data the parameters N and the matrices B and A. The default values are N = 20,000, and the matrices B and A are defined by (15.45) and (15.46), respectively. The program produces two outputs. These are the error vector out, which gives the error positions in a sequence of N transmissions, and the state sequence vector state_seq, which details the state transitions. The probability of finding the model in a given state can be computed from the state sequence vector, and the simulated error probability can be computed from the error vector. These calculations are carried out in the MATLAB program c15_hmmtest.m. Execution of these two programs using the default values produces the following output, in which the dialog relating to the acceptance or rejection of default values is suppressed:

≫ c15_errvector
≫ c15_hmmtest
Simulation results:
The probability of State 1 is 0.2432.
The probability of State 2 is 0.1634.
The probability of State 3 is 0.5934.
The error probability is 0.01445.

Note that the simulated values given for the state probabilities and the error probability are random variables. The variance of these random variables is reduced by increasing the length of the error vector.

A “sanity check” on the above values can be obtained by calculating the state probabilities and the error probability directly from the matrices B and A. The simple MATLAB code, together with the output, is as follows:

≫ A100 = A^100
A100 =
0.2353 0.1765 0.5882
0.2353 0.1765 0.5882
0.2353 0.1765 0.5882
≫ PE = B*A100'
PE =
0.9851 0.9851 0.9851
0.0149 0.0149 0.0149

Note that the theoretical state probabilities S1, S2, and S3 are 0.2353, 0.1765, and 0.5882, respectively, and that the theoretical error probability is 0.0149. These obviously differ slightly from the simulated values, which, as previously mentioned, are random variables. The variance of these random variables can obviously be reduced, as the student should verify, by choosing a larger value of N .

Example HMMs—Gilbert and Fritchman Models

The characterization of discrete channels using discrete-time, finite-state Markov sequences has been proposed in the past by Gilbert, Fritchman, and others [11, 12]. The Gilbert model is a two-state model with a good error-free state, and a bad state with an error probability of p. This model was used by Gilbert to calculate the capacity of a channel with burst errors. Parameters of the model can be estimated from measured or simulated data using the procedures that will be explored in the following section.

The Fritchman model, first proposed in 1967 [12], is now receiving considerable attention, since the Fritchman model appears to be suitable for modeling burst errors in mobile radio channels and also because it is relatively easy to estimate the parameters of the Fritchman model from burst error distributions. For binary channels, Fritchman’s framework divides the state space into k good states and N − k bad states. The good states represent error-free transmissions, and the bad states always produce a transmission error. Hence the entries in the B matrix are zeros and ones and they need not be estimated. A Fritchman model with three good states and one bad state is illustrated in Figure 15.9. This model has the state transition matrix

Equation 15.47. 

The B matrix for this case takes the very simple form

Equation 15.48. 

<source>Source: M. C. Jeruchim, P. Balaban, and K. S. Shanmugan, Simulation of Communications Systems, 2nd ed., New York: Kluwer Academic/Plenum Publishers, 2000. </source>
Fritchman model with three good states and one bad state.

Figure 15.9. Fritchman model with three good states and one bad state.

In general, the state transition matrix A for the Fritchmann model can be partitioned as

Equation 15.49. 

where the submatrices represent the transition probabilities between various good and bad states. For this model, it is possible to derive analytically the expressions for burst error distributions in terms of the model parameters and use these expressions to estimate the parameters of the model from empirical (measured or simulated) burst error distributions.

Let {O1 ,O2, . . . , OT} be an error sequence with Ok = 1 indicating that the kth transmitted bit suffered a transmission error, and Ok = 0 indicating error-free transmission of the kth symbol. Also, let the notation (0m|1) denote the event of observing m or more consecutive error-free transmissions following an error, and (1m|0) represent the event of observing m or more consecutive errors following an error-free transmission. Fritchman showed that the probabilities of occurrence of these two events are given by the sum of weighted exponentials

Equation 15.50. 

and

Equation 15.51. 

where λi, i = 1, 2, . . ., k, and λi, i = k + 1, k + 2 , . . . ,N, are the eigenvalues of AGG and ABB, respectively, and the corresponding values of fi are functions of aij. From (15.50) and (15.51) we can obtain the probability of obtaining exactly m zeros (or ones) as

Equation 15.52. 

and

Equation 15.53. 

The Fritchman model can be interpreted as equivalent to a Markov process with a state transition probability matrix

Equation 15.54. 

where Λgg and Λbb are diagonal matrices given by

Equation 15.55. 

and

Equation 15.56. 

Note that in this equivalent model there are no transitions within the set of k good states and no transitions within the set of N − k bad states. Such transitions are indistinguishable from the observed error sequence, since a transition from one good state to another good state still produces no errors and the transition is not observable from the output. This structure will be revisited when semi-Markov models are considered in a later section.

The Fritchman model is not unique except when there is only one bad state. This is so because, in the general case, the error-free run distribution Pr(0m|1) and the error burst distribution Pr(1m|0) do not specify the statistical dependence of the error-free runs and the error bursts. In the case of a single error state model, as in Figure 15.9, Fritchman showed that [12]

Equation 15.57. 

From (15.57) it can be seen that, in the case of a single error state model, the error-free run uniquely specifies the 2(N − 1) model parameters. An empirical procedure is used to fit an N− 1 mixture of exponentials as in (15.50) and (15.51) to the (simulated or measured) error-free run distribution.

While Fritchman’s model is applicable to discrete channels with simple burst error distributions, it may not be adequate to characterize very complex burst error patterns that will require more than one error state in the model. In such cases it will be very difficult to estimate the model parameters from the burst error distributions alone. Fortunately, it is possible to find a maximum likelihood estimate of the parameters using iterative techniques.

Estimation of Markov Model Parameters

The Markov model for a discrete channel is described by the N × N state transition matrix A and the M × N error probability generation matrix B. An iterative procedure for estimating these parameters γ = {A, B} from a given error sequence, obtained through simulation or measurement, Estimation of Markov Model Parameters = {O1,, ... ,Ot, ... OT} is based on the Baum-Welch algorithm [13]. This iterative algorithm is designed to converge to the maximum likelihood estimator of Γ = {A,B} that maximizes Pr(Estimation of Markov Model Parameters|Γ).

The goal is to compute estimates of the elements of the state-transition matrix A. These are given by

Equation 15.58. 

We also require the estimates of the elements of the error generation matrix, which are given by

Equation 15.59. 

The calculations required to implement the Baum-Welch algorithm are described in the following steps. In addition, the MATLAB program for implementing the Baum-Welch algorithm is developed. Two versions of the MATLAB code for implementing the detailed calculations are given. The first version, denoted the “basic code,” contains a number of looping operations and is therefore inefficient. It is, however, easier to relate the basic code to the defining equations. The second version, denoted the “more efficient version,” eliminates some or all of the looping operations in order to take advantage of MATLAB’s ability to implement vector operations. It should be noted that in several instances the code is not fully vectorized. This was done in order to have more readable code. The version of the Baum-Welch algorithm given in Appendix B was developed by combining the vectorized code segments.

  1. Start with an initial (assumed) model Γ= {A,B}.

  2. With Γ = {A,B} as the model, we first compute the “forward variables”

    Equation 15.60. 

    and the “backward variables”

    Equation 15.61. 

    for t = 1, 2, ..., T and i = 1, 2, ..,N. Details of these calculations follow.

    Forward variables Calculation of the forward variables involves three steps: initialization, induction, and termination.

    • Initialization:

      Equation 15.62. 

    • Induction:

      Equation 15.63. 

    • Termination:

      Equation 15.64. 

      Note that

      Equation 15.65. 

      The basic MATLAB code for performing these calculations follows. (Note the use of the scaling factor. The scaling factor is described in the following section.)

      alpha = zeros(len,states);            % memory allocation
      for column = 1:states
          alpha(1,column) = pye(1, column)*b(1, column);
      hspace*50em% initialization
      end
      scale(1) = sum(alpha(1,:));           % normalizing factor
      alpha(1,:)  = alpha(1,:)/scale(1,:);   % normalization
      sum1 = 0;
      for t = 1:(len - 1)                   % induction
          for j = 1:states
              for i = 1:states
                  inner_sum = alpha(t,i)*p(i,j);
                  sum1 = sum1 + inner_sum;
              end
              alpha(t+1,j) = sum1*b(out(t+1)+1,j);
              sum1 = 0;
          end
          scale(t+1) = sum(alpha(t+1,:));   % normalizing factor
          alpha(t+1,:)  = alpha(t+1,:)/scale(t+1); % normalization
      end

      More efficient MATLAB code results by eliminating several of the looping operations. This yields

      alpha = zeros(len,states);            % memory allocation
      alpha(1,:)  = pye.*b(1,:);            % initialization
      scale(1) = sum(alpha(1,:));           % normalizing factor
      alpha(1,:)  = alpha(1,:)/scale(1);    % normalization
      for t = 1:len-1                       % induction
          alpha(t+1,:)  = (alpha(t,:)*p).*b(out(t+1)+1,:);
          scale(t+1) = sum(alpha(t+1,:));   % normalizing factor
          alpha(t+1,:)  = alpha(t+1,:)/scale(t+1); % normalization
      end

    Backward variables Calculation of the backward variables involves two steps: initialization and induction.

    • Initialization:

      Equation 15.66. 

    • Induction:

      Equation 15.67. 

      The details of these calculations are illustrated in Figure 15.10. The basic MATLAB code for this segment follows:

      beta = zeros(len, states);      % memory allocation
      beta(len,:)  = 1/scale(len);     % initialization and scaling
      for t = (len-1):-1:1            % induction
          for i = 1:states
              for j = 1:states
                  inner_sum = p(i,j)...
                      * b(out(t+1)+1,j) * beta(t+1,j);
                  sum2 = sum2 + inner_sum;
              end
              beta(t,i) = sum2;
      
                  sum2 = 0;
              end
              beta(t,:)  = beta(t,:)/scale(t);   % scaling
          end
      <source>Source: M. C. Jeruchim, P. Balaban, and K. S. Shanmugan, Simulation of Communications Systems, 2nd ed., New York: Kluwer Academic/Plenum Publishers, 2000. </source>
      Parameter estimation for the HMM.

      Figure 15.10. Parameter estimation for the HMM.

      The more efficient (partially vectorized) version follows:

      beta = zeros(len, states);            % memory allocation
      beta(len,:)  = 1/scale(len);           % initialization
      for t = len-1:-1:1                    % induction
          beta(t,:)  = (beta(t+1,:).*b(out(t+1)+1,:))...
              *(p')/scale(t);
      end
  3. The next step is to compute γt(i) according to

    Equation 15.68. 

    The basic MATLAB code for implementing this calculation follows:

    gamma = zeros(len,states);             % memory allocation
    for i = 1:len
    for j = 1:states
    gamma(i,j) = alpha(i,j)*beta(i,j); % calculation of gamma
                                       %    variable
    end
    gamma(i,:  = gamma(i,:)/sum(gamma(i.:));
    end

    We observe that we don’t need to keep all of these values in memory, since they can be summed for the time indices. The following MATLAB code explains this further:

    gamma_sum = zeros(1,states);             % memory allocation
    for t = 1:len
    gamma_sum = gamma_sum + alpha(t,:).*beta(t,:);
    end

    The quantity ξt(i, j) is defined by

    Equation 15.69. 

    which can be calculated as follows:

    prob_model_given_seq = zeros(1,len); % memory allocation
                                         %    for model
    eta = zeros(states,states,len);      % memory allocation for eta
    for t = 1:len                        % start the loop
        for i = 1:states
            prob_model_given_seq = zeros(1,len); % memory allocation
            temp(i) = alpha(t,i)*beta(t,i);
        end
        prob_model_given_seq(t) = sum(temp); % probability of model
    end
    for i = 1:states
        for j = 1:states
            for t = 1:(len-1)
                eta(i,j,t)=((alpha(t,i)*p(i,j)...
                    *b(out(t+1)+1,j)*beta(t+1,j)));
            end
        end
    end

    Careful MATLAB programming indicates that we don’t need to save this variable for all the time indices. Rather we can compute

    eta = zeros(states,states);        % memory allocation
    sum_eta = zeros(states,states)     % memory allocation
    for t = 1:(len-1)
        for i = 1:states
            eta(i,:)= ((alpha(t,i)*(p(i,:).   ...
                *(b(out(t+1)+1,:))).*beta(t+1,:)));
        end
        sum_eta = sum_eta + eta; % adds the values of eta
    end

    We now determine the new state state transition probability âij according to

    Equation 15.70. 

    The basic MATLAB code is

    for i = 1:states
        for j = 1:states
            p_estimate(i,j) = sum(eta(i,j,:))/(sum(gamma(:,i))
                -gamma(len,i));
        end
        p_estimate(i,:)  = p_estimate(i,:)/sum...
            (p_estimate(i,:)); % normalization
    end

    Alternatively:

    for i = 1:states
        for j = 1:states
            p_estimate(i,j) = sum_eta(i,j)...
                /(gamma_sum(i)-alpha(len,i).*beta(len,i)...
                /(sum(alpha(len,:).*beta(len,:))));
        end
        p_estimate(i,:)  = p_estimate(i,:)/sum(p_estimate(i,:));
    end

    Next, , defined by

    Equation 15.71. 

    is computed. The basic MATLAB code is

    for j = 1:states
        i = find(out==0); % find indices of correct bits
        for k = 1:length(i)
            sum_gamma = sum_gamma +gamma(i(k),j);
        end
        b(1,j) = sum_gamma/sum(gamma(:,j));
        sum_gamma = 0;
    end
    for j = 1:states
        ii = find(out==1); % find the indices of errors
        for k = 1:length(ii)
            sum_gamma = sum_gamma +gamma(ii(k),j);
        end
        b(2,j) = sum_gamma/sum(gamma(:,j));
        sum_gamma = 0;
    end
    for i = 1:states
        b(:,i) = b(:,i)/sum(b(:,i));
    end

    Using more efficient computations, the output symbol probability matrix can be estimated as

    out_0 = find(out == 0);              % find correct bits indices
    out_1 = find(out == 1);              % find indices of errors
    sum_0 = zeros(1,states);
    sum_1 = zeros(1,states);
    gamma_sum = sum(gamma);
    for i = 1:length(out_0)
        sum_0 = sum_0 + gamma(out_0(i),:);   % adds correct bits
    end
    for i = 1:length(out_1)
        sum_1 = sum_1 + gamma(out_1(i),:);   % adds error bits
    end
    for i = 1:states
        for j = 1:2
            if j == 1
                b(j,i) = sum_0(i)/gamma_sum(i);
                                       % elements b correct bits
            end
            if j == 2
                b(j,i) = sum_1(i)/gamma_sum(i);
                                       % elements b error bits
            end
        end
    end
    for i = 1:states
        b(:,i) = b(:,i)/sum(b(:,i));   % normalize the b matrix
    end

    We can also compute

    Equation 15.72. 

  4. Go back to Step 1 with the new values of or equivalently obtained in Step 2 and repeat until the desired level of convergence, as discussed in a section to follow, is reached.

Scaling

The forward and backward vectors tend to zero exponentially for large data size and must be scaled properly in order to prevent numerical underflow. The scaling constant, the use of which is implemented in the MATLAB code given in Appendix B, is first defined as

Equation 15.73. 

The scaled values of αt(j), denoted , are given by

Equation 15.74. 

This, of course, implies that

Equation 15.75. 

The values of Ct are saved and used to scale the backward variables. The scaled values of βt(i), denoted are given by

Equation 15.76. 

with the initialization

where 1 denotes the column vector containing all ones. The gamma variable can also be normalized if desired, but scaling the gamma variables is not necessary. Turin [1] discusses scaling in more detail.

Convergence and Stopping Criteria

Since the Baum-Welch algorithm is iterative, the number of iterations to be performed for a required level of model accuracy must be determnined. Perhaps the best way to accomplish this is to display the estimates of A and B as the algorithm is executing. If one desires each element of A and B to be accurate to a given number of significant figures, execution of the algorithm is allowed to continue until the elements of A and B no longer change, within the given accuracy, from iteration to iteration. The algorithm is then terminated manually. This technique has considerable appeal, since the level of accuracy is known. Also, based on previous knowledge, one may simply perform a given number of iterations.

Another commonly used method for determining convergence is to continue the iterations until successive values of Pr[Convergence and Stopping Criteria|Γ] differ very little. (The Baum-Welch algorithm is guaranteed to converge to a maximum likelihood solution. A proof of this statement is given in [1].) The value of Pr[Convergence and Stopping Criteria|Γ] is determined in terms of the scaling constant Ct in (15.73). Specifically

Equation 15.77. 

For T large, this number will be very small and is usually expressed as

Equation 15.78. 

This is referred to as the log-likelihood ratio and is illustrated in Example 15.4.

It should be pointed out that the estimates of A and B, for a given set of data, are not unique unless the initial estimates of A, B, and Π are specified. Since the error vector is a function of both A and B, various combinations may produce statistically equivalent results and a specific result will depend on the initial conditions.

Block Equivalent Markov Models

The Baum-Welch algorithm is one of many reestimation algorithms available for the computation of model parameters based on an error vector. This error sequence could involve many thousands, or even millions, of symbols. Note that long observation periods are required to accurately estimate small values of aij. In communication systems, these small values are often critical for estimating system performance. The computational burden associated with the Baum-Welch algorithm is quite high when the error vector is long, since the forward and backward variables are computed for each symbol in the given error sequence. In addition, convergence is sometimes slow, which adds to the computational burden. For low error probability applications, the error vector has long runs of zeros. In addition, the error vector must contain a significant number of error events in order for the error vector to accurately define the model. In these situations the Baum-Welch algorithm is especially inefficient and faster algorithms are needed.

One method, proposed by Turin [14], for dealing with this problem, involves the computation of the forward and backward variables using a block matrix version of the form

Equation 15.79. 

The relationship between the terms in the matrix defined by (15.79) are illustrated in Figure 15.11. Note that the powers of submatrices involved in the computations can be precomputed and reused in order to reduce the overall computational burden.

Computations for the sequence 0000011100 (Version 1).

Figure 15.11. Computations for the sequence 0000011100 (Version 1).

Another modification proposed by Sivaprakasam and Shanmugan [6] is based on the fact that for a general Markov model there is a statistically equivalent Fitchman-like model with k good states and N − k bad states and an A matrix having the form

Equation 15.80. 

where Λ00 and Λ11 are diagonal matrices. These models are often referred to as block diagonal Markov models. With this model, the channel remains in the same state during an error burst and changes state only at the end of a burst. As a result, all variables are computed only at time steps involving the change of error symbol. In other words, variables are computed at the beginning of each burst of errors rather than once every symbol. With this modification, the computations take the form illustrated in Figure 15.12. Note that the computations during long bursts of ones and zeros now involve raising the power of a diagonal matrix rather than raising the power of arbitrary matrices, and matrix multiplications occur only at the transition from one burst to another. Thus, the computations are very efficient, and long error bursts can be processed without excessive storage and computational requirements.

Computations for the sequence 0000011100 (Version 2).

Figure 15.12. Computations for the sequence 0000011100 (Version 2).

Using this model we have interest only in the run lengths of zeros and ones. Thus, the error vector can be placed in a very compact form. As a simple illustration, consider an error sequence

Equation 15.81. 

The run-length vector corresponding to this error sequence can be written as

Equation 15.82. 

The MATLAB routine c15_seglength.m, given in Appendix D, generates the run-length vector defined by (15.83) from the error vector defined by (15.81).

The development of reestimation algorithms based on block equivalent Markov models will not be discussed here. They have, however, been extensively studied in the literature [1, 14, 15]. A MATLAB program, based on the work of Sivaprakasam and Shanmugan [15] is given in Appendix C. An example illustrating the use of this algorithm is presented in a following section.

Irrespective of the algorithms used, the Markov model for a discrete channel must be computed for different values of the parameters of the underlying physical channel. If the underlying physical channel is changed in any way, the discrete channel model must be reestimated. For example, if a discrete channel model is developed for a given channel, then the Markov model must be developed from measurements or waveform-level simulations for different values of the parameters of the channel including S/N. Thus a parametrized set of Markov models for the underlying channel can be developed and used for the design and analysis of error control coders, interleavers, etc.

Two Examples

We conclude this chapter with two examples that demonstrate the determination of a Markov channel and the determination of a semi-Markov model. The flow of these examples is as follows:

  1. In the first example (Example 15.4), the error vector generated in Example 15.3 is used as input to the Baum-Welch algorithm. The result is an estimated channel model. The error probability and the run-length distribution resulting from this model are determined and are then compared to the error probability and run-length distribution of the original sequence generated in the first example.

  2. The second example (Example 15.5) is similar to the first example. The essential difference is that a block equivalent semi-Markov model is used rather than a Markov model. The motivation for using a semi-Markov model is that the semi-Markov model leads to a substantial reduction in the time required to derive the model from measured or simulated data.

Example 15.4. In this example we generate an error sequence of length N = 20,000 with a known two-state model defined by

Equation 15.83. 

and

Equation 15.84. 

The Baum-Welch algorithm is then used to estimate the model using the generated error sequence. Note that we assume a three-state model when the Baum-Welch model is applied. This may at first seem strange but, given simulation or measurement data, the channel model that produced the data is not known. The MATLAB dialog, with some of the nonessential dialog eliminated, is as follows:

≫ c15_errvector
Accept default values?
Enter y for yes or n for no > n
    Enter N, the number of points to be generated > 20000
    Enter A, the state transition matrix > [0.95 0.05; 0.1 0.9]
    Enter B, the error distribution matrx > [0.95 0.5; 0.05 0.5]
≫ out1 = out;
≫ c15_bwa(20,3,out)
Enter the initial state transition matrix P >
        [0.9 0.05 0.05; 0.1 0.8 0.1; 0.1 0.2 0.7]
Enter the initial state probability vector pye > [0.3 0.3 0.4]
Enter the initial output symbol probability matrix...
    B > [0.9 0.8.7; 0.1 0.2 0.3]

After one iteration we have[4]

Also, after 10 iterations, the estimated state transition matrix is

At the 20th iteration, which is the termination point, the estimated state transition matrix is

The log likelihood function is illustrated in Figure 15.13. It can be seen that the log likelihood function converges in about 10 iterations. Note that this conclusion is consistent with the preceding computations of Ak and Bk for k = 10 and k = 20. Note also that, as discussed previously, the likelihood numbers are very small.

Log likelihood function for Example 15.4.

Figure 15.13. Log likelihood function for Example 15.4.

We now generate a second sequence, out2, using the model estimated by the Baum-Welch algorithm. This gives, with the dialog dealing with default values suppressed:

≫ a = [0.9437 0.0299 0.0263; 0.1173 0.7425 0.1402;...
    0.0663 0.1324 0.8013];
≫ b = [0.9522 0.6637 0.4313; 0.0478 0.3363 0.5687];
≫ c15_errvector
Accept default values?
Enter y for yes or n for no > n
    Enter N, the number of points to be generated > 20000
    Enter A, the state transition matrix > a
    Enter B, the error distribution matrix > b
≫ out2 = out;

In order to calculate and plot Pr{0m|1} for both out1 and out2, we execute the following lines of MATLAB code:

≫ runcode1 = c15_seglength(out1);
≫ runcode2 = c15_seglength(out2);
≫ c15_intervals2(runcode1,runcode2)

This produces the plots illustrated in Figure 15.14. The top frame illustrates Pr{0m|1} for the original data and the bottom frame illustrates Pr{0m|1} using the data generated by the model estimated by the Baum-Welch algorithm.

Pr(0m|1) for the original data and the data resulting from the estimated model.

Figure 15.14. Pr(0m|1) for the original data and the data resulting from the estimated model.

There are a few obvious differences between the two plots illustrated in Figure 15.14, especially in the neighborhood of m = 10. Note, however, that the error probabilities are given by

≫ p1 = sum(out1)/N
p1 =
0.0038
≫ p2 = sum(out2)/N
p2 =
0.0035

 

Example 15.5. In this final example, we consider a test case used by Sivaprakasam and Shanmugan [6] to illustrate the block diagonal Markov model. Assume a three-state model having two good states and one bad state, defined by[5]

Equation 15.85. 

Since the good states are error-free and errors are always produced in the bad state, the error observation matrix is defined by

Equation 15.86. 

Note that this model is semi-hidden. If an error occurs, we know that the error was generated by state 3. If, however, no error is generated, the state cannot be identified.

First, we generate an error vector, having length 20,000, for the state transition matrix A and the error observation matrix B. The appropriate MATLAB commands, with the nonessential dialog supressed, follows:

≫ a1 = [0.90 0.02 0.08; 0.10 0.50 0.40; 0.10 0.20 0.70];
≫ b1 = [1 1 0; 0 0 1];
≫ c15_errvector
Accept default values?
Enter y for yes or n for no > n
Enter N, the number of points to be generated > 20000
Enter A, the state transition matrix > a1
Enter B, the error distribution matrix > b1
≫ out1 = out;
≫ runcode1 = c15_seglength(out1);
≫ [A matrix, pi_est] = c15_semiMarkov(runcode1,100,[2 1])

Note that we must run c15_seglength prior to running c15_semiMarkov in order to generate the run-length vector.

After 50 iterations, the estimate of the state transition matrix is

Equation 15.87. 

Note that this is close to, but not exactly equal to, the result obtained by Sivaprakasam and Shanmugan [6]. There are several explanations for the differences. First, the error vector upon which the estimated model is based is a sample function of a stochastic process and, for practical purposes, no two error vectors will be identical. Also, error vectors of different lengths will result in slightly different models. Finally, we fixed the number of iterations, and Sivaprakasam and Shanmugan used a different stopping criteria.

The next step is to generate a second error vector based on the estimated model and plot Pr{0m|1} for both the original error vector and for the error vector generated by the estimated model. This is accomplished using the following MATLAB code:

≫ a2 = [0.9043 0.0 0.0957; 0.0 0.4911 0.45089; 0.1402 0.1515 0.7083];
≫ b2 = [1 1 0; 0 0 1];
≫ c15_errvector
Accept default values?
Enter y for yes or n for no > n
Enter N, the number of points to be generated > 20000
Enter A, the state transition matrix > a2
Enter B, the error distribution matrix > b2
≫ out2 = out;
≫ runcode2 = c15_seglength(out2);
≫ intervals2(runcode1,runcode2);

The results are illustrated in Figure 15.15. Note that the results are in close agreement even though the original model, defined by A, and the estimated model, defined by Â, are quite different. Obviously we knew from the start of this problem that A and  would be different, since  is constrained to have a block diagonal form.

Pr {0m|1} for Example 15.5.

Figure 15.15. Pr {0m|1} for Example 15.5.

It is also instructive to check the error probabilities. In order to accomplish this we compute

≫ pe1 = sum(out1)/20000
pe1
= 0.3617
≫ pe2 = sum(out2)/20000
pe2 =
0.3538

Once again we have close agreement.

 

Summary

The focus of this chapter was the development of discrete channel models. Discrete channel models are attractive, since their use greatly reduces the computational complexity associated with the execution of a simulation. The parameters of the model may be determined from either measured data or from the results of a waveform-level simulation. Discrete channel models have found widespread use in wireless communication systems, since they can be used to effectively model fading channels. Discrete channel models are an abstraction of waveform-level models in that they characterize the input-output characteristics of the channel but do not model the physical functionality of the channel. It is through this abstraction that the computational burden is reduced.

A two-state model was initially considered in order to establish the concept of a state-transition matrix, the state distribution vector, and the error generation matrix. Several different techniques for determining the steady-state distribution matrix were considered. These concepts were then extended to the N-state model including the development of a technique for efficiently simulating a channel based on a discrete channel model.

We next considered the two-state Gilbert model and the N-state Fritchman model as examples of a hidden Markov model. The Fritchman model is attractive from a computational point of view, since the state-transition matrix contains a large number of zeros (relatively sparse). The Fritchman model has also been shown to effectively model the fading channel environments encountered in wireless communication systems.

The heart of this chapter dealt with the development of discrete channel models based on measured data or data obtained from a waveform-level simulation. The basic tool used for parameter estimation is the Baum-Welch algorithm. The development of the Baum-Welch algorithm was briefly outlined and the development of a MATLAB implementation of the Baum-Welch algorithm was presented. Scaling to prevent underflow was discussed, as was the block equivalent Markov model, which can be estimated more efficiently than the general Markov model.

The chapter concluded with three examples summarizing techniques for estimating both Markov models and block-equivalent Markov models. The first example focuses on the generation of an error vector given a Markov model. Validation of the result is based on the probability of state occupancy and the probability of error. The second example uses the Baum-Welch algorithm to estimate a channel model. Validation includes a comparison of the run-length statistic Pr{0m|1} for both the original error vector and the error vector produced by the estimated model. The third example is similar to the second example except that a block equivalent Markov model is used. These three examples should serve as a guide to the use of the tools developed in this chapter.

Further Reading

The development and use of Markov models for burst error channels (channels with memory) are currently an active area of research and new papers appear frequently. The interested reader may find recent articles on this topic in IEEE Transactions on Communications, IEEE Transactions on Vehicular Technology, and other related journals and conference proceedings. Markov models are also being developed for burst errors at the higher layers of communication networks. For example, packet errors at the transport layer in a network. The book by Turin [1] is recommended as a comprehensive treatment of the subject.

References

Problems

15.1

Assume that a binary symmetric channel is defined by the error probability p = 10−2. Using the technique defined by (15.2), with pk independent of k, simulate the transmission of N symbols through the channel for N = 100, N = 1,000 and N = 10,000. In each case count the number of errors that actually occur and calculate the BER resulting from the simulation. Repeat this experiment ten times for each value of N. What do you conclude?

15.2

Rework Example 15.1 assuming that

Problems

15.3

Repeat the preceding problem assuming that the initial state probability distribution

Problems

and the state transition matrix

Problems

15.4

A Markov process has the state transition matrix

Problems

Determine the steady-state distribution Πss using the technique illustrated in Example 15.1. Verify the result by raising A to a sufficiently high power.

15.5

We know that Πss can be determined by raising the state transition probability matrix to a high power. As yet another method for solving this problem, recognize that Πss = ΠssA implies that Πss is an eignvector of A. Using this observation, develop a MATLAB program for determining Πss given the state transition matrix A. Apply this technique to

Problems

15.6

Develop a MATLAB program to generate a Markov sequence based on the state transition matrix

Problems

Assuming that the system is initially in state 1, generate 200 state transitions and develop a figure that illustrates the state of the channel for each time step. Next generate 40,000 state transitions and, from the simulation results, compute the steady-state probabilities of each state. Verify the result by computing Ak for sufficiently large k.

15.7

Consider the channel model illustrated in Figure 15.16. The probability of error given that the channel is in the good state is 0 and the probability of error given that the channel is in the bad state is 1.

Channel model for Problem 15.7.

Figure 15.16. Channel model for Problem 15.7.

  1. Generate a 10,000-bit sequence with p = 0.01 and q = 0.99. Construct a histogram of the burst length and the inter-error gap (error-free run) distributions.

  2. Discuss how p and q can be estimated emperically from a histogram of (simulated) burst length and inter-error gap distributions. Estimate p and q.

  3. Use the Baum-Welch algorithm to estimate p and q. Compare with the empirical estimates obtained from the histograms.

15.8

Develop a figure similar to Figure 15.9 for the Fritchman model

Channel model for Problem 15.7.

15.9

Generalize c15_hmmtest.m for a N-state model.

15.10

Rework Example 15.4 by estimating a two-state model. Compare the estimated model with the original two-state model. Plot the estimated parameters as a function of the number of iterations up to 20 iterations. Discuss the convergence.

15.11

Rework Example 15.4 using, as initial conditions to the Baum-Welch algorithm, the matrices A and B that were used to generate the original error vector. Discuss the convergence.

15.12

Rework Example 15.4 with N = 4, 3, and 2, where N is the number of states in the stimated model. Compare the likelihood metric after convergence is reached. Use as input to the Baum-Welch algorithm the data generated by the two-state model used in Example 15.2. Work this problem several times using different initial conditions for the Baum-Welch algorithm and discuss the results.

15.13

Modify the Baum-Welch algorithm given in Appendix A (c15_bwa.m) so that one does not have to enter initial guesses for the matrices A, B, and π. Make the elements of A and B random but reasonable. (For example, the on-diagonal elements of the A matrix are typically larger than the off-diagonal terms.) For the initial π matrix make all values equal. Test your resulting program by reworking Example 15.4 several times. What do you observe?

15.14

Modify the MATLAB code for Example 15.4 so that the CPU time spent in the model estimation algorithm can be measured. After this is accomplished, rework Example 15.4 for error vectors having length N = 20,000 and 100,000. Compare the times required to estimate the model. Also compare the error probabilities.

15.15

Rework Example 15.5 assuming two good states and two bad states.

15.16

Assume that a discrete channel is defined by the state transition matrix

Channel model for Problem 15.7.

Determine an equivalent block diagonal Markov model having two good states and two bad states. (Note: This channel model represents another test case presented by Sivaprakasam and Shanmugan [6]. The student should compare the result with that found by Sivaprakasam and Shanmugan.)

Appendix A: Error Vector Generation

Program: c15_errvector.m

% File: c15_errvector.m
disp(' ')
disp('Default values are:')
N = 20000                                       % default N
A = [0.8 0.1 0.1; 0.2 0.6 0.2; 0.02 0.08 0.90]  % default A
B = [0.999 0.95 0.99; 0.001 0.05 0.01]          % default B
disp(' ')
disp('Accept default values?')
dtype = input('Enter y for yes or n for no > ','s'),
if dtype == 'n'
    N = input(' Enter N, the number of points to be generated > '),
    A = input(' Enter A, the state transition matrix > '),
    B = input(' Enter B, the error distribution matrix > '),
end
state = 1; % initial state
total_states = size(A,1);
out = zeros(1,N); % initialize error vector
state_seq = zeros(1,N); % initialize state sequence
h = waitbar(0,'Calculating Error Vector'),
%
ran_b = rand(1);                      % get random number
if ran_b>B(1,state)                   % test for error
    out(1) = 1;                       % record error
end
state_seq(1) = state;                 % record state
for t=2:N
    u1 = rand(1);                     % get random number
    cum_sum = [0 cumsum(A(state,:))];
    for i=1:total_states              % loop to determine new state
        if u1 >= cum_sum(i) & u1 < cum_sum(i+1);
            state = i;                % assign new state
        end
    end
    state_seq(t) = state;             % new record state
    u2 = rand(1);                     % get random number
   if u2 > B(1,state)}
        out(t) = 1;                   % record error
    end
 waitbar(t/N)
end
close(h)
% End of script file.

Program: c15_hmmtest.m

Note: This program is for a three-state model. It is easily generalized for an arbitrary number of states.

% File: c15_hmmtest.m
pe = sum(out)/N;
state_sum = zeros(1,total_states);
for k=1:N
    if state_seq(k)==1
        state_sum(1)=state_sum(1)+1;
    end
    if state_seq(k)==2
       state_sum(2)=state_sum(2)+1;
    end
    if state_seq(k)==3
        state_sum(3)=state_sum(3)+1;
    end
end
a = ['The probability of State 1 is ',num2str(state_sum(1)/N),'.'];
b = ['The probability of State 2 is ',num2str(state_sum(2)/N),'.'];
c = ['The probability of State 3 is ',num2str(state_sum(3)/N),'.'];
d = ['The error probability is ',num2str(pe),'.'];
disp('Simulation results:')
disp(a) % display probability of state 1
disp(b) % display probability of state 2
disp(c) % display probability of state 3
disp(d) % display error probability
% End script file.

Appendix B: The Baum-Welch Algorithm

% File: c15_bwa.m
function [p, pye, b] = c15_bwa(iterations,states,out);
len = length(out);
p = input('Enter the initial state transition matrix P > '),
pye = input('Enter the initial state probability vector pye >'),
b = input('Enter the initial output symbol probability matrix B > '),
alpha=zeros(len,states); beta=zeros(len,states);
eta=zeros(states,states); gamma=zeros(1,states); scale=zeros(len,1);
log_likelihood = zeros(1,iterations);
iplot = 1;                                 % plot switch
%
p                                          % display initial p
pye                                        % display initial pye
%
pye_rec = zeros(states,1);
pye_rec(:,1) = pye';
sum_gamma = 0;
sum_eta = 0;
%
for cycle = 1:iterations
    cycle                                  % display iteration index
    %
    % alpha generation
    %
    alpha(1,:) = pye.*b(1,:);
    scale(1) = sum(alpha(1,:));
    alpha(1,:) = alpha(1,:)/scale(1);
    for t = 2:len
        alpha(t,:) = (alpha(t-1,:)*p).*b(out(t)+1,:);
        scale(t) = sum(alpha(t,:));
        alpha(t,:) = alpha(t,:)/scale(t);
    end
    %
    % beta generation
    %
    beta(len,:) = 1/scale(len);
    for t = len-1:-1:1
        beta(t,:) = (beta(t+1,:).*b(out(t+1)+1,:))*(p')/scale(t);
    end
    %
    % eta generation
    %
    sum_eta = zeros(states);
    for t = 1:len-1
        for i = 1:states
            eta(i,:) = ...
                ((alpha(t,i)*(p(i,:).*(b(out(t+1)+1,:))). ...
                    *beta(t+1,:)))...
                /sum(alpha(t,:).*beta(t,:));
        end
        sum_eta = sum_eta + eta;
    end
    %
    % gamma generation
    %
    gamma_sum = zeros(1,states);
    for t = 1:len
        gamma_sum = gamma_sum + alpha(t,:).*beta(t,:);
    end
    %
    % calculate and display the log_likelihood function
    %
    loglikelihood = sum(log10(scale));
    log_likelihood(cycle) = loglikelihood;
    loglikelihood                                   % display result
    %
    % Re-estimation of the intial state probability vector pye
    %
    pye(1,:) = alpha(1,:).*beta(1,:)/sum(alpha(1,:).*beta(1,:));
    %
    pye % display pye
    %
    pye_rec(:,cycle+1) = pye';                     % Save for plot
    %
    % Re-estimation of the state transition matrix P
    %
    for i = 1:states
        for j = 1:states
            p_estimate(i,j) = ...
                sum_eta(i,j)/(gamma_sum(i)-alpha(len,i). ...
                    *beta(len,i)...
                /(sum(alpha(len,:).*beta(len,:))));
            end
        p_estimate(i,:) = p_estimate(i,:)/sum(p_estimate(i,:));
    end
    %
    p = p_estimate                               % display p
    %
    % Re-estimation of output symbol probability matrix B
    %
    out_0 = find(out == 0);
    out_1 = find(out == 1);
    sum_0 = zeros(1,states);
    sum_1 = zeros(1,states);
    for i = 1:length(out_0)
        sum_0 = sum_0 + alpha(out_0(i),:).*beta(out_0(i),:)...
            /sum(alpha(out_0(i),:).*beta(out_0(i),:));
    end
    for i = 1:length(out_1)
        sum_1 = sum_1 + alpha(out_1(i),:).*beta(out_1(i),:)...
            /sum(alpha(out_1(i),:).*beta(out_1(i),:));
    end
    for i = 1:states
        for j = 1:2
            if j == 1
                b(j,i) = sum_0(i)/gamma_sum(i);
            end
            if j == 2
                b(j,i) = sum_1(i)/gamma_sum(i);
            end
        end
    end
    for i = 1:states
        b(:,i) = b(:,i)/sum(b(:,i));
    end
    b                                            % display b
end
if iplot==1
    plot(1:iterations,log_likelihood)
    xlabel('iterations')
    ylabel('log likelihood')
end
% End of function file.

Appendix C: The Semi-Hidden Markov Model

% File: c15_semiMarkov.m
function [A_matrix, pi_est] = c15_semiMarkov(runlength,cycles,...
    partition)
% runlength = runlength code
% cycles = number of iterations
% partition = 1 by 2 vector = [good states, bad states]
%
[symbols len_symbols] = size(runlength); % gets size of runlength
                                         %     vector
m = runlength(1,:);      % the first bit received is error free
u = runlength(2,:);      % arbitrary number of elements
C = len_symbols;         % total length of runlength vector
%
A = cell(length(partition));   % a 2x2 array only 2 symbols
pye = rand(1,sum(partition));  % initialize the initial state vector
pye = pye(1,:)/sum(pye(1,:));  % normalize
pi_u1 = pye(1:partition(1));
%
% initialize A matrix
%
A = cell(partition);      % allocate memeory for the A matrix
A00 = diag(1 - abs(randn(partition(1),1)/1000));   % initialize A00
                                                   %     matrix
A10 = rand(partition(2),partition(1));             % initialize A10
                                                   %     matrix
A01 = rand(partition(1),partition(2));             % initialize the
                                                   %     A01 matrix
A11 = diag(1 - abs(randn(partition(2),1)/1000));   % initialize A11
                                                   %     matrix
A{1,1} = A00; A{1,2} = A01; A{2,1} = A10; A{2,2} = A11;
A_matrix = [A{1,1} A{1,2};A{2,1} A{2,2}];          % cell-array in
                                                   %     matrix form
%
for i = 1:sum(partition)
    A_matrix(i,:) = A_matrix(i,:)/sum(A_matrix(i,:)); % normalize the
                                                      %     A matrix
end
A_matrix;
%
A{1} = A_matrix(1:partition(1),1:partition(1));
A{2} = A_matrix(partition(1)+1:partition(1)+partition(2),1:...
    partition(1));
A{3} = A_matrix(1:partition(1),partition(1)+1:partition(1)...
    +partition(2));
A{4} = A_matrix(partition(1)+1:partition(1)+...
    partition(2),partition(1)+1:partition(1)+partition(2));
%
for iterations = 1:cycles
    %
    % alpha generation
    %
    alpha{1} = pi_u1*(A{u(1)+1,u(1)+1}.^(m(1)-1)); scale(1)= ...
        sum(alpha{1});
    alpha{1}= alpha{1}/scale(1); % normalization
    for c = 2:C
        alpha{c}= alpha{c-1}*A{u(c-1)+1,u(c)+1}*A{u(c)+1,...
            u(c)+1}^(m(c)-1);
        scale(c)= sum(alpha{c}); % scaling factor
        alpha{c}= alpha{c}/scale(c); % normalize alpha
    end;
    %
    % beta generation
    %
    beta{C}= ones(partition(u(C)+1),1)/scale(C); % last element of
                                                 %     beta
    for(c= C-1:-1:1)
        beta{c}= A{u(c)+1,u(c+1)+1}*(A{u(c+1)+1,u(c+1)+1}...
            ^(m(c+1)-1))*beta{c+1}/scale(c);
    end;
    %
    % gamma generation
    %
    Gamma{1} = alpha{1}.*beta{1}';
    Gamma{2} = alpha{2}.*beta{2}';
    %
    sum_Tii_00s = diag(zeros(partition(1),1));      % initialization
                                                    %     of A00
    sum_Tii_11s = diag(zeros(partition(2),1));      % initialization
                                                    %     of A11
    sum_Tij_01s = zeros(partition(1),partition(2)); % initialization
                                                    %     of A01
    sum_Tij_10s = zeros(partition(2),partition(1)); % initialization
                                                    %     of A10
    %
    % re-estimation for the A00 matrix
    %
    for c=1:2:C-1
        if ( c == 1)
            Tii_00s = diag((m(1)-1)*(pi_u1)'.*(diag(A{u(1)+1,...
                u(1)+1}).^(m(1)-1)).*beta{1});
        else
            Tii_00s = diag((m(c)-1)*((alpha{c-1}*A{u(c-1)+1,...
                u(c)+1})'.*(diag(A{u(c)+1,u(c)+1}^(m(c)-1)))). ...
                    *beta{c});
        end
        sum_Tii_00s = sum_Tii_00s + Tii_00s;        % sum elements
                                                    %     of A00
    end
    %
    % re-estimation for the A11 matrix
    %
    for c=2:2:C-1
        Tii_11s = diag((m(c)-1)*((alpha{c-1}*A{u(c-1)+1,u(c)+1})'...
            .*(diag(A{u(c)+1,u(c)+1}^(m(c)-1)))).*beta{c});
        sum_Tii_11s = sum_Tii_11s + Tii_11s;        % sum elements
                                                    %     of A11
    end
    %
    % re-estimation for the A01 matrix
    %
    for c=1:2:C-1
        Tij_01s = (alpha{c}'*((A{u(c+1)+1,u(c+1)+1}^(m(c+1)-1))...
            *beta{c+1})').*A{u(c)+1,u(c+1)+1};
        sum_Tij_01s = sum_Tij_01s + Tij_01s;         % sum elements
                                                     %     of A01
    end
    %
    % re-estimation for the A10 matrix
    %
    for c=2:2:C-1
    Tij_10s = (alpha{c}'*((A{u(c+1)+1,u(c+1)+1}^(m(c+1)-1))...
            *beta{c+1})').*A{u(c)+1,u(c+1)+1};
        sum_Tij_10s = sum_Tij_10s + Tij_10s; % sums elements of A10
    end
    %
    A_matrix = [sum_Tii_00s sum_Tij_01s; sum_Tij_10s sum_Tii_11s];
    %
    for i = 1:sum(partition)
        A_matrix(i,:) = A_matrix(i,:)/sum(A_matrix(i,:)); % normalize
                                                          % A
    end
    %
    A{1} = A_matrix(1:partition(1),1:partition(1));
    A{2} = A_matrix(partition(1)+1:partition(1)+partition(2),1:...
        partition(1));
    A{3} = A_matrix(1:partition(1),partition(1)+1:partition(1)...
        +partition(2));
    A{4} = A_matrix(partition(1)+1:partition(1)+partition(2),...
        partition(1)+1:partition(1)+partition(2));
    %
    pi_est = [Gamma{1} Gamma{2}]; % re-estimated initial state vector
    pi_est = pi_est/sum(pi_est); % normalized initial state vector
    pi_rec(iterations,:) = pi_est;
    pi_u1 = pi_est(1:partition(1));
    iterations                        % display current iteration
    A_matrix                          % display estimated A matrix
end
% End of function file.

Appendix D: Run-Length Code Generation

% File: c15_seglength.m
function runcode=c15_seglength(errvect)
% Produces a two-row matrix of error intervals and error-free
% intervals. Row 1 specifies the interval length and row 2
% specifies the interval class (error(1) or no error(0)).
%
len = length(errvect);                  % length of input vector
j = 1;                                  % initialize index of m
count = 1;                              % initialize counter
for i=1:(len-1)
    if errvect(i+1) == errvect(i);      % compare elements
        count = count+1;                % on match increment count
    else
        m(j) = count;                   % record count
        j = j+1;                        % increment index of m
        count = 1;                      % reset counter
    end
end
%
runcode = zeros(2,length(m));           % allocate memory
runcode(1,:) = m;                       % assign counts to row 1
%
if errvect(1)==0
    runcode(2,2:2:length(m)) = 1;       % even index error count
else
    runcode(2,1:2:length(m)) = 1;       % odd index error count
end
% End of function file.

Appendix E: Determination of Error-Free Distribution

In this appendix two MATLAB programs are given for determing a plotting the error-free distribution. The first program is designed for a single data file, and the second program is for the comparison of two data files.

c15_intervals1.m

% File: c15_intervals1.m
function [] = c15_Intervals1(r1);
start = find(r1(2,:)==0);                       % index of first
                                                %     interval
maxLength_1 = max(r1(1,start(1):2:length(r1))); % maximum length
                                                %     of interval
interval_1 = r1(1,start(1):2:length(r1));       % get the intervals
for i = 1:maxLength_1
    rec_1(i) = length(find(interval_1>=i));     % record the
                                                %     intervals
end
int1out = rec_1/max(rec_1);
figure;
plot(1:maxLength_1,int1out)
grid;
ylabel('Pr(0m|1)'),
xlabel('Length of intervals m'),
% End of function file.

c15_intervals2.m

% File: c15_intervals2.m
function [] = c15_intervals2(r1,r2);
start1 = find(r1(2,:)==0);                       % index of first
                                                 %     interval
maxLength_1 = max(r1(1,start1(1):2:length(r1))); % maximum length
                                                 %     of interval
interval_1 = r1(1,start1(1):2:length(r1));       % get the
                                                 %     intervals
for i = 1:maxLength_1
    rec_1(i) = length(find(interval_1>=i));      % record the
                                                 %     intervals
end
start2 = find(r2(2,:)==0);                       % index of first
                                                 %     interval
maxLength_2 = max(r2(1,start2(1):2:length(r2))); % maximum length
                                                 %     of interval
interval_2 = r2(1,start2(1):2:length(r2));       % get the
                                                 %     intervals
for i = 1:maxLength_2
    rec_2(i) = length(find(interval_2>=i));      % record the
                                                 %     intervals
end
subplot(2,1,1)
plot(1:maxLength_1,rec_1/max(rec_1))
v = axis;
grid;
ylabel('Pr(0m|1)'),
xlabel('Original sequence - Length of intervals m'),
subplot(2,1,2)
plot(1:maxLength_2,rec_2/max(rec_2))
axis([v])
grid;
ylabel('Pr(0m|1)'),
xlabel('Regenerated sequence - Length of intervals m'),
% End of function file.


[1] Recall that the Hamming weight of a binary vector, consisting only of the elements “0” and “1,” is equal to the number of binary ones in the vector.

[2] Note that in this formulation we are not using t as a continuous variable but as a time index that ranges over a set of integers.

[3] We have interest only in those process for which the state distribution converges to a steady-state value.

[4] It should be mentioned that rerunning this program may result in estimates for A and B that are different from the results given here, even though the same parameters and initial conditions are used. The reason for this lies in the fact that the error vector, upon which the estimates of A and B are based, is a sample function of a random process. In addition, the rows of  and the columns of may not sum to one when the elements of  and are represented with finite precision.

[5] Note that A represents a physical channel and does not have a block diagonal form. The estimated state transition matrix  will have a block diagonal form.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset