In the examples we have seen so far, we have mainly focused on variable-based models. In these types of models, we mainly focus on representing the variables of the model. As in the case of our restaurant example, we can use the same network structure for multiple restaurants as they share the same variables. The only difference in all these networks would be the different states in the case of different restaurants. These types of models are known as variable-based models.
Let's take a more complex example. Let's say we want to model the state of a robot traveling over some trajectory. In this case, the state of the variables will change with time, and also, the states of some variables at some instance t might depend on the state of the robot at instance . Clearly, we can't model such a situation with a variable-based model. So, generally, for such problems, we use dynamic Bayesian networks (DBNs).
Before discussing the simplifying assumptions that DBNs make, let's first see the notations that we are going to use in the case of DBNs. As DBNs are defined over a range of time, with each time instance having the same variables, representing the instantiation of a random variable at a time instance t, we will be using . The variable is now known as a template variable as it can't take any values itself. This template variable is instantiated at various time instances, and at each instance t, the variable can take values from . Also, for a set of random variables , we use , where to denote the set of variables . Similarly, we use the notation to denote the assignments to this set of variables.
As we can see, the number of variables will be huge between any considerable time difference and hence, our joint distribution over such trajectories will be very complex. Therefore, we make some assumptions to simplify our distribution.
The first simplifying assumption that we make is to have a discrete timeline rather than having a continuous one. So, the measurement of the states of the random variables are taken at some predetermined time interval . With this assumption now, the random variable represents the values of the variables at a time instance .
Using this assumption, we can now write the distribution over the variable over a time period 0 to T as follows:
Therefore, the distribution over trajectories is the product of conditional distribution over the variables at each previous time instance, given all the past variables.
The second assumption that we make is as follows:
Putting this in simple words, the variables at time t + 1 can directly depend only on the variables at time t and are thus, independent of all the variables for . Any system that satisfies this condition is known as Markovian. This assumption reduces the earlier joint distribution equation to the following:
In other words, this assumption also constraints our network, such that the variables in can't have any edges from any other variable in .
However, the problem with this assumption is that it may not hold in all cases. Let's take an example to show this. Suppose we want to model the location of a car. As we can see, we can easily predict the location of the car in the future, given the observations about the past. Also, let's assume that we only have two random variables {L,O} and L representing the location of the car and O representing the observed location. Here, we might think that our model satisfies the Markov assumption as the location at t + 1 will only depend on the location at time t and is independent of the location at for . However, this intuition might turn out to be wrong as we don't know the velocity or the direction of travel of the car. Had we known the previous locations of the car, we could have easily estimated both the direction and velocity. So, in such cases, to make our model closer to satisfying our Markov assumption, we can add the variables direction
and velocity
in our model. Now, at each instance of time, if we know the velocity and direction of motion of the car, we can predict the next instance using just the values of the previous instance. Now, to account for the changes in the velocity and direction, we can also add variables such as weather conditions and road conditions. With the addition of these extra variables, our model is now close to being Markovian.
The Markov assumption and the independence assumption that we saw in the previous section allow us to represent the joint distribution very compactly, even over infinite trajectories. All we need to define is the distribution for the initial state and a transition model . We can represent the preceding car example using a network as shown in Fig 7.4, Fig 7.5, and Fig 7.6.
The following flowchart depicts the network structure at time t = 0:
The following figure is the flowchart that shows the unrolled DBN over a two-time slice:
Also, we define the interface variables as variables whose values at time t have a direct effect on the variables at time t + 1. Therefore, only the variables in can be parents of the variables in . Also, the preceding car example is an example of a two-time slice Bayesian network (2-TBN). We define a 2-TBN for a process over as a conditional Bayesian network over , given , where is a set of interface variables. In our example, all the variables are interface variables, except for O.
Overall, this 2-TBN represents the following conditional distribution:
For each template variable , the CPD is known as the template factor. This template factor is instantiated multiple times in the network for each .
Currently, none of the Python libraries for PGM has a concrete implementation to work with DBN. However, pgmpy
developers are currently working on it so it should soon be available in pgmpy
.