MCMC for probabilistic graphical models in R

In fact, this section could be the title of book. As a matter of fact, there are several books entirely devoted to this specific topic. Research in this field is extremely active, with many new algorithms coming every year.

There are numerous packages implementing MCMC algorithms for different types of algorithms. There are also more generic frameworks such as the famous BUGS (and its open source implementation OpenBUGS) and a new, even more powerful framework called Stan. Historically, BUGS was the first framework to popularize MCMC inference in Bayesian statistics and literally led to a revolution in this field, as everyone could benefit from Bayesian statistics right out of the box.

Making an introduction to each of them and showing all the possibilities of MCMC for a few specific graphical models would require another book as big as this one. In this section, we will therefore focus on a programming environment available in R. In fact, it works also with C++, Python, Matlab, Julia, Stata, and even on the command line! It allows the implementation of all sorts of Bayesian models, as we have seen so far. Stan mainly uses MCMC algorithms to perform inferences.

Installing Stan and RStan

This procedure is detailed on the following web page: https://github.com/stan-dev/rstan/wiki/Rstan-Getting-Started.

Thus we will just recall the basic steps in order to install Stan and RStan:

Sys.setenv(MAKEFLAGS = "-j4")
install.packages("rstan", dependencies = TRUE)

Prepare yourself for a long installation, as Stan will need numerous packages. Finally, depending on your R installation, you might have to restart R, but in general it is not necessary.

Finally, load RStan like any other package:

library(rstan)

You should see an introductory message such as this, telling you Stan is ready:

Loading required package: ggplot2
rstan (Version 2.9.0, packaged: 2016-01-05 16:17:47 UTC, GitRev: 05c3d0058b6a)
For execution on a local, multicore CPU with excess RAM we recommend calling
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

A simple example in RStan

We show here some basic possibilities in RStan. The reader is encouraged to read more about it and try more examples.

RStan is based on a probabilistic programming language used to describe the Bayesian models. For example, we can make a simple univariate Gaussian model:

parameters
{
        real y;
}

model
{
        y ~ normal(0,1);
}

This code is Stan code, not R code. Then, in R, we can simulate this model, by doing:

fit = stan(file='example.stan')

The model will be simulated using an MCMC algorithm and the results are displayed using:

print(fit)
Inference for Stan model: example.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
                                                                     
      mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
y     0.00    0.03 0.98 -1.93 -0.63 -0.01  0.64  1.98  1191    1
lp__ -0.48    0.02 0.69 -2.35 -0.63 -0.20 -0.05  0.00  1718    1
                                                                                  
Samples were drawn using NUTS(diag_e)
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).

It's interesting to see that this package has indeed a warm-up procedure as we saw before and it tries to compute an estimate of the effective sample size for a simple Gaussian. The value of 1191 (the n_eff column) is not too far from the examples we saw in the previous section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset