Rejection sampling

Suppose we want to sample from a distribution that is not a simple one. Let's call this distribution p(x) and let's assume we can evaluate p(x) for any given value x, up to a normalizing constant Z, that is:

Rejection sampling

In this context, p(x) is too complex to sample from but we have another simpler distribution q(x) from which we can draw samples. Next, we assume there exists a constant k such that Rejection sampling for all values of x. The function kq(x) is the comparison function as shown in the following figure:

Rejection sampling

The distribution p(x) has been generated with a simple plot:

0.6*dnorm(x,1)+0.4*dnorm(x,5)

The rejection sampling algorithm is based on the following idea:

  • Draw a sample z0 from q(z), the proposal distribution
  • Draw a second u0 sample from a uniform distribution on [0, kq(z0)]
  • If Rejection sampling then the sample is rejected otherwise u0 is accepted

In the following figure, the pair (z0, u0) is rejected if it lies in the gray area. The accepted pairs are a uniform distribution under the curve of p(z) and therefore the z values are distributed according to p(z):

Rejection sampling

The probability of such a pair being accepted is:

Rejection sampling

Because it depends on k, it is necessary for the proposal distribution to be as close as possible to the real distribution, otherwise the algorithm will converge very slowly and would be practically useless.

This algorithm is very simple and can be easily implemented. However, it suffers from a drastic problem related to the dimension of the problem. In the case of a probabilistic graphical model, the dimension quickly becomes very large. Rejection sampling is usually a good idea in one or two dimensions, but the rejection rate grows exponentially with the number of dimensions. However, it can be used as a subroutine in more advanced algorithms to generate samples for simple probabilistic forms (for example, at the level of one node if needed).

An implementation in R

We consider the problem of estimating a mixture of Gaussian distribution with a normal distribution. The mixture of Gaussian and the proposal distributions are shown in the following figure, where the proposal distribution in red has been scaled with k=3.1.

The mixture of Gaussian distribution is in black and has two modes:

An implementation in R

In R, we define the proposal and target distributions as follows:

q <- function(x) dnorm(x, 0, 0.5)
rq<- function(x) rnorm(1,  0,0.5)

p <- function(x) 0.6*dnorm(x,0,0.1)+0.4*dnorm(x,0.6,0.2)

The parameters are arbitrary. We have a proposal distribution q centered on 0 with a standard deviation of 0.5. The target distribution is a mixture of Gaussian with two components.

The rejection algorithm is as follows:

rejection <- function(N,k,p,q,rq)
{
  accept <- logical(N)
  x <- numeric(N)

  for(i in 1:N)
  {
    z0 <- rq() # draw one point from the proposal distribution
    u0 <- runif(1,0,1) # drawn one point from the uniform

    if(u0 < p(z0)/(k*q(z0))) # rejection test
      accept[i] <- TRUE
    else accept[i] <- FALSE

    x[i] <- z0
  }

  data.frame(x=x,accept=accept)
}

The parameters are as follows:

  • N: This is the number of samples.
  • k: This is the coefficient for the proposal distribution.
  • p: This is the distribution to estimate. You must pass a function that takes one parameter.
  • q: This is the proposal distribution (with the same remark as earlier).
  • rq: This is a sampler for the proposal distribution.

This algorithm samples N times and accepts or rejects in each sample in a for loop. The result is stored in a data.frame. We keep all the samples to compare the results between rejection or not. The first column is the samples, and the second column is a binary value indicating whether the sample has been accepted or not.

The algorithm works as described in the theoretical part:

  1. We first create two vectors, accept and x, for storing the results.
  2. We start a loop in which:
    1. We sample z0 from the proposal distribution.
    2. We sample u0 from a uniform on [0,1].
    3. We accept or reject the value and store the result.

Let's do a few experiments to understand the behavior of this important algorithm. In the experiments, in order for the reader to be able to reproduce exactly the same results, we will use a fixed random seed by doing:

set.seed(600)

Moreover, we will use a scaling factor k of 3.1 as:

k <- 3.1

So, the first experiment is to run the algorithm with 100 samples:

x <- rejection(100,k,p,q,rq)

The results are stored in the data.frame x and the head of this data.frame is as follows:

head(x)
            x           accept
1 -0.56007075  FALSE
2 -0.18000011  FALSE
3 -0.07572593   TRUE
4 -0.72502107  FALSE
5 -0.60916359  FALSE
6  0.97963839  FALSE

We can see that not all the points have been accepted. In our example, only 47 points out of 100 have been accepted. Looking at the histogram of the accepted values, we are far from the target distribution. It means that we need to run the algorithm longer than we did:

t <- seq(-2,2,0.01)
hist(x$x[x$accept],freq=F,breaks=200,col='grey')
lines(t,p(t),col=2,lwd=2)
An implementation in R

On this graph, we can see that the accepted samples regroup in the region of the high-probability mass. But running with such a small number of samples is not enough. The red curve is our target distribution.

We now run the algorithm with 5,000 samples:

x <- rejection(5000,k,p,q,rq)
hist(x$x[x$accept],freq=F,breaks=200,col='grey')
lines(t,p(t),col=2,lwd=2)

And what we expect in this second run is to see a better concentration of the accepted samples into the regions of high probability of the target distribution.

The following graph shows the result:

An implementation in R

And indeed the graph looks better now. The histogram follows the true distribution but it is still not perfect. In fact, the number of accepted samples is not that high when we consider it:

sum(x$accept)
1581

If we run the algorithm for longer, we will obtain a better sample set and approach the target distribution very closely.

So, now we run the algorithm with 50,000 samples. After running it, we find that 16,158 of them have been accepted. And the result is far better of course:

An implementation in R

The two modes of the distribution have been correctly captured and the empirical distribution follows precisely the target distribution. This is at the expense of running the algorithm for longer.

If we draw the histogram of all the points sampled from the proposal distribution (accepted or not), they follow precisely the proposal distribution as expected:

hist(x$x,freq=F,breaks=200,col='grey')
lines(t,q(t),col=2,lwd=2)
An implementation in R

Finally, it is also interesting to look at the behavior of this algorithm from the point of view of the number of accepted samples. We run a simple function, as follows:

N <- sapply(seq(1000,50000,1000),
    function (n)
{
        x <- rejection(n,k,p,q,rq)
        sum(x$accept)
})

And we plot the result with:

plot(N,t='o')
An implementation in R

The result is not surprising. The more samples are drawn, the more are accepted, so it gives us an interesting clue: running the algorithm for longer will indeed improve the results with respect to the number of accepted samples.

But the problem with rejection sampling is that we need to sample many points in order to have good results. Rejection sampling is still a good algorithm and can be used in many situations. In the next section we will explore an improvement on rejection sampling, called importance sampling, in which all the sample points are accepted.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset