With Bayesian methods we present an alternative method of making a statistical inference. We first introduce the Bayes theorem, the fundamental equation from which all Bayesian inference is derived.
A couple of definitions about probability are in order:
We begin with the basic assumption, as follows:
The preceding equation relates the joint probability of P(AB) to the conditional probability P(A|B) and what is also known as the marginal probability P(B). If we rewrite the equation, we have the expression for conditional probability as follows:
This is somewhat intuitive—that the probability of A given B is obtained by dividing the probability of both A and B occurring by the probability that B occurred. The idea is that B is given, so we divide by its probability. A more rigorous treatment of this equation can be found at http://bit.ly/1bCYXRd, which is titled Probability: Joint, Marginal and Conditional Probabilities.
Similarly, by symmetry we have . Thus, we have . By dividing the expression by on both sides and assuming P(B) !=0, we obtain this:
The preceding equation is referred to as Bayes theorem, the bedrock for all of Bayesian statistical inference. In order to link Bayes theorem to inferential statistics, we will recast the equation into what is called the diachronic interpretation, as follows:
where, represents a hypothesis.
represents an event that has already occurred, which we use in our statistical study, and is also referred to as data.
Then, is the probability of our hypothesis before we observe the data. This is known as the prior probability. The use of prior probability is often touted as an advantage by Bayesian statisticians since prior knowledge or previous results can be used as input for the current model, resulting in increased accuracy. For more information on this, refer to http://www.bayesian-inference.com/advantagesbayesian.
is the probability of obtaining the data that we observe regardless of the hypothesis. This is called the normalizing constant. The normalizing constant doesn't always need to be calculated, especially in many popular algorithms such as MCMC, which we will examine later in this chapter.
is the probability that the hypothesis is true, given the data that we observe. This is called the posterior.
is the probability of obtaining the data, considering our hypothesis. This is called the likelihood.
Thus, Bayesian statistics amounts to applying Bayes rule to solve problems in inferential statistics with H representing our hypothesis and D the data.
A Bayesian statistical model is cast in terms of parameters, and the uncertainty in these parameters is represented by probability distributions. This is different from the Frequentist approach where the values are regarded as deterministic. An alternative representation is as follows:
where, is our unknown data and is our observed data
In Bayesian statistics, we make assumptions about the prior data and use the likelihood to update to the posterior probability using the Bayes rule. As an illustration, let us consider the following problem. Here is a classic case of what is commonly known as the urn problem:
If a red ball is drawn, what is the probability that it came from urn one? We want that is .
Here, denotes that the ball is drawn from Urn one, and denotes that the drawn ball is red:
We know that , , , or .
Hence, we conclude that .
Bayes theorem can sometimes be represented by a more natural and convenient form by using an alternative formulation of probability called odds. Odds are generally expressed in terms of ratios and are used heavily. A 3 to 1 odds (written often as 3:1) of a horse winning a race represents the fact that the horse is expected to win with 75 percent probability.
Given a probability p, the odds can be computed as odds = , which in the case of p=0.75 becomes 0.75:0.25, which is 3:1.
We can rewrite the form of Bayes theorem by using odds as:
Bayesian statistics can be applied to many problems that we encounter in classical statistics such as:
There are many compelling reasons for studying Bayesian statistics; some of them being the use of prior information to better inform the current model. The Bayesian approach works with probability distributions rather than point estimates, thus producing more realistic predictions. Bayesian inference bases a hypothesis on the available data—P(hypothesis|data). The Frequentist approach tries to fit the data based on a hypothesis. It can be argued that the Bayesian approach is the more logical and empirical one as it tries to base its belief on the facts rather than the other way round. For more information on this, refer to http://www.bayesian-inference.com/advantagesbayesian.