The Law of Large Numbers 173
BOX 11.4 REVEREND THOMAS BAYES 17021761
In 1719, Bayes matriculated at the University of Edinburgh where he stud-
ied logic and theology. en he trained for the Presbyterian ministry at the
University of Edinburgh. In 1733, he became a minister of the Presbyterian
chapel in Tunbridge Wells, 35 miles southeast of London.
omas Bayes was a strong Newtonian in his scientific outlook. omas
Bayes’ early work appears to have been related mainly to infinite series, which
was one of the paths followed by British mathematicians in the 18th century.
Bayes’ interest in probability has several origins. First, Bayes learned prob-
ability from Abraham de Moivre. Next, Bayes became interested in prob-
ability after reviewing a publication of omas Simpson, a special case of the
law of large numbers: the mean of a set of observations is a better estimate of
a location parameter than a single observation.
Bayes set out his theory of probability in “Essay Towards Solving a Problem
in the Doctrine of Chances,published in the Philosophical Transactions of the
Royal Society of London in 1764.
Bayes defined the problem as follows:
Given the number of times in which an unknown event has hap-
pened and failed: Required the chance that the probability of
its happening in a single trial lies somewhere between any two
degrees of probability that can be named.
Bayes solved this problem by considering an experiment on a table (could
have been a billiards table).
A ball is thrown across the table in such a way that it is equally likely to
come to rest anywhere on the table. rough the point that it comes to rest
on the table, draw a line. en throw the ball n times and count the number
of times it falls on either side of the line. ese are the successes and failures.
Under this physical model one can now find the chance that the probability
of success is between two given numbers.
It was Bayes’ friend Richard Price who communicated the paper to the
Royal Society two years after Bayes’ death in 1761. Bayes’ fame rests on this
result [3].
174 Simple Statistical Methods for Software Engineering
Bayes Theorem
What we have seen so far are called classic probability theories championed in the
17th century in France.
ere is another system of probability, invented and advanced by Bayes in the
18th century in England (see Box 11.4 for a short biography).
In Bernoullis system, the future is predicted by current probability derived
from current data. In the Bayesian system of thinking, the probability of a future
event is influenced by history too. Future probability is a product of current and
historic probabilities. Extending it further, future probability is a product prob-
ability derived from data and theoretical probability derived from knowledge.
Bayes boldly combined soft (subjective) and hard (derived from data) probabili-
ties, a notion that remained unacceptable to many statisticians for years but widely
adopted now. Bayes used the notion of conditional probability.
We can define conditional probability in terms of absolute probabilities: P(A|B) =
P(A and B)/P(B); that is, the probability that A and B are both true divided by the
probability that B is true.
Bayes used some special terms. Future probability is known as posterior prob-
ability. Historic probability is known as prior probability. Future probability can
only be a likelihood, an expression of chance softer than the rigorous term prob-
ability. Future probability is a conditional probability.
A Clinical Lab Example
A simple illustration of the Bayes analysis is provided by Trevor Lohrbeer in Bayesian
Maths for Dummies [4]. e gist of this analysis is as follows:
A person tests positive in a lab. e lab has a reputation of 99% correct
diagnosis but also has false alarm probability of 5%. ere is a back-
ground information that the disease occurs in 1 in 1000 people (0.1%
probability). Intuitively one would expect the probability that the person
has the disease is 99%, based on the lab’s reputation. Two other prob-
abilities are working in this problem: a background probability of 0.1%
and a false alarm probability of 5%. Bayes theorem allows us to combine
all the three probabilities and predict the chance of the person having the
disease as 1.94%. is is dramatically less than an intuitive guess.
e Bayesian breakthrough is in that general truth (or disease history) prevails
upon fresh laboratory evidence. Data 11.1 presents the following three probabilities
that define the situation.
P
1
: probability of correct diagnosis
P
2
: probability of false alarm
P
3
: prevalent disease probability (background history)
The Law of Large Numbers 175
e question is What is the probability of a person who tests positive having
the disease?” is probability is denoted by P
0
in Data 11.1. P
1
and P
2
are xed, and
P
3
is varied. e associated P
0
is calculated according to the Bayes theorem:
P
PP
PP P P
0
1 3
1 3 2 3
1
=
+ ( ( ))
(11.13)
In this formula, probabilities are expressed in fractions.
is is a way of understanding how the probability that a hypothesis is true
is aected by a new piece of evidence. It is used to clarify the relationship
between theory and evidence.
e role played by false alarm probability on estimation of P
0
can also be cal-
culated in a similar way. By keeping the P
3
(disease history) constant in the above
example, we can vary P
2
, false alarm probability, and see the impact on estimation
(see Data 11.2).
As false alarm probability P
2
decreases, the probability of the subject having
disease P
0
increases, tending toward the probability of correct diagnosis P
1
.
Data 11.1 Bayes Estimation with Variable Disease
Probability
Given Lab Characteristics
P
1
Reputation of correct diagnosis 99%
P
2
False alarm probability 5%
Question: What is the probability P
0
of a person who tests positive
having the disease?
P
3
P
0
0.1 1.9
1.0 16.7
10.0 68.8
20.0 83.2
30.0 89.5
40.0 93.0
50.0 95.2
60.0 96.7
70.0 97.9
80.0 98.8
90.0 99.4
Note: It may be seen that posterior probability depends on prior
probability.
Disease Probability %
Bayes Estimation Chance of Having
Disease %
176 Simple Statistical Methods for Software Engineering
e above example illustrates the application of conditional probability and
how it can modify our judgment, for the better.
Application of Bayes Theorem in Software Development
Chulani et al. [5] applied the Bayes theorem to software development. e Bayes
theorem is elegantly applied to software cost models.
e Bayesian approach provides a formal process by which a-priori expert
judgment can be combined with sampling information (data) to produce a
robust a-posteriori model
Posterior = Sample × Prior
In the above equation “Posterior” refers to the posterior density function
summarizing all the information. Samplerefers to the sample informa-
tion (or collected data) and is algebraically equivalent to the likelihood
function. “Prior” refers to the prior information summarizing the expert
judgment. In order to determine the Bayesian posterior mean and variance,
we need to determine the mean and precision of the prior information and
the sampling information.
Data 11.2 Bayes Estimation with Variable False Alarm
Probability
Constants
P
3
0.001 Disease history
P
1
0.99 Probability of correct diagnosis
Variables
P
2
False alarm probability
P
0
Probability of the subject having disease
Question: What is the probability P
0
of a person who tests positive
having the disease?
Bayes Estimation
P
2
P
0
0.05000 0.01943
0.01000 0.09016
0.00100 0.49774
0.00010 0.90834
0.00001 0.99001
The Law of Large Numbers 177
Chulani et al. have used the Bayesian paradigm to calibrate the Cost
Construction Model (COCOMO), combining expert judgment with empirical
data. is illustration has great significance and holds great promise. is makes
us think differently about data. In a Bayesian sense, data include an intuitive guess.
e study of Chulani et al. proves that a healthy collaboration between empirical
data and an intuitive guess, such as available in Bayes, is a practical solution to a
hitherto unsolved problem.
Fenton [6] used the Bayesian belief networks (BBNs), a commendable expan-
sion of the Bayesian paradigm, to predict software reliability.
Bibi et al. [7] applied BBNs as a software productivity estimation tool. ey
find that BBN is a promising method whose results can be confirmed intuitively.
BBNs are easily interpreted, allow flexibility in the estimation, can support expert
judgment, and can create models considering all the information that lay in a data
set by including all productivity factors in the final model.
Wagner [8] used BBNs inside a framework of activity-based quality models in
studying the problem of assessing and predicting the complex concept of software
quality. He observes,
The use of Bayesian networks opens many possibilities. Most
interestingly, after building a large Bayesian network, a sensitiv-
ity analysis of that network can be performed. This can answer
the practically very relevant question which of the factors are
the most important ones. It would allow to reduce the mea-
surement efforts significantly by concentrating on these most
influential facts.
A Comparison of Application of the Four
Distributions and Bayes Theorem
In the case of the binomial distribution, the trials are independent of one another.
Trials are done with replacement.
e hypergeometric distribution arises when sampling is performed from a finite
population without replacement, thus making trials dependent on one another.
In NBD, the number of trials is not fixed. Trials go until a specified number of
successes are obtained.
e geometric distribution is a special case of NBD where trials are observed
until the first success is achieved.
Bayes theorem provides a way to combine historical distribution with fresh evidence.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset