Chapter 11 - The Law of Large Numbers (3/4)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

The Law of Large Numbers ◾ 173

BOX 11.4 REVEREND THOMAS BAYES 17021761

In 1719, Bayes matriculated at the University of Edinburgh where he stud-

ied logic and theology. en he trained for the Presbyterian ministry at the

University of Edinburgh. In 1733, he became a minister of the Presbyterian

chapel in Tunbridge Wells, 35 miles southeast of London.

omas Bayes was a strong Newtonian in his scientiﬁc outlook. omas

Bayes’ early work appears to have been related mainly to inﬁnite series, which

was one of the paths followed by British mathematicians in the 18th century.

Bayes’ interest in probability has several origins. First, Bayes learned prob-

ability from Abraham de Moivre. Next, Bayes became interested in prob-

ability after reviewing a publication of omas Simpson, a special case of the

law of large numbers: the mean of a set of observations is a better estimate of

a location parameter than a single observation.

Bayes set out his theory of probability in “Essay Towards Solving a Problem

in the Doctrine of Chances,” published in the Philosophical Transactions of the

Royal Society of London in 1764.

Bayes deﬁned the problem as follows:

Given the number of times in which an unknown event has hap-

pened and failed: Required the chance that the probability of

its happening in a single trial lies somewhere between any two

degrees of probability that can be named.

Bayes solved this problem by considering an experiment on a table (could

have been a billiards table).

A ball is thrown across the table in such a way that it is equally likely to

come to rest anywhere on the table. rough the point that it comes to rest

on the table, draw a line. en throw the ball n times and count the number

of times it falls on either side of the line. ese are the successes and failures.

Under this physical model one can now ﬁnd the chance that the probability

of success is between two given numbers.

It was Bayes’ friend Richard Price who communicated the paper to the

Royal Society two years after Bayes’ death in 1761. Bayes’ fame rests on this

result [3].

174 ◾ Simple Statistical Methods for Software Engineering

Bayes Theorem

What we have seen so far are called classic probability theories championed in the

17th century in France.

ere is another system of probability, invented and advanced by Bayes in the

18th century in England (see Box 11.4 for a short biography).

In Bernoulli’s system, the future is predicted by current probability derived

from current data. In the Bayesian system of thinking, the probability of a future

event is inﬂuenced by history too. Future probability is a product of current and

historic probabilities. Extending it further, future probability is a product prob-

ability derived from data and theoretical probability derived from knowledge.

Bayes boldly combined soft (subjective) and hard (derived from data) probabili-

ties, a notion that remained unacceptable to many statisticians for years but widely

adopted now. Bayes used the notion of conditional probability.

We can deﬁne conditional probability in terms of absolute probabilities: P(A|B) =

P(A and B)/P(B); that is, the probability that A and B are both true divided by the

probability that B is true.

Bayes used some special terms. Future probability is known as posterior prob-

ability. Historic probability is known as prior probability. Future probability can

only be a likelihood, an expression of chance softer than the rigorous term prob-

ability. Future probability is a conditional probability.

A Clinical Lab Example

A simple illustration of the Bayes analysis is provided by Trevor Lohrbeer in Bayesian

Maths for Dummies [4]. e gist of this analysis is as follows:

A person tests positive in a lab. e lab has a reputation of 99% correct

diagnosis but also has false alarm probability of 5%. ere is a back-

ground information that the disease occurs in 1 in 1000 people (0.1%

probability). Intuitively one would expect the probability that the person

has the disease is 99%, based on the lab’s reputation. Two other prob-

abilities are working in this problem: a background probability of 0.1%

and a false alarm probability of 5%. Bayes theorem allows us to combine

all the three probabilities and predict the chance of the person having the

disease as 1.94%. is is dramatically less than an intuitive guess.

e Bayesian breakthrough is in that general truth (or disease history) prevails

upon fresh laboratory evidence. Data 11.1 presents the following three probabilities

that deﬁne the situation.

: probability of correct diagnosis

: probability of false alarm

: prevalent disease probability (background history)

The Law of Large Numbers ◾ 175

e question is “What is the probability of a person who tests positive having

the disease?” is probability is denoted by P

in Data 11.1. P

and P

are ﬁxed, and

is varied. e associated P

is calculated according to the Bayes theorem:

PP P P

1 3

1 3 2 3

+ −( ( ))

(11.13)

In this formula, probabilities are expressed in fractions.

is is a way of understanding how the probability that a hypothesis is true

is aected by a new piece of evidence. It is used to clarify the relationship

between theory and evidence.

e role played by false alarm probability on estimation of P

can also be cal-

culated in a similar way. By keeping the P

(disease history) constant in the above

example, we can vary P

, false alarm probability, and see the impact on estimation

(see Data 11.2).

As false alarm probability P

decreases, the probability of the subject having

disease P

increases, tending toward the probability of correct diagnosis P

Data 11.1 Bayes Estimation with Variable Disease

Probability

Given Lab Characteristics

Reputation of correct diagnosis 99%

False alarm probability 5%

Question: What is the probability P

of a person who tests positive

having the disease?

0.1 1.9

1.0 16.7

10.0 68.8

20.0 83.2

30.0 89.5

40.0 93.0

50.0 95.2

60.0 96.7

70.0 97.9

80.0 98.8

90.0 99.4

Note: It may be seen that posterior probability depends on prior

probability.

Disease Probability %

Bayes Estimation Chance of Having

Disease %

176 ◾ Simple Statistical Methods for Software Engineering

e above example illustrates the application of conditional probability and

how it can modify our judgment, for the better.

Application of Bayes Theorem in Software Development

Chulani et al. [5] applied the Bayes theorem to software development. e Bayes

theorem is elegantly applied to software cost models.

e Bayesian approach provides a formal process by which a-priori expert

judgment can be combined with sampling information (data) to produce a

robust a-posteriori model

Posterior = Sample × Prior

In the above equation “Posterior” refers to the posterior density function

summarizing all the information. “Sample” refers to the sample informa-

tion (or collected data) and is algebraically equivalent to the likelihood

function. “Prior” refers to the prior information summarizing the expert

judgment. In order to determine the Bayesian posterior mean and variance,

we need to determine the mean and precision of the prior information and

the sampling information.

Data 11.2 Bayes Estimation with Variable False Alarm

Probability

Constants

0.001 Disease history

0.99 Probability of correct diagnosis

Variables

False alarm probability

Probability of the subject having disease

Question: What is the probability P

of a person who tests positive

having the disease?

Bayes Estimation

0.05000 0.01943

0.01000 0.09016

0.00100 0.49774

0.00010 0.90834

0.00001 0.99001

The Law of Large Numbers ◾ 177

Chulani et al. have used the Bayesian paradigm to calibrate the Cost

Construction Model (COCOMO), combining expert judgment with empirical

data. is illustration has great signiﬁcance and holds great promise. is makes

us think diﬀerently about data. In a Bayesian sense, data include an intuitive guess.

e study of Chulani et al. proves that a healthy collaboration between empirical

data and an intuitive guess, such as available in Bayes, is a practical solution to a

hitherto unsolved problem.

Fenton [6] used the Bayesian belief networks (BBNs), a commendable expan-

sion of the Bayesian paradigm, to predict software reliability.

Bibi et al. [7] applied BBNs as a software productivity estimation tool. ey

ﬁnd that BBN is a promising method whose results can be conﬁrmed intuitively.

BBNs are easily interpreted, allow ﬂexibility in the estimation, can support expert

judgment, and can create models considering all the information that lay in a data

set by including all productivity factors in the ﬁnal model.

Wagner [8] used BBNs inside a framework of activity-based quality models in

studying the problem of assessing and predicting the complex concept of software

quality. He observes,

The use of Bayesian networks opens many possibilities. Most

interestingly, after building a large Bayesian network, a sensitiv-

ity analysis of that network can be performed. This can answer

the practically very relevant question which of the factors are

the most important ones. It would allow to reduce the mea-

surement efforts signiﬁcantly by concentrating on these most

inﬂuential facts.

A Comparison of Application of the Four

Distributions and Bayes Theorem

In the case of the binomial distribution, the trials are independent of one another.

Trials are done with replacement.

e hypergeometric distribution arises when sampling is performed from a ﬁnite

population without replacement, thus making trials dependent on one another.

In NBD, the number of trials is not ﬁxed. Trials go until a speciﬁed number of

successes are obtained.

e geometric distribution is a special case of NBD where trials are observed

until the ﬁrst success is achieved.

Bayes theorem provides a way to combine historical distribution with fresh evidence.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 11 - The Law of Large Numbers (3/4)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 11 - The Law of Large Numbers (3/4)