Chapter 19

More on Probability

IN THIS CHAPTER

Understanding the beta version

Pursuing Poisson

Grappling with gamma

Speaking exponentially

In Chapter 18, I delve into probability in a semiformal way, and introduce distributions of random variables. The binomial distribution is the starting point. In this chapter, I examine additional distributions.

One of the symbols on the pages of this book (and other books in the For Dummies series) lets you know that “technical stuff” follows. It might have been a good idea to hang that symbol above this chapter’s title. So here’s a small note of caution: Some mathematics follows. I put the math in to help you understand what you’re doing when you work with the dialog boxes of the Excel functions I describe.

Are these functions on the esoteric side? Well … yes. Will you ever have occasion to use them? Well … you just might.

Discovering Beta

The beta distribution (not to be confused with beta, the probability of a Type 2 error) is a sort of chameleon in the world of distributions. It takes on a wide variety of appearances, depending on the circumstances. I won’t give you all the mathematics behind the beta distribution, because the full treatment involves calculus.

The beta distribution connects with the binomial distribution, which I discuss in Chapter 18. The connection is this: In the binomial, the random variable x is the number of successes in N trials with p as the probability of a success. N and p are constants. In the beta distribution, the random variable x is the probability of a success, with N and the number of successes as constants.

Why is this useful? In the real world, you usually don’t know the value of p, and you’re trying to find it. Typically, you conduct a study, find the number of successes in a set of trials, and then you have to estimate p. Beta shows you the likelihood of possible values of p for the number of trials and successes in your study.

Some of the math is complicated, but I can at least show you the rule that generates the density function for N trials with r successes, when N and r are whole numbers:

images

The vertical bar in the parentheses on the left means “given.” So this density function is for specific values of N and r. Calculus enters the picture when N and r aren’t whole numbers. (Density function? “Given”? Refer to Chapter 18.)

To give you an idea of what this function looks like, I used Excel to generate and graph the density function for four successes in ten trials. Figure 19-1 shows the data and the graph. Each value on the x-axis is a possible value for the probability of a success. The curve shows probability density. As I point out in Chapter 18, probability density is what makes the area under the curve correspond to probability. The curve’s maximum point is at x = .4, which is what you would expect for four successes in ten trials. I selected cell C3 so that the Formula bar shows how I used BETA.DIST to calculate the probabilities.

image

FIGURE 19-1: The Beta Density function for four successes in ten trials.

Suppose I toss a die (one of a pair of dice) and I define a success as any toss that results in a 3. I assume I’m tossing a fair die, so I assume that p = pr(3) = 1/6. Suppose I toss a die ten times and get four 3’s. How good does that fair-die assumption look?

The graph in Figure 19-1 gives you a hint: The area to the left of .16667 (the decimal equivalent of 1/6) is a pretty small proportion of the total area, meaning that the probability that p is 1/6 or less is pretty low.

Now, if you have to go to all the trouble of creating a graph and then guesstimate proportions of area to come up with an answer like “pretty low,” you’re doing a whole lot of work for very little return. Fortunately, Excel has a better way.

BETA.DIST

BETA.DIST eliminates the need for all the graphing and guesstimating. This function enables you to work with the cumulative beta distribution to determine the probability that p is less than or equal to some value. Considering the complexity of beta, BETA.DIST is surprisingly easy to work with.

tip In the BETA.DIST Function Arguments dialog box, and in the BETA.DIST Help file, you see Alpha and Beta. The dialog box tells you each one is a “parameter to the distribution,” and the Help file tells you that each is “a parameter of the distribution.” Aside from altering the preposition, neither one is much help — at least, not in any way that helps you apply Alpha and Beta.

So here are the nuts and bolts: For the example you’re working through, Alpha is the number of successes, and Beta is the number of failures.

When you put the density function in terms of Alpha (α) and Beta (β), it’s

images

Again, this applies only when α and β are both whole numbers. If that’s not the case, you need calculus to compute f(x).

The steps are:

  1. Select a cell for BETA.DIST’s answer.
  2. From the Statistical Functions menu, select BETA.DIST to open its Function Arguments dialog box. (See Figure 19-2.)
  3. In the Function Arguments dialog box, type the appropriate values for the arguments.

    The X box holds the probability of a success. For this example, the probability of a success is images.

    Excel refers to Alpha and Beta (coming up next) as parameters to the distribution. I treat them as “number of successes” and “number of failures.” So I enter 4 in the Alpha box and 6 in the Beta box.

    In the Cumulative box, I typed TRUE. This gives the area under the Beta function curve between 0 and 1/6. If I type FALSE, it gives the height of the Beta function at the value of X. As the Formula bar in Figure 19-1 shows, I typed FALSE to create the chart.

    The A box is an evaluation limit for the value in the X box. In English, that means a lower bound for the value. It isn’t relevant for this type of example. I left this box blank, which by default sets A = 0. Incidentally, the Help file refers to an optional B box that sets an upper bound on X. As you can see, no B box is here. The Help file is referring to something in an earlier version of this function.

    After all the entries, the answer appears in the dialog box.

    The answer for this example is .048021492. “Pretty low” indeed. With four successes in ten tosses, you’d intuitively expect that p is greater than 1/6.

  4. Click OK to put the answer into the selected cell.
image

FIGURE 19-2: The BETA.DIST Function Arguments dialog box.

The beta distribution has wider applicability than I show you here. Consequently, you can put all kinds of numbers (within certain restrictions) into the boxes. For example, the value you put into the X box can be greater than 1.00, and you can enter values that aren’t whole numbers into the Alpha box and the Beta box.

BETA.INV

This one is the inverse of BETA.DIST. If you enter a probability and values for successes and failures, it returns a value for p. For example, if you supply it with .048021492, four successes, and six failures, it returns 0.1666667 — the decimal equivalent of 1/6.

tip BETA.INV has a more helpful application. You can use it to find the confidence limits for the probability of a success.

Suppose you’ve found r successes in N trials and you’re interested in the 95 percent confidence limits for the probability of a success. The lower limit is

images

The upper limit is

images
  1. Select a cell for BETA.INV’s answer.
  2. From the Statistical Functions menu, select BETA.INV to open its Function Arguments dialog box. (See Figure 19-3.)
  3. In the Function Arguments dialog box, enter the appropriate values for the arguments.

    The X box holds a cumulative probability. For the lower bound of the 95 percent confidence limits, the probability is .025.

    In the Alpha box, I entered the number of successes. For this example, that’s 4.

    In the Beta box, I entered the number of failures (not the number of trials). The number of failures is 6.

    The A box and the B box are evaluation limits for the value in the X box. These aren’t relevant for this type of example. I left them blank, which by default sets A = 0 and B=1.

    With the entries for X, Alpha, and Beta, the answer appears in the dialog box. The answer for this example is .13699536.

  4. Click OK to put the answer into the selected cell.
image

FIGURE 19-3: The BETA.INV Function Arguments dialog box.

Entering .975 in the X box gives .700704575 as the result. So the 95 percent confidence limits for the probability of a success are .137 and .701 (rounded off) if you have four successes in ten trials.

With more trials, of course, the confidence limit narrows. For 40 successes in 100 trials, the confidence limits are .307 and .497.

Poisson

If you have the kind of process that produces a binomial distribution and you have an extremely large number of trials and a very small number of successes, the Poisson distribution approximates the binomial. The equation of the Poisson is

images

In the numerator, μ is the mean number of successes in the trials, and e is 2.71828 (and infinitely more decimal places), a constant near and dear to the hearts of mathematicians.

Here’s an example. FarKlempt Robotics, Inc., produces a universal joint for its robots’ elbows. The production process is under strict computer control, so that the probability a joint is defective is .001. What is the probability that in a sample of 1,000, one joint is defective? What’s the probability that two are defective? Three?

Named after 19th century mathematician Siméon-Denis Poisson, this distribution is computationally easier than the binomial — or at least it was when mathematicians had no computational aids. With Excel, you can easily use BINOM.DIST to do the binomial calculations.

First, I apply the Poisson distribution to the FarKlempt example. If π = .001 and N = 1000, the mean is

images

(Refer to Chapter 18 for an explanation of μ = N π.)

Now for the Poisson. The probability that one joint in a sample of 1,000 is defective is

images

For two defective joints in 1000, it’s

images

And for three defective joints in 1,000:

images

remember It may seem odd that I refer to a defective item as a “success.” It’s just a way of labeling a specific event.

POISSON.DIST

Here are the steps for using Excel’s POISSON.DIST for the preceding example:

  1. Select a cell for POISSON.DIST’s answer.
  2. From the Statistical Functions menu, select POISSON.DIST to open its Function Arguments dialog box. (See Figure 19-4.)
  3. In the Function Arguments dialog box, enter the appropriate values for the arguments.

    In the X box, I entered the number of events for which I’m determining the probability. I’m looking for pr(1), so I entered 1.

    In the Mean box, I entered the mean of the process. That’s N π, which for this example is 1.

    In the Cumulative box, it’s either TRUE for the cumulative probability or FALSE for just the probability of the number of events. I entered FALSE.

    With the entries for X, Mean, and Cumulative, the answer appears in the dialog box. The answer for this example is .367879441.

  4. Click OK to put the answer into the selected cell.
image

FIGURE 19-4: The POISSON.DIST Function Arguments dialog box.

In the example, I show you the probability for two defective joints in 1,000 and the probability for three. To follow through with the calculations, I’d type 2 into the X box to calculate pr(2), and 3 to find pr(3).

As I mention earlier, in the 21st century it’s pretty easy to calculate the binomial probabilities directly. Figure 19-5 shows you the Poisson and the binomial probabilities for the numbers in column B and the conditions of the example. I graphed the probabilities so you can see how close the two really are. I selected cell D3, so the Formula bar shows you how I used BINOM.DIST to calculate the binomial probabilities.

image

FIGURE 19-5: Poisson probabilities and binomial probabilities.

Although the Poisson’s usefulness as an approximation is outdated, it has taken on a life of its own. Phenomena as widely disparate as reaction time data in psychology experiments, degeneration of radioactive substances, and scores in professional hockey games seem to fit Poisson distributions. This is why business analysts and scientific researchers like to base models on this distribution. (“Base models on”? What does that mean? I tell you all about it in Chapter 20.)

Working with Gamma

You may recall from Chapter 18 that the number of ways of arranging N objects in a sequence is N! (“N factorial”). You might also recall that N! = N(N-1)(N-2)…(2)(1). Obviously, the factorial only works for whole numbers, right?

The gamma function and GAMMA

Not so fast. Mathematicians (some pretty famous ones) have extended the factorial concept to include non-integers and even negative numbers (which gets very hairy). This extension is called the gamma function. When gamma’s argument is a positive whole number — let’s call it N — the result is (N-1)!. Otherwise, gamma returns the result of a calculus-based equation.

Rather than go into all the calculus, I’ll just give you an example: 4! = 24 and 5! = 120. So the factorial of 4.3 (whatever that would mean) should be somewhere between 24 and 120. Because of the N-1 I just mentioned, you’d find this factorial by letting gamma loose on 5.3 (rather than 4.3). And gamma(5.3) = 38.08.

GAMMA is the worksheet function for gamma. GAMMA takes a single argument. Feed it a number and you get back its gamma-function value. For example,

=GAMMA(5.3)

returns 38.08.

The gamma distribution and GAMMA.DIST

All the preceding is mostly within the realm of theoretical mathematics. Things get more interesting (and more useful) when you tie gamma to a probability distribution. This marriage is called the gamma distribution.

The gamma distribution is related to the Poisson distribution in the same way the negative binomial distribution is related to the binomial. The negative binomial tells you the number of trials until a specified number of successes in a binomial distribution. The gamma distribution tells you how many samples you go through to find a specified number of successes in a Poisson distribution. Each sample can be a set of objects (as in the FarKlempt Robotics universal joint example), a physical area, or a time interval.

The probability density function for the gamma distribution is

images

Again, this works when α is a whole number. If it’s not, you guessed it — calculus. (By the way, when this function has only whole-number values of α, it’s called the Erlang distribution, just in case anybody ever asks you.) The letter e, once again, is the constant 2.7818 I mention earlier.

Don’t worry about the exotic-looking math. As long as you understand what each symbol means, you’re in business. Excel does the heavy lifting for you.

So here’s what the symbols mean. For the FarKlempt Robotics example, α is the number of successes and β corresponds to μ the Poisson distribution. The variable x tracks the number of samples. So, if x is 3, α is 2, and β is 1, you’re talking about the probability density associated with finding the second success in the third sample, if the average number of successes per sample (of 1,000) is 1. (Where does 1 come from, again? That’s 1,000 universal joints per sample multiplied by .001, the probability of producing a defective one.)

To determine probability, you have to work with area under the density function. This brings me to the Excel worksheet function designed for the gamma distribution.

GAMMA.DIST gives you a couple of options. You can use it to calculate the probability density, and you can use it to calculate probability. Figure 19-6 shows how I used the first option to create a graph of the probability density so you can see what the function looks like. Working within the context of the preceding example, I set Alpha to 2 and Beta to 1, and calculated the density for the values of x in column D.

image

FIGURE 19-6: The density function for gamma, with Alpha = 2 and Beta =1.

The values in column E show the probability densities associated with finding the second defective universal joint in the indicated number of samples of 1,000. For example, cell E5 holds the probability density for finding the second defective joint in the third sample.

In real life, you work with probabilities rather than densities. Next, I show you how to use GAMMA.DIST to determine the probability of finding the second defective joint in the third sample.

Here are the steps:

  1. Select a cell for GAMMA.DIST’s answer.
  2. From the Statistical Functions menu, select GAMMA.DIST to open its Function Arguments dialog box. (See Figure 19-7.)
  3. In the Function Arguments dialog box, enter the appropriate values for the arguments.

    The X box holds the number of samples for which I’m determining the probability. I’m looking for pr(3), so I entered 3.

    In the Alpha box, I entered the number of successes. I want to find the second success in the third sample, so I entered 2.

    In the Beta box, I entered the average number of successes that occur within a sample. For this example, that’s 1.

    In the Cumulative box, the choices are TRUE for the cumulative distribution or FALSE to find the probability density. I want to find the probability, not the density, so I entered TRUE.

    With values entered for X, Alpha, Beta, and Cumulative, the answer — .800851727 — appears in the dialog box.

  4. Click OK to put the answer into the selected cell.
image

FIGURE 19-7: The GAMMA.DIST Function Arguments dialog box.

GAMMA.INV

If you want to know, at a certain level of probability, how many samples it takes to observe a specified number of successes, this is the function for you.

GAMMA.INV is the inverse of GAMMA.DIST. Enter a probability along with Alpha and Beta and it returns the number of samples. Its Function Arguments dialog box has a Probability box, an Alpha box, and a Beta box. Figure 19-8 shows what happens if you enter the cell that holds the answer for the preceding section (I stored it in cell A1) into the Probability box and the same numbers used earlier to get that answer for Alpha and Beta: The answer is 3.

image

FIGURE 19-8: The GAMMA.INV Function Arguments dialog box.

Exponential

If you’re dealing with the gamma distribution and you have Alpha = 1, you have the exponential distribution. This gives the probability that it takes a specified number of samples to get to the first success.

What does the density function look like? Excuse me … I’m about to go mathematical on you for a moment. Here, once again, is the density function for gamma:

images

If α = 1, it looks like this:

images

Statisticians like substituting λ (the Greek letter lambda) for images, so here’s the final version:

images

I bring this up because Excel’s EXPON.DIST Function Arguments dialog box has a box for LAMBDA, and I want you to know what it means.

EXPON.DIST

Use EXPON.DIST to determine the probability that it takes a specified number of samples to get to the first success in a Poisson distribution. Here, I work once again with the universal joint example. I show you how to find the probability that you’ll see the first success in the third sample. Here are the steps:

  1. Select a cell for EXPON.DIST’s answer.
  2. From the Statistical Functions menu, select EXPON.DIST to open its Function Arguments dialog box. (See Figure 19-9.)
  3. In the Function Arguments dialog box, enter the appropriate values for the arguments.

    In the X box, I entered the number of samples for which I’m determining the probability. I’m looking for pr(3), so I typed 3.

    In the Lambda box, I entered the average number of successes per sample. This goes back to the numbers I give you in the example — the probability of a success (.001) times the number of universal joints in each sample (1,000). That product is 1, so I entered 1 in this box.

    In the Cumulative box, the choices are TRUE for the cumulative distribution or FALSE to find the probability density. I want to find the probability, not the density, so I entered TRUE.

    With values entered for X, Lambda, and Cumulative, the answer appears in the dialog box. The answer for this example is .950212932.

  4. Click OK to put the answer into the selected cell.
image

FIGURE 19-9: The EXPON.DIST Function Arguments dialog box.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset