Introduction

CLASSIC PUZZLES FROM THE PAST

I.1 A GAMBLING PUZZLE OF GOMBAUD AND PASCAL

The birth of probability theory is generally dated by historians of science to 1654. That was the year the French writer and philosopher Antoine Gombaud (1607–1684), better known today by his literary name, Chevalier de Méré, presented the French mathematician Blaise de Pascal (1623–1662) with some questions concerning games of chance.1 This is a bit of an artificial milestone as it refers to the start of an analytical approach. Probabilistic thinking can be traced back to well before 1654. A hundred years earlier, for example, the Italian Gerolamo Cardano (1501–1575) had pondered probability, and wrote his work up in a 15-page paper, Liber de ludo aleae (The Book on Games of Chance). It was written in 1564 but not published until many years later (1663), a decade after Gombaud and Pascal. Furthermore, Cardano thought random behavior was the result of the influence of external, supernatural forces, what he mysteriously called the “authority of the Prince.” (This is not the way to a true, mathematical understanding of probability!) One might even make a (weak) case for an even earlier interest in probability, perhaps going all the way back to ancient times in the Holy Land.2

For now, however, let’s be traditional and go with the Gombaud-Pascal interaction as “the beginning.” In particular, Gombaud’s befuddlement over two particular games involving dice gives us an interesting peek into how, just a few hundred years ago, the theory of probabilities was still very new. Starting with a fair die, Gombaud correctly understood that on a single toss of a fair die he would observe a 6 with probability 1/6. He also knew that, given two fair dice, a single toss of the pair would show a double 6 with probability 1/36. From that starting point, Gombaud was puzzled when he came to compare two then popular dice games. For the first game one tosses a single die n1 times; the question for this game is: what should n1 be to first make the probability of seeing at least one 6 greater than 1/2? That is, what should n1 be to make an “even money” bet on seeing at least one 6 an advantageous bet? For the second game, one tosses a pair of dice n2 times, and the question is, what should n2 be to first make the probability of seeing at least one double 6 greater than 1/2?

Gombaud believed in strict proportionality, and so he asserted that it should be true that

Images

since in the first game there are six ways for a die to fall, and 36 ways for a pair of dice to fall in the second game. In fact, however, very careful observation of the outcomes of a large number of both games had convinced Gombaud (correctly) that strict proportionality did not hold. In his letter to Pascal, Gombaud had strongly expressed his outrage at the failure of proportionality; as Pascal wrote (in a letter dated July 29, 1654) to his fellow Frenchman, the now world-famous Pierre de Fermat (1601–1665), Gombaud thought this failure “was a great scandal which made him proclaim loudly that … Arithmetic belied herself.” Pascal was able to properly calculate the values of n1 and n2, and that is why historians have focused on the 1654 date as the beginning of probability theory.

To calculate the values of n1 and n2, write P1 and P2, respectively, as the probabilities of first seeing (with a probability greater than 1/2) at least one 6 in n1 tosses of a fair die, and of seeing at least one double 6 in n2 tosses of two such fair dice. For P1, Pascal observed that, in n1 tosses of a die there are 6n1 ways it can land, with 5n1 of them having not a single 6. So, 6n1 − 5n1 is the number of ways at least one of the tosses will be a 6. We therefore have

Images

and so n1 = 4.

A modern mathematician would argue his or her way to the same result using just slightly different language, observing first that the event “no 6 in n1 tosses” and the event “at least one 6 in n1 tosses” are complementary events, that is, the two events are mutually exclusive (if one occurs when the die is tossed, then the other doesn’t), and also that the two events are inclusive (one of the two events must occur). Since the probability of throwing a 6 on a single toss is 1/6, then the probability of not getting a 6 is 5/6, and so, again,

Images

In the same way,

Images

and so n2 = 25, and not the 24 that Gombaud thought it should be from (I1.1).

Our modern mathematician would explain Pascal’s approach as the calculation of the ratio of “favorable” possibilities (that is, the events of either “at least one 6” or “at least one double 6”) to the total number of possibilities. The total number of possibilities is called the sample space of an experiment (“tossing a die n1 times” for P1 and “tossing two dice n2 times” for P2). The tacit assumption behind taking a ratio is that all of the possibilities in the sample space are equally likely.3

I.2 GALILEO’S DICE PROBLEM

Years before Pascal answered Gombaud’s puzzle, the Italian mathematical physicist Galileo Galilei (1564–1642), better known today for dropping balls off the top of the Leaning Tower of Pisa, successfully answered a different dice-tossing question from a gambling friend. The friend was puzzled by the following fact: even though with a single toss of three dice the totals of 9 and 10 can each be achieved with the same number (six) of different sums (for example, 9 = 4 + 3 + 2 and 10 = 4 + 4 + 2), experience shows that 10 is (slightly) more likely to be the result than is 9. Galileo’s correct answer was that of the 63 = 216 possible ways for the three dice to fall, slightly more of those ways give a total sum of 10 than the number of ways that give a total sum of 9.

Galileo answered his friend’s question with a laborious hand enumeration of the 216 possibilities, but for us it is far easier to write a computer program to run through all the grubby arithmetic. A MATLAB® code that generates all the possible ways three dice can fall on a single toss, and that keeps track of how many ways each possible total is generated, is shown below as the code galileo.m. When run, total(9) = 25 while total(10) = 27. (The total of 11 also appears 27 times, too.)

galileo.m

total=zeros(1,18);

for i=1:6

for j=1:6

for k=1:6

s=i+j+k;

total(s)=total(s)+1;

end

end

end

total

One attractive feature of this code is the ease with which it can be modified to examine any other number of dice. For example, in the case of four dice (with 64 = 1,296 possibilities, a lot to do by hand), simply add another for/end loop for the variable l (that is, change the command for s to s=i+j+k+l), and change the first line to total=zeros(1,24). The result is that the most likely total for a single toss of four dice is 14 (it can occur in 146 ways).

I.3 ANOTHER GOMBAUD-PASCAL PUZZLE

Unlike Galileo’s problem, or even his “single 6” and “double 6” questions, Gombaud posed another question to Pascal that would have enormous consequences for the history of mathematics. It’s called the problem of points. Imagine two players, A and B, who continually flip a fair coin. When heads shows, A wins a point, while when tails shows, B gets a point. Each has put an equal amount of money into a pot, with the entire pot to go to the first one to win n points. At some instant during this process, before either player has won, they decide to call it quits and to split the pot “fairly” based on their present point totals. (In a 1603 Italian text on arithmetic, for example, A and B are two boys who play a series of games involving a ball; before arriving at an overall winner, they lose the ball!) If A has a points and B has b points, how should they split the pot?

This is a very old problem. Its history can be traced back, in Italy, to at least 1380, while Ore (see note 1) conjectured it has an even more ancient, Arabic origin. Before (and after, too!) Pascal finally solved this problem, it was rightfully considered to be one of very great difficulty.4 The Italian mathematician Niccolo Fontana (1500–1557), for example, better known in mathematical history as Tartaglia (“the stammerer”) and famous for discovering how to solve cubic equations, declared in his 1556 book, General Trattato (General Treatise on Number and Measure), the problem of points to be a question that “is judicial rather than mathematical, so that whatever way the division is made there will be cause for litigation.” Tartaglia was a most able analyst, but those words sound (to me) just a bit like a justification for not being able to solve the problem!

Pondering the problem of points is what led Pascal into his correspondence with Fermat, which in turn developed into a deeper consideration by both men of the mathematics of chance. In particular, in 1656 Pascal posed a new problem to Fermat, one so difficult he wondered if Fermat could actually solve it. This was the earliest version of the now famous gambler’s ruin problem, which can be found in virtually all modern probability texts. Imagine our two players, A and B, again playing a series of identical games. A wins any one of these games with probability p (and so, of course, B wins any individual game with probability q = 1 − p). A “game” could be, for example, as simple as flipping a coin, once, with heads showing with probability p (with A winning with heads). Between the two players there is a total, fixed wealth of a + b dollars; at the start of the series of games A has a dollars and B has b dollars. The loser of an individual game gives one dollar to the winner. The games continue until one of the players has lost all of his money (or, as mathematicians put it, is ruined). The original, fundamental question associated with the gambler’s ruin problem was that of calculating the probability A is ruined, which of course is one minus the probability that B is ruined, since one of them is certainly ruined in the end.

That calculation was first done by Pascal, using his method of expected values and the mathematics of difference equations,5 an approach discussed in one of my earlier books so I won’t repeat it here.6 (We’ll later encounter difference equations—one of them nonlinear—in the last sections of this introduction, in two entirely different probability problems.) What I will show you, in the next section, is a different, far more ingenious derivation of the ruin probabilities. The work of Pascal and Fermat eventually expanded to include that of the Dutch mathematical physicist Christiaan Huygens (1629–1695), who added the new twist of game duration, that is, the probability one of the players is ruined within n games. All of this work by these three men remained known only to them until Huygens published his “De ratiociniis in ludo aleae” (“On calculations in games of chance”) in 1657.7

I.4 GAMBLER’S RUIN AND DE MOIVRE

In this section I’ll show how the French-born English mathematician Abraham de Moivre (1667–1754) derived the ruin probabilities. We can then use those probabilities to find the average number of individual games played until one of the players is ruined. De Moivre’s derivation is elegant and incredibly clever,8 but it is not the one typically presented in modern probability textbooks.9 After reading through it, however, I predict you’ll fully agree with his well-known friend, Isaac Newton (1642–1727), who, when asked about some mathematical question, would often reply, “Go to Mr. de Moivre, he knows these things better than I do.” Pretty impressive praise, indeed, coming from one of the inventors of calculus!

To start, suppose we already have the ruin probabilities in hand. That is, we have

PA = probability that A is ruined

and

PB = probability that B is ruined,

where, of course,

PA + PB = 1. (I4.1)

On any individual game, A “expects” (here we see the influence of Pascal) to win one dollar with probability p and “expects” to lose one dollar with probability q. His expected gain per game is therefore given by p (+1) + q (−1) = pq. A’s expected total gain is given by bPBaPA because A wins all of B’s starting money (b dollars) if B is ruined and loses all his own starting money (a dollars) if he himself is ruined. If the average number of games played until somebody is ruined occurs is m, then m times the expected gain per game for A must equal his total expected gain, and so

m(pq) = bPBaPA,

or, the average number of games played until ruin occurs is

Images

My gosh, we’ve suddenly solved the problem of the average duration of the gambler’s ruin problem! But of course, (I4.2) isn’t of much use until we know the ruin probabilities, and so we can’t put off their calculation any longer.

Forget, for now, money being valued in dollars. Let’s simply say that A starts with a chips and that B starts with b chips. When it suits us, we can say that a chip is valued at a dollar, but for now a chip is simply a round piece of plastic or cardboard. Imagine that A and B each stack their chips in a pile and that, when one loses a game, they take the top chip of their pile and place it on the top of the other’s pile. Imagine further that A assigns a value of q/p to his bottom chip, of (q/p)2 to the next chip up, of (q/p)3 to the next chip up, and so on, so that at the start of play his top chip has value (q/p)a. And finally, imagine that B starts the valuation of his chips from where A left off, but now going top down. That is, before the start of play the top chip in B’s pile has value (q/p)a + 1, the chip immediately below the top chip has value (q/p)a + 2, and so on down to the bottom chip, which has value (q/p)a + b. These chip values remain associated with each chip, no matter what position or which pile they happen to be in as play continues.

In the first game played, A wins B’s top chip with probability p, and loses his own top chip with probability q. A’s expected gain from the first game is then

Images

Since A’s gain is B’s loss (and vice versa) in any individual game, B’s expected gain from the first game is also zero (zero having the nice property of being its own negative). Now, before we continue, we need an explanation for this perhaps surprising result. How, you may wonder, can A’s expected gain from the first game be zero when, just moments before, we calculated A’s expected gain for any game (including the first one) to be pq? Since pq = 0 only for the single, special case of p = q = 1/2, don’t we have a conflict?

Well, no. There is actually no contradiction here because our first calculation took the value of every chip to be the same, one dollar, while in our second calculation the chips have variable value. In an actual playing of gambler’s ruin we would of course use the first valuation, and pq is indeed A’s expected gain in the real world. We use the second valuation of the chips to calculate the ruin probabilities in the mathematical world, however, precisely because of the zero expectation for A’s gain (and not only for the first game but on all subsequent games, too, as you’ll soon see). The actual gain does, of course, depend on how we value the chips, but it should be clear that the probabilities of them all ending up with one player or the other are independent of how we value the chips. So, we’ll use a particular chip valuation that is particularly helpful for calculating probabilities. It is that zero result for the expected gain per game, true only for de Moivre’s particular valuation for the chips, that is the genius of his derivation. Okay, back to business.

A and B start the second game either (if A won the first game) with A having a + 1 chips and B having b − 1 chips, or (if A lost the first game) with A having a − 1 chips and B having b + 1 chips. In the first case A’s top chip has value (q/p)a + 1 and B’s top chip has value (q/p)a + 2, and in the second case A’s top chip has value (q/p)a − 1 and B’s top chip has value (q/p)a. Notice carefully that in both cases the chip that B has at risk in the second game is worth q/p times the chip A has at risk, just as in the first game. Thus, A’s expected gain in the second game is again zero (as is B’s). Indeed, if you continue this line of reasoning on to the third game and beyond, you should see that the expected gain for both A and B, for every individual game, is zero.

This is all pretty clever, but now comes the real heart of de Moivre’s derivation. Since A expects to gain zero at every game, then A’s total expected gain, no matter how many games the entire gambler’s ruin process may require, is zero, since zero expected gain per game times any number of games played is zero. (Remember, this wonderful property is because of De Moivre’s particular valuation of the chips.) This conclusion, of zero total gain, is the key to solving for the ruin probabilities PA and PB. That’s because we can also write A’s expected total gain in terms of PA and PB, and then set that equal to zero. Here’s how to do that.

A’s total expected gain is: all of B’s starting chips if B is ruined, minus all of A’s own starting chips if he himself is ruined. That is, A’s total expected gain is

Images

Summing the two geometric series in the brackets and simplifying, we get (we do have to assume that pq, that is, q/p ≠ 1, to avoid division by zero issues)

Images

Combining (I4.3) with (I4.1), a little algebra then gives us the ruin probabilities:

Images

and

Images

I’ll leave it to you to verify, with L’Hospital’s limit rule, that for q/p = p/q = 1 (that is, for p = q = 1/2), the ruin probabilities reduce to

Images

and

Images

You should confirm from (I4.2), (I4.4), and (I4.5) that limp → 1m = b and limp → 0m = a, which makes sense because if p = 1, then it takes A exactly b games to ruin B, and if p = 0, it takes B exactly a games to ruin A.

I.5 MONTE CARLO SIMULATION OF GAMBLER’S RUIN

De Moivre’s analysis of gambler’s ruin is devilishly clever. But what if you are not a de Moivre, and even if given several years (or even decades!) to think about gambler’s ruin, you know deep in your heart that you would never come up with his variable chip-valuation idea? What then? Are you doomed to never know the ruin probabilities, or anything about the duration of play until ruin occurs? The answers were probably yes and yes in de Moivre’s day, but not today. Today we have computers, random number generators, and sophisticated number-crunching software like MATLAB®.

In the following code, called gr.m, we have a surprisingly brief code that, after being given the values of a, b, and p, simulates the play of gambler’s ruin for 100,000 times, all the while keeping track of how long each simulation lasts and which player is ruined in each simulation.

gr.m

a=input(‘What is a?’);

b=input(‘What is b?’);

p=input(‘What is p?’);

LS=0;AR=0;

for loop=1:100000

A=a;B=b;X=0;

while A&&B>0

if rand<p

A=A+1;B=B1;

else

A=A1;B=B+1;

end

X=X+1;

end

LS=LS+X;

if A==0

AR=AR+1;

end

end

LS/100000

AR/100000

The workings of gr.m are pretty straightforward. After being informed of the values of a, b, and p, the variables LS (for “length of game sequence”) and AR (for “A is ruined”) are each initialized to zero. When gr.m is finished, LS will be the total number of individual games played in 100,000 simulations of gambler’s ruin, and AR will be the total number of times it was A who was ruined during those simulations. At the start of each gambler’s ruin simulation the variables A and B are initialized to a and b, respectively, and the variable X (the current number of games played in the current gambler’s ruin simulation) is set equal to zero. An individual gambler’s ruin is played in the while loop as long as both A and B remain greater than zero. At the end of each game X is increased by one. When the while loop is terminated (because either A or B reached zero), LS is increased by X, and AR is increased by one if it was A reaching zero that triggered the while loop termination. Then another gambler’s ruin simulation is performed. The last two commands in gr.m are obvious, giving the average number of games played per gambler’s ruin simulation and A’s probability of being ruined.

Table I.5.1 shows some simulation results for PA and m (the average number of games played until A or B is ruined) for some various values of a, b, and p.

In Table I.5.2 the theoretical values for m (equal to ab if p = q) and PA are shown for the same values of a, b, and p, as calculated from (I4.1), (I4.2), (I4.4), and (I4.6).

The agreement between theory and the simulation code is remarkable, but don’t think that Monte Carlo simulation is infallible. Suppose, for example, that B is infinitely rich, which means that it is simply impossible for A to ruin B. On the other hand, B could certainly ruin A if p < q. If p > q, however, it isn’t hard to show that A is not necessarily ruined even by the infinitely rich B. So, to simulate gambler’s ruin with B having arbitrarily huge wealth (with p > q) may, with non-zero probability, result in a simulation that never terminates. This is a very practical concern!

Table I.5.1. Simulations results

Images

Table I.5.2. Theoretical values

Images

I.6 NEWTON’S PROBABILITY PROBLEM

In the last section I mentioned that de Moivre was a friend of Newton’s. A natural question to ask now is, what did Newton think of probability? Newton was one of the world’s greatest intellects, with his contributions to science and mathematics everywhere, too numerous to list other than in a full-length biography. And yet, with just two exceptions, he ignored probability. The first exception was a collection of minor contributions to actuarial science (the theory of insurance annuities). The second came only at the prompting of another.

That prompting arrived in the form of three letters, the first dated November 22, 1693, from Samuel Pepys (1633–1703).10 Pepys is remembered today as the author of a posthumously published Diary that gives much insight into life in England in the 1660s. When his letter arrived, Newton knew nothing of Pepys’s still private Diary, but nevertheless he knew who Pepys was; during the years 1684–1686 Pepys was president of the Royal Society of London, just when Newton’s masterpiece, Principia, was being prepared for publication in 1687 by the Royal Society. Indeed, Pepys’s name as president of the Royal Society, appears on the title page of Principia.

Just why Newton seemingly avoided probability theory is still not entirely clear to historians, but one possibility is that the only major applications (in Newton’s day) other than in insurance seemed to be those of interest to gamblers. That is, the pursuit of a “calculus of probabilities” would not be one of seeking after pure truth but rather one of mere grubby speculation. Many devout people thought, then, and think now, of gambling as both a failure of good judgment and a diversion of time from more worthy activities. Even worse, gambling is thought by such folks to be downright immoral since it explicitly substitutes for reasoning ability (given to humans by God) a submission to random chance. To gamble is to hope to get something for nothing,11 an attitude shared with lowlife thieves!

Pepys’s own Diary (in an entry dated January 1, 1668) gives us a dramatic description of a common gambling scene of the time. There we read of men “cursing and swearing” and “muttering and grumbling,” of “gentlemen drunk,” and of one man “being to throw seven if he could and failing to do it after a great many throws, [crying] he would be damned if ever he flung seven more while he lived.” It is simply impossible to imagine the pious Newton swearing, muttering, and cursing at such a coarse gathering of dice-tossing drunks.12 So in his initial letter to Newton, Pepys was quite careful to pose his question in a way acceptable to a man of faith, as “an application after truth”.

Newton at first found Pepys’s original wording of his question to be, as Newton put it, “ill stated.” Eventually, however, the two men agreed to the following formulation: of the three events A, B, and C, defined as

A = {a fair die is tossed six times and at least one 6 appears},

B = {a fair die is tossed twelve times and at least two 6s appear},

C = {a fair die is tossed eighteen times and at least three 6s appear},

which of the three events has the greatest chance of occurring?

Newton correctly calculated the chances for each event, and indeed, Galileo fifty years earlier could have solved this problem too, using just the elementary mathematics available to him (simple enumeration). What makes this problem most interesting to us is that while Newton’s mathematics is impeccable (no surprise there), at one point his logical reasoning is flawed. Oddest of all, this misstep by Newton remained unremarked upon by mathematicians and historians alike until 2006!13

First, let me go through the way a modern analyst would answer Pepys’s question; in the process we’ll derive some equations that will let us easily see the flaw in Newton’s response to Pepys. If we write p for the probability a die (fair or otherwise) shows a 6 on a single toss, then 1 − p is the probability the die doesn’t show a 6, and so PA, the probability of event A, is 1 minus the probability no 6s appear in six tosses:

PA = 1 − (1 − p)6.

In the same way, the probability of event B, is 1 minus the probability no 6’s appear in twelve tosses, minus the probability exactly one 6 appears in twelve tosses:

Images

or, with a little arithmetic,

PB = 1 − (1 + 11p) (1 − p)11.

And finally, PC, the probability of event C, is 1 minus the probability no 6s appear in eighteen tosses, minus the probability exactly one 6 appears in eighteen tosses, minus the probability exactly two 6s appear in eighteen tosses:

Images

or, with some arithmetic,

PC = 1 − (1 + 16p + 136p2) (1 − p)16.

If we put p = 1/6 into these expressions we get PA = 0.6651, PB = 0.6187, and PC = 0.5973. So, for a fair die, PA > PB > PC, and this is just what Newton told Pepys. This was valuable information for Pepys because he had initially believed that event C was the most likely, and in a letter (dated February 14, 1694) to another of his mathematically minded acquaintances, Pepys revealed that his original motivation for asking his question was, in fact, a gambling concern. There he wrote that it was good he now had the actual chances for the three events as he was “upon the brink of a wager [for 10 pounds, not a trivial amount of money in 1694] upon my former belief.” Pepys did have the correct chances now, but it was clear that he still didn’t understand what Newton had told him because, in that same letter, he goes on to write that he is afraid of getting involved with future (different) bets, and makes the admission to his friend that Newton’s calculations were “beyond my depth.”

Perhaps appreciating Pepys’s lack of mathematical ability, Newton had at one point attempted to forget the detailed math and to formulate a simple argument that would let Pepys see the result PA > PB > PC in a flash. And it is with that argument that we see Newton stumble. In his first letter (dated November 26, 1693) in reply to Pepys’s original query, Newton wrote,

it appears by an easy computation that the expectation of A is greater than that of B or C, that is, the task of A is the easiest. And the reason is because A has all the chances of sixes on his [dice] for his expectation, but B and C have not all the chances on theirs. For when B throws a single six or C but one or two sixes, they miss of their expectations.

In other words, event A occurs if even a lone six appears on any toss, while both event B and event C require multiple appearances of a six.

Newton’s argument makes no mention of the die being fair, and so, if true, it must be true even for a loaded die. But that isn’t true! Specifically, let’s assume that the die is loaded to make the 6 more likely—say, p = 1/4. I’ll leave it for you to confirm, but if you plug this value for p into our three earlier equations you’ll find that PA = 0.822, PB = 0.8416, and PC = 0.8647. So now PA < PB < PC, the reverse of the conclusion for a fair die. Newton’s misstep was in considering only the number of possibilities, and not their probability.

When, years later, de Moivre wrote the preface to the first edition of his Doctrine of Chances (dedicated, you’ll recall from note 9, to Newton), perhaps he had this affair in mind when he said,

Some of the Problems about Chance having a great appearance of Simplicity, the Mind is easily drawn into a belief, that their Solution May be attained by the meer Strength of natural good Sense; which generally proving otherwise and the Mistakes occasioned thereby being not unfrequent, ’tis presumed that a Book of this Kind, which teaches to distinguish Truth from what seems so nearly to resemble it, will be looked upon as a help to good Reasoning.

I.7 A DICE PROBLEM THAT WOULD HAVE SURPRISED NEWTON

At the risk of sounding trite, let me start this section by observing that Newton was a pretty smart fellow. There were probably not many dice puzzles around in his day that he wouldn’t have been able (eventually) to crack. In 1959, however, a very curious mathematical result was published that is the basis for a disarmingly simple-appearing dice-tossing problem that might have astonished even Newton.

Imagine that we have three independent random variables, X, Y, and Z. If you had asked mathematicians before 1959 whether it would be possible to have P (X > Y) > 1/2, P (Y > Z) > 1/2, and P (Z > X) > 1/2, almost surely a majority would have said no. That’s because even mathematicians can easily be seduced by the “naturalness” of transitivity. For example, if we denote the event “football team A outscores football team B” by A > B, then P (X > Y) > 1/2 means football team X beats football team Y most of the time, and similarly for the other two probability inequalities. Most people, including mathematicians, intuitively think that if X beats Y and Y beats Z, then surely X beats Z. In fact, that’s not so: it is possible to have three such inequalities. This result is so surprising it is called the Steinhaus and Trybuła paradox, after the two Polish mathematicians who discovered it in 1959, Hugo Steinhaus (1887–1972) and Stanislaw Trybuła (1932–2008).

Here’s how the paradox appears in the form of what are called nontransitive dice. Imagine that we have three identical, perfect cubes. Instead of the usual dots one finds on the faces of ordinary dice, however, the faces of our cubes are inscribed with the numbers 1 through 18, each number appearing on just one face. Let’s call our three cubes A, B, and C, and imagine that the numbers are distributed as follows:

A: 18, 9, 8, 7, 6, 5

B: 17, 16, 15, 4, 3, 2

C: 14, 13, 12, 11, 10, 1

By examining each of the 36 ways that A and B can fall, we see that A shows a larger number than does B in 21 of those ways: 18 beats all six of B’s faces, and 9 beats three of B’s faces, as do 8, 7, 6, and 5. If we assume A and B are fair, with each pair of faces having probability 1/36, then P(A > B) = 21/36 > 1/2. In the same way, we can verify that P(B > C) = 21/36 > 1/2 and P(C > A) = 25/36 > 1/2.

It’s perhaps hard to accept, but there it is, and which are you going to believe, the cold, hard math or your lying intuition? It is the mathematical equivalent of M. C. Escher’s famous Ascending and Descending (1960) and Waterfall (1961) drawings in which the paradox of people and water endlessly traveling around and around in a three-dimensional loop is illustrated. Unlike the situation in Escher’s drawings, however, our three dice actually exist.

One can easily imagine how Pepys would have drooled, dreaming of all the suckers he could have relieved of their money if only he had known of nontransitive dice! Newton, on the other hand, with his piety, would no doubt have been simply appalled at how God could have let such an outrageous thing happen.

I.8 A COIN-FLIPPING PROBLEM

As I mentioned earlier, difference equations often appear in probability, and here’s one quick example of that (a second example is given in the next section). This problem is too short to warrant a full entry in the book but also too good to leave out. (It would be very interesting to know what Newton would have done with it.) If you flip a coin n times, with probability p of heads on each independent flip, what is the probability of getting an even number of heads? This is an easy question to answer for small n; for n = 1 the answer is 1 − p, since the single flip produces a tail with that probability (and so the number of heads is zero, and that’s even), and for n = 2 the answer is (1 − p)2 + p2, since the double flip produces two tails (zero heads) or two heads with that probability. For larger n, however, enumeration quickly becomes overwhelming. We need a better approach.

Let P(n) be defined as the probability of the event “even number of heads in n flips.” Then 1 − P(n) is the probability of the complementary event, “odd number of heads in n flips.” Now, every sequence of heads (H) and tails (T) of length n symbols starts either with an H or a T. If it starts with a T (with probability 1 − p), then we have n − 1 symbols left with which to get an even number of heads. If it starts with an H (with probability p), then we have n − 1 symbols left with which to get an odd number of heads. So,

P(n) = (1 − p) P(n − 1) + p[1 − P(n − 1)].

That is, we have the first-order, linear, inhomogeneous difference equation

P(n)= p + (1 − 2p) P(n − 1), P(1) = 1 − p.

Notice that this says

P(1) = p + (1 − 2p) P(0) = 1 − p.

And so P(0) = 1, which makes sense. That is, if we haven’t flipped the coin at all, we are absolutely guaranteed to have zero heads—which is even!

So, how do we solve our difference equation? In general, with a constant term (the p) on the right-hand side, the general solution will have the form of a constant plus a power term. So, let’s assume

P (n) = k + Can

where k, C, and a are all constants. Substituting this assumed solution into the difference equation, we have

k + Can = p + (1 − 2p) [k + Can − 1].

Equating the constants on each side of this equation, and equating the power terms on each side of this equation, we have

k = p + (1 − 2p) k

and

Can = (1 − 2p) Can − 1.

These are easily solved to give k = 1/2 and a = 1 − 2p. Thus,

Images

Finally, using P(1) = 1 − p, we find that C = 1/2. So, the answer to our question is that

Images

For a fair coin the power term vanishes and P(n) = 1/2 for any n, while for a biased coin P(n) generally varies with n as long as p ≠ 0 (if p = 0, then P(n) = 1, always, since in that case you always get tails and so zero heads). If p = 1 (the coin always shows heads) then P(n) is 0 for odd n and 1 for even n. And for any p other than 0 or 1, limn → ∞P(n) = 1/2.

This particular coin-flipping problem is a random process that is easy to simulate, as does the code flip.m. The operation of the code is straightforward. Once the values of p and n are set, the variable even is initialized to zero (when the code terminates, even will equal the number of times, out of one million simulations of n flips, that resulted in an even number of heads).

flip.m

p=0.1;n=9;even=0;

for loop1=1:1000000

for loop2=1:n

result=rand;

if result<p

f(loop2)=1;

else

f(loop2)=0;

end

end

total=sum(f);

test=2*floor(total/2);

if total==test

even=even+1;

end

end

even/1000000

A simulation of n flips of the coin is done by generating n numbers, uniformly random from 0 to 1, and storing a 1 in vector f for each number that is less than p and a 0 for each number that is greater than p. A summation of the elements of f will give the number of heads in the n flips. This value (the variable total) is tested for “evenness” by simply dividing it by two and rounding the result down (with the command floor: floor(3.3) = 3, while floor(3) = 3), and then multiplying that by two to get test. Only if the number of heads is even will total equal test. If that’s the case, then even is incremented by one, and then another n flip simulation is performed. When all one million simulations are done, the last line of flip.m prints the code’s estimate for the probability of an even number of heads. If n = 9 and p = 0.1, the theoretical probability is 0.5671, while the code’s result was 0.5669. If n = 9 and p = 0.9, the theoretical probability is 0.4329, while the code’s result was 0.4326. Pretty good agreement.

In one of the puzzles later in this book I’ll remind you of what we did here, and I think you’ll find it quite helpful.

I.9 SIMPSON’S PARADOX, RADIO-DIRECTION FINDING, AND THE SPAGHETTI PROBLEM

To end this introduction, I’ll leave “old history” and tell you three “new history” stories. The first one has to do with statistics, a very close relative of probability. Statistical issues really make no appearance in this book, which some may view as a disappointment, so perhaps this section will earn me some partial forgiveness. Imagine two hypothetical baseball players, A and B, who have just completed their first two years as professionals. Their at-bats and hitting numbers for each of those two years and their two-year combined totals are as follows (with their batting averages in bold):

Images

Notice that B has the higher batting average in each of the two years, but nonetheless the lower average when the two years are combined!

This sort of behavior is generally thought by most people to be quite strange; even mathematicians, who are used to seeing some pretty pathological behaviors, have tagged it as a paradox. Specifically, it is an example of Simpson’s paradox, named after the British statistician Edward Hugh Simpson (born 1922), who wrote of it in a 1951 paper (even though the Scottish statistician George Udny Yule (1871–1951) had much earlier commented on it, in 1903). Simpson’s paradox typically occurs when data sets of unequal size are combined into one large set, as is often done in medical studies. The results often are conflicting conclusions concerning the value of various medical treatments being studied, which typically leads to confusion rather than edification. My next example of Simpson’s paradox shows, quite dramatically, how that can happen.

Suppose a new, previously unknown infection has appeared for which two new drugs are being tested at two different hospitals. Hospital 1 gives the drugs to a total of 230 men, with the success/failure outcome as follows:

Drug

Success

Failure

   1

  60

20

   2

100

50

With these results, drug 1 has a success rate of 60/80 = 0.75, while drug 2 has a success rate of 100/150 = 0.67.

Hospital 2 gives the drugs to a total of 160 women, with the outcomes as follows:

Drug

Success

Failure

   1

40

80

   2

10

30

With these results, drug 1 has a success rate of 40/120 = 0.33, while drug 2 has a success rate of 10/40 = 0.25.

Both hospitals therefore reach the same conclusion: drug 1 is the better drug. But is it?

Suppose we combine the two tests, to have a total of 390 people. With drug 1 we have 100 successes and 100 failures (a success rate of 100/200 = 0.5), and with drug 2 we have 110 successes and 80 failures (a success rate of 110/190 = 0.579). Now drug 2 appears to be the better drug. What should a person conclude? Take two aspirin and go to bed!

For my second story, I’ll start by reminding you of a scene we’ve all seen at least once in some action/drama movie of World War II. The scene opens in a tiny room barely lit by the flame of a flickering candle—perhaps it’s a basement closet or an attic—with a man hunched over a radio transmitter Morse code key. He is rapidly sending an encrypted message from the underground resistance in (pick a city, pick a country) to incoming British SAS commandos concerning the sudden shift in the whereabouts of some important Nazi official who has been flagged for “termination with extreme prejudice.”

Suddenly the scene shifts to the local Gestapo headquarters, and we hear the excited yelling of a man wearing headphones who has just picked up the underground’s radio signal. His boss, a brutal-looking SS colonel wearing a suit and black gloves, strides over to a detailed street map of the city that is hanging on a wall beneath portraits of Adolf Hitler and Heinrich Himmler. “Sergeant,” he barks at the radio man, “tell the hunter-vans to give me fixes. Quickly!”

The scene shifts again, to a speeding, black Gestapo radio intercept van, and we see it suddenly pull over and stop, and then a circular antenna on the roof begins to slowly rotate. There is a close-up shot of the antenna’s angular-bearing dial inside the van. We listen in as the sergeant back at Gestapo headquarters receives the report, “Intercept bearing from (pick a street) of 72 degrees.” He tells the colonel. The colonel, still in front of his map, grunts an acknowledgment, then pins one end of a thin red thread to the map at the van’s location and stretches it tight across the map at an angle of 72 degrees. Then a second radio intercept van reports in with a new angle from a different street location, and a second thread is similarly pinned to the map. The colonel taps the map where the threads intersect, chuckles evilly, and says, slowly and ominously, “Now we have you.”

The final scene is of a troop carrier roaring up to a building, discharging two dozen soldiers, each armed with a nasty-looking MP-40 machine pistol. Things look pretty bleak for the hero.

Would this little drama actually play out as I’ve described? Sort of, but not exactly. In Figure I.9.1 the point P is the actual location of the underground’s transmitter. When the first intercept van reports in, small but unavoidable errors in both the van’s location and the measured bearing angle will result in the colonel’s first thread (shown in the figure as line A) almost certainly not passing through P but rather having some small displacement. When the second van reports, the same situation results, giving line B also not passing through P, and so the intersection point of A and B is almost certainly not P. What might be done is to have a third van report, giving line C in the figure, a line similarly displaced from P. The Gestapo therefore hasn’t located P exactly, but it does seem to have at least put P somewhere inside the small triangle abc. That would mean soldiers could certainly cordon off that entire area and then begin a detailed search inside the triangle.

Images

Figure I.9.1. The geometry of radio-direction finding.

But is P really inside the triangle? Maybe, but it’s actually more likely that P isn’t there! This was shown in a 1946 paper by the physicist Samuel Goudsmit (1902–1978), who presented a very simple probability argument to show the surprising result there is “a three-to-one chance that the true position is outside” the triangle.14 Here’s how Goudsmit reasoned.

As shown in the figure, unavoidable errors have caused A to fall to the right of P. There is an equal probability that B falls on the right or the left of P (I’ve drawn it to the left). The position of C shows that P can be inside the triangle only if B passes P on the left; the probability of that is 1/2. And finally, when C is drawn you can see that P is inside the triangle only if C passes above P. That has, again, probability 1/2. Thus, the probability that P is inside the triangle is 1/4, and so the probability is 3/4 that P is actually outside the triangle, three times the probability that it is inside. Maybe our underground radio operator survives after all!

My final tale is of a recently solved probability question with a most surprising answer. Once again, a difference equation will be the key to our solution. Imagine n strands of spaghetti in a boiling pot of water. At all times, all you can see are the free ends of the spaghetti strands sticking up through the water’s surface. You choose two of the visible ends at random, tie them together (this is a pure math problem, so realism isn’t a top-priority issue here), and drop the connection back into the water (where it is not visible). You keep doing this until no more free ends are visible. You then pour the pot of water through a strainer. How many spaghetti loops do you expect to find in the strainer? If we call the answer E(n), then obviously E(1) = 1. If you do a little exhaustive enumeration, you should be able to convince yourself that E(2) = 4/3. For n > 2, however, enumeration becomes very difficult, very quickly (just for fun, try your hand at the n = 3 case). So, let’s derive a theoretical expression for E(n) and evaluate it for n = 100, for n = 1,000, and for n = 10,000. It should be physically obvious that limn → ∞E(n) = ∞, but I think you’ll be surprised by how small the results are for even large n.

This problem and its solution have an interesting origin. In December 2009 I flew out to Monterey, California, to give a talk at a math conference. Just before the talk I came across the spaghetti problem (posed as possibly an unresolved problem) in Steve Strogatz’s new book, The Calculus of Friendship (Princeton 2009), and I made passing mention of it during my presentation. Two days after I returned home, I received the following elegantly simple solution in an e-mail from Matt Davis (on the math faculty at Chabot College), who had been in the audience.15

Starting with n strands of spaghetti, there are 2n ends sticking up through the water’s surface. Randomly choosing one of those ends, we then have 2n − 1 ends remaining for the second choice. There are now two possibilities. First, with probability Images the second choice is the other end of the same strand as the first choice. We then have a completed loop, and so n − 1 strands are left in the pot. With this outcome we have the expected number of loops as 1 + E(n − 1). The second possibility, with probability Images, is that the second choice is an end from a strand different from the first strand. After we tie this second choice to the first choice we simply have n − 1 pieces of spaghetti in the pot (one is now longer than all the others, but that’s irrelevant—all we see are the ends). So, with this outcome we have the expected number of loops as E(n − 1).

Weighting each of these two expectations of the number of loops by their probabilities to get the overall expected number of loops, we (remembering that E(1) = 1) arrive at the nonlinear difference equation

Images

or

Images

Even though nonlinear, this is actually easier to solve than was the linear difference equation in the previous section’s coin-flipping problem. So,

Images

and so on. In general, Images. Thus, the partial sums of E(n) grow even more slowly than do those of the harmonic series, which itself has very slowly increasing partial sums. Specifically, E(100) = 3.28, E(1,000) = 4.44, and E(10,000) = 5.59, numbers that I personally find counterintuitive in their “smallness.” Because of that, in fact, the spaghetti problem is a good example of a well-defined random physical process that is not a good candidate for study by computer simulation. A change in E(n), even for a “big” change in n, will be swamped by statistical sampling fluctuations.

Nearly a hundred years ago an out-of-sorts physicist grumbled in a paper that “Mathematicians of today are like moles, each in his little burrow finding his way about apparently by a sixth sense and turning up [his own] rubbish heap.” It’s easy to understand that frustration, as so much of modern math is understandable only to specialists who alone can appreciate what their subject is even about. There will be no doubt in your mind about what the following probability problems are about, however, and I think even that long-ago physicist would have had fun with what is in this book. Enjoy!

NOTES AND REFERENCES

1. The relationship between Gombaud and Pascal is often misstated in probability textbooks, as is carefully and entertainingly discussed in Oystein Ore, “Pascal and the Invention of Probability Theory” (American Mathematical Monthly, May 1960, pp. 409–419). Pascal is famous in the philosophical community for the so-called Pascal’s wager, which appeared in his posthumously published (1669) Pensées (Thoughts). This was his use of mathematical expectation to conclude it was in the interest of even otherwise confirmed atheists to proclaim a belief in God. This argument does have the glaring flaw of ignoring the fact that an omniscient being would certainly know of the spiritual emptiness of the proclamation! You can find more on Cardano in Ore’s book, Cardano, the Gambling Scholar (Princeton, N.J.: Princeton University Press, 1953).

2. Oscar Sheynin, “Stochastic Thinking in the Bible and the Talmud” (Annals of Science, April 1998, pp. 185–198). The Bible contains many references to lotteries in both the Old and the New Testaments. For example, when a choice had to be made between Barsabas and Matthias as the successor to the apostle Judas Iscariot, it wasn’t done by direction from God but instead by lot (Acts 1:23–26). An even more dramatic example is found in the four gospels, which are not infrequently in conflict. On one point, however, they do agree: when the Roman soldiers present at the Crucifixion divided the garments of Jesus, they did so by lot (Matthew 27:35, Mark 15:24, Luke 23:24, and John 19:23–24).

3. This assumption is a circular one, as it implicitly assumes that one already knows what a probability is even as we use it to define probability! Nonetheless, it is in that famous category of “you know what I mean,” and so the circular objection is usually taken seriously only by the purest of the pure.

4. A.W.F. Edwards, “Pascal and the Problem of Points” (International Statistical Review, December 1982, pp. 259–266).

5. A.W.F. Edwards, “Pascal’s Problem: The ‘Gambler’s Ruin’” (International Statistical Review, April 1983, pp. 73–79). This problem is equivalent to another with a less provocative, more serious name: the “random walk with two absorbing barriers.”

6. Paul J. Nahin, Mrs. Perkins’s Electric Quilt and Other Intriguing Stories of Mathematical Physics (Princeton, N.J.: Princeton University Press, 2009, pp. 241–245). My difference equation discussion is for a special case; for a more general difference equation analysis see volume 1 of William Feller, An Introduction to Probability Theory and Its Applications, 3rd ed. (New York: John Wiley, 1968, pp. 344–349).

7. Eddie Shoesmith, “Huygens’s Solution to the Gambler’s Ruin Problem” (Historia Mathematica, May 1986, pp. 157–164).

8. A. R. Thatcher, “A Note on the Early Solutions of the Problem of Duration of Play” (Biometrika, December 1957, pp. 515–518).

9. The usual textbook derivation of the ruin probabilities is the one that appears in the famous, posthumously published 1713 book, Ars conjectandi (The Art of Conjecturing), by James (aka Jacob) Bernoulli (1654–1705), using the difference equation approach (see note 6 again). De Moivre’s derivation was actually in print first, however, appearing in the essay “De Mensura Sortis” (“On the Measurement of Chance”) in a 1711 issue of the Philosophical Transactions of the Royal Society of London, and then again in de Moivre’s 1718 book, Doctrine of Chances. (The first edition of Doctrine is dedicated to Newton.) De Moivre’s work gets a nice tutorial review in the paper by the Dutch mathematician Anders Hald (1913–2007), “A. de Moivre: ‘De Mensura Sortis’ or ‘On the Measurement of Chance,’ ” International Statistical Review, December 1984, pp. 229–262.

10. You can find these letters reprinted in Florence N. David, “Mr. Newton, Mr. Pepys & DYSE: A Historical Note” (Annals of Science, September 1957, pp. 137–147), and Emil D. Schell, “Samuel Pepys, Isaac Newton, and Probability” (American Statistician, October 1960, pp. 27–30). If that “DYSE” in David’s Annals of Science title is puzzling you, that’s part of the reason why it’s not always easy to read letters that are hundreds of years old—that’s the old-fashioned spelling for “dice.” David (1909–1993), late professor of statistics at the University of California, Berkeley, wrote a very nice book-length history (up to the time of Newton) of probability in 1962 that is well worth the effort to locate today (Games, Gods and Chance).

11. There is a vivid example of what I mean by this in Lorraine Daston, Classical Probability in the Enlightenment (Princeton, N.J.: Princeton University Press, 1988, p. 160), which describes the “motley following” of lotteries: “one stream of Coachmen, Footmen, Prentice Boys, and Servant Wenches flowing one way, with wonderful hopes of getting an estate for three pence. Knights, Esquires, Gentlemen and Traders, Marry’d Ladies, Virgin Madams, Jilts, etc.; moving on Foot, in Sedans, Chariots, and Coaches another way; with a pleasing Expectation of getting Six Hundred a Year for a Crown.” The original letter Pepys wrote to Newton was actually prompted by Pepys’s involvement with such a speculator, John Smith, who was writing master at Christ’s Hospital in 1693. It was Smith who originally formulated the question, and when he asked Pepys (who was a governor of the hospital) about it, Pepys in turn asked Newton. You can find more on this in the paper “John Smith’s Problem,” by T. W. Chaundy and J. E. Bullard (Mathematical Gazette, December 1960, pp. 253–260).

12. Gambling in Newton’s day did have the interesting feature of bringing together, at least for a while, all manner of loose, shady characters from vastly different walks of life and classes of society. As Pepys reported in his Diary (in an entry dated December 21, 1663), while watching a cockfight, “Lord! To see the strange variety of people, from [a] Parliament man … to the poorest ’prentices, bakers, brewers, butchers, draymen, and what not; and all these fellows one with another cursing and betting.”

13. Stephen M. Stigler, “Isaac Newton as a Probabilist” (Statistical Science, August 2006, pp. 400–403).

14. S. A. Goudsmit, “Accuracy of Position Finding Using Three or Four Lines of Position” (Navigation, June 1946, pp. 34–35). An editorial introduction to this paper told readers that Goudsmit was in charge of the theoretical group at the MIT Radiation Laboratory (working on radar) “during the early part of the war, until a confidential mission took him to the European theater.” Nothing more was said, but it was later revealed that Goudsmit’s mission was as scientific head of Alsos, the cover name for a group of analysts who immediately followed the Normandy invasion forces after the D-Day landings on the coast of France in an attempt to discover the details of the German atom bomb project. (This seemingly odd name was actually an inside joke: alsos is the Greek word for groves, in honor of General Leslie Groves, who was head of the American atom bomb project!)

15. The original statement of the spaghetti problem (with strings in a box instead of spaghetti strands in a pot), and its solution, was made by two Australian electrical engineers: see their paper, “How Many Loops in the Box?” (Mathematical Gazette, March 1998, pp. 115–118), by Dushy Tissainayagam and Brendan Owen.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset