2

The Twins

2.1 THE PROBLEM

In February 2008 I received a very interesting e-mail from Bruce C. Taylor, a professor of biomedical engineering at the University of Akron. Bruce had just been reading my book, Duelling Idiots (Princeton 2002), and that prompted him to write to me. Here’s what Bruce wrote:

I have an interesting probability problem that I have not been able to solve and I am just curious to see if you can come up with a solution. The problem came up when in one of our classes here I was assigning lab groups using a random number generator. As it turns out the class had 20 students, two of whom were related (twin sisters). Well, as luck would have it, the two sisters ended up in the same lab group of four. I had divided the class into five groups of four students. I, and a colleague, got to wondering what was the probability that the two sisters would end up in the same group. I originally thought that this would be a trivial problem but so far it has beaten me. I did write a MATLAB® program to solve the problem via a probabilistic model and I came up with a probability of 0.16 after 100,000 repetitions. I think that this is the correct answer but I can’t, for the life of me, arrive anywhere near the same answer analytically. I thought maybe you’d like to take a crack at it.

Well, who could resist that?

After a bit of thought I did arrive at a theoretical result, a rational fraction approximately equal to 0.1579, and so I wrote back to Bruce to ask, “You said the [Monte Carlo] estimate was 0.16. Was it actually somewhat less?” Back came Bruce’s response: “I ran the simulation three times at 100,000 reps. each and came up with the following: (1) 0.1591, (2) 0.1570, (3) 0.1557.” Not too bad an agreement with my fraction. I then wrote my own MATLAB® simulation code, ran it for ten million repetitions, and got an estimate of 0.1579092, an even better agreement with my theoretical fraction.

2.2 THEORETICAL ANALYSIS

To theoretically derive the answer to Bruce’s question, here’s what I sent him, where Images is, as in the first problem, the binomial coefficient x!/(xy)!y!, with x and y both non-negative integers and yx.

First, to find the total number of ways (TNW) to randomly place 20 students into 5 groups of 4 each, imagine 5 bins. In the first bin we place 4 from 20, then 4 from the remaining 16 in the second bin, then 4 from the remaining 12 in the third bin, and so on. Thus, Images.

Next, to find the total number of ways that the twins are together (TNWTT) in the same bin, we first imagine that the twins are glued together. When we select a twin, we automatically select the other one, too. There are 5 ways to place the glued twins into one of the bins, leaving 18 students. There are Images ways to select the 2 students who join the twins, leaving 16 students. We then finish the analysis as before, that is Images. The probability we are after is

Images

Now, as easy as the above analysis may appear, an early reviewer of this book (Nick Hobson) pointed out to me that there is an even easier way to see the result in a flash. A total of 20 lab slots are to be filled, with 4 slots in each lab section. One of the twins, of course, has to be in some lab section, leaving 3 slots in that section still available out of the 19 total slots that are still available. So, the probability that our second twin gets one of those 3 slots (and so joins her sister) is 3/19. That’s it!

2.3 COMPUTER SIMULATION

To write a Monte Carlo simulation, I found the following imagery helpful. (I wrote my simulation code before receiving Nick’s clever observation, so perhaps there is a better way to simulate—I’ll leave that for you to explore!) I started by visualizing the 20 students lined up in front of me in some (random) order, standing in a row, shoulder to shoulder. Each holds a slip of paper. These slips each have a single number on them; there’s a 2 on each twin’s slip, while all the other students have a 1 on their slips. Starting at the far left (student 1), the first four students are assigned to lab section 1, the next four students to lab section 2, and so on, with students 17 through 20 assigned to lab section 5. To simulate the placement of the twins into their lab sections, all we need do is randomly generate two different integers from 1 to 20, integers that determine the positions where the twins stand in the shoulder-to-shoulder row.

The simulation code can determine if the two twins have been assigned to the same lab section by simply adding up the numbers, in each lab section, on the paper slips held by the students in that section. If a lab section has neither twin, the group sum will be 4, while if a lab section has one twin, the group sum will be 5. A group sum of 6, however, means we have a lab section that contains both twins. This is the decision logic behind the simulation code twins.m. I make no claims that twins.m is a superoptimal (in some sense) code, just that it is easily understood and executes in a reasonably short time (ten million repetitions on my quite ordinary, bottom-of-the-line computer required less than 23 seconds to run). After the code listing, I’ll give you a quick walkthrough of what each line is doing (the line numbers at the far left are not part of the code but are included simply as reference tags for the walkthrough).

twins.m

01

together=0;

02

for loop1=1:10000000

03

lab=ones(1,20);

04

twin1=floor(20*rand)+1;

05

twin2=twin1;

06

while twin1==twin2

07

twin2=floor(20*rand)+1;

08

end

09

lab(twin1)=2;

10

lab(twin2)=2;

11

groupsum=zeros(1,5);

12

for loop2=1:5

13

x=4*(loop2-1);

14

for loop3=1:4

15

groupsum(loop2)=groupsum(loop2)+lab(x+loop3);

16

end

17

end

18

for loop4=1:5

19

if groupsum(loop4)==6

20

together=together+1;

21

end

22

end

23

end

24

together/10000000

Line 01 initializes the variable together to zero; at the end of ten million simulations together will be the number of simulations in which the twins were assigned to the same lab section. Lines 02 and 23 define the outer for/end loop that cycles the code through the ten million simulations. Line 03 defines the row vector lab, with all of its 20 elements initially equal to 1. The value lab(k) is the number written on the slip of paper held by the student in the kth row position. Initially, then, all 20 students have a 1 on their individual slips of paper. Line 04 assigns twin1 equal to an integer value selected at random from 1 to 20, and line 05 assigns the same integer to twin2. Since the two twins can’t, of course, have the same position in lab, lines 06 through 08 then continually assign twin2 a new random integer value until twin1 and twin2 have different integer values. Lines 09 and 10 write a 2 on the slip of paper each twin holds, leaving the other 18 students holding slips of paper each with a 1. Line 11 initializes all five elements of the row vector groupsum to zero. The two nested loops defined by lines 12 through 17 run through the 20 elements of lab, four at a time, from left to right, and generate the five element values of groupsum. Finally, the two nested loops defined by lines 18 through 22 check each element of groupsum and, if a value of 6 is detected (indicating both twins are in the same section), then together is incremented by one. Once the ten million simulations are finished, line 24 prints the code’s estimate of the probability of the twins being in the same lab section (0.1579092), an estimate very close to the theoretical value.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset