Chapter Three: Diversity Bonuses: The Logic

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER THREE

DIVERSITY BONUSES: THE LOGIC

He who loves practice without theory is like the sailor who boards a ship without a rudder and compass and never knows where he may cast.

—LEONARDO DA VINCI

I now turn to my primary aim: to explain the logic of diversity bonuses. I do so by applying the formal characterization of cognitive repertoires to predicting, creating and innovating, problem solving, integrating knowledge, and strategic decision making. I first analyze the contributions of diversity to each activity separately and then describe how many decisions and actions require a combination of these activities. A team in charge of building a new airport terminal must predict demand, evaluate designs, and choose among locations. Each task will benefit from different types of diversity.

My central finding will be that diversity bonuses become more likely and more significant on complex tasks. Complexity increases in a problem’s dimensionality and interdependencies. On complex tasks, no single person’s repertoire will be sufficient, so teams will be needed, and those teams must be diverse.

The claim that we need teams is noncontroversial and backed up by mountains of data. However, the necessity of teams alone need not imply a diversity bonus. The best team could consist of the best individuals. To show the contribution of diversity in those teams, I rely on a collection of models. The models reveal mechanistic logic. In some cases, the models produce testable hypotheses. When data align with those hypotheses, we do not prove the models to be true so much as we demonstrate a resonance of the logic of the models with the real world.¹

The models also provide guidance about whom to hire, admit, or include. Should Hollywood include more Muslims? Should colleges allocate spots to Native Americans? Do we make sure every academic conference panel includes one woman? To address those questions we must understand the causal mechanisms that can make two (different) heads better than one.

The field of chemistry provides an apt analogy. Diverse elements combine to form useful compounds like water, salt, sugar, baking soda, and bleach. These compounds exhibit properties distinct from those of their constituent elements. Biochemists attempting to create new compounds, whether for use in household cleaning solutions or as lifesaving molecules, do not randomly drop chemicals in a tube, stir, and hope for a miracle. They select chemical components based on understandings of their properties. Theory tells them where to look and to avoid wasting time on trying to turn base metals into gold. Theory provides rudder and compass.

To explain how diversity bonuses arise requires connecting cognitive repertoires to outcomes on specific tasks and then examining whether repertoire diversity improves outcomes. For any task or activity, the relevant parts of the repertoire differ. In predicting, a person’s information, categorizations of that information, and models determine her individual accuracy. In innovating, diverse knowledge, representations, and heuristics result in more ideas. In problem solving, people also apply representations and heuristics. In knowledge integration, people leverage their knowledge sets. And in strategic play, people apply representations and heuristics, along with models of how competitors will respond.

For each of these tasks, how and why diversity bonuses arise varies. In prediction, diversity produces bonuses through negative correlation. In problem solving, heuristics build off one another. Where one person gets stuck, another person can find an improvement. On creative tasks, diverse representations create more possibilities. In knowledge integration, diverse understandings reduce the set of possible truths. In strategic play, diversity means less correlation in mistakes and better choices.

PREDICTIVE TASKS

In prediction, a group estimates a numerical value or range of possible values. These might be a future event (next year’s unemployment rate), the result of an intervention (the effect of a regimen of low-dosage anti-inflammatories on a patient’s heart rate), or a backcast of a historical event (the height of the ash plume from the Mount Vesuvius eruption in AD 79²). Inferring hidden values—such as the relative proportions of luck and skill in soccer and tennis—is also a form of prediction.³

Governments, nonprofits, and businesses devote billions of dollars to making predictions. NASA scientists and the International Panel on Climate Change predict the climatic effects of greenhouse gasses using environmental data. The Federal Open Market Committee predicts future inflation, growth, and unemployment rates using economic data. The Ford Motor Company predicts vehicle sales based on attributes of its fleet and those of its competitors. Google predicts the number of click-throughs based on the size of banner ads. These predictions inform actions. They assist NASA in setting thresholds for particulates, the Federal Open Market Committee in deciding on interest rates, Ford in determining production targets for cars, and Google in configuring ads.

People make predictions in a variety of ways. Some predictions derive from sophisticated causal models. Einstein’s general theory of relativity predicted that light would bend in a gravitational field. During the 1919 solar eclipse, Arthur Eddington and Frank Watson Dyson showed that prediction to hold true by measuring light diffraction at two distant spots on the earth.

Scientific models like Einstein’s predict and explain. Such models represent an ideal. More often, we lack rich theories and make predictions that extrapolate from past patterns. Machine learning algorithms make predictions in this way. They identify patterns based on training data and test those patterns on other data.

These algorithms bin the data into multiple categories and identify patterns. The internal logic of the predictive models is impenetrable to the human mind. We cannot know why the algorithms predict what they do. We may not care. When we wake in the morning and see a 90 percent chance of rain in the afternoon, we do not care about the details of the meteorological model. We care about the forecast’s accuracy. We want to know whether to pack an umbrella.⁴

Similar thinking applies to predictions of surgical interventions. If a machine learning algorithm applied to tens of thousands of cases finds that older women who receive titanium hip replacements who perform mild exercise at least three times daily exhibit fewer long-term complications than those who do not, you should tell your aunt Deloris to exercise. And she should listen to you, even though the algorithm provides no reason.

All else equal, we would prefer to have an explanation and an understanding of a phenomenon. If doctors knew the determinants of hip replacement success, perhaps increased tendon flexibility, they could develop better rehabilitation protocols. Nevertheless, the millions of people receiving hip implants, including your aunt Deloris, benefit now from the accurate prediction. They need not wait for an understanding of why exercise improves outcomes. The same pragmatism applies to predictions of the recidivism of criminals, the success of a new sales hire, or the length of an economic recession. We can take useful actions with accurate predictions even if we lack a causal understanding.⁵

Machine learning techniques require lots of data. When we have less data, we use people and simple models. For easier predictions, a single person or model may suffice. We do not need a diverse team to predict tomorrow’s sunrise in Anchorage, Alaska. Diversity begins to matter and becomes significant on difficult predictions, where no one model will be correct. In those cases, predictions based on different information, categories, and models will make different errors. These less correlated errors produce a more accurate collective prediction. In the ideal case, to borrow Richard Levins’s phrasing, the truth will lie “at the intersection of our independent lies.”

That said, a diverse group offers no guarantee of perfect accuracy. However, as I show next, diversity does improve accuracy.

The Diversity Prediction Theorem

Diversity bonuses in prediction can be quantified with a mathematical identity: the diversity prediction theorem.⁶ That identity reveals that ability (measured by accuracy) and diversity contribute equally to collective accuracy. Thus, the best predictive groups will include both accurate and diverse predictors.

The formal statement of the diversity prediction theorem requires four statistics calculated from the predictions of a group of people or models.⁷

Prediction Error: The square of the difference between a prediction and the true value.

Average Error: The average of the group members’ prediction errors.

Collective Error: The prediction error of the average of the group members’ predictions.

Predictive Diversity: The variance of the group members’ predictions.

A low prediction error corresponds to high ability. A group with a lower average error has higher average ability. Collective error corresponds to the ability of the group. Finally, predictive diversity corresponds to the variation in the predictions.

The diversity prediction theorem states that collective error equals average error minus predictive diversity (see box). In other words, the accuracy of the group depends equally on the average accuracy of its members and their collective diversity. No tradeoff between diversity and ability exists. Accurate collective predictions depend on both.

The diversity prediction theorem describes a mathematical identity. It holds true for any collection of predictions. Those could be guesses of the number of jelly beans in a jar by third graders at Angell Elementary School or predictions of interest rates by economists working for the United States Federal Reserve.⁸

Diversity Prediction Theorem

Collective Error = Average Error − Predictive Diversity

(Group Ability = Average Ability + Diversity)

Numbers from a real case make the logic clearer. I once asked sixty students to guess the number of ridges on a US quarter. The correct answer is 120. The average of the predictions was a little over 137. Some guessed fewer than 50 ridges. Others guessed more than 250. Plugging their predictions into the diversity prediction theorem produced the following equation:

US Quarter Predictions

Collective Error (296) = Average Error (6726) − Predictive Diversity (6430)

In interpreting these numbers, recall that we measure errors as squares of differences. The group missed the mark by a little more than 17. The average error exceeded 80. The average error far exceeds their collective error because the predictions were diverse. Some students predicted above 120, while other students predicted below 120. The high and low guesses cancel out.

The same students predicted the length in miles of the London Tube. In this case, the collective prediction (249 miles) was within one of the true value (250 miles) even though, on average, most students made wildly inaccurate predictions, as shown in the equation below:

London Tube Predictions

Collective Error (1) = Average Error (63,698) − Predictive Diversity (63,697)

In this case, most students guessed fewer than 200 miles, while two people guessed over 750 miles. Each of those two people produced squared errors in excess of 250,000. These two students knew that the Tube covered most of London and overestimated. Most of the other students had not been to London and assumed limited public transportation systems.

From these two examples, it might appear that the diversity prediction theorem guarantees a diversity bonus. That is not true. Adding an inaccurate prediction can increase average error by more than it increases predictive diversity. If so, the net accuracy falls.⁹

To see how adding wildly inaccurate predictions can make the group less accurate, consider the extreme case of an easy prediction in which each person makes an accurate prediction. Collective error, average error, and diversity all equal zero. Adding an inaccurate, and, by definition, diverse prediction, makes the collective prediction incorrect. Though the bad prediction increases diversity, it increases average error even more.¹⁰

The diversity prediction theorem has three corollaries that bear keeping in mind. The first corollary states that crowd error cannot be larger than average error. This holds because diversity cannot be negative. In the students’ predictions of the length of the London Tube, the crowd error was substantially less than average error. A group that lacks diversity will not be much more accurate than any one predictor. Large predictive error will therefore be more likely among groups that think alike. To the same extent that the wisdom of crowds depends on diversity, the madness of crowds depends on a lack of it. A price bubble depends on inaccurate price estimates that all err in the same direction.

Three Features of Predictive Groups

(i). A diverse group will always be more accurate than the average of its members.

(ii). A group’s accuracy depends in equal parts on the average accuracy of its members and their diversity.

(iii). Adding a more accurate predictor or a more diverse predictor can make a group more accurate.

The idea behind the second corollary—that the collective will be more accurate because of diversity—is not a new idea. It goes back to at least Aristotle and was echoed and emended by Friedrich A. Hayek in the context of the economy.¹¹ The fact that they contribute equally we obtain through algebra. The equal weighting of accuracy and diversity leads to counterintuitive insights.

Imagine that we had to select three students out of a class of twenty to make a group prediction. Assume that we know the accuracy of each student on past predictions and we know the diversity for all groups of three students.

Our first intuition might be to select the three most accurate students. That group would have the lowest average error. Recall though that the group’s error depends in equal measure on their diversity. By choosing the group with the smallest average error, we ignore diversity. We ignore half of the equation. If the three most accurate people all think the same way, then we have no diversity. Selecting the group of three people with the largest diversity makes even less sense. To have high diversity, the group must have an even higher average error. The best approach is to search among all 570 possible groups for the one whose diversity is closest to its accuracy.

The final corollary reveals two routes to creating a more accurate group prediction: we can add someone more accurate or someone diverse who is not horribly inaccurate. In the Netflix Prize competition, BellKor was the most accurate, so they had no one better to add. Here again, we see why problem difficulty correlates with diversity bonuses. On difficult predictions, developing a different predictive model is often easier than developing a more accurate one.

The same logic that applies to groups applies to individuals. Individuals who excel at predicting keep many models in their head. They mimic a diverse crowd. Individuals who employ a single model or way of thinking are less accurate than many model thinkers. In fact, a single-model thinker will be less accurate than random selection in predicting whether a trend will increase, decrease, or stay the same.¹²

Information, Categories, and Mental Models

Within a single person or a group, diverse predictions have similar causes. People make predictions, whether formal or informal, using their cognitive repertoires. They apply models to categories of information. Different information, different categorizations of that information, or different models will produce diverse predictions. These causes of diversity apply individually or in any combination.

Suppose, for example, that two people possess different information but categorize the world similarly and use similar models. Those two people will make different predictions even if they use the same category and the same model.

That different information could come from different life experiences. Two students trying to determine the number of miles in the London Tube might both use the same category, major international city public transportation systems, and the same model, public transportation systems have similar numbers of miles of track per resident. The students would make different predictions if they based them on different information. One student may have been to Paris and Amsterdam. Paris has a population of 2.2 million residents. The Paris Metro has 113 miles of track, or 1 mile of track per 20,000 residents. Amsterdam has a population of 750,000 residents and 25 miles of track in their metro system, an average of 1 mile per 30,000 residents. Taking the average of the two cities gives a predictive model of 1 mile of track per 25,000 residents. Using that estimate, the student would predict 350 miles of track for London’s 8.75 million residents.

The second student could apply the same model but use information from Rome and Vienna. Rome’s metro has 37 miles of track and Vienna’s U-Bahn has 49 miles of track. Given Rome’s 2.7 million residents and Vienna’s 1.7 million, she calculates that Rome has 1 mile per 70,000 residents and Vienna 1 mile per every 35,000 residents. If the student estimates 1 mile per 50,000 residents, she then predicts that the London Tube will have 175 miles of track.¹³

The two students make different predictions because they have visited different cities. Those experiences could connect to identity diversity: One student may have visited Paris because of her French ancestry. The other might have visited Rome because of her Italian mother. Or, the experiences could have come about for some other reason. What is relevant is that their different experiences caused them to rely on different information when making a prediction.

Category diversity can also produce diverse predictions. Functional categorizations group items with relevant features. To predict the quality of restaurants in Chicago, categorizing restaurants by their first three letters would not capture relevant features. That categorization would lump Chicago’s Tre Kronor, an upscale Swedish restaurant; Tres Amigos, a low-priced Mexican restaurant; and Tre Soldi, a trattoria and pizzeria, into the same category: restaurants beginning with Tre.

A more useful categorization would classify restaurants by their distance from the Loop (the city center). This would be a relevant categorization provided that restaurants nearer the Loop have higher quality. A second useful categorization would distinguish between restaurants in posh hotels and restaurants not in those hotels. The corresponding model would be that hotel restaurants have higher quality. A third categorization could distinguish restaurants by their longevity. The associated model would predict that Chicago’s Berghoff restaurant, which has been in continuous operation since 1898, would be of high quality.

Each of the models based on these categorizations produces better-than-random predictions. Each also makes mistakes. Chicago’s Billy Goat Tavern, famous for its cheeseburgers, has been in operation since 1934. Contrary to the longevity model, it is not of high quality. The model based on hotels will classify the Billy Goat correctly, thought it does err on the Berghoff. The other two models classify the Berghoff correctly. In addition to being around for more than one hundred years, it is also near the city center.

People who rely on diverse categorizations will make distinct predictions and, therefore, by the diversity prediction theorem will be more accurate as a group than they are on average. The potential for diverse categorizations increases in the dimensionality of the data or experiences. Restaurants can be categorized in dozens if not hundreds of relevant ways. Each of those categorizations can create a diverse model. Thus, we see a link between dimensionality and the potential for diversity bonuses.

The linkage between diverse categorizations and diverse models also reveals why more data implies more potential for diversity bonuses. A group of analysts fed only a small amount of data cannot construct many different models. Analysts with access to rich data sources can construct any number of plausible models. Those diverse models can combine to make a more accurate prediction, and they can offer up more accurate worst- and best-case scenarios.

Diverse models provide a third source of diverse predictions, even if people can have access to the same data and rely on similar if not identical categories. On difficult predictions, that is, in contexts in which the relationship between the data and outcomes is more complicated, models will be more likely to differ. The same holds for models with more variables. If two people fit the model y = mx + b, and have the same data for x and y, they will arrive at the same values of m and b. Two climatologists building complex climate change models with thousands of variables will not choose the same functional forms and therefore will not arrive at the same coefficient. Each model will make different predictions. And we know that a combination of those models will be more accurate than the average model. For this reason, the International Panel on Climate Change combines twenty-three distinct atmosphere-ocean general circulation models to make its predictions.

We should interpret this diversity of models as a method for handling the complexity of the predictive task and not as a lack of understanding. Climate change is far too complex to predict exactly. However, if the different models based on the same data are approximately equal in their accuracy, then collectively they will be more accurate than any of the models.

We already saw how that occurred in the Netflix Prize competition. The two final models had equal accuracy and differed. Therefore, their average had to be more accurate. Figure 3.1 shows the same phenomenon occurring in models of genotype identification. The graph measures discordance, so lower lines correspond to more accurate models. The top line represents the Eli Broad Institute at MIT, and the next two interwoven lines represent the Sanger Institute at Harvard and Goncola Abecasis’s lab at the University of Michigan. To make these identifications, the biostatisticians rely on DNA, so they necessarily use the same information. The diversity that arises must therefore arise from different categorizations and models.

Figure 3.1 Collective Identification of Genotypes

In this example, all three models have roughly the same discordance. The bottom line in the graph represents the discordance of a collective classification based on majority voting (labeled as in the graph). The collective outperforms each model. Given that the three models differ and have approximately the same accuracy, that must be true.

These results and those of the Netflix Prize competition both show how everyone can be a winner. In each case, an ensemble of all the participants outperformed the individual winner. The Eli Broad Institute’s model, though least accurate, contributes to the winning ensemble.

So far I have considered predictions of outcomes that can be measured in numbers like interest rates and rainfall. People also make qualitative predictions such as whether a foreign-policy action will lead to greater or less stability or whether painting an interior wall a darker color warms a room. A related logic applies in those cases as well. Even if those predictions cannot be averaged to a single number, they can be coalesced into a richer understanding.

In sum, whether analyzing data in a scientific laboratory, predicting the weather or climate, or estimating future downloads of a new application, diverse information, categorizations, and models create diverse predictions, and those diverse predictions combine to produce accurate collective predictions.

The diverse information, categorizations, and models that create these predictions have various sources. Information may come from experiences or training. Identity and education may well affect the categories we use and our experiences. In addition, our formal training and our background knowledge can influence the models we apply.

The weight we might place on identity varies by case. If predicting consumer demand for the Dyson Supersonic hair dryer, we would probably want gender and racial diversity. Given the product’s cost ($399), we might prefer wealthier consumers; that is, not want income diversity. Nor might we seek out religious diversity.

In contrast, to gauge the appeal of a political candidate or the usefulness of a new student loan program, we would want diversity of wealth and religion, as borrowing practices vary by religion. In making predictions about the likely side effects of a drug, identity might play a small role, if any. But if predicting the ability of a diverse set of people to follow a treatment regimen that includes the drug, we would again want to cast a wide net.

As a rule, we should seek out diverse people. We should contemplate how and why identity diversity might matter. In any given case, the choice of whom to include requires careful thought. We should not blindly pursue identity diversity any more than we should necessarily take the best, that is, most accurate, people.

The diversity prediction theorem shows why and how diversity bonuses occur. It also shows why adding diversity without forethought offers no magical bonus. If we ask a random person on the street how much oceans will rise in the next fifty years or what will happen to interest rates in Greece in the next five years, we will not experience a diversity bonus. Those people will increase diversity but reduce accuracy.

Accurate groups consist of people with distinct information sets, diverse and informative categorizations, and diverse accurate models. No hard and fast rule exists for whom to add to a group. If deciding to add the opinion of another person or to add a second predictive model, data and theory support a rule of thumb called the not half-bad rule: the second person should be included so long as he or she is not more than 50 percent less accurate.¹⁴

Artificial Intelligence: Boosting Diversity

The same types of machine learning algorithms that forecast medical outcomes have revolutionized the field of artificial intelligence. Over the past three decades, the field of artificial intelligence has shifted away from single, sophisticated algorithms to ensembles of diverse predictors. For example, the state-of-the-art Google Translate software relies on ensembles of predictors that scan the web for words, phrases, and sentences and learn patterns. The program constructs translations by predicting sentences and structures based on writings it finds on the web. This results in more natural sentence structures. In the past, programs that translated French into English would contain internal French-English dictionaries and would apply built-in rules of syntax and grammar.

Computer scientists describe these individual classifiers as weak learners. For the ensemble as a whole to predict accurately, the classifiers within the ensemble must differ.¹⁵ They must look at unique feature sets or assume different relationships between features. The weak learners range from relatively sophisticated neural networks to the simple if-then decision rules used in random forest algorithms.¹⁶

The diversity prediction theorem provides a logic for the accuracy of these ensembles: the average of diverse predictions must be more accurate than the average prediction. That same logic applies to groups of people. A puzzle therefore remains as to why these ensembles of algorithms outperform groups of people by such a wide margin. The theorem provides intuition for that as well.

The collective accuracy of an ensemble of people or machines depends on the accuracy of each predictor and their diversity. Ideally, we would restrict the ensemble to accurate and diverse predictors. That would make the ensemble accurate. Machine learning programs do exactly that. The aforementioned random forest algorithms begin with a collection of random decision-tree predictors.¹⁷ Using a sample of the data to determine accuracy, the algorithm separates out the accurate decision trees. This procedure kicks out the inaccurate predictors. We cannot always do that with groups of people. We may not be able to evaluate people over past cases and boot out less accurate predictors.

Futhermore, ensemble methods train their predictors on subsets of data. This all but guarantees predictors that are accurate. We have no guarantee that people make reasonable inferences. People often apply woefully inaccurate models. Thus, the machine learning ensembles have a built-in accuracy advantage.

The algorithmic ensembles also build in diversity. They do so through bagging and boosting. Bagging trains predictors on randomly drawn subsets of examples. So the predictors learn from different experiences. This all but guarantees diversity. The analogy to the benefits of including people with different experiences holds with a small caveat. The machine learning algorithms determine the number of experiences, that is, the size of the bags, that is large enough and representative enough that the predictors will be accurate yet small enough to ensure some diversity. A person’s experiences may not be representative, and that could lead to an inaccurate prediction. For a new, diverse predictor to increase accuracy, it must be relatively accurate.

The second technique, boosting, engineers diversity by adding predictors that are accurate when the ensemble makes mistakes. Suppose that we want to train an ensemble of predictors to classify bank loans as either successes or failures and that we have one thousand cases for which we know the outcome. We can use those to train our predictors. We might first generate eighty random predictors. For each, we might choose four hundred random cases (recall that this is bagging) and only keep those predictors that classified more than 55 percent of their cases correctly. We next have those predictors collectively classify the one thousand cases. We might find that the ensemble predicts correctly on eight hundred cases. Call the other two hundred cases the difficult cases.

To perform boosting, we create a new random set of predictors. We train those predictors on the two hundred cases for which the previous ensemble made mistakes. We add to the ensemble those predictors that classify the two hundred cases correctly more than half the time. Given that the original ensemble classified those cases incorrectly, these new predictors must be diverse. Adding them to the ensemble increases accuracy.¹⁸ Thus, boosting, like bagging, generates diverse predictors and helps to produce an accurate ensemble.

Prediction Markets

In light of the awe-inspiring breakthroughs of artificial intelligence that leverage the combined power of accuracy and diversity, we might ask whether we cannot achieve similar success with groups of people. We know the end goal: an ensemble of smart, diverse people. One route mimics the machine learning algorithms and applies the logic of the diversity prediction theorem. Begin with a large set of predictors, test them, and choose accurate predictors that are diverse. Evidence supports that approach. Teams of the best predictors predict with high accuracy, as do teams chosen by the diversity of their predictions.¹⁹

These approaches rely on a top-down organization to construct the ensemble. Prediction markets provide a novel bottom-up approach to obtain similar results. In a prediction market, individuals place bets on outcomes and receive payouts based on their success. Prediction markets ensure a degree of accuracy through market forces. Predictors who consistently lose money will leave the market. People who make money, those who are more accurate, will remain.

Prediction markets create incentives for diversity because predictions that disagree with the majority earn higher payoffs if correct. In a winner-take-all prediction market, people may buy assets that pay one dollar if a specific outcome occurs. That outcome could be the winner of an election or a sporting event or the release of a new product by a certain date. If people do not think an event will occur, then the price of the corresponding stock will be low. A bargain, in a prediction market, is a stock corresponding to a likely event that has a low price. The large payoffs from these bargains create an incentive to identify systematic biases in prices. In theory, that incentive mimics a boosting algorithm, resulting in accurate aggregate predictions.

Though theoretically appealing, prediction markets have not been widely implemented. One obvious reason is the lack of a population of engaged accurate predictors. Unlike the classifiers included in ensemble methods, the models applied by individuals in prediction markets need not be accurate. To make accurate predictions, a prediction market needs intelligent, informed, diverse predictors. If we ran a market of fifth graders who were to predict the price of the Japanese yen against the euro in three months’ time, we should not expect accuracy.

Prediction markets suffer from two other potential shortcomings. If the stakes are too small, people may not take them seriously or experiment with manipulating the market to exploit trends. Also, under some conditions the implicit probabilities produced by a prediction market do not align with the participants’ beliefs of those probabilities.²⁰

These potential shortcomings aside, several large companies including Ford, Google, HP, Best Buy, and General Electric have experimented with or operate internal prediction markets. Rarely are the predictions the sole reason for taking an action. Instead, their predictions are combined with or contrasted with other methods. They become part of an ensemble of predictors. HP and Best Buy have relied on prediction markets and forecasts of experts. By having two diverse methods for predicting, they hope to obtain more accurate predictions. Corporate prediction markets also promote a culture of inclusion. They provide a mechanism for employees to give voice to their opinions, albeit in numerical form.

CREATIVE TASKS

When we predict, we apply what we know to estimate something unknown. We organize our information and experience into categories and apply models. Those models range from elaborate statistical estimates to gut intuitions. In either case, we know the form those predictions will take. The outcome of an election will be one of a handful of candidates, a mean temperature will be a number between −40 and 120 degrees, and a stock price will be a positive number. We know the arrival time of a flight to Los Angeles will not be “banana” or “Luxembourg.”

When we create, we try to come up with novel idea or solution. We may lack any conception of the form a solution or idea will take. The mechanical clock was a long-shot winner in the contest to develop a method for determining a ship’s longitude.²¹ Sometimes, creative ideas are constructed from whole cloth. They can also derive from existing artifacts and products. Apple’s iPod is a miniature personal jukebox, the television remote control is a repurposed garage-door opener, and the 1970s mood ring is thermotropic medical tape transformed into jewelry.²²

We can think of a creative act as pulling an idea or thought from a set of possibilities. What a person imagines that set of possibilities to be depends on how she represents or categorizes the world, and on her information and knowledge. Thus, diversity in those parts of our repertoires will produce diverse, creative ideas.

New ideas also arise from recombinations of items: the combustion and steam engine combined parts sitting on the factory floor, Hormel’s Spam mixed ham and pork in a perfect ratio, and the fax machine combined the copier and the telephone. Recombination produces superadditivity. One plus one equals three because each new idea contributes on its own and in combination with the others.²³

Owing to the potential for repurposing and recombining, the value of a new idea or an innovation will not be revealed for some time. The practical applications of the laser, the combustion engine, and the wheel all rolled out over time.²⁴ The team that developed small, user-controlled drones probably did not anticipate their use by realtors to capture aerial shots of properties, by journalists to view political uprisings and crime scenes from above, or their long-term potential for pizza delivery.

Creativity involves a mixture of intelligence, perseverance, and serendipity. A group’s creativity will also be enhanced by its cognitive diversity. We already saw how diverse perspectives of physical space produce distinct adjacent possibles. That result provides a hint as to why the most creative groups need not consist of the most creative individuals. The people must also bring diverse repertoires.

Measuring Creativity

To demonstrate the contribution of diversity to the creativity of a team requires a measure of creativity. Psychologists measure creativity by the ability to generate original, diverse, and useful ideas. One common measure equals the number of ideas. More sophisticated measures include the depth and variety of ideas.

Regardless of the measure we use, we should expect a person’s creativity to correlate with her richness of experiences, and studies do find that multicultural experiences increase creativity.²⁵ We should also expect a person’s creativity to increase with the number of perspectives or categories she can apply to a task and with the granularity or fineness of her categorizations. A furniture builder who distinguishes among hundreds of types of dressers should be able to create more new designs for a dresser than a layperson who categorizes them by the number of drawers.

Psychological tests used to measure creativity pose open-ended questions. In the tourist problem, subjects generate ideas for increasing tourism to the United States.²⁶ In the fame problem, subjects must conceive of a route for a person with no special talents to achieve fame.²⁷

The canonical creativity test, the Alternative Uses Task, presents a subject with a common object, often a brick. The subject must describe as many uses of the brick as possible within a fixed time period.²⁸ A person’s score on this brick test can be measured by the number of ideas, by the originality of those ideas, by the amount of detail elaborated, or by the number of categories of uses.

A noncreative subject might respond that a brick could be used as a boat anchor, to build a wall, to build a house, or to break a window. This list consists of four unoriginal ideas, expressed with little detail. These answers could be placed in three common categories: brick as weight, brick as construction material, and brick as weapon. None of his answers would be classified as original.

A higher-scoring subject might offer that a brick could secure a tablecloth during a windy picnic or a tarp during a roof repair, displace water in a toilet tank, represent a brick building in a model train display, hammer tent stakes, serve as a Medieval knight’s coaster, level a table with uneven legs, sharpen knives, or, if broken into pieces, make arrowheads, knives, jewelry, chalk, or screwdrivers. These answers add new categories: brick as tool, brick as representation, and brick as composition.

The Alternative Uses Task maps directly to real-world situations. A chef must think of what to do with an excess of day-old halibut, a developer must think of development options for a piece of land, and a theater owner must think of events that will draw crowds on Saturday morning.

Not all creative tasks involve thinking up uses for a single object. Some tasks require the opposite: they require people to come up with multiple objects for a single use. Designing emojis is an example of an opposite task. A successful emoji will be fun, informative, and nonoffensive. In the October 2015 release of iOS 9.1, Apple added 184 new emojis, including a lion, a snowboard, a taco, a unicorn, a rolled-up newspaper, and the popular face with rolling eyes.

Diversity Bonuses on Creative Tasks

On a creative task, we can define a person’s diversity relative to a group as the number of unique ideas that she adds.²⁹ Using this measure of diversity, it follows that the creativity of a group will depend on both the creativity of individuals and their diversity. Walking through the algebra of an example shows why. Imagine that four people take the brick test. The two most creative people each generate ten ideas. The two least creative people each generate six ideas.

The creativity of the group of the two most creative people equals twenty minus the overlap in their ideas. If they have no diversity, then their joint creativity will be no greater than their individual creativity. If they have maximal diversity, that is, if they have no overlap in their ideas, then the two of them generate twenty ideas. Similarly, the number of unique ideas from the two least creative people equals twelve minus the overlap in their ideas. Barring almost complete overlap between the two most creative people, those two will be more creative as a team than the least creative two.

There also exist two groups consisting of one creative person and one noncreative person. Each of those groups generates at most sixteen ideas. If one of these groups has little to no overlap, they could form the most creative group. The key intuition builds on the logic developed in the stylized tool model. If the creative people both think of ideas using the same line of thought, that is, if their ideas arise in the same order like in the train trip analogy, they will have complete overlap and produce no bonuses. If they generate ideas along different routes, like in the analogy of the trip to the zoo, they will produce bonuses.

We can define the problem of finding the most creative group as follows:

The Most Creative Group Problem: Given a set of people, choose a group of a given size that has the highest creativity.

For the reasons just mentioned, the most creative group need not consist of the most creative people. To gain insights for when it will and when it will not, we can again use the brick test. There exist thousands of possible uses for a brick. It could function as a stand for a dog bowl. It could represent a couch in a Barbie play scene. Imagine writing each possible idea on a piece of paper and placing all of those pieces of paper in a giant box.

We can then think of an individual who generates an idea as reaching into that box. Creative people reach into the box many times. Noncreative people reach in less often. If people randomly draw from the box, then any two people draw each idea with the same probability. If that were true, then the most creative group would generally consist of the most creative people.³⁰

Instead of assuming that all ideas belong to the same box, we might alternatively assume two types of ideas: typical ideas and unusual ideas. Typical ideas include using bricks to build walls or fireplaces and destructive uses like breaking a window or smashing pottery. Few people think of using the brick as a couch for a Barbie.

To model these two types of ideas, we can keep the typical ideas inside the box and take the unusual ideas outside the box. Suppose there exist one hundred inside-the-box ideas and an enormous, near-infinite number of unusual, outside-the-box ideas. The latter assumption implies that no two outside-the-box ideas will match.

In this more elaborate model, a person’s creativity still equals the number of ideas she generates. Her creativity score does not depend on whether an idea comes from inside or outside the box. However, if forming a large group, we care more about the number of outside-the-box ideas a person generates than her total number of ideas. To see why, suppose we are forming a group of ten people. Creative groups of nine people will generate most of the inside-the-box ideas. Adding a person with fifty inside-the-box ideas and only five outside-the-box ideas adds few new ideas. A person who thinks of twenty inside-the-box ideas and also twenty outside-the-box ideas adds more new ideas. Though she is less creative, her ideas do not duplicate those of others.

The Alphabet of the Famous

On actual creative tasks, an idea that is obvious to one person may be outside the box to another, so I drop the assumption of ideas as either inside or outside the box. Instead, I think of ideas as belonging to knowledge domains. A person’s training, experiences, and identity influence her knowledge domains and therefore the types of ideas she generates. A diver thinks of using a brick as ballast. A chef thinks of cooking chicken on bricks.

The link between identity diversity and diversity bonuses on creative tasks operates through these knowledge domains. Interests vary by identity group. Men and women watch different television shows, visit different websites, and read different books. Race, religion, ethnicity, and age also correlate with these activities. Identity-diverse groups can therefore produce more ideas because they draw from different experiences, interests, and knowledge bases.

I have found that links between identity and knowledge can be revealed by the Alphabet of the Famous Test. This test measures ability by a person’s capacity to think of famous people. To conduct the test, each subject writes the alphabet vertically on a sheet of paper. Someone chooses a sentence that contains at least twenty-six letters, such as “Keen at the start, but careless at the end,” by Cornelius Tacitus. Each subject writes that sentence vertically, pairing each letter with a letter from the alphabet. Each letter pair represents a pair of initials. The subject tries to think of a famous person for each pair of initials, as shown in figure 3.2. The criterion for being famous must be agreed upon beforehand. It might be having a Wikipedia page with more than five hundred words.

Figure 3.2 Possible Answers to the Alphabet of the Famous Test

Figure 3.3 Baseball Answers to Alphabet of the Famous Test

On this test, any knowledge proves useful. People who perform well tend to be experts in domains with lots of famous people. The test therefore advantages people who follow popular music, the movies, or professional sports like football, baseball, or soccer. Each of these domains contains a large number of famous people. People with knowledge of classical music, tennis, or video games perform less well, as these domains have fewer stars. Professional tennis has fewer famous players than baseball, football, or soccer, and the movies feature hundreds of stars, while video games include few real people.

As a result, the most creative people in the Alphabet of the Famous Test, those who follow football, baseball, movies, or music, often overlap in their knowledge domains. If so, the best group on this test will not consist of the best individuals. To make that claim more explicit, consider that a baseball fanatic might come up with the names shown in figure 3.3.

If the best performers are all baseball fans, they might have similar lists. A group of the best will not score much higher than the single best person. In contrast, a group of less creative individuals consisting of someone who follows women’s tennis, a person who followed 1970s television, a political junkie, and a follower of reality television might generate a famous person for each of the first seven pairs, as shown in figure 3.4. The group’s diversity, not their individual creativity, would enable them to outperform the group of the best individuals, all of whom happen to be baseball fans.

Figure 3.4 A Diverse Group’s Answers to the Alphabet of the Famous Test

On scientific creative challenges, identity may play less of a role than background. Carol Fierke’s chemistry lab at the University of Michigan investigates compounds that inhibit farnesyltransferase (FTase) as potential antitumor agents. Finding inhibitors involves searching microbial plant sources for the proper chemical structures. Only students trained in chemistry and biology (or possibly physics) possess the relevant knowledge. Knowledge of baseball, reality television, or music will not be of much use.

On this task, the relevant diversity corresponds to a person’s information about various plants or her knowledge of chemical compounds. These types of diversity result from differences in training, or different laboratory experiences. To achieve those diversity bonuses, scientists populate their laboratories with postdoctoral students trained at different universities or with unique specializations. An identity diversity effect could arise if, based on her cultural background, someone knew of a particular fungus. Though such serendipity can occur, it is not a long lever on which to stand.

The Alphabet of the Famous Test and the FTase inhibitor task capture the two extremes. In the former, identity diversity correlates with variation in interests. In the latter, we would not expect identity diversity to have much of a direct effect. The scientific tasks require diversity among specialists. In between these two extremes lie an enormous number of creative tasks confronted on a regular basis by information specialists, lawyers, consultants, civic engineers, teachers, architects, administrators, project managers, and hospitality directors. The role of identity will again vary. In coming up with a provocative advertising slogan or a novel legal defense, identity diversity may matter significantly. For other creative tasks, such as coming up with an oven exhaust hood design, identity may matter less.

No hard and fast rules apply. Identity diversity probably matters more when people’s preferences or interests determine an idea’s value. Groups proposing restaurant designs, themes for a party, changes to an employee-compensation plan, possible recipients of honorary degrees, or designs for a public park should expect identity-driven diversity bonuses. The identity characteristics that are relevant when proposing degree recipients, likely race and gender, may differ from those relevant to designing a public park, perhaps age and physical abilities, as may the weight of identity and whether it acts alone or in combination with experiences and education.

The Medici Effect

Up to now, we have considered creative ideas in isolation. Often, ideas can be recombined. If so, this creates an additional bonus. Recombinations arise through interactions. People do not walk into a room, dump their ideas on the table, and leave. Ideas are shared, challenged, refined, and recombined.

These interactions occur on multiple scales. We share and combine ideas within small work teams, within scientific disciplines, and across cities and nations. If we look across time and place, we see many instances—Florence in the fifteenth century, Detroit at the dawn of the twentieth century, or Silicon Valley today—in which the geographic concentration of ideas enabled recombinations and refinements with wondrous consequences.³¹ Italians incorporated pasta from China into their cuisine. Henry Ford brought the assembly line from meatpacking and gun assembly to the car industry.

The potential for recombination increases the value of ideas. If we count by ideas alone as we have done so far, creativity is subadditive. One person has ten ideas. The other person has eight ideas. Together they have at most eighteen ideas. If the two people overlap on four ideas, together they have fourteen ideas. Mathematicians describe these calculations as subadditive because fourteen is less than eighteen, the sum.

Counting by the number of combinations reveals the potential for superadditivity from sharing. A person who thinks up ten ideas can create 45 pairs of ideas. A second person who thinks up eight ideas can combine them to create 28 unique pairs, for a total of 73 pairs. Recall that the number of ideas thought up by the pair is additive at best. That is, at most they could think up eighteen ideas. More likely, they would overlap in their ideas and only create, say, fourteen total.

However, those fourteen pairs produce 91 combinations. Individually, the pair created only 73 combinations (45 + 28 = 73). Recombination, in this case, is superadditive, that is, more than the sum. If the pair had no overlap of ideas, they could create 153 combinations of ideas, an even greater amount of superadditivity.³²

These combinatoric calculations portend an explosion of possibilities. Were a high percentage of those combinations useful, creativity would be child’s play. We would be overwhelmed with useful new ideas. Sadly, they are not.³³ We gain little by combining the remote control and the donut or the telephone and the blender. Idea combinations that work often come from related fields or domains.³⁴ This low probability of any one combination succeeding partly cancels out the benefits of the superadditive combinations.

That said, within a product category, it will often be the case that almost all combinations function just fine. The Italian fashion brand Prada allows customers to design their own shoes by choosing among nineteen styles, ninety-two leathers, five soles types, and a gold or silver monogram. Each of the more than seventeen thousand designs will function. A person could wear them. While some designs will be more fashion forward than others, nothing horrible can happen. None will be the equivalent of a Vegemite-flavored tofu ice cream sundae with blue cheese topping. And, hopefully, somewhere in that giant space of possibilities will be a jaw-dropping design or two. By crowdsourcing design, Prada taps into an enormous, diverse, talented population. If consumers copy attractive designs, Prada can mine their sales data to improve their own creativity.

No Test Exists

Notice that whether or not we allow for recombination, the most creative team need not consist of the most creative people. A similar finding held for prediction: the most accurate team needed not consist of the most accurate individuals. In fact, we can state an even stronger result: no test can be applied to individuals so that we are guaranteed to select the most creative group.³⁵

Reread the last sentence. It says no test. If a similar result holds for Google hiring employees, then Google cannot develop any test to give to their three million plus job applicants such that the highest performers constitute their best hires. If a similar result holds for college admissions, UCLA cannot apply a formula to its 119,000 applicants and expect to get the best class.

To show this result holds for the most-creative-group problem requires two logical steps. First, any test applied to an individual can only measure that individual’s ideas. Second, a clone of the person who scores highest on whatever test we apply necessarily adds less to the group than a second person with a single different idea. Therefore, in the most-creative-group problem, no test can exist. We cannot evaluate people individually and determine an optimal group.

No Test Exists (on Creative Tasks)

If selecting a group for a creative task, a person’s contribution depends on her ability to produce ideas that differ from those of others in the group. No test applied to individuals will be guaranteed to produce the most creative group.

The impossibility of a test undermines the meritocratic idea that choosing according to some objective criteria results in optimal choices. Creativity is defined in isolation. A person’s contribution depends on the group composition. A test cannot measure a person’s diversity unless it knows the group’s composition.

We must be careful not to misinterpret the no test exists result. It implies that blindly hiring or admitting by any fixed criterion cannot guarantee the best group. It does not imply that we should ignore ability or that less able people add more to a creative group. If we want the most creative group, we must weigh cognitive diversity as well as individual creativity. We will also want people who can work collaboratively.

PROBLEM SOLVING

In problem solving, we either seek a solution that satisfies conditions (a light bulb that produces five hundred lumens and uses fewer than five watts of electricity) or improves on the state of the art (a more aerodynamic wing design). To build the iPhone required both types of problem solving. Apple had to improve on existing technologies for screens, batteries, and memory and fit them in a small box.

The long arc of increasing economic prosperity rests on our ability to identify and frame problems, derive solutions to those problems, and communicate those solutions. Humans transitioned from small bands of hunters and gatherers to modern societies by solving problems. Innovation and problem solving fueled the Roman Empire, the Renaissance, and the Industrial Revolution, not to mention the Information Age.³⁶ Our capacity to pose and solve problems and to develop new technologies will also drive future growth. Evidence suggests that this will require teams of intelligent, diverse people embedded within political systems that allow innovations to take hold and spread.³⁷ Some economists tie epochs of economic growth to specific technologies. Breakthroughs in electricity, urban sanitation systems, the internal combustion engine, chemicals and pharmaceuticals, and communication technology drove economic growth from 1870 to 1970.³⁸

Within each broad category of inventions reside hundreds if not thousands of problems solved. The October 6, 1934, Chicago Daily Tribune ran the headline “Gold and Silver Extracted from Atlantic Water.” The article told of Willard Dow’s announcement that an electrochemical process developed to extract bromine from seawater also produced gold and silver as by-products. Dow had solved an important problem: how to extract minerals from the sea. He could even extract gold. The gold received the headlines. The bromine brought him great wealth. Its return exceeded that of the gold by a ratio of more than twenty to one.

Dow used a technical tool to solve the problem of mineral extraction. In describing the double helix structure of DNA, Francis Crick and James Watson also solved an open problem, as did the German engineers Hans Tropsch and Franz Fischer, who developed a chemical process for creating synthetic oil. These breakthrough solutions are the stuff of history books. Attempts at breakthrough solutions either succeed or fail. On these problems, we can measure the ability of a person or a group as their probability of finding a solution.

Far more often, problem-solving activities improve on a best practice. They result in a battery that charges faster, an engine that requires less fuel, or a tomato that remains firm after being picked and tastes a little less like cardboard. For these types of problems, ability corresponds to the magnitude of the improvement to the status quo.

To show how cognitive diversity can produce bonuses in both breakthroughs and small improvements requires different types of models. To analyze breakthroughs, I return to the toolbox model introduced earlier.³⁹ To analyze iterative improvements, I apply two models: one based on representations and heuristics and another that characterizes problem solvers as statistical distributions of solutions.⁴⁰

All three models simplify the process of problem solving. Solving problems and finding improvements to existing solutions involves every component of our cognitive repertoires. We tap into information and knowledge for potential answers. We apply models and frameworks to identify routes to solutions. We embed abstract problems within formal representations within which we apply heuristics and analytic tools. Students who obtain engineering degrees acquire repertoires for the purpose of solving problems. MIT’s Donald Sadoway suggests we might even label engineering PhDs as PSDs—problem-solving degrees.⁴¹

The Toolbox Model

I start with a model of breakthrough solutions. I assume that a team confronts a problem in need of a solution. This could be an open problem of great import: eradicating the Zika virus or developing fusion energy. It could be a riddle: How do we move a fox, a rabbit, and a head of cabbage safely across a river in a small boat?

I represent methods for solving the problem as a set of tools. A tool could be a heuristic applied to a representation. It could be the application of an analogy. Each tool has a probability of solving the problem. I refer to this as its potential. When a person applies a tool, she either obtains a solution—a cure for a Parkinson’s disease—or not.

I also assume that each person has a facility with a subset of those tools. Facility equals the probability of applying the tool correctly. A person who lacks familiarity with a tool has a facility of zero. A person with experience using a tool could have a facility near one. The probability that a person solves a problem with a particular tool equals the product of the tool’s potential and the person’s facility with that tool.

Suppose that the problem is to chop a tree. An ax’s potential would equal 100 percent. A person’s facility with the ax would depend on her stamina and strength. Facilities would lie between zero and one. In this example, facility equals ability. No diversity bonus exists. Hiring should be done by ability. As already discussed, we might want a team of Abe Lincolns.

Instead of felling trees, suppose that our problem requires solving an integral. If we consider the calculus to be a single tool, then facility with calculus would again equal ability and no diversity bonuses would exist. Once again, we should hire by ability.

The single-tool assumption makes sense for the ax. It does not fit the calculus. Calculus consists of a body of knowledge, along with a collection of heuristics. To solve integration problems, a student cannot swing her dog-eared copy of Tom Apostle’s Calculus with the hope of knocking the problem down. She applies the tools (heuristics) explained within the book. These include expanding, completing the square, performing a substitution, adding zero, multiplying by one, substituting for a trigonometric identity, performing a change of variables, and applying integration by parts.⁴² On a given problem, each of these tools would have some probability of success.

A student would have facilities defined over this set of tools. One person may be adept with trigonometric identities and not proficient at integration by parts. A person’s ability no longer corresponds to a single facility. Instead, it equals the probability that she solves the problem. That probability depends on her facilities across the many tools. High-ability people have facility with high-potential tools.

To see how diversity bonuses can arise, assume that calculus involves solving only integration problems and that these problems can be solved using one of four tools: expansion, adding zero, multiplying by one, and substitution. Each person can be represented as a vector of four facilities, one corresponding to each tool. One person might have 50 percent facility on each. A second person might have 60 percent facility with expansion, adding zero, and multiplying by one but might never have learned substitution. A third person might have never learned expansion or how to add zero and might have 60 percent facility with multiplying by one and substitution. Figure 3.5 represents these facilities graphically.

For the purposes of this example, I assume that each tool has a 50 percent chance of solving the problem. A few calculations show that the first person has a 68 percent probability of solving the problem. That equals her ability. The second person has an ability of 64 percent and the third an ability of 51 percent.⁴³

The ability of a two-person team equals the probability that at least one solves the problem. To stack the deck against diversity bonuses, assume that if one person has higher facility with a tool than another, then any time the second person successfully applies the tool, the first person does as well. Given that assumption, the facility of a team with a tool equals the highest facility of any team member.

Figure 3.5 Three Individuals and Their Facilities with Integration Tools

Figure 3.6 shows that the team of the two lowest-ability people has weakly higher facility on every tool than any other team. Therefore, that team has the highest ability even though it does not contain the two highest-ability problem solvers. A diversity bonus exists.

The existence of a single example of diversity bonuses does not imply that they always occur in problem solving. We can infer from this example a more general insight into the conditions necessary for them to exist. In the example, the highest-ability person can apply every tool. The two lower-ability people specialize on different sets of tools. That diverse specialization produces the bonus. That intuition applies broadly. A diversity bonus will exist when a lower-ability person has the highest facility with a tool. No diversity bonus can exist if the higher-ability people have greater facility with all tools.

Figure 3.6 Three Teams and Their Facilities with Integration Tools

Extending this intuition, if many tools can be applied to a problem, there exist more types of specialists. And, for larger groups, these specialists may be part of the optimal group. It follows that the ratio of the number of potential tools to the group size plays a role in whether diversity bonuses matter. If the group size equals or exceeds the number of tools, then the optimal group consists of the best person on each tool, and hiring by diversity will be optimal. If the best person on each tool happens to be one of the highest-ability people, then hiring by ability will also be the best thing to do.

Once again, there exists a straightforward connection between the growing knowledge base of tools and diversity bonuses. As the number of tools increases beyond what any one person can master, diversity bonuses become more prevalent. Within the many subdisciplines in science, engineering, mathematics, and medicine, thousands of tools have been developed. In acquiring a master’s degree in bioinformatics, a student learns hundreds if not thousands of tools, tricks, heuristics, methods, and techniques for evaluating and organizing data. Similarly, in chemical research, the number of possible molecular compounds that might be tested grows with each passing week.

For domains with large numbers of tools, a high-ability person can have high facility with only a subset of the tools. To explain how this results in diversity bonuses, I return to and expand the earlier analogies of the trip to the zoo and the train ride. Imagine the tools arranged in clusters. Think of each cluster as an exhibit at the zoo. Within each cluster, the tools may be acquired in a sequence, resembling the train ride of information, or they may be learned idiosyncratically like at the zoo. To extend the analogy, once we arrive at the snake exhibit, we may find we have to follow a specific path. Or, we might find that the snake exhibit resembles a zoo within a zoo and that we can visit the garden snakes, the tiny adders, and monstrous pythons in any order we desire.

We can think of the organization of tools for econometricians, surgeons, or any other specialist similarly. Econometricians use statistical tools to reveal relationships in data, to make inferences, and to state and test hypotheses. Econometrics apply Bayesian methods, non-Bayesian methods, and matching methods. Each of these methods can be thought of as a cluster containing sets of tools. Some of these tools must be acquired in a specific order. Others can be learned in any order once a baseline set of skills has been acquired.

For surgeons, clusters include intestinal surgery, hip replacements, and heart surgery, and tools include physical implements like laser knives and surgical techniques. Clusters of tools also exist for lawyers, information scientists, string theorists, mathematicians, and anesthesiologists.

Owing to the enormous number of problem-solving tools within each of these domains, even the highest-ability people possess facility in a small subset of the relevant possible clusters. An analogue of the Milton problem holds here as well. Any one person can only know a small proportion of the relevant tools.

In the toolbox model, high-ability people possess facility with high-potential tools. If the best tools reside in a single cluster, the highest-ability people will have similar facilities. Other people who have less ability may know different clusters of tools. These other people may not be necessary for easy problems. They become valuable when the standard tools fail to solve the problem.

On some problems, none of the usual tools succeed. Success may require applying tools from a related cluster. That requires people adept with those tools.⁴⁴ If no tool from a related cluster succeeds, this could lead to the search for a new tool. We might think of that as a creative task, and we have already seen how and why diversity matters there.

As an example of the value of related clusters, consider a problem faced by the International Monetary Fund (IMF). The IMF primarily employs economists. These economists possess facility with a range of analytic tools. They know development economics (D), econometrics (E), game theory (G), and political economy (P). The economists may lack knowledge of cultural anthropology (CA) and social psychology (SP).

Figure 3.7 shows hypothetical potential hires for the IMF represented by their abilities and by their tools. If the IMF hired by ability only, its staff would consist only of economists. The tools of economists perform better, on average, for the problems the IMF confronts than do the tools of cultural anthropologists and social psychologists. By the IMF’s standards, people trained in these disciplines possess less ability.

Figure 3.7 Representing People by Abilities and Tools

Nevertheless, cultural anthropologists and social psychologists can add value when the economists get stuck, that is, on difficult problems. So while the IMF wants high-ability people, they also want diversity, so they do not just hire economists. Neither does the Bank of England. Their research group includes psychologists.

Recall that in the model, a person’s ability equals the probability that she possesses a tool that solves the problem. This means that high-ability people master multiple useful tools. The collective ability of a group equals the probability that someone in the group solves the problem. High-ability groups include people with high facility with different tools.

Note the paradox of aggregation: even though high-ability people and high-ability groups have identical characteristics—high facility with many useful tools—high-ability groups need not consist of the highest-ability people. The best group need not consist of the best parts. Once again, a no test exists result will apply.⁴⁵

The logic goes as follows: Any one person has facility with only a set of the possible tools, and the tools that someone masters will be clustered. The highest-ability people will be concentrated in a few clusters—those clusters with the highest-potential tools. The clustering of high-potential tools implies a clustering of the highest-ability problem solvers.

More detail makes the logic transparent. Figure 3.8 shows a group of three people with diverse tools. Assume each has high facility with the tools represented. Given their tool diversity, this represents a high-ability group. Each person knows a different cluster. They differ in their facilities.⁴⁶

For the sake of argument, suppose that the person on the right has the highest ability, that his four tools have a higher probability of solving the problem than the tools of the other two. If other problem solvers of high ability possess only those same tools, then groups of high-ability problem solvers will not be the best. Selecting by diversity, or even randomly among competent problem solvers, could result in a better group.⁴⁷

Figure 3.8 A High-Ability Group with Diverse Tools

Diversity versus Ability

If the following conditions hold—(i) individuals possess sets of tools; (ii) no tool solves the problem with certainty; (iii) people master clusters from a larger set of tools; and (iv) the group will be chosen from a larger population—then the best group will generally not consist of the best problem solvers on some tasks.

The role of problem difficulty merits emphasis. High-ability people can solve moderately difficult problems. On those tasks diversity bonuses do not exist. Only when the high-ability people cannot solve the problem do diversity bonuses materialize.

Iterative Improvements

The second type of problem solving I consider consists of improvements in existing best practices such as when teams try to improve production processes, increase worker safety, lower costs, increase brand loyalty, or raise graduation rates. In these contexts, solutions can be assigned numerical values and better solutions score higher.

To model these iterative improvements, we can represent a person as being a collection of ideas. If the problem involves increasing production in a canning facility, a person may apply lean management techniques and consistently find solutions that increase production by between 5 percent and 7 percent. A second person might rethink the entire process from scratch. Half of the time, this approach fails. The other half of the time, production increases by 8 percent.

In this model, a person’s ability equals the expected value of her solution. The first person would have an ability of 6 percent and the second person an ability of 4 percent. We are abstracting away from repertoires here, but if two people had the same repertoire, they would have the same distribution and, perhaps, come up with the same solution.

Once again, we can imagine a large set of problem solvers from whom we want to choose a group. We can order these problem solvers by their abilities. First assume that the ability of the group equals the expected value of the best solution found by some member of the group. Notice that high-ability people and high-ability groups differ. High-ability people, on average, produce good solutions. High-ability groups, in contrast, must include at least one person who generates a great solution.

It follows that when putting together a group, a person’s ability matters less than the probability that she comes up with a great solution.⁴⁸ The expected ability of a group that consists of ten people who produce solutions with values between five and seven cannot exceed seven.⁴⁹ A group consisting of ten people who each have a 50 percent chance of producing a solution of value eight will produce a solution of value eight more than 99.9 percent of the time.

Therefore, we should seek people with the potential to generate great solutions, not people who do well on average. Ability is not the correct criterion for selecting group members. The optimal criterion evaluates people by their likelihood of finding high-value solutions.⁵⁰

Implicit in this construction is that one person’s probability of finding a high-value solution cannot be correlated with another person’s. Thus, the best groups consist of people who produce high-variance solutions using diverse cognitive repertoires. If people applied the same knowledge, heuristics, and mental frameworks, their solutions would be correlated.⁵¹

Combining Heuristics

The toolbox model and the distribution model lack any interaction among group members. Each person applies her tool or proposes her solution, and the group’s performance equals that of the best group member. That construction ignores the possibility of combining ideas. If two people generate different better ideas, both can be applied.

This accumulation of feature improvements is common. Remaining competitive, be it in smartphones, lawn mowers, or shoes, requires constant innovation. The iPhone 6 added Apple Pay, image stabilization, higher-quality video recording, more sensors, and Wi-Fi calling. The 2016 John Deere Signature Series mower added a new mower deck with lift-off spindle covers, a backlit electronic instrument panel, and a tachometer.

We could assume that improvements are additive: that two ideas that each improve output by 4 percent improve performance by 8 percent. We might instead assume that each improvement reduces the impact of the others. Or the improvements could be superadditive: one improvement could make the other more valuable. Improved image stabilization and higher-quality videos may be of more value together on an iPhone than separately. In this last case, the bonuses would be even larger.

The larger point is that recombination turns combinations of bonuses into more bonuses. The size of those bonuses will depend on the context. We should not expect superadditivity, nor should we rule it out.

The results from these three models show that diversity bonuses can exist and have significant magnitude in problem solving. Each model includes a set of parameters and assumptions. In the toolbox model, these are the number of tools, the probabilities that they solve the problem, the size of groups, and the number of tools a person can master. In the distribution-based model, the assumptions consider how the values are combined—whether the group’s solution is the best individual answer or a nonlinear combination of those answers.

The importance of diversity in each model depends on the assumptions. It is conditional. In the toolbox model, for the best group not to consist of the best individuals, the problem must be difficult (no one solves it for sure) and the highest-ability individuals cannot know every tool that someone else knows (a diversity assumption). In the distribution model, the value of diversity increases as the function becomes nonlinear.

SCIENTIFIC RESEARCH: REPRESENTATIONS + MODELS

Scientific researchers apply multiple parts of their repertoires. They develop hypotheses based on background knowledge. They bring novel information to bear. They borrow and develop heuristics to find solutions and improvements. They apply various perspectives and categories to organize data.

As an exercise, you can select almost any academic paper or patent that has had impact and you can find evidence of diversity bonuses. I do not recommend this for those who struggle with academic jargon or prose. Reading these papers is like walking into a French movie an hour after it has started.

In preparing this book, I performed this exercise about a half a dozen times. Each time I could find support for a diversity bonus. By that I mean that I could see how a new perspective, heuristic, or model resulted in new knowledge. One such paper, written by Bell Labs scientists, contributed to the field of physical chemistry. The paper studies well-mixed chemical systems. Cream stirred into coffee is well mixed. Chocolate syrup swirled on top of the coffee is not.⁵²

I knew nothing about those systems, so I did some background reading. In one background paper, the author referenced the “well known Brusselator limit cycle model.” The French movie now seemed a walk in the park by comparison.⁵³

I learned that at the time there existed two approaches to studying well-mixed systems. One approach models the rate of chemical change using mathematical equations. An equation might describe the change in the amount of a chemical as increasing by five units per second or decreasing at a rate equal to fourteen divided by the number of seconds. These mathematical representations reduce a complex process to a deterministic set of flows.⁵⁴ The deterministic approach models chemical systems as smooth. The levels of the various chemicals are represented by curved lines on graph paper.

The second approach, made possible by computers, simulates chemical processes as stochastic—that is, random—processes. This approach allows the modeler to include local correlations and fluctuations. Instead of a single, smooth prediction, it produces a distribution of jagged predictions. Each graph resembles a sequence of stock prices over time.

The two types of models give different insights. In this case, I had identified a bonus before getting through the paper’s abstract. As I dug into the paper, I learned that computational constraints limit the precision of the simulations. That constraint led to a creative solution.⁵⁵

The relevant sentence from the abstract reads as follows: “The motion of isolated adatoms and small clusters on a crystal surface is investigated by a novel and efficient simulation technique. The trajectory of each atom is calculated by molecular dynamics, but the exchange of kinetic energy with the crystal lattice is included through interactions with a ‘ghost’ atom.”

Set aside academic jargon and focus on the “ghost” atom. That constitutes the main contribution. No ghost atom exists. It serves as a placeholder to represent a population of atoms on the surface. Assuming the ghost atom makes the model calculable. The authors thought of the entire crystal lattice as a single atom. That reduced an enormous number of equations to one.

How they came up with this ghost atom is a mystery. I contacted one of the authors and he could not recall, though he did say that it was a new idea at the time. That should not be surprising. When problems become intractable, we pull from our repertoires of information, knowledge, models, tools, and representations in search of a new idea. When we happen on a solution, we may have no idea of its origin or cause. Some ideas may be more salient to some identity groups than others. Other ideas may be more available to certain professions. We also reason from analogy.

I chose to write about this paper because the ghost atom idea reminded me of a novel representation known to anthropologists as well as a heuristic used by economists. The representation comes from the navigational framework of the Micronesians. They imagine themselves as fixed and think of islands as objects that float past. If they lack islands in the appropriate places, they construct phantom islands defined in relation to their positions with respect to the moving stars overhead. These phantom islands, like the ghost atoms, simplify calculations.⁵⁶

The economist Leo Hurwicz taught me a similar heuristic. He called it assume what you need. Kenneth Boulding’s more memorable assume a can opener version goes as follows: There is a story that has been going around about a physicist, a chemist, and an economist who were stranded on a desert island with no implements and a can of food. The physicist and the chemist each devised an ingenious mechanism for getting the can open; the economist said, “Assume we have a can opener!”⁵⁷

Make up a ghost atom, create a phantom island, assume a can opener—these build on the same idea: make up what you need so that the math works. You can then go back and worry about the particulars. The point of this digression is that the process of science involves testing new ideas. Teams of people with diverse repertoires will have more ideas to test and perform better. Later we see that this will be borne out in the data.

INNOVATION: {CREATING, PROBLEM SOLVING} + PREDICTION

Innovation consists of both creative activities and problem solving. Some innovations, like the Slinky and the Internet, are out-of-the-blue ideas. Other innovations, like the near-endless improvement in chip design captured by Moore’s law, result from directed, purposeful problem solving.

In evaluating innovation, we care less about the number of ideas than about the value of the best one. Thus, experts characterize innovation as consisting of two parts: generating ideas (either by creating or problem solving) and then choosing the best from among them. Innovative teams still believe that more is better—all else equal, having a larger set of ideas to choose from increases the odds of having one really good idea.

If we think of innovation as combining a creative task or problem-solving task with a predictive task, and if we recall how diversity bonuses exist for each task, we can therefore conclude that diversity bonuses also exist in innovation. We might even infer a potential triple diversity bonus—one for each part of the task.

A deeper reading of the literature reveals a more complicated picture, as well as richer insights into the contributions of diversity. The first subtlety arises when we realize that the parts of our repertoires that produce the diversity bonuses differ for creating, problem solving, and prediction. The bonuses in the creative and problem-solving tasks result from diverse perspectives and heuristics and from diverse knowledge. The bonuses in the predictive task result from diverse categories, information, knowledge, and predictive models.

We should therefore not expect those people who are good at predicting to be the most creative or the best problem solvers, and they are not.⁵⁸ Nor should we expect the optimal diverse team for creating ideas or solving problems to be the same as the optimal team for selecting among them. Put simply, as innovation consists of distinct tasks, the people who have ability and who add diversity on each of those tasks may differ. The criteria for putting together a diverse predictive team will differ from those used to identify diverse, creative people or problem solvers.

A second subtlety concerns what constitutes creative search. Originally, the first part of innovation was conceived of as blind search, not unlike the mutations and recombinations that occur in evolutionary systems.⁵⁹ More recent models assume intentional search in line with the cognitive repertoire model described here.⁶⁰ The literature distinguishes between deep, foundational search within a discipline or paradigm, more like problem solving, and the more creative recombination of ideas.⁶¹

The dichotomy between deep and narrow search and speculative broader search recombinations both begs a practical question and introduces an attributional conundrum. The practical question centers on which type of search produces more value, or the best method of finding breakthrough innovations.

Evidence shows that established firms more often generate high-value innovations through broad search.⁶² While it is possible that deep knowledge and narrow search limits the potential for breakthrough ideas by limiting the set of possibilities considered, it is more likely to produce something entirely new.⁶³ Those new ideas become the building blocks for later recombinations. Thus, it may be unfair to give credit to the recombiners and not to those who originate the ideas being recombined.

Setting aside the credit assignment issue, evidence from patent data reveals that recombination predominates. The United States Patent and Trademark Office distinguishes among utility patents (the light bulb), design patents (the Coke bottle), and plant patents (hybrid corn). It classifies patents within 474 technology classes and over 160,000 technology codes. A study of all patents issued from 1790 to 2010 shows that in the nineteenth century more than half of patents were classified by a single technology code. That percentage has steadily decreased to around 12 percent.⁶⁴ Over the entire data set, more than three-fourths of patents combine multiple codes.⁶⁵ While the patent data do not “prove” that diversity bonuses predominate, the data reveal the value of combining diverse ideas and an unmistakable trend toward recombination as a driver of innovation.

The conundrum concerns the relative roles of ability and diversity. A first pass might assign credit for all recombinations to diversity and credit for the deep, novel innovations to ability. That attribution errs in both directions. On the one hand, many recombinations come from the mind of a single talented person. To call this an (internal) diversity bonus would stretch the diversity-bonus logic too far. On the other hand, many of the deep, narrow insights are produced by diverse teams.

KNOWLEDGE INTEGRATION

I next take up knowledge integration. When integrating knowledge, a group’s objective is to determine the veracity of a claim or the wisdom of an action. Unlike in creative tasks, where diversity implies more possibilities, in knowledge integration diversity reduces the set of possibilities. A team of cardiologists considering surgical techniques to improve blood flow needs to know possible complications. Knowledge integration rules out some techniques.

Grahame Knox’s group exercise Lost at Sea teaches the value of diverse knowledge.⁶⁶ I provide a partial version here so as not to spoil the exercise for others. Lost at Sea describes a scenario in which a group of people abandon a sinking ship and board a lifeboat. They must rank a set of fifteen items in terms of their value for survival. These items include a shaving mirror, rope, shark repellent, and a transistor radio. More than a trillion rankings are possible.

In the first stage of Lost at Sea, each person produces a ranking. For each item, she might think through a collection of possible scenarios and then ask which items would be most useful.⁶⁷ During a shark attack, the shark repellent or the flare gun would be more valuable than a sextant.

In the second stage of Lost at Sea, people share their knowledge, update their rankings, and come up with a collective ranking. In this stage, the group members combine their diverse knowledge. Items initially ranked as unimportant often move up in the rankings. The large clear plastic sheet earns a high rank when people realize that it could catch rainfall and provide shelter from the wind.

Lost at Sea has a best ranking determined by experts. So, it is possible to score the rankings of each individual after the first stage and to compare their scores to the collective group ranking. This exercise has been run thousands, if not tens of thousands of times. To the best of my knowledge, groups almost always outperform the average person, and typically outperform any individual.

In Lost at Sea, the integration of diverse knowledge produces deeper knowledge. This also happens in the real world. If one police investigator knows that the killer wore small gloves and another investigator knows that a particular suspect has enormous hands, then by combining their knowledge, they can rule out that suspect. Alone, neither could.

We can formalize that intuition and show the power of diverse knowledge integration using an experiment my students and I developed called the Diversity LSAT.⁶⁸ The experiment relies on a logic problem like those on the Law School Admission Test (LSAT).

Logic Problem: Boeing, Ford, Alphabet, Molex, and Caterpillar just released annual revenues. Boeing’s revenues exceeded Molex’s. Alphabet did not have the highest revenue among the five firms, though its revenue did exceed Boeing’s. Caterpillar has never had higher revenues than Molex.

Which of the following must be true?

Claim 1: Alphabet had the second-highest revenue.

Claim 2: Ford did not have the highest revenue.

Claim 3: Caterpillar had the lowest revenue.

Solving these problems involves identifying the set of relevant facts and then determining their implications. This particular problem includes four facts:

Fact 1: Boeing > Molex

Fact 2: Alphabet not highest

Fact 3: Alphabet > Boeing

Fact 4: Molex > Caterpillar

In the experiment, we assign each participant one fact. We then ask if anyone can answer the question. None can. The person who knows Fact 1, that Boeing earned higher revenues than Molex, cannot determine if any of the three claims must be true. We then form two groups of size two. One group knows Facts 1 and 2. The other knows Facts 3 and 4. The group that knows Facts 1 and 2 knows that Boeing earned more than Molex and that Alphabet did not have the highest revenues. They remain unable to determine which, if any, of the three claims must be true. Nor can the group that knows Facts 3 and 4.

We then let the first group pick a third member from the second group. If they pick the person who knows Fact 3, the group will know that Alphabet had higher revenues than Boeing, which in turn had higher revenues than Molex. They will also know that Alphabet did not have the highest revenues. They can then deduce that Alphabet ranks above Boeing and Boeing above Molex and that either Ford or Caterpillar (or both) ranks above Alphabet, but they cannot discern the true ranking.

Last, we let all four people work together. The fourth person knows that Molex is ranked above Caterpillar. The group can see that if Molex is ranked above Caterpillar, then Ford, and not Caterpillar, must be ranked above Alphabet. The order now falls into place. It must be the following:

Ford > Alphabet > Boeing > Molex > Caterpillar

Therefore, both Claim 1 and Claim 3 are true.

In problems like this, discerning the truth requires intersecting what each person knows to be possible. Each person’s diverse knowledge reduces the possible orderings. The logic contradicts the intuition that diversity creates bonuses by increasing the number of alternatives. In this example, we assign people different facts. In a real-world setting, people would know different facts. Diverse knowledge therefore can contribute to truth verification.

To see how identity diversity might play a role in truth verification, consider a panel of judges who are determining whether a particular action violates the Constitution. Each judge takes in facts, applies knowledge, and makes a binary decision. If every judge on the panel came from the same identity group, had similar life experiences, and attended the same law school, where they were taught by the same professors, then no matter how much ability each had, they might categorize the case similarly. If the judges had diverse identities, had distinct life experiences, and learned the law at the knees of diverse faculty, we would expect a richer collective interpretation of the law.

STRATEGIC PLAY

Next, I consider strategic contexts that I model as games. In a game, a player’s payoff depends on his or her own actions and on the actions of other players. Payoffs can also depend on random events. Chess is a game. Soccer is a game, as are poker, politics, and competition between firms. The field of game theory analyzes strategic behavior in games. Game theory, for the most part, assumes optimal behavior, so game theorists often restrict attention to games they can solve, that is, simple games. For the class of games in which individuals can deduce or learn optimal actions, cognitive diversity plays little role other than in games that have multiple possible outcomes.

In more complex games such as chess, Go, business competition, political elections, sports, and warfare, optimal strategies have yet to be discovered. Furthermore, each game requires a sequence of actions taken under time constraints. During a time-out, a basketball coach has at most two or three minutes to decide on an action.

These action choices appear to resemble searches in problem solving. In each case, a person makes a choice and gets a payoff. The key difference is that in a game, the payoff also depends on the actions of others. That distinction proves crucial because players not only must learn how payoffs depend on actions, they must also anticipate the actions of their opponents. To do so, a player needs a model of the other players. As with any predictive task, diversity in the models of the other players will improve accuracy.

The relative importance of predicting and problem solving will depend on the game. In some games, the set of possible actions at a given stage may be small, so accurate predictions may matter more than increasing the set of the adjacent possible. For example, a person playing checkers or chess may be deciding among five or six moves. A poker player may decide between folding or calling.

Further complicating matters, in sequential games like Go or chess, each move alters the configuration. The value of a move taken now can depend on how the game unfolds. Taking an action therefore requires a predictive model of future paths of play by the opponent. Different predictive models may advocate different actions. Having a diversity of models confers a strategic advantage.

That advantage can be seen by considering actions in the game of backgammon. Backgammon has a handful of basic strategies. One involves playing offensively, trying to get your pieces around the board and off as fast as possible. Another strategy does the opposite. It holds as many pieces back as possible in the hope of blocking the opponent from getting his pieces off the board. A third tries to form blockades that trap the opponent, and a fourth holds just a few pieces back on the opponent’s home board with the goal of exploiting a later opening.

Early in a game, a roll of the dice may present four possible moves, each associated with one of the possible strategies. Think of four roads diverging in a wood: a player who knew the strategy of the other player and had a computer to simulate billions of trials of possible rolls of the dice for the continuation of the game could calculate the expected winning percentages from each action. Lacking a computer, the best she can do is to make inferences based on short sequences of possible rolls.

In making those inferences, she needs a model of the other player. She must predict whether the other player will take risks or play it safe. Strong players not only have the ability to calculate probability distributions of future rolls, but they also have accurate models of what the other player will do given a roll.

If we imagine a team of players with diverse models, they will be even better at backgammon than an individual. Consider each possible action to be a choice from a set. Assume only one of those actions is best; that is, it maximizes the probability of winning. The ability of a player corresponds to her probability of choosing the best action at each step. A great player may make the correct choice 90 percent of the time. A lesser player may only be correct 80 percent of the time.

Now, suppose two teams of five players compete against each other. One team consists of five people using a single strategy that makes the correct move 90 percent of the time. This team of five will be no better than any one of its members. The other team consists of five players who each make the correct move 80 percent of the time but who use different models. For convenience, assume that each has an independent probability of identifying the correct move. If the second team’s members vote on the correct move, the majority will be correct more than 90 percent of the time (93 percent, to be exact).⁶⁹ The diversity of models makes for a better team.

The previous example might be dismissed as a straw man. I compared one team of people using identical models to a team with models so diverse one person’s correctness was independent of another person’s. That construction might appear to favor the diverse team. However, in contexts in which there exist many alternatives, the independence assumption may well understate the overlap of the diverse group.

Suppose that when the second five-member team votes on a move, they are selecting among tens or hundreds of possibilities. That is true in backgammon, where a roll of doubles can create more than a thousand potential moves. Given the positions of the other player’s pieces, perhaps one hundred of them might be possible and not unreasonable. Thus, when the five people vote on a move, they are choosing among one hundred alternatives. In that case, the winning move need not get three votes. It might get only two votes.

If two people’s models select the correct action, so long as the other votes go to different incorrect moves, the best move will win a plurality. In our example, if we assume one hundred possibilities and an 80 percent chance of finding the best move, this increases the probability that the best move gets selected to above 98 percent. Therefore, the team of five people, each of whom selects the correct action independently 80 percent of the time, now outperforms an individual who selects correctly 97 percent of the time. Evidence from simulations and real-world experiments agrees with the logic. Ensembles of diverse Go algorithms outperform ensembles of better similar algorithms.⁷⁰

A return to the concept of the adjacent possible clarifies the intuition and helps make the larger point about the value of diverse ways of seeing and thinking. Suppose that the most able people make choices from a similar set of adjacent possibles. Suppose that a more diverse team consists of people with different sets of adjacent possibles. A plurality of the members of a more homogeneous, more able team may well match on the same incorrect choice. That possibility is less likely when people are choosing from different sets of possibilities.

This same logic applies to decisions by committees. In most corporations, board committees help select a company’s CEO. This process consists of identifying a set of candidates and then selecting from among them. While companies hire headhunters who seek out potential candidates, they also rely on their board members to encourage people to apply. A board member’s social network can influence the set of candidates considered. Greater board diversity therefore can increase the size of the set of potential candidates who arise through personal connections. These connections occur across industry—people from telecom know other people in telecom—and across identity groups. The latter are facilitated by formal organizations like the Black Board of Directors Association and several groups promoting women on boards.

When selecting from among those candidates, board diversity can lead to better choices. If board members with similar backgrounds, training, and experiences rely on similar criteria for gauging past performance and employ similar heuristics when conducting interviews, they will be more likely to prefer the same person for CEO. This board might vote 11–0–0 or 10–1–0 among three final candidates. If not diverse, these lopsided votes might be the norm. If board members come from different industries, have different identity backgrounds, possess diverse knowledge, and rely on different models, they will be less likely to agree on their rankings of candidates. Their initial vote among three candidates might be 5–3–3.

A unanimous vote could mean that one candidate clearly dominates the others, or it could signal a lack of diversity. A mixed vote, on the other hand, reveals the existence of diversity. Paradoxically, that could mean a better decision, as it guarantees that not everyone applied the same model. Pushing this logic further, if you never find yourself on the losing side of a vote, then the group cannot be making better decisions than you would on your own. For the committee to be improving choices, you have to be on the losing side sometimes.

The Bonus of Being on the Losing Side

A committee that makes decisions can only be more accurate than a member of that committee if that member is sometimes on the losing side of votes. If not, the member could make every decision on her own and be equally accurate.

THE BUSINESS CASE AS MULTIPLE TASKS

Most business and organizational decisions occur within complex environments and include multiple tasks. A team may predict outcomes, solve problems related to implementation or engineering, generate ideas, and work through logic. Nobel laureate Herb Simon partitions decision making into three stages: information gathering, design, and choice.⁷¹ Creativity guru Edward de Bono describes six necessary factors or dimensions involved in careful thinking that he calls thinking hats. These include “facts and figures,” “emotions and feelings,” and “speculative and positive.”⁷²

Juliet Bourke of Deloitte identifies six core dimensions that overlap somewhat with de Bono’s thinking hats.⁷³ Bourke finds that good evaluations consider outcomes, options, process, people, evidence, and risk. Bourke does not believe that each person needs to be adept at each of these. She finds that most people excel at two or three and often ignore the others. Hence, the need for diverse teams.

Bourke describes empirical evidence that businesses ignore any of these dimensions at their peril. She finds that many organizations’ leaderships place too much emphasis on outcomes and options. The 1998 McDonald’s investment in the Chipotle burrito chain provides an example. McDonald’s executives evaluated the potential returns they hoped to achieve (outcomes) and considered alternative purchases (options). Chipotle proved a good buy. McDonald’s earned $1.6 billion on their $360 million investment.

Despite this rate of return, McDonald’s sold their stake. Media accounts describe tension on the people dimension. The founders of Chipotle shared a commitment to organic, locally sourced food. McDonald’s espoused more of an efficiency mind-set.

The loss of cognitive diversity proved costly to Chipotle. Chipotle’s commitment to organic suppliers required elaborate supply chains. McDonald’s had decades of expertise organizing and operating supply chains. Chipotle did not, and that lack of skill in process led to failures. In 2015, Chipotles were responsible for E. coli poisonings in eight states, a salmonella outbreak in Minnesota, and a norovirus in Boston, leading to the Internet meme “You can’t spell Chipotle without E. coli.”

Bourke’s framework demands a certain type of cognitive diversity. A decision-making team must take into account all six dimensions (see figure 3.9). Here again, we see the link between complexity and diversity bonuses. On a simple problem, a team might not be needed. A single person could work through all six dimensions. The correct decision as to whether to buy a new delivery van would require asking the following questions: Will the car serve our needs (outcomes)? What other cars might we buy (options)? How will we finance the car (process)? Can all of our employees drive the car (people)? What is the market value for the car if we have to resell it (evidence)? And finally, what’s the worst-case scenario (risk)? By answering these questions, the person has covered all six dimensions.

Strategic choices in more complex environments require teams because no one person could possibly cover all six dimensions. Suppose that the federal government wants to create incentives for more charter schools. They must think about how the outcomes might range between efficiency and corruption, and what types of corruption might emerge. They must think about the possible policy changes. Do they offer grants or subsidies? Do they push to change laws? They have to consider the process by which schools and teachers are accredited. Does this happen at the local, state, or national level? Other process questions concern the allocation of students to schools and the transportation of those students.

Figure 3.9 Expanded Bourke Model

Policy makers would also need deep understandings of the populations being served. How will people be informed of their options? Do they have the capacity and means to make good choices? In making these decisions, policy makers should pull together the best evidence. That evidence may be diversely held. And last, they must consider the risks, not only the political risks but also the potential implications for communities if the experiment fails.

To answer each of those questions would require diverse cognitive repertoires. Some, like the evidence questions, would require information diversity and knowledge diversity. Others, namely the questions pertaining to the responses within communities and the associated risks, would benefit from the types of category and mental model diversity attributable to identity. How will, for example, various racial and ethnic communities respond to the new program? Some of those same questions would benefit from disciplinary cognitive diversity. Economists, sociologists, historians, political scientists, and civil engineers would all bring useful knowledge and mental models.

DOMAIN-SPECIFIC BONUSES

Diversity bonuses can also arise in a broad array of task domains including analytic tradecraft, presidential appointments, venture capital investments, drug discovery, product design, and classroom discussions. Here, I present brief overviews of how bonuses occur in those settings.

Analytic Tradecraft

The practice of analytic tradecraft by the intelligence community has been designed to deal with complexity and limited, vague information. Failures in the intelligence community can often be traced back to not challenging assumptions or information. In the 1950s the intelligence community assumed limited Chinese support of North Korea. In 1974 they assumed that the Arabs would not attack Israel. In 1989, they did not think the Soviets would allow German unification, and in 2003, they thought Saddam Hussein was developing nuclear weapons.⁷⁴

To overcome errors in the evaluation of evidence (representations and categorizations), the estimation of probabilities (prediction), and perceptions of causality (mental models), tradecraft enforces diversity through the use of devil’s advocates who take the opposite position. Devil’s advocates add value, but they are not a perfect solution. Obliging someone to take the opposing position may not be as effective as including someone who actually holds that position.

On the CIA’s website, a director and senior adviser for cyber security (who is, needless to say, unnamed) writes, “Truth isn’t resident in a single perspective or the product of one mind. To discover it means to come at it from several directions, to question what is seen to be certain that it is what it actually is. The more we question, the more we look, the more we consider, the closer we will get to the wisdom we are being asked to offer.”⁷⁵

Selecting Candidates

Hiring and nomination procedures also require multiple distinct tasks and multiple possible diversity bonuses. A president selecting a nominee to the Supreme Court might first consider the direction she would like the court to move (outcomes) and then identify a set of potential candidates (options). She will then task a team with gathering evidence about the candidates. This entails digging into past writings and opinions. The president will then interview those who rise to the top. She will think through the nomination process. How will potential nominees deal with the media and perform during confirmation hearings? She and her team will include possible responses of the Senate Judiciary Committee and the Senate at large. To be prepared for what might happen during the nomination process, a president may even sound out members of the opposing party. Outcomes, options, evidence, process, people, and risk all enter. Failed nominations, such as Ronald Reagan’s nomination of Robert Bork, might be chalked up to not considering evidence and process. Bork’s record was attacked from the left, and he had a negative affect, which hurt him in the media.

Venture Capital

In evaluating a proposed startup, venture capitalists apply their knowledge of technology and markets to make predictions. They apply mental models of adoption curves. And they apply categories to management teams.⁷⁶ In the event that an investment turns sour, venture capitalists take a more active role. They negotiate contracts that allow them to intervene in the case of poor performance. At this point, venture capitalists become problem solvers.

Cognitive diversity helps at each stage. Billionaire venture capitalist Steve Jurvetson says, “I’ve actually come to respect the most irritatingly challenging people I’ve worked with as really valuable in improving group decision-making and what to do and what to invest in.”⁷⁷

Drug Discovery

Drug discovery provides an interesting example of how the accumulation of scientific knowledge changed the types of diversity that produce bonuses. Early drug discoveries often derived from folk medicine.⁷⁸ The isolation of quinine, an antimalarial chemical compound found in the bark of cinchona trees in the Andes, came about because seventeenth-century European missionaries noticed that the indigenous people would chew the bark as a treatment for fevers. Powdered bark became a popular antimalarial medicine. The French scientists named it quinine, a variant of quina, the indigenous word for bark.

Quinine is not a singular case. According to one study, 119 drugs have been developed from folk medicines.⁷⁹ That approach to drug discovery has had diminishing returns with many of the most promising folk medicines having been analyzed. Some produced useful drugs and leads. Others turned out to function only through superstition.

When scientific knowledge was less advanced, drug discoveries were as likely as not to occur through serendipity. British scientist Alexander Fleming discovered penicillin by chance when a culture killed bacteria on the surface of a test tube. The image of a scientist experimenting in a lab, guided by horse sense, intuition, and a bit of luck, captures this period of drug discovery.

Modern drug discovery, though it still relies on one part indigenous plants and one part lab-coated scientists, has become more systematic and sophisticated. Scientists still investigate native plants in search of useful compounds, a process known as bioprospecting, but that process is based less on indigenous knowledge. Those naturally occurring compounds, along with synthetic compounds, function as libraries of chemicals. Two decades ago, scientists would explore those chemical libraries compound by compound. Scientific and technological advances now allow chemists to test entire libraries against specific biological targets cloned from human proteins. Think of these protein targets as the pathological structures that cause disease.

These new methods turn the original process of drug discovery on its head. Classical pharmacology first extracts a compound and then identifies the proteins it attacks. Modern (or reverse) pharmacology starts from the protein and then searches for the chemical compound that targets it.

In each era of drug discovery, cognitive diversity produced bonuses, though the type of diversity that produced those bonuses changed. Compounds derived from folk medicines tapped into local, indigenous knowledge. This diversity ties to that part of identity connected to geography. Different peoples live in different ecological niches and develop unique knowledge bases.

The relevant knowledge diversity for the scientists in lab coats somewhat blindly searching the library of compounds probably had few ties to identity. The relevant diversity could be measured with respect to the library of compounds. Each scientist developed expertise with different parts of that library. The breadth of their search increased their collective odds of finding useful compounds. Their collective ability derived from their individual talents being pointed in many directions. Hence, Fleming, a brilliant scientist by any measure, describes his discovery of penicillin as serendipitous.

Modern drug discovery benefits from diversity in disciplinary knowledge and tools. The lone chemist in the lab has been supplanted by large, interdisciplinary teams consisting of experts in proteomics, protein folding, computational chemistry, and structural biology. These scientists possess diverse training, acquire diverse knowledge bases, and master tools ranging from X-ray crystallography to computational modeling.

Similar trajectories exist in other domains. Initially, diverse local knowledge and understandings may be of central importance, which may depend on identity diversity. As knowledge accumulates and as understandings become formalized as models, diversity within the core discipline, be it economics, chemistry, or medicine, becomes more important.

Design

Effective design entails understanding the effects of attribute choices. The design of a microwave oven determines its functionality and its aesthetic appeal. The layout of an assembly facility for a manufacturing process affects cost, quality, and safety. The ad placements in a promotional campaign for a US Senate candidate influence voter turnout and support.

Quality design, manufacturing, and marketing involve multiple choices that must have desirable direct and indirect causes. A positive direct effect can be undermined by negative indirect effects. A sturdier composite circular tray that doubles as a serving dish can overburden the lightweight, rotating motor on a microwave oven.

Making a physical product requires materials, equipment, people, a work environment, processes, and management. Each step in the process must function well. Much like in the previous analysis of evaluating an investment, diverse expertise improves performance. A choice could involve creative thinking or it could involve problem solving.

When the Mazda corporation designed the Miata roadster, it articulated a set of desirable features. Mazda wanted a certain feel. It wanted drivers to be able to rest their elbows on the side of the car. That may seem like an easy feature to ensure. It can be thought of as a problem. Engineers had many solutions. They could raise the seat height, lower the sides of the car, or do a little of each. Each combination of choices influenced aesthetics and safety (lowering the sidewalls made the car less safe) and ultimately sales.

Selecting a Panel

Consider a producer of a public-interest television show who is selecting six teens to discuss creative solutions to reduce recreational drug use among America’s youth. She wants a knowledgeable, informed panel. To select among thousands of online applicants, she could assign a test over general knowledge of types of drugs and their effects. The producer might find that the top scorers lack diversity. They might all be Latinas living in Foster City, California, who attend the same elite private high school. They may have scored highest in part because in the previous summer they all worked as interns in the same neurobiology laboratory at the University of California, San Francisco. Though individually the best, collectively these six would not constitute the best group for the producer.

Their discussion, informed as it may be, would be narrow compared to that of a group that included students from other regions of the country who belonged to other identity groups. The types of drug use, the forms that social pressure takes, and the opportunities available vary along identity dimensions. When the producer adds a white male who scored lower on the test, the producer does not sacrifice excellence in favor of diversity. She adds a person with a diverse repertoire who will contribute to a more productive discussion.

To blindly choose those who score best would be to fall victim to the meritocratic fallacy: the belief that the best team consists of the best individuals. Selecting on individual merit makes sense for a four-hundred-meter relay team, but not for a discussion of drug use because the discussion can produce diversity bonuses.

Admitting Students

Universities understand this distinction between tasks that have bonuses and those that do not. College cross-country coaches offer scholarships to runners with the fastest times, and school admissions officers attempt to admit a student body that is diverse across a variety of dimensions, including identity.

College and law school admissions have been the focus of a series of legal cases relating to identity-based discrimination. Here is another context in which no single test can identify the best group. A law school wants a cohort of students who can best learn to interpret, apply, and adjudicate the law. Acquiring those skills requires an awareness and appreciation of the diverse lives that people lead, as well as of the various activities that compose our social, political, and economic worlds.

No single test that averages grade points and test scores can best determine that group. This is why law school admissions officers look beyond those metrics and consider life experiences, expertise, and identity. They do so to ensure cognitive diversity and a vibrant cohort. A school will admit people with a mix of college majors. To add cognitive diversity, a school may give a leg up to a medical doctor or an environmental engineer.

Given the salience of identity in so many aspects of the law, schools many also consider identity to the extent that it is legally permitted. Society needs lawyers and judges capable of understanding the disparate impacts of rulings in diverse communities. Similar arguments can be made for admission decisions based on socioeconomic diversity. Having an elite 1 percent supply all of our judges would produce a less fair and effective legal system.

The White Whale, Politics, and the Middle East

The value of diversity for businesses like Boeing or LinkedIn can be measured in dollars and market share. Diversity’s contribution to policy can be measured in cost and efficiency. In a classroom, cognitive diversity contributes to discussions and understandings.

Literature provides a good starting point. Great works of fiction can be interpreted through multiple lenses. Various literary scholars have interpreted the great white whale in Herman Melville’s Moby Dick as representing nature, a dragon, male potency, race, evil, the mystery of the universe, and even God.⁸⁰ A discussion that considered only one of these interpretations would lack the richness of a more comprehensive discussion that includes them all.

When engaging with any work of literature, people will draw inferences, construct analogies, and make connections based on their life experience and background knowledge. Their identities will filter and influence how they interpret images and events. At least two of the whale interpretations connect to identity dimensions: seeing the whale (a sperm whale) as representing male potency and as representing the white race.

Identity and cognitive diversity also broadens and deepens discussions of politics. Diversity, in fact, may constitute the fundamental problem of politics. Jack Knight and James Johnson, two leading contemporary political theorists, write, “Any imaginable human population is heterogeneous across multiple, overlapping dimensions, including material interests, moral and ethical commitments, and cultural attachments. The most important implication of this diversity is that disagreement and conflict are unavoidable.”⁸¹

Imagine two classrooms discussing the Black Lives Matter movement and the shootings of unarmed African Americans. The first classroom contains predominately upper-class suburban white and Asian American students. Those students classified as African American are recent immigrants. The second classroom contains students with a more diverse mix of racial identities. It includes poor students. It includes students who live in cities and students from rural areas. It includes students with family members in jail.

Assume that in each classroom, the instructor creates a safe space in which all students openly share their opinions and feelings. We should expect that the students in the second classroom will discuss more dimensions to the movement, will introduce more perspectives, and will produce a more complex discussion.⁸²

Or consider a class that discusses the Middle East. Most universities’ populations include Muslim students, Jewish students, and evangelical Christian students. Each identity group brings a different perspective on policies and actions in the region. Provided these students feel safe sharing their opinions, the diversity of knowledge, information, and mental models that they bring to classroom discussions results in deeper and broader understandings.

SUMMARY

In this chapter, I have shown how the logical case for diversity bonuses can be constructed by connecting repertoires to outcomes on specific tasks. Diverse categories and mental models improve predictions. Diverse representations and heuristics improve problem solving. Diverse perspectives and categories lead to more adjacent possibles and make groups more creative. Diverse information and knowledge improve a group’s ability to verify the truth.

In all of these domains, the right type of diversity can improve outcomes. On complex tasks, the best team will not consist of the best individuals. Teams need diversity. Diversity, though, is no panacea. Only rarely will the best team be maximally diverse. Most often, the best team will balance individual ability and collective diversity.

The relative importance of diversity depends on the corpus of relevant perspectives, knowledge, heuristics, models, and information. If there is much that can be applied to the task, diversity becomes more important. It follows that as we reach the frontiers of any discipline, we should seek diversity. The search for diversity can spur us to look across disciplines. Chemists look to physicists to better understand chemical structures. Ecologists turn to mathematicians to better understand niche dynamics. Economists turn to psychologists and neuroscientists to construct more accurate models of people.⁸³

On many of the challenges we face today, we need diversity to span disciplines. As already noted, America’s obesity epidemic falls into multiple disciplinary buckets. People may have biological predispositions that are exacerbated by abundant opportunities to choose fattening, unhealthy food. The current transportation infrastructure and zoning laws force more of us into cars and fewer of us onto sidewalks. No single cure exists. Making progress on the obesity epidemic will require thoughtful interventions based on input from doctors, marketers, public health professionals, sociologists, economists, and engineers. Finally, if, on the challenges we face, be they improving educational outcomes or selling running shoes, our identity differences correlate with relevant knowledge bases, understandings, and models, then the logic demonstrates the value of identity diversity.

These models reveal logical truths. They do not guarantee that in the world of people, diversity bonuses will always arise. By revealing logical truths, models enable us to interpret and structure empirical data. More importantly, they delineate the routes we must follow to achieve bonuses and help define the behaviors we must adopt to achieve bonuses. To reorder da Vinci’s claim, the models provide a compass and rudder.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter Three: Diversity Bonuses: The Logic

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter Three: Diversity Bonuses: The Logic