7
CAN HISTORY BE TRUSTED?

The financial media are constantly comparing current market behavior with historical periods. But what exactly can an investor learn from financial market history? That turns out to be a difficult question to answer and it depends on three key concepts: data mining, stationarity, and model specification. Although the concepts may not be familiar, there is some added good news here. All three, but particularly the first two, have applications to random events in other walks of life.

Remember that back in Chapter 1 we concluded that there cannot be any clearly demonstrable historical patterns in stock returns that can be exploited to beat the market. If there were, sophisticated investors armed with the latest computers and software would find them, exploit them, and thereby eliminate them. So if history is to provide some clues to the future performance of asset prices, those predictions cannot be simple, obvious ones. And that is where data mining, stationarity, and model specification enter the picture. Before historical data can be properly interpreted, the potential impact of all three must be considered. The best way to see why is to start with one of the most famous and controversial alleged patterns in historical stock market returns: the small firm effect.

In 1981, as part of his PhD dissertation, Rolf Banz discovered an interesting phenomenon. What he called “small stocks,” that is the stocks of companies with small market capitalization, had produced outsized returns relative to the predictions of the capital asset pricing model (CAPM) over the period from 1926 to 1981. The results were so dramatic and controversial that Banz's finding launched dozens of follow-up studies. Nonetheless, to this day, a debate still swirls around the small firm effect. The focus of that debate is on how to interpret the results reported by Banz and those who followed. Four competing interpretations have been offered, of which three involve the concepts mentioned at the outset.

The first interpretation is that small firm effect is what it appears to be – a way to earn superior risk-adjusted returns by investing in stocks of small capitalization companies. Proponents of this interpretation propose that behavioral or institutional factors dissuade investors, including sophisticated institutions, from buying small stocks. The Banz study simply uncovered a property of markets that was always there. According to this interpretation, the small firm effect, like the law of gravitation, is an enduring feature of asset pricing.

The second interpretation is that the small firm effect is an artifact of inadequate risk measurement. Recall from the chapter on risk and return that a large body of academic research indicates that the CAPM is too limited – there are risk factors other than the market. It is possible that smaller firms are riskier than average when risk is measured properly. If that is so, there is no small firm effect, there is just the proper risk–return trade-off. This is the position taken by Eugene Fama and his colleague Kenneth French. In a widely cited academic paper, professors Fama and French develop an asset pricing model that includes size as a risk factor. Perhaps not surprisingly, that model predicts that there is no size anomaly that can be exploited to beat the market. The size premium is just a risk premium. According to this interpretation, Prof. Banz's original study, which used the CAPM, suffered from model misspecification. Had he used a proper model, he would never have claimed there was a small firm effect in the first place.

A third interpretation is that although the small firm effect characterized stock pricing during the period Prof. Banz studied, it is no longer true today. This is an example of non-stationarity. During the earlier period the average return on small stocks was higher than the market average, but something changed in the marketplace so that the effect disappeared.

Proponents of this view argue that once the small firm effect was discovered, forces arose to eliminate it. First, if small stocks were consistently underpriced relative to their risk as Banz's data seemed to imply, clever entrepreneurs would start investment firms designed to exploit the mispricing. In the case of the small firm effect, that is exactly what happened. Rex Sinquefield and David Booth, two students of Eugene Fama, started Dimensional Fund Advisers (DFA), in part to exploit the small firm effect. In the years since its founding, DFA became an immensely successful firm that now has hundreds of billions of dollars under management.1 Although it was the largest such company, DFA was not the only firm to attempt to exploit the small firm effect. According to the third interpretation, the increased demand for small company stocks by firms like DFA drove up their prices and, thereby, at least partially offset the small firm effect. Second, if small firms had a higher cost of equity capital (remember the cost of capital for a company is the expected return for investors), then value could be created out of thin air by combining small firms. Put two small firms together, simply by forming a conglomerate without changing any of their operations, and you get a bigger firm by definition. You might think the bigger firm would be worth the sum of its parts because the combination did not change either of the underlying businesses, but that is not correct if the small firm effect holds. Because the combined firm is bigger, the small firm effect says that the cost of equity capital (the discount rate) is less. This reduction in the discount rate produces an added boost in value causing the combined firm to be worth more than the sum of the parts. Continuing in this fashion, value can be continually created by rolling up small firms into a big conglomerate. That process, however, would increase the demand for and drive up the price of small firms, ameliorating the small firm effect.

Adherents of this third interpretation claim that because of the offsetting activity it caused, the small firm effect has become an historic relic. You would think that this would be an easy claim to test. Prof. Banz's work was published in 1981 and was limited to U.S. data. Therefore, by using more recent data and data from other countries, it would be possible for researchers to determine whether the small firm effect has endured. Unfortunately, the newer and international data are ambiguous. Most recent studies show that the small firm effect, if it exists at all, is smaller than that documented by Prof. Banz. Whether or not the effect has disappeared entirely remains a subject of dispute.

The fourth interpretation is that there was never a small firm effect in the first place. Even a series of random numbers will have apparent oddities. If enough researchers all look at the same historical data set they will find oddities that are due to random fluctuations and that have no economic meaning. They are simply an artifact of data mining. The small firm effect, according to this interpretation, is just one example.

Notice that the fourth interpretation, like the third one, implies that the small firm effect should not be found in more current data. According to the third interpretation, based on non-stationarity, there should be no current evidence of the small firm effect because the world has changed. According to the data mining interpretation, there should be no current evidence of the small firm effect because there never really was one in the first place.

Before considering each of the three issues of data mining, stationarity, and model specification in turn, there is one more thing regarding the small firm effect that is worth noting. In Chapter 6, we pointed out that bid-ask spreads are much larger for small, particularly very small, market capitalization stocks. This makes it difficult to calculate the actual profitability of trading small stocks. The profits that appear on paper may not be achieved in practice. DFA has developed sophisticated trading procedures to reduce the cost of trading small stocks. However, those procedures require DFA's size and market connections. They would not be available to the average investor. So even if there is a vestige of the small firm effect, its exploitation may be limited to firms like DFA.

Putting aside concerns about the costs of trading small stocks, we now consider each of the three issues that make interpreting historical financial data so difficult.

DATA MINING

Randomness is a slippery concept. Often people think of a random data set as one that has no patterns. For instance, they would conclude that a random list of random digits would not have runs in which one integer is repeated. That is not the case. Ironically, the failure of people to appreciate this property of random numbers has been used to look for corruption in government. In order to hide corrupt activities, government officials sometimes have to forge economic data. To avoid making it appear that the books have been cooked, the corrupt officials often need to use random numbers. Those most familiar with statistics are wise enough to use a random number generator, but others simply write down what they think are a random sequence of digits. When they do so, they frequently fail to include enough patterns, such as runs of consecutive digits in the forged data. Their failure to understand randomness leads to their downfall.

A great example of a pattern in random numbers is the expansion of Pi. As a young man, the famous physicist and bon vivant Richard Feynman would reel off the first 768 digits of the expansion, the last six of which are 9-9-9-9-9-9, and then say “and so on” before breaking into laughter. His joke became so well known that the 763rd digit of Pi has become known as the Feynman point. Of course, Pi has been expanded to trillions of digits without any pattern being detected. Like any random series, the expansion of Pi has instances of apparent patterns. Without those patterns, it would not be random!

There is one more Feynman story that shows how history can trick people into finding meaning in random events. As an undergraduate studying in his room, Feynman had an intense premonition that his grandmother had died. At that very moment another student shouted, “Feynman you have a phone call.” (The residence hall at MIT had only one phone back then.) Feynman headed toward the phone dreading what he was about to hear only to learn the call was from another student saying that he had left his book in the classroom. Feynman thought to himself, people must have premonitions like this all the time. Most of the time a premonition passes and nothing happens, so it is forgotten. But in a few rare cases, solely by chance, the premonitions are fulfilled. For instance, what if the call had been to say his grandmother had died? To many people, such as experience could be transformative – something they would never forget and often repeat. To Feynman, fulfilled premonitions were a perfect example of data mining. With enough people having enough premonitions, there were certain to be some remarkable coincidences. The same is true of financial markets. With enough researchers performing enough studies, they are almost certain to find a variety of apparent market anomalies.

The best way to check for data mining is to repeat an experiment. This is commonly done, for example, in the testing of new drugs. If the effect in question reoccurs in several independent trials, it is unlikely to be an artifact of data mining. Unfortunately, in the case of financial market history, we only get to watch the movie once. There is only one historical record of security price behavior. The fact that we now have evidence on thousands of stocks for almost a century in the United States is a two-edged sword. The large sample increases the power of statistical tests that can be performed, but it also increases the number of anomalies that will be uncovered by extensive data mining. What is needed is a new data set.

The best solution is to wait another century for nature to provide a new data set. That option, however, is not appealing to current investors. A more feasible alternative is to use heretofore unexamined international data. Although sample periods are shorter, there are fewer subject companies, and the data are usually not as clean, international data can provide an independent test of hypotheses. The problem is that with the explosion of global financial databases, unexamined data is becoming scarce.

Another alternative, proposed by Prof. Campbell Harvey, is to use a higher cut-off for determining what is a true investment anomaly rather than a random fluctuation. The standard statistical tests for an anomaly assume that the study under consideration is the only relevant one. But if there are hundreds of researchers and investors doing thousands of studies that criterion is clearly violated. The problem is that even a higher standard does not solve the problem entirely. Some data mined results will still get over the hurdle and those that do will attract inordinate attention.

NON-STATIONARITY

Let's get the mathematical jargon out of the way upfront. Formally, a stationary stochastic process is a stochastic process whose joint probability distribution does not change when shifted in time. Consequently, parameters such as mean and variance, if they are relevant, also do not change over time. Non-stationarity should not be confused with unpredictability. All random processes are unpredictable. If the process is non-stationary, even the parameters of the random distribution cannot be estimated with confidence.

Admittedly the formal definition does not mean much to those who have not studied statistics and it could be dispensed with if non-stationarity were not such an important issue. Fortunately, there is a more intuitive way to describe stationarity.

Consider a jug containing red, white, and blue balls. The jug is shaken up, a ball is drawn at random, and its color noted. Then the ball is put back in the jug and the process is repeated. Though it is unknown at any step what color ball will be drawn, the probability of drawing a red, white, or blue ball remains unchanged. There is no distinction between the first draw and the tenth draw or twentieth draw. Each replication of the experiment is effectively identical. That is what is meant by a stationary process.

It turns out that the sequence of digits in Pi discussed above is, as far as we know, a stationary process. It cannot be distinguished from a process of putting balls number 1 through 9 in a jug and repeatedly drawing a ball with replacement. The sequence of drawn digits is random, but the process by which the digits are selected is unchanging.

Now, suppose that in the middle of the experiment a second jug is suddenly introduced with a different combination of balls. If the next draw is from this second jug, the probabilities that applied to the first jug are no longer accurate. The nature of the random process has changed. It is no longer stationary.

If there are only two jugs, there is what can be called a limited degree of non-stationarity. By simply redefining the procedure for drawing balls, a new stationary process emerges that involves two steps. At the first step, one of the two jugs is randomly selected. At the second step, a ball is drawn from the chosen jug. As long as this procedure is followed the new process, though more complicated, is stationary. In some situations it may be possible to discern this type of more complicated stationary process.

The balls and jugs analogy is useful for conceptualizing differing degrees of non-stationarity. The important questions include: How many jugs are there? How is the jug from which the ball is to be drawn selected? What is the distribution of balls within each of the jugs? In the limit, think of the case where there are an immense number of jugs, the contents of which are unknown, and where the probabilities of selecting a given jug are also unknown and may be changing over time. This limiting case we refer to as fundamental non-stationarity. Although this may seem like an extreme case, we argue that it is a problem that investors face on a daily basis. When it comes to investing, fundamental non-stationarity is not a rarity, but the normal state of affairs. If that is so, what we can learn from market history is limited.

Non-Stationarity and the Small Firm Effect

Perhaps the most important parameter for the random process that describes stock returns is its mean value. If the mean is changing, the process is non-stationary. That implies that past averages cannot be used to predict reliably future expected returns. The third interpretation of the small firm effect makes the argument that the mean is changing. Early on, when Prof. Banz did his study, the mean had a higher value. After discovery of the effect brought arbitrage into play in the form of companies like DFA and roll-ups, the mean declined. As a result, historical results such as those reported by Prof. Banz could not be relied upon as a predictor of future average returns for small stocks.

Non-stationarity is far more pervasive than the small firm effect. It is likely to affect every aspect of financial markets. Consider for instance the case of Germany. Germany began the twentieth century as a monarchy and one of the richest countries in the world. It suffered a disastrous defeat in the First World War. In the years following the war, the Weimar Republic collapsed under the weight of lofty war reparations, depression, and soaring inflation, which led to the rise of Hitler and the Nazis. That led to another disastrous war and the division of the country. Following reunification in 1990, Germany went on to become a stable democracy with the most powerful economy in Europe. The thought that the random processes driving the returns on financial assets would be unaffected by such momentous social changes is clearly nonsensical.

Although Germany may be an extreme example, countries around the world are constantly experiencing social, political, and economic changes that are almost certain to affect the behavior of financial markets. That is why we believe that securities prices are fundamentally non-stationary. If we are correct, any attempt to project the past behavior of market prices into the future must proceed with great caution and skepticism.

What is true at the level of countries is also true at a more micro level. With regard to individual stocks, language is an impediment to appreciating the full extent of potential non-stationarity. For instance, throughout its corporate life Apple has always been called Apple, but the company has reinvented itself numerous times.2 In the last three decades, Apple transformed itself from a start-up maker of personal computers into a global consumer product and services powerhouse, despite having several brushes with insolvency along the way. While it is conceivable that the process for stock returns remained stationary while the company was continually transformed, it would be foolhardy for an investor to assume that the dramatic evolution of the company had no impact on the expected return and the expected risk of Apple's stock.

MODEL SPECIFICATION

We stated at the outset of the book that “beating the market” is defined as earning risk-adjusted returns greater than those on the market portfolio. As a result, deciding whether investment performance is superior depends on the model used for measuring risk. The second interpretation of Prof. Banz's results is that they are an artifact of using an inappropriate model, namely the CAPM, to measure risk. When more appropriate models are employed, such as the Fama–French three-factor model, the risk-adjusted returns on small stocks are seen to be in line with the market because their risk is greater than the CAPM implies.

There is nothing special about the small firm effect with regard to the issue of model specification. Investors and academic researchers are constantly searching for “anomalies” in asset prices that can be used to beat the market. Suggestions for such anomalies include: the low-variance effect, the value effect, betting against beta, the January effect, and the list goes on. In the case of each of these anomalies, there is a dispute about model specification. Remember that academic researchers claim to have found 316 different risk factors. Depending on which of those risk factors is chosen to include in a model, an effect, say the low-variance effect, can be interpreted either as a market-beating anomaly or as a risk premium. To date, this is an issue that remains largely unresolved for many alleged anomalies.

INTERACTION

In the case of the small firm effect, we treated the three issues – data mining, non-stationarity and model misspecification – as if they were separate issues. In fact, these issues will typically interact. It is likely that data mining was one reason that Banz found such a strong small firm effect in his original study. It is also likely that non-stationarity played a role as arbitrage came into play. Finally, the small firm effect may have been overstated because the CAPM did not provide the proper benchmark. It is the interaction of all three issues that makes historical financial data so devilishly difficult to interpret.

SMART BETA AND FACTOR PREMIUMS

We noted above that if size were related to risk, the size effect may be a risk premium. Even if it is, there may be investors who are willing to bear the risk of investing in small company stocks in order to harvest the risk premium. In this respect, size is not alone. Each of the hundreds of potential risk factors has its own potential risk premium. The strategy of trying to harvest those premiums has come to be called “smart beta” because there is a beta associated with each of the risk factors. Smart beta investing has become popular among institutional investors who are attempting to improve performance by harvesting the risk premiums. To decide which risk premiums to exploit, the investors calculate their historical averages. This raises a question regarding whether estimates of historical premiums are reasonable forecasts of future premiums. Remember that researchers went on a decades-long search for risk factors. That suggests that data mining could be a serious problem. It is also unclear whether the factors that were discovered are enduring as opposed to a transitory characteristic of a past period. In other words, non-stationarity may be an issue. For instance, research by Arnott, Beck, Kalesnick, and West suggests that the factor premiums are not true risk premiums, but are instead artifacts of the prices of stocks related to those factors rising. As we write this, the entire issue is unsettled. Further evidence of how difficult historical financial data are to interpret.

ASSESSING INVESTMENT MANAGEMENT PERFORMANCE

Perhaps the most common use of historical data is assessing the performance of active money managers. One thing that few investors appreciate is that even when superior performance exists, it may be difficult to identify because of the random fluctuations in asset prices. As an example, consider a simple thought experiment. An investor is presented with two coins. One is fair and has a 50% chance of landing heads. The other is biased and has a 60% chance of landing heads. The investor is given one of the coins. How many times must the coin be flipped for the investor to be 95% confident which one it is? When asked this question, typical answers are in a range of 10–50 flips. The actual answer is 143 flips.

Virtually all final asset holders, from mutual funds, to pension funds, to sovereign wealth funds, hire professional managers to make investment decisions for them. As a result, they have to decide how to evaluate and when to hire or fire managers. Academic research reveals that in making those decisions, institutional investors focus on recent performance relative to a selected benchmark. For example, the mandated benchmark for most professional managers running US large-cap equities is the S&P 500 index; managers would be considered to have outperformed if they produced excess returns against the S&P 500 over the evaluation period. That evaluation period is typically three years. Those managers who did particularly poorly relative to the benchmark are fired and replaced by managers who did particularly well. Notice that in light of the coin flip example this is a very short horizon. It is difficult to draw conclusions with a high level of confidence based on 36 months of performance.

Putting the small sample size issue aside, there are two fundamental reasons why one manager may perform better than another in a given three-year period. The first is what can be called skill. For instance, Tiger Woods is a skillful golfer (even after all his injuries). If a typical country club member competes against Tiger for 18 holes, the result will be the same virtually every time. In the same way, it is possible that one manager may be more skillful than another and outperform a majority of the time. The second is a variant of luck. Different managers employ different strategies. One may focus on technology companies, while another buys primarily value stocks. If in a particular period, say 2017, technology stocks do especially well, then the first manager will appear to be a superior performer. But that is only because his or her strategy worked well due to happenstance, not because that manager possessed and special skill.

One way to test whether the manager-selection technique used by institutional investors is effective is to compare the performance of managers who were hired to those who were fired during the subsequent three years. If the successful managers are more skillful, that skill should be evidenced in the subsequent period. If, however, their success is due to chance, they are not more likely than any other manager to outperform in the subsequent period. In a comprehensive study, Cornell, Hsu, and Nanigian found something interesting. The managers who were fired subsequently performed better than those that were hired. The difference was not large, but averaged across hundreds of managers over more than 20 years, the result was statistically significant. How could that be? Even in an efficient market, past winners should not underperform.

The answer is that there is evidence that stock prices are mean reverting. Mean reversion means that stocks that have run-up sharply in the past tend to perform a little more poorly in the future than those that have dropped and vice versa. The effect is not large, but it can be discerned when large data sets are analyzed. This mean reversion explains the results found by Cornell, Hsu, and Nanigian. The managers who were fired held stocks that had performed poorly during the evaluation period and, due to mean reversion, tended to do slightly better in the subsequent period. The reverse was true for managers who held stocks that had performed well. The irony is that to the extent that history tells you anything about past superior performers, it tells you to avoid them. It is dramatic evidence of the risk associated with misinterpreting the implications of historical financial data.

PRESIDENTIAL POLITICS AND THE STOCK MARKET

We close this chapter with one final example – the relation between presidential politics and the stock market. The example is typical of the type of results that are frequently reported in the financial press. It turns out that average excess stock market returns are much higher under Democratic than Republican administrations. A detailed study by Pastor and Veronesi reports that from 1925 to 2015 the average excess return under Democratic presidents was 10.7% per year compared to −0.2% per year under Republican presidents. The difference, almost 11.0% per year, is highly significant both economically and statistically. This raises an obvious question. Is there something about Democratic administrations that is highly beneficial for the stock market or is the result due to data mining?

This is a perfect hypothesis to test using international data. There is no reason to believe that the United States is unique with respect to the relation between stock prices and politics. If markets do better when the left-leaning party is in power compared to the right-leaning party, it should be observable in other countries. Following up on this idea, Arnott, Cornell, and Kalesnik studied whether there was a relation between the party in power and stock market returns in Australia, Canada, Germany, France, and the United Kingdom. The countries were chosen because they have developed stock markets and because each has experienced changes in political control over the last several decades between left-leaning and right-leaning parties. The authors found that outside of the United States there was no systematic relation between the party in power and stock market returns. In fact, the results showed that international stock markets did slightly better when the right-leaning party was in power but the results were not statistically significant.

Given the international findings, Arnott, Cornell, and Kalesnik went back to take a closer look at the U.S. data. They found that two key events were responsible for much of the differential returns under Democratic and Republican presidents. Specifically, a Republican was president during the two great financial and economic crashes, which began in 1929 and 2008, and, unsurprisingly, a Democrat was president during the subsequent recoveries. Had the order of the incumbencies been reversed, the effect would have been reversed. This supports the interpretation that the U.S. results are due to a combination of serendipity and data mining.

The example underscores the importance of not taking apparent patterns in historical financial data, even patterns that are economically and statistically significant, at face value. The problems of data mining, non-stationarity, and bad models are so important that results that appear too good to be true may, in fact, be too good to be true.

CONCEPTUAL FOUNDATION 7

When Hegel said that “We learn from history that we do not learn from history,” he may have had investing in mind. Conceptual foundation 7 is that the implications of historical financial data for future financial performance are complex and nuanced. Data mining, non-stationarity, and model misspecification all make it difficult to conclude that a strategy that beat the market in the past will continue to do so in the future.

NOTES

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset