What butter production means for your portfolio
(MoneyWatch) There are many illusions in the world of investing. The process known as data mining -- or torturing the data until it confesses -- creates many of them. Unfortunately, identifying patterns that worked in the past doesn't necessarily provide you with any useful information about stock price movements in the future. As Andrew Lo, a finance professor at MIT, points out: "Given enough time, enough attempts and enough imagination, almost any pattern can be teased out of any data set."
The stock and bond markets are filled with wrongheaded data mining. The tale of David Leinweber, which is related in the excellent new book "Quantitative Value," illustrates this point about "stupid data miner tricks." Leinweber sifted through a United Nations CD covering the economic data of 140 countries. He found that butter production in Bangladesh explained 75 percent of the variation of the S&P 500 Index. Not satisfied, he found that if he added a broader category of global dairy products, the correlation would rise to 95 percent. Then he added a third variable, the population of sheep, and found that he had now explained 99 percent of the variation in the S&P 500 for the period 1983-'99. That's a virtual perfect fit.
- The obsession with growth investing
- Is gold's price drop just the beginning?
- What if you could time recessions -- perfectly?
His example is a perfect illustration that the mere existence of a correlation doesn't necessarily give it predictive value. Some logical reason for the correlation to exist is required for it to have credibility. For example, there's a strong and logical correlation between the level of economic activity and the level of interest rates. As economic activity increases, the demand for money, and, therefore, its price (interest rates), also increases.
The amusing part of the story is that reporters picked up on the story, and Leinweber started getting calls from investors about the status of butter production in Bangladesh. Leinweber, of course, meant this as a joke. He meant to show that data mining, or analyzing huge amounts of data without a pre-existing rational hypothesis, leads to the finding of relationships that are pure coincidences. In other words, Leinweber didn't know what relationship would show up. However, he knew that some random data point would likely show a strong relationship. Of course, no one would have come up with the theory that butter production in Bangladesh would predict the stock market. Butter production was just the "fish" Leinweber caught in a fishing expedition. And, of course, butter production in Bangladesh was useless as a predictor before 1983 or after 1999.
Unfortunately, our minds are trained to find patterns, making it difficult for us to dismiss apparent patterns as nothing more than the random outcomes they often are. Tomorrow we'll look at another example of what I believe is a stupid data mining trick. It's one that has a vast following. With that in mind, try to think of a theory that would give you a strong reason to believe that there are parts of the calendar when stocks should perform so poorly that they would be expected to have lower returns than one-month Treasury bills.
Image courtesy of Flickr user 401(K) 2013.