Crowds in the Clouds
We’ve already seen signs that internet data can be used for various sorts of prediction. Using Google trend data allows nowcasting of employment trends and the spread of disease (see Nowcasting with Google), Twitter may predict stockmarket movements (Twits, Butter and the Super Bowl Effect) and social media predicts investor sentiment (Noise, Sentiment and StockTwits). Reports suggest that there are already fund managers out there exploiting these ideas.
The question remains whether these sources of information are really reliable or whether we’re seeing data mining biases. The more data you have the more probable it is that you can find a correlation between any two variables, proving little other than having a lot of computing power makes work for idle processors. Not all data is equal, though, and some results suggest that the wisdom or crowds is alive and well in the internet clouds.
The Ox and Anchor
Finding a theory to explain the concepts behind prediction using internet data isn’t hard. The problem isn't a lack of theories but a veritable plethora of the darned things. For example, the idea of the wisdom of crowds (see Contrarianism) – that a group of relatively uninformed people can converge on the “right” answer where a group of experts can’t – has been around for over a century and originated in an insight of Francis Galton's (Regression to the Mean: Of Nazis and Investment Analysts). Famously Galton noted that a crowd of people were able to correctly judge the weight of an ox at a market because, even though no individual was able to exactly guess the answer, their predictions were correct on average.
This idea has become an investment meme, transgressing the original theory, and is invariably rolled out when people start discussing markets and the accuracy of predictions. Critical to wisdom of crowd models is that the individuals making predictions are independent of each other. If the latest guess about Galton’s ox had been posted on a board next to it then that information alone would have been enough to bias the next person – this is the key finding of research on anchoring, that people unconsciously attach themselves to any old data they can find when making a prediction. You can bias peoples’ predictions upwards or downwards by randomly introducing spurious numbers in the preamble to the real question (Anchoring, the Mother of Behavioral Biases).
An alternative to the wisdom of crowds for understanding internet information are econophysics models, which offer ways of understanding complex systems based on analogies with physics (Econophysics, Conciousness and Cosmic Karma). However, even these assume that there is some form of underlying collective “swarm intelligence” which is driving herding behavior. This has clear appeal in certain internet situations such as social media where people are clearly not making independent decisions. Internet search trends, on the other hand, may well be composed of independent individuals acting on local information. As yet a further variation there are agent based models, where 'agents' are traders either exhibiting rational, information based or irrational, noise driven, investing behavior.
All this proves, if it proves anything at all, is that we need to be careful which models we follow when we start analysing trends. Nonetheless, there is a growing volume of research on market prediction from internet data – which is suggestive itself of some level of herding amongst researchers. There’s got to be a research grant in using trends in academic research to predict how the securities industry will next exploit private investors.
Ising on the Cake
A lot of the emerging research emerging in this area is genuinely interesting for investors. Tobias Preis, Daniel Reith and H. Eugene Stanley in Complex dynamics of our economic life on different scales: insights from search engine query data, show that there’s a link between Google searches and the trading volume of related S&P 500 stocks. Importantly this link is predictive – search volume one week predicts volume the next. Sadly it doesn’t tell you the direction of the stock movement, but maybe that’ll come next.
The researchers are using an Ising model, an econophysics model which arises from models of ferromagnetism, comprising a system of subunits which interact to create the overall behavior. In this research the overall system is the market and the subunits are the individual sellers and buyers of stocks. As in models of ferromagnetism the behavior of each unit is affected by the behavior of the units around it – so just as individual dipoles in a magnet will align with those around them so individual investors will herd with those they’re closely linked with, and this can cause sudden changes in market behavior – so called “phase transitions”.
Alternatively, In Search of Earnings Predictability BY Zhu Da, Joseph Engleberg and Pengjie Gao construct a wisdom of crowds model using internet search data to predict corporate earnings announcements. The approach is to look at the volume of Google searches for the target companies’ leading products and then to see if this predicts earnings statements. The results are impressive:
“We find that increases (decreases) in the search volume index (SVI) of a firm’s most popular product predict positive (negative) revenue surprises and standardized unexpected earnings (SUE). Changes in search volume also predict earnings surprise relative to the median analyst forecast, especially among firms with high information uncertainty. Finally, we find strong evidence that innovations in SVI predict announcement-window abnormal returns, even after controlling for the earnings and revenue surprise at the announcement.”
Meanwhile Thomas Dimpfl and Stephan Jank in Can Internet Search Queries Help to Predict Stock Market Volatility? use an agent based model – a model that incorporates agents who are either rational information based traders or irrational noise based traders – to determine whether information from Google queries is useful for predicting market behavior:
“In the model by Lux and Marchesi (1999) noise traders are seen as a source of additional volatility in the stock market. A fundamental shock in volatility triggers noise trading, which in turn causes volatility. Taking internet searches as a measure of retail investors’ attention, we observe exactly this pattern of high volatility followed by high retail investor attention, which is then followed by high volatility.”
Essentially private investor attention is attracted by high volatility, causing higher levels of searches, followed by yet more volatility. This is in line with yet more research by Da, Engleberg and Gao who, in The Sum of All FEARS: Investor Sentiment and Asset Prices, use internet search data to show that queries related to personal financial fears can be used to predict reversal effects amongst stocks highly favoured by noise investors – high levels of queries indicating fearfulness about personal finance result in immediate low returns in such stocks followed by a reversal later on. It also predicts flows of funds into and out of mutual equity funds, but not bond funds. You've just got to reckon someone's scalping private investors using these trends.
The overall impression from these results is that internet search data is revealing something interesting about investor trends, but exactly what it’s revealing is dependent on what you look at, and how you look at it. If you’re interested in investor sentiment and what noise traders are going to do next then looking at what private investors are searching for is of interest, but if you’re interested in future corporate stock performance you want to look at what their crowd of customers are doing. As usual the internet has something for everyone.