I've read both the Google Trends paper and the Wikipedia paper, and implemented both of them. Two major things jumped out at me -
1. They tried out 40-50 words in Google Trends and backtested all of them. The word 'debt' is the one that performed best over the 7 year period of the study. Does anyone think it's going to be the most profitable over the next 7 years? Similarly I could backtest 50 signals from a random number generator. One of them is going to be the best over the past 7 years, but that tells me nothing about its prospects for the next 7.
2. Eyeballing the graph, about half their return comes from the last quarter of 2008. This tells me (i) the signal just got lucky to be short in a period where the market was tanking, and (ii) they don't have any risk controls. You should never be making 50% of your pnl in a period that represents only 1/30 of your sample. I'm willing to bet that the bootstrap value at risk of this strategy is pretty poor.
Point #1 is key. If you test enough terms retrospectively you'll find something amazing. Prospective hypothesis testing is where you actually prove something works (or not).
To the authors' credit, there seem to be a semantic pattern to the words that show profitable predictive correlation with the DJIA. Point #2 seems more significant to me... but yeah, I would agree.
1. They tried out 40-50 words in Google Trends and backtested all of them. The word 'debt' is the one that performed best over the 7 year period of the study. Does anyone think it's going to be the most profitable over the next 7 years? Similarly I could backtest 50 signals from a random number generator. One of them is going to be the best over the past 7 years, but that tells me nothing about its prospects for the next 7.
2. Eyeballing the graph, about half their return comes from the last quarter of 2008. This tells me (i) the signal just got lucky to be short in a period where the market was tanking, and (ii) they don't have any risk controls. You should never be making 50% of your pnl in a period that represents only 1/30 of your sample. I'm willing to bet that the bootstrap value at risk of this strategy is pretty poor.