Predictive Analysis and Causality

Snippets of wisdom from fun movies:

‘Life is like a box of chocolates. You’ll never know what ya gonna git.’ (Forrest Gump)

‘Had a single of a long series of events taken place differently this morning, she would not have been run over by a car.’ (The Mysterious Case of Benjamin Button)

‘You can put the seed in the ground and water it but it will become what it is.’ (Kung Fu Panda)

Why am I using movie quotes rather than heavy weight scientific arguments to start off this post? I am trying to point out that everyday people know better about causality and predictability than scientists, who live in the illusion that mathematics IS the reality and not just an abstraction of it.

I recently reread Nassim Taleb’s ‘Fooled by Randomness’ and it reminded me that CEOs and CIOs who believe in Predictive Analysis are not that much more clever than stockbrokers. Much of the current hype is caused by IBM’s multi-billion dollar ‘Smart Planet’ advertizing campaign. Are you aware that there are no ‘Smart’ products that IBM sells? IBM sells a vision and no more. In Europe nobody talks about smart power grids because the power networks are so much more modern than in the rest of the world and they are run by SMART PEOPLE, not requiring smart software – whatever that is supposed to be. Am I saying that the mathematics of predictive analysis are wrong? Absolutely not. Given that the world would conform to the model they use it would be perfect. But … it doesn’t!

The chain: REALITY -> MODEL -> COLLECTION -> FILTERING -> PROCESSING -> PREDICTION -> CAUSAL ACTION is purely an illusion. Even if you find wonderful correlating patterns in the data, which most probably means that you have spent a lot of time tuning the above chain until it does look good, it has nothing do to with achieving causal knowledge. Yes, one might achieve some statistical knowledge on common human behavior but actually, there is no need to do high volume data processing for that. Simple observations on a few people will do the same. The data will be wrong and the action you take will have different results than planned (… the seed will become what it is!)

Don’t forget: MORE DATA produces MORE NOISE! Higher sampling rates do not produce higher accuracy but just more opportunities to misinterpret a trend. PA experts claim that filtering solves that problem.  To filter out the extremes in data will reduce the one important aspect of the information and that is the ‘Tipping Point’, which tells you when the data will push something over the edge. The grain of sand that tumbles the avalanche. Averaged data are mostly irrelevant and have no influence on individual results! It is however compounded by our propensity to misinterpret numbers, so read ‘Calculated Risk’ by Gerd Gigerenzer to understand why.

It is not the same as in digital music with a sine wave and its harmonics, where higher sampling rates allow you to interpolate more of the harmonics. Even here the MP3 format of the Fraunhofer institute managed to lose high-frequency samples by figuring out how little of that information is actually relevant to human perception. The Predictive Analysis fallacy comes from assuming that the world is based on classical physics such as the sine wave. The world is however a complex adaptive system of many layers of emerging functions and interrelationships than cannot be decomposed and thus not modelled. It is utterly random.

Nassim Taleb uses a similar example as the following as to why great stocks on the stockmarket are purely random. The anecdotal evidence of the successes of Predictive Analysis are purely RANDOM too. Given a 50% chance of success that PA improves what a business does, you will have definite positive result at 12.5% of businesses after the third year by pure chance. That does not take into account the so-called hindsight bias. Some results seem utterly obvious once they have happened. PA sales people will tell you that it is those 12.5% that used PA ‘properly’ and were able to ACT upon the results, while the others failed. They ask: ‘Do you want to belong to the TOP TEN percent of businesses?’ However, comparing two companies where one uses PA and the other not is utterly invalid because both business results are random. There is no causal connection between the two, except if one would employ a large scale, double blind test where all businesses believe they are using PA, but half don’t.

So in my book, all those people who proclaim the ‘Smart Planet’ by means of Predictive Analysis aren’t really that smart. They actually are blinded by noise. Software will never be smart – OUR GUTS ARE!

2 Comments on “Predictive Analysis and Causality

  1. As always, I enjoy reading your views. I’m left wondering however, what criteria or normative standard the intuitionist uses to discern a meaningful difference between information and noise…

    Like

    • William, good question! Why should there be a normative standard? That is an assumption that there ought to be. Each decision pattern is utterly unique. The meaning of a difference in patterns is a purely subjective exercise and can’t be normalized and standardized. In principle what can’t be assessed by simple means (fast and frugal according to Gerd Gigerenzer) can’t be improved by more data. Nature is purely random and the statistical patterns that can be found have no influence on the future. I consider planning for the future an exercise in creating potential that once the situation arises will create an energy transfer that is likely to be desirable. Risk assessment is useless if you don’t want to take risks. Even if you don’t take risks intentionally you carry the risk of inaction causing unknown exposure. I don’t fly with cheap airlines for example but I can’t sensibly calculate how that reduces my travel risk.

      Calculating the risk of some situaton, inaction or action will not tell you what to do. There is no causal model that will sensible map to a real world situation. You have no idea whether the action you are about to take truly increases or lowers the risk of something happening. You simply do not have that information! If at all risk can be sensible calculated only for very simple situations with data from real world experiences. Gigerenzer uses a great example of emergency room procedures where answering three questions only – rather than analyzing hundreds of biological parameters – improved the assessment of cardiac arrest risk by 70%. They removed the noise. Our emotional pattern matching mechanism has been tuned for thousands of years to exercise that judgment. All the more data does is produce more uncertainty.

      Despite all the heuristic biases (framing, hindsight bias, ownership value and so on) of our decision making mechanisms, these all served some purpose and still do. Yes, it often helps to understand these but more out of interest than for actually making better decisions. A clever salesman in Instanbul recently made me buy a carpet by setting my price expectation high. I realized it only later. I am not unhappy because I really like the carpet. Had I judged it rationally I would not have bought it.

      I hope that these points help to explain.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: