## Difference between correlation and causation

This is the first article in my “causality” series. Please see the index for a list of the other posts in this series.

In my day job I deal with probability and statistics. I use words like correlation, standard error and so on but I never use the word causality or causation. In my current professional world it is impossible to separate causation from correlation.

It is ironic that in my old professional world (physics) I never used the word correlation. In my old world I used the word causality quite often but the word correlation was not part of my vocabulary.

If two variables are behaving similarly then there is a correlation between them. There are degrees of correlation. For example, we say that there is 80% correlation or 20% correlation. When the degree of correlation is close to zero we say that those two variables are uncorrelated. Sometimes the two variables are anti-correlated. For example, if the correlation is rather high with a minus sign, say -80% then we can say that those two variables are anti-correlated.

When I say “variable” I don’t necessarily mean “continuous variable.” We can have “discrete variables.” For example, you can type random numbers in an (Excel) spreadsheet column, then type another series of random numbers in a separate column, then use the Excel function CORREL to compute the correlation between these two series (columns). Tutorials on the Excel  CORREL function can be found here or here. Mathematically, correlation is a number between -1 and +1. In common language, -1 is also known as -100% correlation and +1 is known as +100% correlation.

If the degree of correlation is really high, say above 90% (positive or negative) we suspect a causal relationship between the two variables but we can never be sure. Even a high degree of correlation is not a proof of causal relationship.

In nature there are causal relationships. Science has demonstrated that there are natural laws. This is another way of saying that natural phenomena are caused by other natural phenomena.  There is cause and effect in nature.

Not every scientist or philosopher agrees with the statement that there is cause and effect in nature. There are some sophisticated philosophical discussions on causality and whether the natural laws exist or not but for a physicist the existence of the causal order in nature is very clear. The mathematical equations of these “laws” make precise predictions. Some “laws” of nature are deterministic, some are statistical and others are probabilistic. Deterministic laws of nature make precise predictions in terms of individual outcomes, the statistical and probabilistic laws too make precise predictions in terms of the distributions of outcomes.

In the other posts of this series I will present contrasting views such as the view that the “linear causality” is an illusion. Let’s ignore these subtleties for the time being. What I am trying to do in this article is to show the difference between the natural sciences and the social/economic sciences. The difference is captured by the correlation/causation duality.

When the variables are the constructs of the human mind or human behavior then it is not possible to talk about natural laws or causation anymore. In the world of variables determined by the human mind (social and economic sciences) we can only talk about statistical patterns, trends, correlations and similarities.

There is a branch of physics known as the Statistical Mechanics where we formulate the statistical laws of nature. There is another branch of physics known as the Quantum Mechanics where we formulate the probabilistic laws of nature. These are statistical or probabilistic laws but still laws of nature.

The types of models that we use in social and economic sciences are very different from the models of Statistical Mechanics or Quantum Mechanics. Social and economic models use the mathematics of statistics and probability too but they can never be classified as laws because they deal with the human behavior. In the world of social and economic sciences the statistical patterns, trends, correlations and similarities change all the time. The natural laws on the other hand never change.  Social and economic models, on the other hand, change almost every 6 months. 