Master Statistics 101: Correlation

correlation coefficients collage

What is Correlation?

Correlation indicates that as one variable changes in value, the other variable tends to change in a specific direction. For example, the height and weight of an individual can be correlated – which means if the person’s height is on the taller side of the curve, the weight would also be high.

What are correlation coefficients?

The correlation coefficient is a value that can quantitatively assess the strength and direction of the correlation or association between the two variables. One can choose different types of correlation coefficients depending on the type of data and the relationship you are looking at. One common correlation coefficient is the Pearson correlation coefficient, which explores the linear relationship between two continuous variables.

Pearson’s correlation coefficient

Before we proceed further, we need to learn how to interpret the correlation coefficient. Look at this by studying Pearson’s correlation coefficient (r)under the following subheadings.

  • Strength: The greater the absolute value of the correlation coefficient, the stronger the relationship between the variables. For example, -1 and 1 are the extremes in the range, indicating a linear relationship between the negative and positive variables, respectively. A coefficient of 0 indicates no correlation between the variables.
  • Direction: the sign of the correlation coefficient represents the direction of the relationship. For example, a negative sign indicates that the other variable’s value decreases as one variable’s value increases.
    • One example of a negative correlation would be the engine’s horsepower and mileage per litre of gas or petrol. Engines with higher horsepower tend to consume more fuel, thus resulting in lower mileage per litre of gas or petrol.

Scatterplots are one of the best graphs to visualise and look for correlations between the variables. Here is an image of scatterplots featuring different Pearson’s correlation coefficients: -1, -0.5, 0, 0.5, 1

correlation cefficient geekysteth

As you can see from the above graph, the stronger the correlation, the closer the data points are to the line, and the less the dispersion of the data points on the graph.

Pearson’s correlation coefficient measures only the linear relationship betwene the variables

If there is a curvilinear relationship between the variables, the person’s correlation coefficient might not detect it and might give you wrong, inaccurate values.

Correlation doesn’t imply causation.

You might have heard this rather infamous quote before: “Correlation doesn’t imply causation”. This is an important line to remember. Correlation doesn’t mean that changes in one variable will lead to changes in the value of the other variable.

  • So, if two variables are correlated, but there is no causal relationship, how can one explain the correlation between the two variables?
    • A third variable might correlate with the other two variables, leading to the correlation between the first two variables. This third variable is called a confounder or confounding variable. This confounding variable correlates with the other two variables and creates confusion regarding which is a causal relationship and which is a spurious association. In statistics and trials, you need to perform a randomized controlled trial

Like this post? Share and subscribe to stay in the loop: Subscribe

Comments

2 responses to “Master Statistics 101: Correlation”

  1. srih Avatar

    @sriharsha cool

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from Geeky Steth

Subscribe now to keep reading and get access to the full archive.

Continue reading