Introduction to correlation



Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.


  • The following topics will be covered in this lecture:
    • Scatter plots
    • Correlation
    • Linear correlation coefficient
    • Computing correlation


Scatter plot of chocolate consumption versus number of Nobel laureats per country.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • Intuitively, we think of the idea of correlation or anti-correlation as the behavior of two variables varying together or oppositely.
  • To the left, we see the number of Nobel Leaureates per million persons in a country plotted in a scatter plot versus the number of kg of chocolate consumed per capita.
  • At at glance, we can see that the two variables tend to vary together, but not in an exact, determinstic way;
    • i.e., a \( 1 \) unit increase in chocolate consumption doesn’t automatically correspond identically to a \( 2.5 \) unit increase in the number of Nobel Laureates.
  • Also, there is no reason to believe that the increase in one causes an increase in the other;
    • i.e., eating more chocolate doesn’t produce more Nobel prize winning scientist.
  • Indeed, a more rational explanation is that these values tend to vary together because Nobel Prizes usually go to countries with highly developed academic, cultural and industrial infrastructure.
  • Likewise, inhabitants of these coutnries can better afford luxury goods like chocolate.
  • Correlation should never be considered causation, but rather that two measurements tend to have systematic associations, which may be better explained by other latent variables.
  • We may see a similar association when plotting either of the two above variables versus a third variable that acts as an economic indicator.

Motivation continued

Table plot of chocolate consumption versus number of Nobel laureats per country.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • Understanding the limitations of correlation for explanatory power, we can use correlation as a powerful research tool for the purpose that it is intended.
  • Correlation is a statistic that we will compute from pairs of measurements in a single sample.
  • In the last example, we had a sample consisting of individual contries.
  • Each observation (country) corresponded to two distinct measurements:
    1. The number of kg of chocolate consumed per capita.
    2. The number of Nobel Laureates per million inhabitants.
  • This will always be a feature of computing correlation – we need observations which have two measurements.
    • We will compute the correlation coefficient between these variables.
  • Usually, we will use a scatter plot as a first check for a systematic pattern and then we wil then compute the statistic.
    • Being a statistic, the correlation coefficient is subject to sampling error;
    • we will also need to test for the significance to quantify the uncertainty of the value in relation to the population parameter.

Motivation continued