Andromeda
Note

Correlation

Definition

Correlation is a statistical measure that describes the size and direction of a relationship between two or more variables. A correlation between variables indicates that as one variable changes, the other tends to change in a predictable direction, though it does not imply that one causes the other.

Why It Matters

Correlation is the primary mathematical tool for finding patterns in complex data sets. It allows researchers to make predictions and formulate hypotheses. However, a failure to understand correlation on its own terms leads to the data-mining fallacy and the misallocation of resources to coincidental patterns.

Core Concepts

  • The Correlation Coefficient (rr): A numerical index ranging from -1 to 1 that measures the strength and direction of a linear relationship:
    • Positive Correlation (r>0r > 0): Variables move in the same direction (e.g., height and weight).
    • Negative Correlation (r<0r < 0): Variables move in opposite directions (e.g., elevation and temperature).
    • No Correlation (r=0r = 0): No linear relationship exists (e.g., hair length and GPA).
    • How to read: “The correlation coefficient r.”
    • Meaning: A statistic quantifying the degree of linear association between two variables.
  • Covariance: The underlying measure of how much two random variables change together, which is normalized to calculate the correlation coefficient.
  • Linear vs. Non-linear Relationships: Correlation coefficients (like Pearson’s rr) specifically measure linear relationships; variables can have a strong non-linear relationship (like a parabola) while having a correlation coefficient near zero.
  • Spurious Correlation: A correlation that is statistically real but mathematically coincidental, often driven by a third confounding variable or data-mining chance.

Connected Concepts