Andromeda
Note

Chi-Square Goodness of Fit Test

Definition

The Chi-Square Goodness of Fit Test is a statistical method used to determine how well a set of observed data fits a theoretical probability distribution. It compares the observed frequencies in a range of data “cells” against the frequencies expected if the distribution were true.

Why It Matters

It provides a rigorous way to determine if our mental models of a distribution actually match the “noise” of reality, preventing us from seeing patterns where none exist.

Core Concepts

  • The Statistic (χ2\chi^2): Calculated as (OiEi)2Ei\sum \frac{(O_i - E_i)^2}{E_i}, where OO is observed and EE is expected.
    • How to read: “The chi squared value equals the sum over the cells of the difference O i minus E i, squared, all divided by E i.”
    • Meaning / when to use: Measures total squared relative deviation between observed and expected counts. Large values suggest the hypothesized distribution fits poorly.
  • Null Hypothesis (H0H_0): The data follows the specified distribution.
  • Degrees of Freedom (dfdf): Calculated as kp1k - p - 1, where kk is the number of cells and pp is the number of parameters estimated from the data.
    • How to read: “The degrees of freedom equals k minus p minus one.”
    • Meaning: Each estimated parameter and the total-count constraint reduce independent information; dfdf sets the correct χ2\chi^2 reference distribution for the p-value.
  • Equiprobable Approach: Dividing the distribution into cells of equal probability (rather than equal width) to ensure Ei5E_i \ge 5 for all cells, increasing test reliability.
  • Requirement: Typically requires at least 20-30 data points for valid results.

Connected Concepts