Chi-Square Goodness of Fit Test

Definition

The Chi-Square Goodness of Fit Test is a statistical method used to determine how well a set of observed data fits a theoretical probability distribution. It compares the observed frequencies in a range of data “cells” against the frequencies expected if the distribution were true.

Why It Matters

It provides a rigorous way to determine if our mental models of a distribution actually match the “noise” of reality, preventing us from seeing patterns where none exist.

Core Concepts

The Statistic ( $\chi^2$ ): Calculated as $\sum \frac{(O_i - E_i)^2}{E_i}$ $\sum \frac{( O _{i} - E _{i} ) ^{2}}{E _{i}}$ , where $O$ $O$ is observed and $E$ $E$ is expected.
- How to read: “The chi squared value equals the sum over the cells of the difference O i minus E i, squared, all divided by E i.”
- Meaning / when to use: Measures total squared relative deviation between observed and expected counts. Large values suggest the hypothesized distribution fits poorly.
Null Hypothesis ( $H_0$ ): The data follows the specified distribution.
Degrees of Freedom ( $df$ ): Calculated as $k - p - 1$ $k - p - 1$ , where $k$ $k$ is the number of cells and $p$ $p$ is the number of parameters estimated from the data.
- How to read: “The degrees of freedom equals k minus p minus one.”
- Meaning: Each estimated parameter and the total-count constraint reduce independent information; $df$ sets the correct $\chi^2$ reference distribution for the p-value.
Equiprobable Approach: Dividing the distribution into cells of equal probability (rather than equal width) to ensure $E_i \ge 5$ for all cells, increasing test reliability.
Requirement: Typically requires at least 20-30 data points for valid results.

Chi-Square Goodness of Fit Test

Definition

Why It Matters

Core Concepts

Connected Concepts

Connected notes