Definition
The Chi-Square Goodness of Fit Test is a statistical method used to determine how well a set of observed data fits a theoretical probability distribution. It compares the observed frequencies in a range of data “cells” against the frequencies expected if the distribution were true.
Why It Matters
It provides a rigorous way to determine if our mental models of a distribution actually match the “noise” of reality, preventing us from seeing patterns where none exist.
Core Concepts
- The Statistic (): Calculated as , where is observed and is expected.
- How to read: “The chi squared value equals the sum over the cells of the difference O i minus E i, squared, all divided by E i.”
- Meaning / when to use: Measures total squared relative deviation between observed and expected counts. Large values suggest the hypothesized distribution fits poorly.
- Null Hypothesis (): The data follows the specified distribution.
- Degrees of Freedom (): Calculated as , where is the number of cells and is the number of parameters estimated from the data.
- How to read: “The degrees of freedom equals k minus p minus one.”
- Meaning: Each estimated parameter and the total-count constraint reduce independent information; sets the correct reference distribution for the p-value.
- Equiprobable Approach: Dividing the distribution into cells of equal probability (rather than equal width) to ensure for all cells, increasing test reliability.
- Requirement: Typically requires at least 20-30 data points for valid results.