p-Value Hacking

Definition

p-Value Hacking (or researcher degrees of freedom) is the misuse of data analysis to find patterns that can be presented as statistically significant when they are actually the result of chance. It involves exploiting the common threshold of $p < 0.05$ through biased data selection or analysis choices.

How to read: “The p-value is less than 0.05.”
Meaning: The conventional significance cutoff—results below this are often treated as “real,” creating incentive to manipulate analysis until $p$ crosses it.

Why It Matters

P-hacking is “scientific fraud by statistics.” It allows researchers to claim “significance” where none exists, polluting the literature with false positives. It is the technical engine of the Replication Crisis, and its cost is the slow erosion of public trust in the scientific method itself, leading to wasted lives and billions in dead-end research.

Core Concepts

Researcher Degrees of Freedom: The many choices a researcher makes (when to stop data collection, which variables to exclude, which statistical tests to use). If these choices are made after seeing the data, they can inflate the chance of a false positive to over 60%.
Cluster around 0.05: A telltale sign of p-hacking in an academic field is a suspicious cluster of published p-values just below the 0.05 significance threshold.
The predictive value trap: A $p = 0.05$ $p = 0.05$ result does not mean there is a 95% chance the hypothesis is true. In a field with low prior probability, a $p = 0.05$ $p = 0.05$ study has a significant chance of being a false alarm.
- How to read: “The p-value is equal to 0.05.”
- Meaning: $p$ measures compatibility with the null, not posterior belief; low base rates make marginal $p$ -values unreliable.
Optional Stopping: Looking at results during data collection and stopping only when a significant result is achieved.

Definition

Why It Matters

Core Concepts

Connected Concepts

Connected notes