Andromeda
Note

Sampling

Definition

Sampling is the practice of selecting a subset of individuals or data points from within a larger population to estimate the characteristics of the whole.

Why It Matters

Sampling is the ‘high-leverage’ tool of data science; it allows us to understand vast populations with minimal energy, provided we have the discipline to avoid the selection biases that render a model useless.

Core Concepts

  • Representative Accuracy: A sample is only as good as its ability to mirror the total population. Selection bias (e.g., only sampling people who are easy to reach) destroys the model’s validity.
  • Sample Size vs. Cost: Larger samples reduce “margin of error” but increase the “energy cost” (time and money). The goal is to find the point of diminishing returns.
  • The Law of Large Numbers: As a sample size grows, its mean gets closer to the average of the whole population.

Connected Concepts