Definition
Preference Utilitarianism is an ethical framework (pioneered by John Harsanyi) that argues the “right” action is the one that maximizes the satisfaction of individuals’ preferences, rather than just pleasure or wealth. It treats individuals as the sovereign judges of their own well-being.
Why It Matters
Happiness is subjective. Preference utilitarianism is the only ethical framework that treats people as the “Masters of their own Meaning.” For AI safety, this is the “Constraint of Autonomy”: the machine must do what we want, not just what makes us smile. If we get this wrong, we create a “Nanny State” superintelligence that treats us like happy, lobotomized pets.
Core Concepts
- Preference Autonomy: The principle that in deciding what is good or bad for an individual, the ultimate criterion is their own wants and preferences.
- Social Aggregation Theorem: Harsanyi’s proof that an agent acting on behalf of a population must maximize a weighted linear combination of the individuals’ utilities.
- Interpersonal Comparisons: The challenge of “scaling” utilities between people (e.g., does Alice like a lollipop as much as Bob?). Russell suggests machines can learn these scales by observing trade-offs.
- The Somalia Problem: The risk that a purely utilitarian AI might conclude it should immediately abandon its owner to help people in more desperate need (e.g., in a famine-stricken region), unless some form of loyalty or contractual compensation is included.