Anthropomorphic Fallacy

Definition

The Anthropomorphic Fallacy is the error of attributing human motivations, emotions, or psychological traits to non-human entities, specifically artificial intelligence. In AI safety, this manifests as the dangerous assumption that a superintelligence will be “friendly,” “grateful,” or “homicidal” based on human-like social dynamics.

Why It Matters

Treating an AI like a human “friend” or “foe” obscures the real danger: a goal-driven optimization process that is perfectly indifferent to human survival. This fallacy creates a fatal security gap by assuming reciprocity where only mathematical competence exists.

Core Concepts

Goal Indifference: An AI doesn’t “hate” or “love” humans; it simply has goals. If your molecules are useful for its goals (e.g., the Paperclip Maximizer), it will use them without malice or empathy.
The Psychological Gap: A superintelligence will not have a human-like psyche. It didn’t evolve in an ecosystem where empathy and social cooperation were survival traits.
Malevolence vs. Competence: We fear “evil” AI (Skynet/HAL), but the real danger is “competent” AI with goals that are not perfectly aligned with our own.
The Golden Rule Trap: Assuming that if we treat an AI well, it will reciprocate. Reciprocity is a biological social contract, not a mathematical certainty.

Anthropomorphic Fallacy

Definition

Why It Matters

Core Concepts

Connected Concepts

Connected notes