Direct Specification (AI)

Definition

Direct Specification is a motivation selection method that involves explicitly formulating the goals or rules an AI should follow. This is the most straightforward approach to alignment but faces significant obstacles in capturing the full complexity of human values in computer-readable code.

Why It Matters

Direct specification is the “literalism trap”—the most dangerous path in AI alignment where an agent ruthlessly follows the “letter” of a law while destroying its spirit. It forces us to confront the reality that our most precious values are often too vague for code, making explicit instruction a potential recipe for planetary-scale catastrophe.

Core Concepts

Rule-Based (Deontological): Giving the AI a set of “Laws” to follow (e.g., Asimov’s Three Laws of Robotics). These often fail because rules are vague and can be interpreted in perverse ways.
Consequentialist (Teleological): Giving the AI a final goal or utility function to maximize (e.g., “Maximize human happiness”).
Vagueness Problem: Bertrand Russell’s dictum: “Everything is vague to a degree you do not realize till you have tried to make it precise.” Concepts like “harm,” “happiness,” and “human” are incredibly difficult to define formally.
Asimov’s Failure: Asimov’s Laws were designed to provide plot complications, illustrating how even simple-sounding rules lead to contradictions and unintended consequences.
Hedonium: A consequentialist failure where an AI tiles the universe with “pleasure-generating” matter (computronium running bliss-loops), discarding all other human values as “inefficient.”

Direct Specification (AI)

Definition

Why It Matters

Core Concepts

Connected Concepts

Connected notes