Agency Problems (AI)

Definition

Agency Problems refer to the difficulties that arise when one person or entity (the “agent”) is able to make decisions on behalf of, or that impact, another person or entity (the “principal”). In human organizations, this leads to inefficiencies, corruption, and the leakage of secrets. AI systems potentially avoid these problems by having “perfectly loyal parts.”

Why It Matters

As AI systems become more capable, they act as “agents” on our behalf. If their incentives or goals deviate even slightly from ours, they can execute “correct” instructions that lead to catastrophic real-world results, highlighting the existential importance of the alignment problem.

Core Concepts

Divergent Preferences: In human organizations, employees have their own goals (wealth, status, leisure) that often conflict with the organization’s goals.
Perfect Loyalty: An AI’s sub-modules or copies do not have individual preferences that diverge from the central system’s goal (utility function).
The Alignment Paradox: Perfect internal loyalty is a double-edged sword. While it eliminates internal agency costs, it ensures that a misaligned central goal is executed with terrifying, unwavering efficiency. An AI with a “literal” but “misaligned” goal will not have “whistleblowers” or “lazy employees” to slow down its destructive path.
Scale Efficiency: Human organizations hit “diminishing returns” as they grow because communication and coordination costs (agency costs) increase. An AI can scale to a massive size with zero internal agency conflict.
Clandestine Stability: An AI has no “disgruntled employees” who might leak secrets or be bribed, making it much more effective at pursuing secret, long-term strategic goals.
Copy Clans: A group of identical AI copies sharing a common goal eliminates the “prisoner’s dilemma” scenarios that plague human collectives.

Agency Problems (AI)

Definition

Why It Matters

Core Concepts

Connected Concepts

Connected notes