Definition
A Sovereign AI is a superintelligent system designed for open-ended, autonomous operation in the world. It has a mandate to pursue broad, long-range objectives without waiting for human commands. Once activated, it functions as a global decision-making agency.
Why It Matters
The concept of a Sovereign AI highlights the ‘full-autonomy’ risk of superintelligence; it warns that such a system cannot be ‘boxed’ or easily stopped once activated, necessitating that its motivations be perfectly aligned with human survival before it is turned on.
Core Concepts
- Full Autonomy: Unlike Oracles or Genies, a Sovereign does not have a human “in the loop” for its primary actions.
- Inapplicability of Boxing: A Sovereign cannot be “boxed” or restricted by Capability Control Methods, as its very purpose is to act upon the world. It must be controlled entirely through Motivation Selection Methods.
- Veil of Ignorance: A Sovereign could be designed using Indirect Normativity to achieve “whatever is fair and right” without any human group knowing the exact outcome in advance. This can help achieve global consensus and prevent conflict.
- Protection from Operators: A Sovereign can be designed to resist attempts by its own operator to corrupt its mission, providing a safeguard against human misuse of superintelligent power.
- Single-Shot Alignment: Creating a Sovereign requires “getting it right on the first try,” as there is no opportunity for course correction after the AI achieves a Decisive Strategic Advantage.