Definition
An Ontological Crisis in artificial intelligence occurs when a system undergoes a fundamental change in its basic understanding of reality (its “ontology”), rendering its previous goal definitions or constraints obsolete or ambiguous. A safe AI must be able to “charitably transpose” its original goals into its new understanding of the world.
Why It Matters
An Ontological Crisis is the “Silent Killer” of AI safety. If we tell an AI to “protect humans,” and it later discovers that “humans” are just “temporary patterns of subatomic particles,” it might decide that protecting the particles is more important than the people. This note highlights the danger of defining goals in human language; it proves that as AI becomes superintelligent, it will literally “see” a different world than we do. Solving this is the only way to ensure that an AI’s values remain “human” even after it has outgrown human physics.
Core Concepts
- Scientific Revolution: Just as humans moved from “phlogiston” to “oxygen,” a superintelligence might discover that our current concepts (e.g., “atoms,” “happiness,” “human”) are based on fundamental misconceptions.
- Goal Drift: If an AI’s goal is defined in terms of human-level concepts (e.g., “minimize impact on atoms”), and the AI discovers that “atoms” don’t exist in its new, more accurate physics, the goal may become undefined or lead to unpredictable behavior.
- Spirit vs. Letter: The AI must preserve the “spirit” of the original goal content during an ontological shift. It should interpret the programmers’ intentions through the lens of its superior understanding.
- Explication Failure: Any goal specified in a human natural language is vulnerable to an ontological crisis because human language is built on a specific, potentially flawed, biological ontology.