Andromeda
Note

Value Alignment

Definition

Value Alignment is the problem of ensuring that an AI system’s goals and behaviors are consistent with human values and intentions. It is one of the central challenges in making advanced AI safe and beneficial.

Why It Matters

Without value alignment, a superintelligent system becomes a “genie” that grants your literal wish while destroying what you actually care about. It is the difference between AI as a tool for flourishing and AI as an existential threat.

Core Concepts

  • Outer Alignment: Specifying the right objective.
  • Inner Alignment: Ensuring the learned system actually pursues that objective.
  • Value Learning: Inferring human values from behavior, preferences, or data.

Connected Concepts