Motivation Selection Methods

Definition

Motivation Selection Methods are AI safety strategies that aim to prevent undesirable outcomes by shaping what a superintelligence wants to do. By engineering the agent’s final goals and values, these methods ensure the AI pursues outcomes that are aligned with human interests.

Why It Matters

Hiring or leading based on the wrong motivational signals leads to high turnover and ‘quiet quitting.’ By matching selection methods to intrinsic vs extrinsic drivers, we build resilient teams that can weather crises without constant supervision.

Core Concepts

Direct Specification: Explicitly defining a set of rules (e.g., Asimov’s Laws) or a utility function (e.g., classical utilitarianism) for the AI to follow (Direct Specification (AI)).
Indirect Normativity: Specifying a process for deriving values rather than the values themselves (e.g., “Achieve what we would have wished for if we thought about it long and hard”) (Indirect Normativity).
Domesticity: Giving the AI final goals that are inherently self-limiting or small-scale (e.g., being a simple “Oracle” or minimizing world impact) (Domesticity (AI)).
Augmentation: Starting with a system that already has human-like values (e.g., a human brain or an organization) and enhancing its intelligence while trying to preserve those values.

Motivation Selection Methods

Definition

Why It Matters

Core Concepts

Connected Concepts

Connected notes