Associative Value Accretion

Definition

Associative Value Accretion is a model of value acquisition based on the way humans and other animals develop their preferences. It involves starting with a simple set of innate preferences (e.g., pain aversion) and a set of dispositions to form more complex “associative” preferences in response to life experiences and environment.

Why It Matters

It explains how human morality develops from simple biological roots, which is critical for creating human-compatible AI. Without understanding this, we risk building machines that lack the moral common sense gained through lived experience.

Core Concepts

Innate Dispositions: Genetic instructions that determine what kinds of things an agent is likely to value (e.g., birds imprinting on moving objects).
Environmental Input: The specific content of the values is acquired from the world (e.g., loving Sally instead of Harry).
High Informational Complexity: This mechanism allows for final goals that are far more complex than what could be directly coded in a genome or initial program.
Bootstrapping Concepts: The agent first develops a world model (concepts of “person,” “justice”) and then “accretes” values around those concepts.
Sealing the Goal System: Once the appropriate values have been accreted, the mechanism must be “sealed” to prevent further, unintended accretions that might corrupt the goal system.

Associative Value Accretion

Definition

Why It Matters

Core Concepts

Connected Concepts

Connected notes