CUDA Architecture

Definition

CUDA (Compute Unified Domain Architecture) is a parallel computing platform and programming model developed by Nvidia. It allows software developers to use a CUDA-enabled graphics processing unit (GPU) for general-purpose processing (GPGPU), effectively turning a video game card into a scientific supercomputer by “flicking a switch” in the code.

Why It Matters

CUDA transformed the GPU from a graphics engine into the world’s most powerful calculator. It is the technological bedrock of the modern AI revolution, enabling the massive parallel processing required for deep learning and scientific simulations.

Core Concepts

The CUDA Stack: A multi-layered architecture consisting of:
- Circuits: Arrays of “CUDA cores” that simultaneously execute instructions.
- Machine Code: The lowest level, breaking complex math into simple arithmetic (+, -, *, /) whispering directly to the metal.
- Compiler: Translates high-level languages (C++, Python) into machine-level instructions.
- High-Level Software: Scientific and AI frameworks (e.g., PyTorch, TensorFlow) that face the end-user.
The CUDA Tax: The internal cost of shipping dual-purpose chips, where gamers subsidize the development of expensive high-performance computing (HPC) features.
Parallel Threads: Programmers must break large tasks into hundreds of smaller “threads” and precisely route them to available cores, a process that requires holistic rather than linear reasoning.
Vendor Lock: By encouraging scientists to build entire academic disciplines on the CUDA platform, Nvidia created a powerful form of strategic inertia—once a field is “locked in,” there is no easy way to migrate to a competitor’s hardware.
Arithmetic Intensity: Maximizing the number of transistors that respond to each internal clock pulse, compensating for the collapse of Dennard Scaling Collapse.

Definition

Why It Matters

Core Concepts

Connected Concepts

Connected notes