Definition
A Distributed System is a collection of independent computers that appears to its users as a single coherent system. These components are located on different networked nodes and communicate via message-passing to achieve a common goal.
Why It Matters
We no longer live in a world where a single computer can hold the sum of human data. Distributed systems are the invisible infrastructure of the modern world; if they fail to handle partial outages or network latency correctly, global financial markets, communication networks, and cloud services collapse instantly. Understanding these systems is the difference between building a service that crumbles under its first million users and building one that is globally resilient, scalable, and practically indestructible.
Core Concepts
- Concurrency: Components execute simultaneously and potentially communicate to synchronize state.
- Partial Failure: One node may fail while others continue to operate. The system must be designed for fault tolerance and graceful degradation.
- No Global Clock: There is no single, global notion of the “correct time”; synchronization must be achieved through logical clocks (e.g., Lamport timestamps) or consensus algorithms.
- CAP Theorem: In a distributed data store, it is impossible to simultaneously provide more than two out of three guarantees: Consistency, Availability, and Partition Tolerance.
- Consensus Algorithms: Mechanisms (like Paxos or Raft) used to achieve agreement on a single data value among distributed processes.