CAP Theorem

Introduction to CAP Theorem

The CAP Theorem is a fundamental principle in distributed systems that helps in understanding the trade-offs that are necessary when designing a distributed database. It was proposed by Eric Brewer in the year 2000. The theorem states that in a distributed system, you can only achieve two out of the following three properties simultaneously:

  1. Consistency (C)
  2. Availability (A)
  3. Partition Tolerance (P)

These three properties are vital in the context of distributed data storage systems, but according to the CAP theorem, achieving all three is impossible. Hence, when designing distributed systems, you must prioritize two of these properties based on your system requirements.

lb
Key Concepts of the CAP Theorem
  1. Consistency (C):

      Consistency means that every read operation returns the most recent write, i.e., all nodes in a distributed system see the same data at the same time, no matter which node they connected to. In simple terms, when data is written to the system, it must immediately be propagated to all nodes, ensuring that all users see the same information.

      Example: In a bank transaction system, if one account is debited, all nodes must reflect that change immediately to avoid showing inconsistent balances.

      Impact: To maintain consistency, a system may have to delay responses to ensure all replicas are updated.


  2. Availability (A):

      Availability means that the system is always able to serve requests. Every request (read or write) gets a response, even if the system doesn't guarantee the most up-to-date data or even if some of the nodes are down.

      Example: In a social media system, you might still be able to post updates or view older posts, even if the latest data from other users hasn’t yet been propagated to all nodes.

      Impact: Prioritizing availability may lead to stale or outdated data being served during failures or delays in updates.


  3. Partition Tolerance (P):

      Partition indicates a communication break between two nodes. Partition Tolerance means that the system continues to operate even when there is a network partition, i.e., when communication between nodes in a distributed system is disrupted. The system can still function despite certain nodes being isolated from each other.

      Example: In a cloud service, if data centers in different regions lose connectivity, a partition-tolerant system will continue serving users with data from the available nodes, even if they can't communicate with others.

      Impact: Partition tolerance is crucial in distributed systems, especially in wide-area networks, but it often forces the system to compromise on consistency or availability.


Trade-offs in CAP Theorem

Based on the CAP Theorem, distributed systems can only guarantee two of the three properties. Let’s break down the different combinations:

1. Consistency + Availability (CA)

2. Consistency + Partition Tolerance (CP)

3. Availability + Partition Tolerance (AP)

CAP Theorem in Real World Distributed System

In real world system we cant avoid Partition. So we must need to choose between Consistency & Availability.

lb

If n3 becomes unreachable and cannot communicate with n1 and n2, then:

If the system chooses Consistency over Availability (CP):

We must block all write operations on n1 and n2 during the partition.
This prevents inconsistent data, but it also means the system becomes unavailable for writes until n3 is reachable again.

If the system chooses Availability over Consistency (AP):

The system continues to accept both reads and writes, even though some nodes may return outdated data.
n1 and n2 keep processing writes, and once the network partition heals, the data will be reconciled and synced with n3.

Strong Consistency:

Every read returns the most recent write, as if there is a single up-to-date copy of the data. After a write completes, all nodes in the distributed system immediately see the same value.

Eventual Consistency:

Nodes may temporarily return stale data, but if no new updates occur, all replicas will converge to the same value over time. The system does not guarantee immediate consistency, only eventual agreement.


CAP Theorem in System Design Interviews

In system design interviews, CAP theorem is an important concept because it helps you understand trade-offs in designing distributed systems. When discussing a system, you should be able to explain:

When asked about CAP in a system design interview, follow this approach:

1. Clarify Requirements
2. Identify Trade-offs
3. Use Real-World Examples
4. Discuss Failures and Recovery
Cloud Database following CAP Theorem
  1. Amazon DynamoDB (AP)
  2. Google Cloud Spanner (CP)
  3. Amazon Aurora (CA)
  4. Cassandra (AP)
  5. Firebase Realtime Database (AP)
  6. Amazon RDS (CA)