Introduction to CAP Theorem

The CAP Theorem is a fundamental principle in distributed systems that helps in understanding the trade-offs that are necessary when designing a distributed database. It was proposed by Eric Brewer in the year 2000. The theorem states that in a distributed system, you can only achieve two out of the following three properties simultaneously:

  1. Consistency (C)
  2. Availability (A)
  3. Partition Tolerance (P)

These three properties are vital in the context of distributed data storage systems, but according to the CAP theorem, achieving all three is impossible. Hence, when designing distributed systems, you must prioritize two of these properties based on your system requirements.

lb

Key Concepts of the CAP Theorem

  1. Consistency (C):

      Consistency means that every read operation returns the most recent write, i.e., all nodes in a distributed system see the same data at the same time. In simple terms, when data is written to the system, it must immediately be propagated to all nodes, ensuring that all users see the same information.

      Example: In a bank transaction system, if one account is debited, all nodes must reflect that change immediately to avoid showing inconsistent balances.

      Impact: To maintain consistency, a system may have to delay responses to ensure all replicas are updated.

      lb

  2. Availability (A):

      Availability means that the system is always able to serve requests. Every request (read or write) gets a response, even if the system doesn't guarantee the most up-to-date data.

      Example: In a social media system, you might still be able to post updates or view older posts, even if the latest data from other users hasn’t yet been propagated to all nodes.

      Impact: Prioritizing availability may lead to stale or outdated data being served during failures or delays in updates.

      lb

  3. Partition Tolerance (P):

      Partition Tolerance means that the system continues to operate even when there is a network partition, i.e., when communication between nodes in a distributed system is disrupted. The system can still function despite certain nodes being isolated from each other.

      Example: In a cloud service, if data centers in different regions lose connectivity, a partition-tolerant system will continue serving users with data from the available nodes, even if they can't communicate with others.

      Impact: Partition tolerance is crucial in distributed systems, especially in wide-area networks, but it often forces the system to compromise on consistency or availability.

      lb

Trade-offs in CAP Theorem

Based on the CAP Theorem, distributed systems can only guarantee two of the three properties. Let’s break down the different combinations:

1. Consistency + Availability (CA)

2. Consistency + Partition Tolerance (CP)

3. Availability + Partition Tolerance (AP)

CAP Theorem in System Design Interviews

In system design interviews, CAP theorem is an important concept because it helps you understand trade-offs in designing distributed systems. When discussing a system, you should be able to explain:

When asked about CAP in a system design interview, follow this approach:

1. Clarify Requirements
2. Identify Trade-offs
3. Use Real-World Examples
4. Discuss Failures and Recovery

Cloud Database following CAP Theorem

  1. Amazon DynamoDB (AP)
  2. Google Cloud Spanner (CP)
  3. Amazon Aurora (CA)
  4. Cassandra (AP)
  5. Firebase Realtime Database (AP)
  6. Amazon RDS (CA)

Images sourced from Algomaster's blog: https://blog.algomaster.io/