Distributed Cache

What is Cache?

How it works?

Why Cache is Useful?
  • A cache is required to reduce latency and improve system performance by storing frequently accessed data in fast memory, minimizing the load on the database.

  • What is a Distributed Cache?

    A distributed cache is a system where multiple servers coordinate to store frequently accessed data, ensuring scalability and high availability.

    Benefits of Distributed Cache:
    Caching at Different Layers of a System
    System Layer Technology in Use Usage
    Web HTTP cache headers, web accelerators, key-value store, CDNs, etc. Accelerate retrieval of static web content, and manage sessions
    Application Local cache and key-value data store Accelerate application-level computations and data retrieval
    Database Database cache, buffers, and key-value data store Reduce data retrieval latency and I/O load from database

    Types of Cache Writing Policies
    1. Write-Through:
    2. Write-Around:
    3. Write-Back:

    Eviction Policies in Cache

    Eviction policies are strategies used by caching systems to decide which data to remove when the cache reaches its storage limit.


    Cache invalidation

    Some cache data may become outdated over time, making it invalid and requiring removal. To identify such stale entries, metadata like a time-to-live (TTL) value is stored with each cache item, ensuring outdated data is automatically deleted.

    We can use two different approaches to deal with outdated items using TTL:
    Storage Mechanism in Distributed Cache

    High-level Design of a Distributed Cache
    Functional Requirement: NonFunctional Requirement: API Design: lb Cache Client: Cache Server: Some challenges:
    1. How the cache client to realize the addition or failure of a cache server.
    2. Solution: To solve this, we can use a configuration service that constantly monitors the health of cache servers. The cache clients will be notified automatically when a new cache server is added or if there’s a failure. This way, no manual monitoring is needed, and the cache clients can always get an updated list of available cache servers from the configuration service.
    3. How can we address the single point of failure (SPOF) issue caused by having a single cache server for each data set, and how can we improve performance when certain data (hotkeys) are frequently accessed?
    4. Solution: A simple solution is to add replica nodes, starting with one primary and two backup nodes in each cache shard. To prevent inconsistencies, we perform synchronous writes over replicas when they are in close proximity. By dividing cache data into shards, we can avoid issues of unavailability and ensure efficient use of hardware resources.
    5. How Internal Working of cache server
    6. Solution: Each cache client should use three mechanisms to store and evict entries from the cache servers: lb
      • Hash map: The cache server uses a hash map to store or locate different entries inside the RAM of cache servers. The illustration below shows that the map contains pointers to each cache value.
      • Doubly linked list: If we have to evict data from the cache, we require a linked list so that we can order entries according to their frequency of access. The illustration below depicts how entries are connected using a doubly linked list.
      • Eviction policy: The eviction policy depends on the application requirements. Here, we assume the least recently used (LRU) eviction policy.
    Detail Desigh lb
    Memcached vs Redis
    Feature Memcached Redis
    Data Structure Key-value store (string-based) Supports multiple data structures (strings, lists, sets, sorted sets, hashes, etc.)
    Persistence No persistence, in-memory only Supports persistence (RDB snapshots, AOF logs)
    Eviction Policies Least Recently Used (LRU), Least Frequently Used (LFU), etc. LRU, LFU, volatile TTL, etc.
    Memory Management Memory allocated dynamically Supports configurable memory limits with eviction options
    Replication No native replication support Supports master-slave replication
    Clustering No native clustering support Supports clustering for horizontal scalability
    Data Expiration Supports time-to-live (TTL) for cache expiry Supports TTL, but with more advanced expiry options
    Use Cases Simple caching, session storage Advanced caching, pub/sub, real-time analytics, queues, and more
    Performance Fast for simple key-value operations Fast, but with more overhead for advanced data structures
    Community and Ecosystem Large community, widely adopted Large community, highly active, many advanced tools and libraries

    Popular Distributed Cache Solutions in Cloud Environments
    1. Amazon ElastiCache (AWS)
    2. Azure Cache for Redis (Azure)
    3. Google Cloud Memorystore (GCP)
    4. Redis
    5. Memcached