What is Content Delivery Network (CDN)?

Key Features of a CDN

Pull Strategy (On-Demand Content Retrieval)

Push Strategy (Pre-Loading Content)
Example

When a user in Europe accesses a website hosted in the US, a CDN will deliver content from a nearby European server, significantly reducing load times compared to fetching the content directly from the US server.
CDNs are commonly used by large websites, streaming services, and applications to ensure quick, reliable access to their content for users around the world.


Problems a CDN Can Solve


Building blocks for cdn


cdn workflow

lb

How CDN Works in Cloud Services

1. Upload Content to Cloud Storage

    You start by uploading your website’s static content (images, videos, CSS files, etc.) to cloud storage, such as Amazon S3, Google Cloud Storage, or Azure Blob Storage. This storage acts as the origin server for your CDN.

2. Configure CDN with Cloud Service

    Next, you configure your CDN service (e.g., Amazon CloudFront, Azure CDN, or Google Cloud CDN) to pull content from the cloud storage. You specify the origin server (cloud storage URL) and the CDN service handles the rest.

3. User Requests Content

    A user from New York requests an image from your website. The request is first routed to the CDN rather than directly to the origin server.

4. CDN Determines Nearest Edge Server

    The CDN determines the edge server closest to the user’s location (e.g., an edge server located in New York).

5. Check Edge Server Cache

    The edge server checks its cache to see if the requested image is already stored locally. If the image is cached, the edge server immediately serves it to the user.

6. Pull from Origin (if not cached)

    If the image is not in the cache, the edge server retrieves the image from the origin server (cloud storage) and then caches it for future requests.

7. Deliver Content to User

    Once the content is retrieved (either from the cache or the origin server), it is delivered to the user with minimal latency, thanks to the proximity of the edge server.

8. Subsequent Requests

    For subsequent requests from other users in the same region, the CDN serves the cached content from the edge server, ensuring fast delivery and reducing the load on the origin server.

Example

Imagine you're hosting a video streaming website, and you've uploaded your video files to Amazon S3. You've also configured Amazon CloudFront as your CDN.

A user in Tokyo tries to watch a video. The request is routed to the nearest edge server in Tokyo. If this edge server already has the video in its cache, it quickly serves the video to the user. If not, it retrieves the video from Amazon S3, caches it, and streams it to the user. Future users in Tokyo will now get the video faster because it’s cached locally.

Cache Busting

What is Cache Busting?

Cache busting is a technique used to ensure that clients and CDNs retrieve the most recent version of a file, rather than serving an outdated or cached version. When a file changes, cache busting techniques force the cache to invalidate the old version and fetch the new one from the server.


Why is Cache Busting Important?

Without cache busting, users might continue to receive outdated files from the cache even after updates have been made. This can lead to issues such as users seeing old content, broken functionality, or inconsistencies in the user experience. Cache busting helps ensure that users always get the latest version of resources like CSS files, JavaScript, and images.


Common Cache Busting Techniques

How Cache Busting Works in a CDN Environment

In a CDN environment, cache busting techniques work by altering the URL of resources whenever a change is made. When a new version of a file is deployed:

  1. The CDN recognizes the updated URL (with a new query string, version number, or hash).
  2. The CDN fetches the updated file from the origin server and caches it at the edge locations.
  3. Subsequent requests for the file are served from the edge locations with the updated content.

This approach ensures that users receive the latest content while minimizing the risk of serving stale resources.


Considerations for Implementing Cache Busting

When implementing cache busting, consider the following:


Interview Questions and Answers

Q1: What is a CDN, and how does it work?

- A Content Delivery Network (CDN) is a distributed network of servers that delivers web content to users based on their geographic location.
- CDNs store cached versions of your website content in multiple locations around the world, also known as edge locations.
- When a user requests content, the CDN serves it from the nearest edge location, reducing latency and improving load times.
- This setup is especially useful for handling high traffic loads and ensuring availability and performance globally.

Q2: Why would you use a CDN in a web application?

- CDNs are used to enhance the performance, reliability, and scalability of web applications.
- They reduce latency by serving content from edge servers closer to the user, balance load by distributing traffic across multiple servers, and improve availability by providing redundancy.
- Additionally, CDNs can help protect against DDoS attacks by distributing and absorbing the traffic load.

Q3: Explain the process of cache invalidation in a CDN.

Cache invalidation in a CDN involves removing or updating content in the CDN’s cache when it becomes stale or outdated.
This can be done through:

  • Time-to-Live (TTL): Setting a TTL for cached content so that it automatically expires after a certain period.
  • Manual Purging: Explicitly sending a request to purge or invalidate specific content across all edge locations.
  • Cache Key Versioning: Changing the URL or query string parameters, forcing the CDN to treat it as a new resource and fetch a fresh copy from the origin server.

Q4: How do you handle dynamic content with a CDN?

  • Partial Caching: Caching parts of the page that are static and dynamically generating the rest.
  • Edge Computing/Edge Functions: Using CDN capabilities like serverless functions at the edge to generate dynamic content closer to the user.
  • API Caching: Caching API responses that are less dynamic or have a defined TTL.

Q5: What is the difference between a CDN and a reverse proxy?

- A CDN is specifically designed to cache and serve static content from multiple geographically distributed locations to reduce latency and improve availability.
- A reverse proxy, on the other hand, is a server that sits between the client and the origin server, forwarding client requests to the origin and returning the server's response to the client.
- While CDNs often act as reverse proxies, a reverse proxy itself does not necessarily provide the geographic distribution and caching that a CDN offers.

Q6: How would you ensure the security of data in transit between your origin servers and the CDN?

  • TLS/SSL Encryption: Enforcing HTTPS between the origin servers and the CDN to secure data in transit.
  • IP Whitelisting: Restricting access to the origin server only to known CDN IP ranges.
  • Token-Based Authentication: Using signed URLs or tokens to ensure that only authorized requests are allowed to fetch content from the CDN.

Q7: How can CDNs help in mitigating DDoS attacks?

CDNs help mitigate DDoS attacks by distributing the traffic across a large number of edge servers. This distribution makes it difficult for attackers to overwhelm a single point of origin. Additionally, many CDNs offer built-in DDoS protection features such as traffic filtering, rate limiting, and web application firewalls (WAFs), which inspect incoming traffic for malicious requests and block them before they reach the origin server.