Load Balancer

What is a Load Balancer?

Load balancing is the process of distributing incoming network or application traffic across multiple servers to ensure no single server becomes overwhelmed.


Why is a Load Balancer Important?


Load Balancer Placement

Typically, load balancers are positioned between clients and servers, managing the traffic flow from clients to servers and back. However, load balancers can also be strategically placed at various points within a server infrastructure to optimize traffic distribution among different server types. Here’s how load balancers can be utilized across the three main types of servers:

lb

Services offered by load balancers

Load balancers offer a variety of services to ensure efficient distribution of traffic and maintain high availability and performance in distributed systems. Here’s a comprehensive list of services typically offered by load balancers


What if load balancers fail? Are they not a single point of failure (SPOF)?

Load balancers are often set up in pairs to ensure reliability. If one fails and there's no backup, the entire service can go offline. To avoid this, businesses typically use groups (or clusters) of load balancers that constantly check each other's health through "heartbeat" signals. If the main load balancer fails, a backup automatically takes over. However, if the entire cluster goes down, traffic can be manually redirected as a last resort.


Global Load Balancing

Global load balancing involves distributing traffic across multiple geographic locations or data centers. It ensures that user requests are routed to the closest or most efficient data center based on factors such as latency, server health, and load conditions.

Examples:

Local Load Balancing

Local load balancing distributes traffic within a single geographic location or data center. It ensures even distribution of traffic among servers within a specific region or data center to balance the load and optimize resource utilization.

Examples:

Load Balancing Algorithms

Algorithm Description Use Case
Round Robin Distributes requests sequentially across all available servers, cycling through the list repeatedly. Simple scenarios where each server has similar capacity and there is no need for session persistence.
Weighted Round Robin Similar to Round Robin, but allows servers to be assigned weights based on their capacity or performance. Requests are distributed proportionally to these weights. Scenarios where servers have different capabilities, ensuring that more powerful servers handle a larger share of the load.
Least Connections Routes traffic to the server with the fewest active connections. This helps to balance the load based on current server usage. Environments where servers have varying capacities or workloads, providing more dynamic load distribution.
Least Response Time Directs requests to the server with the lowest response time, improving performance by prioritizing faster servers. Applications where response time is critical, such as real-time services or high-performance computing.
IP Hash Routes requests based on a hash of the client’s IP address. This ensures that a specific client consistently reaches the same server, which can be useful for session persistence. Applications requiring session persistence, where users need to interact with the same server to maintain state.
Session Persistence (Sticky Sessions) Keeps track of sessions and directs requests from the same user to the same server. This can be implemented using various techniques like cookies or session identifiers. Applications where maintaining user sessions on the same server is critical, such as online shopping carts or login systems.

Stateful Load Balancers

Stateful load balancers maintain session information about clients and their interactions. They keep track of client state and route requests based on this information to ensure continuity of the user session.

Characteristics:
lb

Stateless Load Balancers

Stateless load balancers do not retain any information about client sessions. They distribute requests to servers without considering any previous interactions or states.

Characteristics:
lb
Types of load balancers:

Implementation of load balancers:

Layerwise Load Balancers
Application Load Balancers
Network Load Balancers
DNS-Based Load Balancers

Interview Questions and Answers

How does a load balancer handle SSL termination, and what are the security concerns?

SSL termination refers to the load balancer decrypting the incoming SSL/TLS traffic before passing the unencrypted traffic to the backend servers. This can improve performance since decryption happens only once. However, the main security concern is that data is unencrypted between the load balancer and the backend servers, which may create vulnerabilities in internal networks. To mitigate this, you can use SSL pass-through (decrypting at the backend) or re-encrypt traffic between the load balancer and servers.

How do load balancers handle sticky sessions, and what are the potential drawbacks?

Sticky sessions (session affinity) ensure that all requests from a user are sent to the same backend server. This can be useful for stateful applications, but it creates issues in scalability and failover, since if a server fails, all sessions tied to it are lost. Moreover, it can lead to uneven load distribution, where some servers are overloaded while others remain underutilized.

Can you explain how DNS-based load balancing (like Route 53) works and its limitations?

DNS-based load balancing works by distributing traffic using DNS resolution, directing users to different IP addresses based on health checks and geography. However, DNS caching by ISPs and clients can lead to stale records, meaning users might still be routed to unhealthy servers. Moreover, DNS-based load balancing is slower to adapt to changes in traffic since DNS TTL (Time-to-Live) delays adjustments.

How do load balancers ensure high availability? What happens if the load balancer itself fails?

Load balancers typically ensure high availability by operating in a redundant setup with active-passive or active-active configurations. In the case of load balancer failure, failover mechanisms are used to transfer traffic to a backup load balancer. For example, DNS failover, virtual IP addresses (VIPs), or health checks can trigger failover to a healthy load balancer.

Can you describe the concept of "connection draining" in a load balancer?

Connection draining (or deregistration delay) is a technique that ensures a graceful shutdown of backend servers. When a server is taken out of service (either manually or due to a health check), the load balancer allows existing connections to complete while preventing new connections from being sent to the server. This minimizes disruption and ensures that in-progress requests finish gracefully.

How would you implement zero-downtime deployments using a load balancer?

How would you architect a load balancer solution to handle sudden spikes in traffic?

  1. Auto-scaling: Automatically scale backend servers based on load.
  2. Global load balancing: Use geo-distributed load balancers to route traffic across multiple regions.
  3. Caching layers: Use edge caching (CDNs) or in-memory caching systems (like Redis) to offload traffic from backend servers.
  4. Circuit breakers: Implementing failover mechanisms in case the backend systems get overloaded.