What is load balancing

Load balancing is the process of distributing incoming network or application traffic across multiple servers to ensure no single server becomes overwhelmed.
This helps in achieving several key objectives:

lb


Load Balancer Placement

Typically, load balancers are positioned between clients and servers, managing the traffic flow from clients to servers and back. However, load balancers can also be strategically placed at various points within a server infrastructure to optimize traffic distribution among different server types. Here’s how load balancers can be utilized across the three main types of servers:

lb

Services offered by load balancers

Load balancers offer a variety of services to ensure efficient distribution of traffic and maintain high availability and performance in distributed systems. Here’s a comprehensive list of services typically offered by load balancers

Global Load Balancing

Global load balancing involves distributing traffic across multiple geographic locations or data centers. It ensures that user requests are routed to the closest or most efficient data center based on factors such as latency, server health, and load conditions.

Examples:

Local Load Balancing

Local load balancing distributes traffic within a single geographic location or data center. It ensures even distribution of traffic among servers within a specific region or data center to balance the load and optimize resource utilization.

Examples:

Global vs Local Load Balancing

Aspect Global Load Balancing Local Load Balancing
Scope Covers multiple geographic locations or data centers. Confined to a single location or data center.
Use Case Used for applications with a global user base to reduce latency and improve performance. Used to optimize resource utilization and manage traffic within a specific region.
Routing Decision Considers geographic location and latency. Focuses on server health and load within a single data center.

Load Balancing Algorithms

Algorithm Description Use Case
Round Robin Distributes requests sequentially across all available servers, cycling through the list repeatedly. Simple scenarios where each server has similar capacity and there is no need for session persistence.
Least Connections Routes traffic to the server with the fewest active connections. This helps to balance the load based on current server usage. Environments where servers have varying capacities or workloads, providing more dynamic load distribution.
Least Response Time Directs requests to the server with the lowest response time, improving performance by prioritizing faster servers. Applications where response time is critical, such as real-time services or high-performance computing.
Weighted Round Robin Similar to Round Robin, but allows servers to be assigned weights based on their capacity or performance. Requests are distributed proportionally to these weights. Scenarios where servers have different capabilities, ensuring that more powerful servers handle a larger share of the load.
Weighted Least Connections Combines weights with the Least Connections algorithm. Servers are assigned weights, and traffic is routed to servers with the fewest connections, considering their weight. When servers have different capacities and varying loads, allowing for a balanced distribution based on both server performance and current connections.
IP Hash Routes requests based on a hash of the client’s IP address. This ensures that a specific client consistently reaches the same server, which can be useful for session persistence. Applications requiring session persistence, where users need to interact with the same server to maintain state.
Session Persistence (Sticky Sessions) Keeps track of sessions and directs requests from the same user to the same server. This can be implemented using various techniques like cookies or session identifiers. Applications where maintaining user sessions on the same server is critical, such as online shopping carts or login systems.
Least Bandwidth Directs traffic to the server with the least amount of bandwidth usage, helping to balance the load based on network throughput. Environments where managing network bandwidth is important, such as media streaming or high-traffic websites.
Least Response Time with Weighted Distribution Routes requests based on the lowest response time, adjusted by server weights. This combines performance metrics with server capacity. Complex applications needing both performance optimization and resource balancing.

Stateful Load Balancers

Stateful load balancers maintain session information about clients and their interactions. They keep track of client state and route requests based on this information to ensure continuity of the user session.

Characteristics:
lb

Stateless Load Balancers

Stateless load balancers do not retain any information about client sessions. They distribute requests to servers without considering any previous interactions or states.

Characteristics:
lb

Layerwise Load Balancers

Application Load Balancers
Network Load Balancers
DNS-Based Load Balancers

Interview Questions and Answers

How does a load balancer handle SSL termination, and what are the security concerns?

SSL termination refers to the load balancer decrypting the incoming SSL/TLS traffic before passing the unencrypted traffic to the backend servers. This can improve performance since decryption happens only once. However, the main security concern is that data is unencrypted between the load balancer and the backend servers, which may create vulnerabilities in internal networks. To mitigate this, you can use SSL pass-through (decrypting at the backend) or re-encrypt traffic between the load balancer and servers.

How do load balancers handle sticky sessions, and what are the potential drawbacks?

Sticky sessions (session affinity) ensure that all requests from a user are sent to the same backend server. This can be useful for stateful applications, but it creates issues in scalability and failover, since if a server fails, all sessions tied to it are lost. Moreover, it can lead to uneven load distribution, where some servers are overloaded while others remain underutilized.

Can you explain how DNS-based load balancing (like Route 53) works and its limitations?

DNS-based load balancing works by distributing traffic using DNS resolution, directing users to different IP addresses based on health checks and geography. However, DNS caching by ISPs and clients can lead to stale records, meaning users might still be routed to unhealthy servers. Moreover, DNS-based load balancing is slower to adapt to changes in traffic since DNS TTL (Time-to-Live) delays adjustments.

How do load balancers ensure high availability? What happens if the load balancer itself fails?

Load balancers typically ensure high availability by operating in a redundant setup with active-passive or active-active configurations. In the case of load balancer failure, failover mechanisms are used to transfer traffic to a backup load balancer. For example, DNS failover, virtual IP addresses (VIPs), or health checks can trigger failover to a healthy load balancer.

Load balancers typically ensure high availability by operating in a redundant setup with active-passive or active-active configurations. In the case of load balancer failure, failover mechanisms are used to transfer traffic to a backup load balancer. For example, DNS failover, virtual IP addresses (VIPs), or health checks can trigger failover to a healthy load balancer.

Layer 4 (Transport Layer): Routes traffic based on IP addresses and TCP/UDP ports. It's faster because it doesn't inspect the payload, but it lacks the ability to make complex routing decisions based on content.
Layer 7 (Application Layer): Routes traffic based on application-level data, such as HTTP headers, cookies, or URLs. It’s more flexible for content-based routing but slightly slower because of the overhead in inspecting data.
Choosing One: Layer 4 is ideal for simple routing and high throughput, whereas Layer 7 is preferred for more advanced routing (e.g., for microservices or API gateways).

Can you describe the concept of "connection draining" in a load balancer?

Connection draining (or deregistration delay) is a technique that ensures a graceful shutdown of backend servers. When a server is taken out of service (either manually or due to a health check), the load balancer allows existing connections to complete while preventing new connections from being sent to the server. This minimizes disruption and ensures that in-progress requests finish gracefully.

How would you implement zero-downtime deployments using a load balancer?

  • Blue-Green Deployment: Run two identical environments (blue and green), switch traffic from one to the other using the load balancer after deploying to the new environment.
  • Canary Releases: Gradually route a small percentage of traffic to the new version, and if successful, increase the percentage until the old version is phased out.
  • Rolling Updates: Update instances in small batches, allowing the load balancer to drain connections from old instances and route traffic to updated ones without downtime.

How would you architect a load balancer solution to handle sudden spikes in traffic?

  1. Auto-scaling: Automatically scale backend servers based on load.
  2. Global load balancing: Use geo-distributed load balancers to route traffic across multiple regions.
  3. Caching layers: Use edge caching (CDNs) or in-memory caching systems (like Redis) to offload traffic from backend servers.
  4. Circuit breakers: Implementing failover mechanisms in case the backend systems get overloaded.