The Reverse Proxy type of load balancer is probably the most common one - it is Good Enough for most applications.
The differences between the L4 and L7 proxy variants come from which level of the OSI model they deal with. L4 load balancers deal with individual connection flows (think TCP/UDP packets, plain bytes), while L7 load balancers deal with individual requests (think HTTP requests, GETs or POSTs).
In the proxy design, incoming connections are terminated on the load balancer, and the request passed on to one of the backend servers, either by opening a new connection or using an already established one.
The response from the backend server would then get returned to the load balancer, which would then pass the reply back to the client.
In both cases, the backend is generally a pool of servers. The load balancers can select a backend in various ways. Simple selection methods are round robin, or least recently used; more complex methods could involve polling the server every few seconds to check the load, and choosing the least loaded. (If you are doing something like this, pick two is a good selection algorithm.)
If you’re concerned about the proxy being a single point of failure, you can deploy a second one in a primary/secondary, or an active/passive setup, and flip when necessary. You could also do ECMP to split traffic across multiple machines - I suspect Stack Overflow does this, based on their architecture diagram.
Clouds
Load balancing services are increasingly table stakes for public cloud companies:
- AWS has ELB (configurable for either L4 or L7), the next generation split it into separate products: ALB (L7) & NLB (L4)
- GCP has HTTP(S) Load Balancing (L7) and Network TCP/UDP Load Balancing (L4), plus a few proxies
- Azure has Application Gateway (L7) and Azure Load Balancer (L4)
One important note for the cloud L4 load balancers is that they appear to be using MagLev-type designs (or similar). They’re a L4 load balancer, but not a proxy design - and will be discussing them in the MagLev section.
Layer 7 load balancing is more probably common than Layer 4, so that’s what I’ll cover that first.
Layer 7
Layer 7 load balancers provides a bunch of useful features for developers/sysadmins - common features include TLS termination, or routing to a select backend server based on something in the request (eg a specific request path, a cookie, or a header).
This works because L7LBs are aware of the protocol in use. It can unpack the request instead of treating it as a stream of bytes, and the flexibility comes from being able to modify the request before passing it on. For example, a L7LB can add a X-Forwarded-For
header to the request to workaround the connection looking like it’s coming from the load balancer.
Pretty much anything that sits in front of a web server can probably be (ab)used as a load balancer by using a pool of servers as the backend, instead of a single server. You might not have much control over the selection algorithm, but it should be possible to force it.
For example, NGINX and Apache both support load balancing in Proxy mode.
Layer 4 (“TCP Load Balancing”)
In contrast to a L7LB, a L4LB only understands where the packet came from, where it’s going, and what protocol it’s using. This is the five tuple of a connection. The load balancer doesn’t care about what is in the packet, just that it is flowing across the network, from a certain source, to a certain destination.
Pure L4LBs are less common today than they were - processing power used to be the main bottleneck. This meant that L7 features like routing on a request path tended to be restricted to specialized hardware (names like F5 BIG-IP, Citrix NetScalers), while L4 load balancers didn’t require too much processing power.
As processing power got cheaper, the bottleneck flipped to the network interface. L4 load balancers got replaced with L7 load balancers, which used the extra processing power to provide more features.
Moving the bottleneck
However, these load balancers just shifted the bottleneck from compute to the network interface. Scaling up is possible, but faster interfaces become prohibitively expensive quickly, since you need to upgrade the rest of your networking equipment to take advantage of the faster NIC.
Dealing with spliting the traffic across multiple load balancers is a hallmark of the Maglev design - a subject for another post.