Load Balancing Building Blocks - Overview | (in)sufficiently advanced

This is a long form version of my talk at SRECon EMEA 2019 - Load Balancing Building Blocks. But it’s turned into a bit of a brain dump, writing down what I’ve learnt.

My alternative title was “What I wish I knew about (Network) Load Balancing”.

Why do we need Load Balancing?

Load balancing essentially comes down to spreading requests across multiple servers. There are a few common reasons to do so:

we have too many requests for a single server to handle
we are defending against the failure of an individual unit of capacity
we want to be able to gracefully add and remove units of capacity

The first is easy to understand - if your server can only handle 100 requests per second (rps) before melting down/degrading, but you’re getting 150 rps of traffic, you probably need to either upgrade the server (scale up) or add more servers (scale out).

You can probably scale up, but you can’t always do that - you’ll hit a limit eventually, be it technical (you want more cores than available) or financial (the next dollar you spend gets you less performance than the last dollar).

Reasons 2 and 3 are a bit more handwavy. I’m specifically mentioning “individual unit” because we’re not always operating on the level of a single server depending on where we’re doing the load balancing. If you’re (un)lucky enough to worry about multiple racks in a datacenter, you can start treating racks as your unit of individual capacity.

Datacenter Scale, XKCD (https://xkcd.com/1737/)

What they both come down to is the need to explicitly control where traffic gets sent to based on the health of the server we’re talking to.

Why is this important? Servers don’t stay online indefinitely. There’s random hardware failures, and you (usually) need to restart for maintenance/operations like kernel upgrades. My personal site going down for an upgrade is one thing. If you’re depending on your site to make money, you want to keep the site up.

With a load balancer we can allow a service being stopped to mark itself as unhealthy so it stops getting new traffic, instead of continuing to receive traffic that ends up in a black hole. The load balancer can be your router in a blue/green deployment.

Tentative Topics:

Overview (this)
Proxy type load balancers (L4 & L7)
Getting traffic to the datacenters
How do PoPs help?
Global load balancing
MagLev load balancers
Putting it together

Why do we need Load Balancing?#

Why do we need Load Balancing?