Sharing the Load? Or Sharing the Poison?
February 10, 2009
By Jeff Freund
Our network engineer and I have been going back and forth for a while now about load balancing strategies, both at the web/application layer and also at the database layer.
It is clear that there are two competing interests at work:
1) To distribute load and provide redundancy and high availability
2) To limit the propagation of problems and confine issues
The former is clearly driven by one of the key principles of SaaS – that by running the application for many customers in a single instances, economies of scale in the infrastructure can be applied to such challenges as creating high availability.
However, blindly allowing the automated pools and failover of all resources has danger too. There is the potential that problems will spread from one server or pool of servers to other servers and potentially to the entire platform (“Sharing the Poison”).
Such things do happen. We have seen it ourselves repeatedly - there is always that one customer who does things a little bit different and hits that one query that crushes the database load, or the traffic spike of a magnitude that nobody ever expected, or the crippling bug that is exposed out of the blue by a user.
Is there a load balancing schema that both maximizes high availability and also protection? We have no silver bullet yet, but it seems that by making some tradeoffs, configuring a combination of segmention and pooling, and having "safe" failsafe mechanisms, that we can potentially strike a nice balance of these forces.