There’s a good interview at ACM with Phil Smoot, an engineer on the Hotmail project and a product manager for MSN. The interview attempts to address issues of operations and systems scaling on an Internet-scale service and as such is interesting to me. It’s also full of some silly platitudes: comparing hotmail to the Everest of “megaservices” even though it is several orders of magnitude smaller than some competing services and applications like Google Search or Yahoo! Search, for example.
What I found interesting about the article was how few specifics Smoot was willing to give up about how you scale an “Internet megaservice”, and how low the ratio of sysadmins to machines is. They have 10K machines and O(100) sysadmins. That’s a ratio (you can do the math with me here!) of 100 machines per sysadmin. I claim that to be total crap. I can do 100 machines to sysadmin with almost no automation or fancy management anything. 50-100 is just what you can do with good server software and reasonable hardware and nothing particularly fancy. Hell, you can do 50 machines/sysadmin without even doing something like cfengine or puppet.
Not to start a religious crusade here, but one wonders if they are using a lot of Windows to do this and if that is why their sysadmin ratio is so low. Windows has been shown to be significantly more management-intense than Unix and Unix-derivatives (in part because of the lack of command line interface and the lack of a text-based exposure of the configuration of the device). This doesn’t make Windows worse, necessarily, but it tends to make it less suitable to applications that require massive horizontal scaling. Google and Akamai don’t use Windows and there are many good, non-price-related reasons for that. OK, so I lied. This does make Windows worse.
But the comments about scaling and the flexibility necessary to scale a computationally and storage dense service resonated for me. These are problems we struggle with at Renesys. Some of them have known, good solutions. Most of them have only trade-offs: faster, cheaper, easier to manage, but not all at once.
It’s an interesting interview and worth reading, even though the content is a bit thin.