To the cloud!
Ok, that is clearly swiping from a pretty smart ad campaign by Microsoft (like it or hate it, you do remember it) but it is also the first thing that sprang to mind when the need to load test our DynECT Email platform emerged. We knew the system could handle a LOT of mail but we didn’t have definitive numbers in an end-to-end scenario which would allow us to do a few key things.
Here’s how we found those numbers out, using EC2 and Chef.
The Goals For Testing Our System
- Sleep well knowing that we are utilizing only a small percentage of our mail capacity in the same way we know our DNS data centers are only utilizing a small percentage of their capacity as backed by our quantified DNS load numbers.
- Scale as needed at know inflection points. For example, if we know we are going to add 2000 messages more per minute, what resources would we need to add to stay at, ex. five percent of our capacity utilized?
- Tweak operating system and mail system parameters to maximize the system efficiency.
The task is easy enough to describe: create a number of test servers which could send mail through our DynECT Email Delivery system at a very large rate to a known email address which would simply drop the emails so it could receive as fast as possible. The servers would simply need to be started up and start sending mail one at a time, as we watched the DynECT Email server to see how it responds from a resources point of view.
Solving A Problem
If we were to do this from our hardware in-house, we would have needed a number of machines free for the test duration, all built and configured by hand and started and stopped individually. This would result in a lot of wasted hardware and man hours — all for a one time test. Now the alternative is to use EC2 machines, which alleviates the major overhead of hardware and this alone would still leaves some pretty major time investments. For that, a little automation needed to be employed.
So how did we do it?
Using Opscode’s Chef as our EC2 automation tool and some fun bash scripting and a few custom built AMIs, we were able to sit at one machine and in the course of ten minutes, launch N number of machines, have them send mail at a breakneck pace using one of our servers as the relay (after being easily pulled out of our load balancer so nothing in production would be affected), incrementally add new machines sending mail ’till we gathered the metrics we needed and then automatically shut down and terminate all of the created EC2 nodes.
Sending all of this information into Graphite graphs to analyze makes it easy to see where bottlenecks lie, enabling us to tweak settings and then run identical tests in another ten-minute span to see the if the effects of the change benefited the whole system under true duress. I love when things are simple!
How could I do this?
If you want to do something along these lines yourself, feel free to use these scripts as a starting point. These were built up in a little over a day and as I get some free time, I’ll be editing them to add the automatic creation of DynECT Email users for testing using the DynECT Email API, throttling traffic rate as well as other little bells and whistles.
The real meat of the scripts, however, have already helped make even more solid a strong as granite DynECT Email foundation.