This post previously appeared on Network World.
Sound management includes independently measuring performance goals. This includes cloud performance.
When businesses hire outside contractors for a job, they always try to ensure that there are clear measures of whether the contractor is doing the job. Whether it be expanding office space, ensuring the office is cleaned regularly, having the bookkeeping up to date or reviewing HR procedures, any sound management decision always depends on independently measurable performance goals. Otherwise, you’re just hiring someone with the conditions, “It’s OK, we trust you.”
For reasons that are better contemplated in a psychology blog, when it comes to computing, the independent measurement too often goes out the window. Measurement from inside applications is important, for instance, but can’t be taken as gospel, since the same developer responsible for the performance bug might have implemented the measurement, thereby producing the same basic assumption twice. The Russian proverb, “Доверяй, но проверяй,” (“Trust, but verify”) is also true in networked applications because there are so many opportunities for trouble.
The set of issues is made much worse by the cloud. As I said in the first installment, the more you outsource network functions on which you depend, the more active measurement of those vendors you need to do. And yet, an alarming number of people appear to take the measurements provided by their cloud provider as the only measurement needed. If you’re the CIO, you need to have accurate, independent measures of your vendors’ performance.
None of this is to suggest cloud providers are being dishonest in what they’re measuring. Instead, the problem is much deeper. If you want to know whether a stick you are holding is really a yardstick, you need a known yardstick with which to check it. But if you’re in a yardstick factory, you can check your candidate stick against every available yardstick, and you still won’t know whether it’s a yard long. The factory might be the problem, and so every yardstick in the place may be the wrong length. If the yardstick machine has a fault and cuts every yardstick 1/8″ short, you won’t be able to spot it with your naked eye. But your measurements will be wrong in the future unless you confirm that length.
To see how this can be a problem in the cloud, consider a simple example. Cloud providers often have “regions” in which they can move resources around without worrying too much. Suppose your service, which you want to move to the cloud, has an imbalance in its user population, and most of the users come from one ISP network. Unless your cloud provider specifically measures that link, you will not know how your cloud provider’s internal management decisions affect your own users.
And if you don’t use the affected ISP, too, then if performance problems crop up (perhaps because of routing trouble or something similar), you will be mystified by your customers’ trouble reports. There is nothing more frustrating for a customer than hearing the support representative say, “It works for me.”
Measure what your customers see
Part of how you avoid this problem is to make sure you actually measure what your customers see:
- Find out where your customers come from, and ensure you monitor from those locations. Measurements from where your customers aren’t doesn’t tell you anything about what your customers see.
- Measure not just your application, but also the dependencies for functions. Even trivial problems, such as a css file unable to load because of where it is hosted, can result in bizarre misbehaviors. Your cloud provider’s uptime monitor probably won’t catch this.
- Measure how things really work for your users. Synthetic measurements are great and important, but they alone don’t tell you what users see and so they may not expose changes to your environment due to being in the cloud. This is related to what I wrote last time, but I’ll say more about synthetic measurements in the next installment.
Many of these sorts of measurements are made easier by “real user monitoring” techniques. Measurements are taken by causing the user agent (like the browser) to open connections, make DNS queries or use various extensions to measure how long things take for the browser. The techniques are useful, but like every other measurement, they’re useful only if you measure the right things.
Of course, you need to use the provider’s measures—you need to know how fast you’re using the resource compared to budget, so you don’t have a surprise at the end of the month. But any measurement plan really needs to start with a sound and complete model of what the application is delivering to the users. Measure that, so you know that you’re building better networks.