If you’re scaling Internet facing systems, you know a thing or two about system monitoring, data collection and visualization. For years at Dyn, we’ve been using the trusted tools of the trade including Nagios, MRTG, Cacti and Munin.
All of these tools have been working well in helping us keep a watchful eye on our network and systems, but with our expansions to 17 global datacenters and additional features, services and capacity, the traditional tools have begun to show their limits.
Luckily, we found a great tool that helps in the process and one we’ve fallen in love with and I want to tell you about it.
Written by the fine folks over at Orbitz, Graphite is a time-series database (TSD) which solely focuses on the logging of data and the display of data back to back to users. This also means that Graphite has no built-in poller (such as a Cacti or MRTG does) and allows the user to build whatever poller is most appropriate for the data being collected.
Graphite focuses on storing data in highly efficient databases and displaying data back to users as quickly as possible. In fact, unlike other TSD systems we’ve used, Graphite’s data path propagation is fast enough to use it for nearly real-time heads up displays.
We really love being able to write our own pollers for use with Graphite. It means that nearly anyone at Dyn can throw together some pretty simple shell scripts and start feeding data over to Graphite. The simple raw TCP-based API makes it so flexible that it is easy to integrate with our Perl and Python based backends, database systems and DNS servers. Writing a poller for SNMP data or web-scraping a page is easy enough to do and feeding the data to Graphite is even easier.
Once data is into Graphite, the front-end UI makes it easy to construct graphs. A number of built in functions are available to aggregate and summarize data. The beauty of Graphite’s front-end is its speed at which data is drawn, making it a breeze to quickly scan through data to look for an interesting data point or trend. We currently have over 1250 metrics being pumped into Graphite and the server load is close to nill. We’re expecting it to scale vertically very well.
Let’s take a look at the obligatory pretty graph of Interop Las Vegas, where we powered the conference and were a sponsor:
Data Collected from the InteropNet Recursive DNS Servers
operated by Dyn — InteropNet 2011 Las Vegas
For the more advanced, the graph API is a simple REST-based set of calls and can be used to source graphs into other applications we use. For our internal wiki, we use Confluence and by setting up a simple deck and cards slide show, you can put together an instant heads-up dashboard for a team.
We highly recommend that you give Graphite a try! You can install Graphite on a Linux-based server in about 20 minutes, write your first polling script in another 10 minutes, and spend the next 30 in awe of how quickly and beautifully graphs are produced!