A key part of a cohesive DevOps approach is ensuring that sufficient data is collected throughout the process to allow everyone involved to have a much more detailed understanding of the current state of the system. This is especially important as we are giving away more and more control of elements of the infrastructure that makes up the system.
Having this depth of knowledge allows for dynamic systems to be put in place to handle resolution of issues quickly and effectively. Again, this focuses on the ability to quickly and reliably implement change in response to problems.
The DevOps culture, on top of the move to cloud where much more of the infrastructure is out of your control, means that monitoring has to be extended to much more than traditional infrastructure monitoring.
In the DevOps and cloud world, monitoring has to be focused on understanding the holistic view of the performance of your application. This means taking monitoring to the next level of detail; typically this includes three additional views:
- RUM and EUM
RUM and EUM
Real User Monitoring (RUM) and End User Monitoring (EUM) are focused on what is ultimately the most important metric of all: What is the experience that actual users are seeing?
RUM focuses on the experience of actual users. Typically this is done by including a beacon within the system that sends back data to a central server outlining details of how the system is working. The most common implementation of this is injecting some Java‐Script into the content of a web page that monitors performance metrics.
RUM is valuable because it captures what is actually happening to users; it is not dependent on a set of predefined measures that are being proactively measured. If issues are raised by users, then analysis can be completed to first determine whether the issues are affecting a wider group of users, and then to try and drill down to the cause of the issue.
RUM also allows you to determine if there is a pattern to the users affected; e.g., are they using a similar browser or type of connectivity, or are they from one geographic location?
EUM is similar but is based on a set of synthetically executed, repeatable tests carried out from a “clean room” location. This allows you to assess the results of tests without worrying about the results being affected by unknown conditions.
A good monitoring solution includes elements of EUM and RUM, as they both add value in different situations.
RUM is valuable in that it reflects what actual users are experiencing and will flag issues beyond the range anticipated when defining test plans. RUM executes continuous testing against your complete application, albeit in an unscientific manner.
EUM adds value in that it is a more scientific approach to testing; you can be confident when failure occurs that no other factors will be changed. EUM also allows you to proactively identify issues without users experiencing them first (hopefully resolving the issue before it affects users).
Application Performance Monitoring (APM) is a monitoring technology that sits within your application and gathers core metrics about what is going on under the hood of your application.
These metrics will usually go down to a granular detail about how your application is behaving (such as method execution times, SQL query execution times), as well as the overhead of communication with external systems.
APM allows comprehensive assessment of what your application is actually doing at a code level rather than the impact that it has on external items such as servers or user experience. This is invaluable when assessing the root cause of any issues or when completing performance optimization.
Many modern APM solutions will integrate with RUM and EUM systems to get a complete end-to-end breakdown of user interaction with your system.
Internet Performance Management (IPM) is the gap that is often left in a monitoring solution. This looks at the monitoring and analytics in the performance of the internet between users and your application.
Note that this may be a dedicated tool or may be API-based data feeds from different sources that are pulled into dashboards. Overall, the aim is to give an operational awareness for the way in which your applications and the services they interact with are available through the internet.
Applications are increasingly reliant on the performance of the public internet, so it is ever more important that we have an understanding of any issues that may arise. This applies not only to the applications you run yourself, but also to the other internet-based applications that your applications rely on.
Typically this will be routing issues that may be temporarily or permanently in place between elements of your user base and your systems. The internet is a volatile environment, and these types of issues can arise at any point.
IPM monitoring allows you to become aware and react to these issues, for example, by using geolocation-based DNS to route those users to an alternative location.