MAKING MONITORING PERVASIVE

Historically, monitoring has been something that could be manually deployed to a set of physical or virtual infrastructures, and once in place, you were good to go. Alerts and events would occur if something went wrong, and the monitoring tool would simply notify you with details about the incident. IT operations teams would nurse their infrastructure back to health, and all was good and right in the world again.

Fast-forward to today, however, and you’ll find a much different situation – one where enterprises are increasingly embracing a culture of DevOps and the world of monitoring is reckoning with the need to evolve and adapt. Production environments are no longer static or perpetual where monitoring is a one-and-done activity. Rather, concepts like observability—where teams are taking a holistic approach to measuring and understanding digital experiences—are being baked into the release management pipeline to ensure every application and every bit of infrastructure being deployed has monitoring attached to it. But how do you get there? What tools and technologies are ushering in this new age of monitoring and observability? The not-so-surprising answer: An intelligent combination of automation and data.

A Call to Automation

As the velocity of software release cycles increases, the manual approach to deploying monitoring agents or instrumenting everything in the release for proper observability becomes impractical and counterproductive. Can you imagine a developer checking in code to a source code management repository, creating a ticket in an ITSM system, and requiring someone within IT operations to manually deploy monitoring to that release? That flies in the face of speed, agility, and what a culture of DevOps is trying to accomplish (releasing better software, faster).

Leveraging automation tools, such as Ansible, Puppet, or Chef, is a great first step toward making monitoring pervasive with every release. This may involve the automated deployment and installation of a monitoring agent, the customization of a configuration file to define what gets monitored, or API calls to the control plane of a monitoring system to register new systems for observation. Fortunately, many continuous integration (CI) tools either have these automation capabilities out-of-the-box or stitch well with third-party automation tools.

Another growing trend we’re seeing in the quest toward making monitoring pervasive is ‘observability as code.’ In the same way that infrastructure as code seeks to codify infrastructure components as configuration files using JSON or YAML for consistent and repeatable use, we can leverage a similar capability with monitoring and observability. Increasingly, we’re seeing observability vendors develop terraform providers to do just that. Head over to the Terraform Registry and search for your favorite observability vendor to see if they have something listed.

A Data-Driven Approach

Once a repeatable structure has been established to deploy observability in an automated fashion, we can take things a step further in the use and maturity of observability data. If the goal of DevOps is to release better software more quickly, the question of how we do that begs to be asked. Observability provides one answer: by leveraging data about the health, performance, and availability of the application while running in a pre-production, lower-tier environment. You might be asking yourself, “does that mean monitoring and observability should be deployed outside of just production?” The short answer is yes. There are a host of benefits to having the telemetry data to make better-informed decisions about whether or not to promote code from a pre-production environment to a production environment. Many in the marketplace have begun to call this process ‘release verification.’

Among the largest data sources to drive this automated decision-making process are the metrics, logs, and traces from our observability stack. If there is too much latency being identified by our APM tool, too many errors in the logs, or anomalous metrics coming from our applications or infrastructure, then our tooling should be integrated or stitched into our delivery pipelines to throw up a red flag and say, “Stop! Don’t release or promote this code.” In addition to taking automated action, it would also facilitate a quick feedback loop to the development team, providing insight into why it failed so developers can spend more time on value-add activities and less time troubleshooting.

Conclusion

Monitoring maturity has come a long way in a short period of time – especially as the ways infrastructure and applications are architected and released to market have evolved. Learning to become proactive in the use of monitoring and observability data as part of a CI/CD pipeline can be new to a lot of traditional operations teams (particularly for those who have historically focused on incident response activities). However, employing an automated and data-driven approach to the use of observability will result in better software, differentiated products and services, happier customers, and outpaced competition.