devops - continuous monitoring
PHOTO: Adobe

Continuous monitoring can be traced back to the traditional business auditing practices and processes. However, it goes a bit further than the periodic, snapshot-like audits most companies are used to. Continuous monitoring (CM) refers to a continuous monitoring of transactions and controls that work to unearth and correct weak or ill-designed rules and processes to be replaced, thus minimizing the risks for companies. 

For CM to be useful, it requires a company-wide effort so everybody involved in the process knows where the company was, where it is now, and what the future holds. It also needs to consider the significant global trends, as well as the organization’s culture and the way companies manage risks.

To understand more about continuous monitoring and its impact in DevOps, we’ve asked IT professionals and thought leaders about what needs to be monitored and how you can balance data collection without being overwhelmed in the process. 

What Is Continuous Monitoring?

Continuous monitoring enables management to review business processes for adherence to and deviations from their intended performance and effectiveness levels. Thanks to CM, DevOps professionals can observe and detect compliance issues and security threats. CM also helps teams study relevant metrics and aid in solving issues in real-time when they arise. 

When asked about continuous monitoring, Reuben Yonatan, CEO at New York, NY.-based GetVoIP, said that “continuous monitoring is observing each stage of software development to facilitate quality. By observing each stage, you will catch errors and bugs that might have gone unnoticed. Such errors lower the quality of the product, and when discovered, you will have to do a lot of work to fix them.”

Thus, CM is helpful when it comes to implementing and strengthening company-wide security measures. It also helps provide feedback on the overall health of your IT infrastructure. 

Related Article: How the Shift To Remote Work Is Changing DevOps

Which Areas Should Be Monitored?

Most organizations begin their continuous monitoring journey by overseeing simple, standard, and easy-to-understand metrics like CPU usage, memory usage, disk space, and other server-related metrics. Companies can also monitor error codes, drops in customer activity, and even security policies as code to detect non-compliant workloads 

All in all, the objective is to identify, detect, and remediate risks related to environments and infrastructure components to ensure that the systems have high availability and resiliency. 

These are some areas that CM oversees:

  • Application Monitoring: performance and availability of application services.
  • Configuration Monitoring: application and environment-specific configurations for specific app/environment version.
  • Database Monitoring: database connection time and locks, cache size, performance, queries (most frequent, top CPU, long running), replication details, CPU/Memory and disk.
  • Middleware Monitoring: queue services, communication and data management across app and operating systems, message services.
  • Infrastructure Monitoring: virtualization and physical hardware utilization, performance and availability, network latency.
  • Third-party Monitoring: services availability, the response time of partner services.
  • Batch Monitoring: scheduled jobs start time, duration, completion rate.
  • Data Monitoring: data quality, uniformity, accuracy and completeness.
  • Security Monitoring: vulnerability management, end-points, data access and authorization, and event monitoring.

Troubleshooting & Infrastructure: Striking A Balance

The truth is that the most challenging part of continuous monitoring is not being overwhelmed by useless metrics and alerts. When first starting CM, many focus on the default, usually low-level metrics, such as CPU usage. However, these metrics aren’t good at predicting when a problem is about to arise.  

To find issues, DevOps professionals need to examine the problems your organization has. When asked about data and log messages, John Annand, Research Director for Infrastructure and Operations at Ontario, Canada-based Info-Tech Research Group says that logging and monitoring are two different activities. “With monitoring, you know what you’re looking for in an attempt to be proactive and deal with a condition before it results. Logging is useful for trying to figure out what went wrong after the fact. It’s a fine balancing act, usually informed by some past incident where the RCA indicated that having some information would have helped resolve the incident. This is why some security logs are kept for upwards of 2 years — as investigators may want to go back that long.”

Similarly, if you realize you did not collect enough data, adjust. To help us prevent overwhelming the infrastructure, Matt Dickens, Chief Product Officer at Cambridge, UK.-based Gearset suggests that “deciding on the right amount of data to collect is undeniably a challenge. Too many queries can slow databases, but the human element is equally important — too much data is simply overwhelming. The most useful monitoring tools provide options to configure alerting so that the right alerts get to the right people via the right channels — ideally along with actionable recommendations.”

Yet, there is no perfect guideline to ensure you strike the ideal balance between collecting data and overwhelming the infrastructure. The trick with continuous monitoring is to learn with each iteration. If after the first sprint, you realize you overwhelmed the infrastructure, adjust accordingly.