Automatic Baselining

What is Automatic Baselining?

Automatic baselining in IT monitoring refers to the process of dynamically establishing performance baselines for various metrics and parameters within an IT infrastructure. It involves using machine learning algorithms and statistical analysis of metrics to continuously analyze and adapt to the changing behavior of monitored systems. AIOps monitoring and observability systems implement automatic baselining amongst other automation technologies.

How does Automatic Baselining work?

When implementing automatic baselining, monitoring tools collect data on performance metrics such as CPU usage, memory utilization, network traffic, response times, and other relevant indicators. By analyzing this data over time, a monitoring system can establish a baseline or normal range of values for each metric.

The baseline is created by identifying patterns and trends within the collected data. The system takes into account factors such as daily, weekly, or seasonal variations, as well as periodic events like backups or maintenance activities. By considering these patterns, the system can differentiate between normal operational behavior and abnormal conditions.

Once the baseline is established, the monitoring system can compare real-time data against the baseline to identify deviations or anomalies. If a metric exceeds or falls below the expected range, crosses a threshold, the system can trigger alerts or notifications to inform IT administrators or operators of potential issues or performance degradation.

Automatic baselining is an essential component for setting dynamic thresholds for alerting systems. Dynamic thresholds that can be configured for alerting in modern, cloud-native systems such as Microsoft Azure Monitor and AWS CloudWatch as well as other observability solutions like eG Enterprise.

What are the advantages of Automatic Baselining in IT monitoring?

Automatic baselining enhances monitoring accuracy, enables proactive issue and anomaly detection, and contributes to better system performance and reliability.

  • Adaptability: Automatic baselining can adjust to changes in system behavior, accounting for fluctuations in metrics based on usage patterns and environmental factors. Application servers hosting different types of applications can have vastly different resource usage patterns.
  • Time Efficiency: By automatically establishing baselines, the monitoring system saves time that would otherwise be spent manually configuring and adjusting thresholds for each metric.
  • Scalability: In modern dynamic IT systems leveraging technologies such as Kubernetes, containers, auto-scale and microservices, resources are automatically deployed at scale automatically. Auto-baselining ensures that resources are automatically monitored as they are deployed without the need for manual intervention.
  • Proactive Issue Detection: Detecting deviations from established performance baselines allows IT teams to identify potential problems or performance bottlenecks before they significantly impact the system or end-user experience.
  • Reduced False Positives: Automatic baselining can help reduce false positive alerts by considering normal variations in system behavior, thus improving the accuracy of issue detection.
  • Performance Optimization: Baseline analysis provides insights into the expected behavior of monitored systems, enabling IT teams to optimize resources, fine-tune configurations, and improve overall system performance.

What are the disadvantages of Automatic Baselining?

The dynamic thresholds calculated by auto-baselining technologies can have some problems, including:

  • Dynamic thresholding can become confused when normal cyclic patterns are not adhered to e.g., a public holiday means only a small number of staff logon on a weekday.
  • Auto-baselined monitoring tools deployed in a broken or poorly performing IT environment can learn that state as normal and even start to send alerts due to it getting better.
  • Dynamic systems are also inclined to view things that get broken for a while as the new normal. If a storage array slowly gets overloaded and unresponsive, some auto-baselining monitoring systems will register the overloaded state as the new normal.
  • Systems deployed in test or pre-production may have little realistic load e.g., VMs are not accessed by real users using applications auto-baselining may benchmark their normal usage as 3% CPU.

Many of the inherent limitations of automated baselines can be overcome where the dynamic thresholds and ranges they determine can be combined with static thresholds defined using human and a priori knowledge.