Extending and Integrating the Monitoring System with Automation and Scripting

Admins like scripts a lot and there are a number of different type of scripts – custom scripts, community scripts, etc. Would you rather have your monitoring vendor provide built-in functionality or a whole bunch of community scripts? What’s the best role for scripts in IT monitoring?

eG Enterprise Features in Focus Series

One of the hidden gems within eG Enterprise is the ability to perform remote actions and automated tasks using built-in functionality. In conversations with customers and community peers, I often get asked why we at eG Innovations don’t offer functionality in regard to adding custom scripts and a community database of shared scripts.

Many questions surround monitoring and scripting to support IT monitoringThese questions got me thinking about monitoring & scripts to take actions:

  • Why would you ask for the ability to perform custom monitoring/data collection scripts instead of asking a vendor to support a specific functionality?
  • When and where is it useful to execute tasks/actions and should they be custom scripts or out-of-the-box?
  • What are the pros & cons of using custom (community) scripts versus vendor-built and maintained functionality?

The focus for this article is to highlight:

  • Who typically uses/interacts with the monitoring console and why they would need the data and ability to perform automated tasks and manual actions?
  • What level of availability and performance data is displayed and what tasks/actions might be required?
  • When would scripts, tasks and actions need to be executed or triggered?
  • Where is it better/worse to have out-of-the-box vs customer scripting
  • How is it possible to have the best of both worlds?

Answering the Question: “Why is my Application or Desktop Slow?”

Kirana ProgramAt eG Innovations, our main goal is to give an answer to probably the hardest question in the complex world of IT applications or infrastructures: “Why is my application or desktop slow?” Answering this question is not an easy task for many reasons:

  • IT infrastructures today support a variety of applications and technologies. Citrix/VDI, web applications, SAP and other packaged applications, industry specific applications (e.g., Cerner, Epic in healthcare, Oracle financials, eCommerce applications), SaaS applications like Microsoft 365 are just some of the many applications that have to be managed.
  • Some are deployed on-premises, others may be entirely cloud-based, and some may be used in a hybrid cloud model.
  • Many of these applications involve multiple tiers of hardware and software working together. Each and every tier has to be monitored in order to identify possible bottlenecks.
  • Finally, most IT teams do not have experts in each of these technologies. Even if they do have experts, these experts do not have the time to spend routinely tracking every application, every layer and every tier.

Therefore, it is no wonder that a big part of IT budgets today is spent on troubleshooting and delivering the highest possible user experience to end users.

Domain Expertise is a Must for Effective Monitoring

Domain expertise is a must for effective IT monitoringAnswering the question “why is my application or desktop slow” requires insights and expertise into every layer and every tier of the application topology chain. In order to be effective, a monitoring tool needs to embed expertise to monitor each tier and layer of the infrastructure. Monitoring just standard CPU, memory and disk resources or the up/down status of each tier is just not sufficient.

At eG Innovations, the monitors for 200+ infrastructure and application components we support embed domain expertise relating to each of these different technologies. A specialized model / profile for each technology/tier is built into our product suite.

The domain expertise we build into the monitoring solution focuses on answering several questions:

  • What are all the important characteristics of a component that could impact availability and performance?
  • How can we collect the relevant performance metrics from each component type (which API, log, command) and yet not add significant overhead from the monitoring?
  • How frequently should we collect these metrics?
  • When should alarms/alerts be triggered by default thresholds for these metrics and how to use machine learning to auto-tune these thresholds?

eG Enterprise comes with built-in domain expertise

Specialized model for a Microsoft application

Metrics by themselves only highlight a potential problem. Often, additional diagnosis is necessary – e.g., a web application is seeing “internal server errors,” which URLs were affected, or a Citrix user is taking up excessive resources, which applications are causing this? Diagnosis may also be needed even when a metric is behaving normally, and such diagnosis may provide insights into potential issues. The diagnosis to collect is also part of the model/profile for each technology tier.

Models/profiles for each technology tier are developed and evolved based on various inputs:

  • Vendor-recommended best practices on what metrics are important for their technology (Cisco, Microsoft, VMware, etc. routinely publish these and update them).
  • Our internal consultants who work with customers and understand customer needs.
  • Feedback from thousands of eG Enterprise users and partners worldwide.
  • Consultations with external industry experts (CTPs, MVPs, etc.).

The metrics and diagnosis are stored in our database and can be used for live analysis as well as for historical and predictive reporting. This is particularly important because in many cases, you may be called upon to troubleshoot an issue that occurred when you were not looking at the console – e.g., a server suddenly rebooted, or a Citrix user got disconnected in the middle of a session. You will then have to look into what happened in the past. Running a script after the event has occurred may not give you the insights you need to make sure the problem does not occur again in the future.

Domain-specific requirements also have to be considered when designing dashboards and reports. The dashboards and reports that a web application administrator needs are likely to be very different from what a Citrix administrator needs. Domain expertise is factored in at every stage of our product development process.

Slow time is the new downtime – monitor the digital experience of users in real time

Dashboard tailored for a Citrix administrator’s needs

As you can see, a lot of thought goes into designing the models/profiles for each component we support. Whenever there is a new version of that component released by a vendor, we consider:

  • Has there been a significant architecture change that warrants a complete revamp of the monitoring model?
  • Are there new capabilities introduced in the new version that we need to add monitoring for?
  • Are there new APIs we need to consider to enhance our models?

Where Does Scripting Fit In?

scripting Code
What is Scripting?

A scripting or script language is a programming language for a special run-time environment that automates the execution of tasks; the tasks could alternatively be executed one-by-one by a human operator. Scripting languages are often interpreted, rather than compiled.

Limitation of Custom Scripting

Scripts are often used to extend the functionality of a monitoring system. IT operations teams and administrators are better equipped to write scripts rather than full-fledged programs.

Custom Scripting is important but has limitations.The extensive built-in domain expertise included in eG Enterprise means that we have out of the box functionality to support most of the monitoring tasks related to specific infrastructure and applications components. As a result, 99%+ of customers’ demands are fulfilled with the out of the box functionality offered by eG Enterprise.

The difference between built-in functionality and custom scripts is that built-in functionality is fully supported by the vendor and when there is an issue, the vendor supplies the fix. This is not possible for custom scripts. You are reliant on the author or community; this is why eG actively engages with customers to learn about new requirements they have and to support these requirements in upcoming versions as a built-in capability.

What About Community Scripts?

Many vendors have community script databases – users or industry experts contribute scripts to these databases. The advantage with this is you are not dependent on the vendor for development and not limited to their release cycles. At the same time, bear in mind some of the limitations of community scripts:

  • These scripts are maintained by the people who developed them. If the developer changes employment or has a new project that he/she needs to dedicate time to, the development and support stops.
  • Sometimes you may have multiple scripts for the same task. Which one is the right one?
  • Each developer creates scripts for their specific use case. Your use case and environment could be different, and the script may not be directly usable by you. And even if you do, remember you are using it at your own risk!
  • Another key concern is security. How do you ensure that these uncontrolled scripts are not doing any harm to your production systems?

Scripting vs. Built-In Functionality: The Tradeoffs

As I already mentioned, a great majority of scripts available in vendor and community websites are already covered by the built-in functionalities of eG Enterprise. The advantage of built-in functionality is the additional support provided by the vendor if something doesn’t work as expected, validation of the functionality across different environments and use cases, and ongoing enhancements to the functionality that become necessary whenever the monitored infrastructure and applications are upgraded.

I should also add that scripting is often great for demonstrating capability and integration but does not often provide the full range of capability provided by a tight integration. Let us take a few examples:

  • Scripts need to be tightly integrated into the IT monitoring system.Detailed diagnosis: If scripts are not tightly integrated into the monitoring system, they can be configured to run and the results of the script can be emailed or displayed on the screen, but are not processed by the monitoring system. In such a case, if you have to diagnose a problem that happened when you were not around, you’d have to search your email to see what the script output was when it ran. Not the best way to be monitoring your infrastructure!
  • A major part of IT monitoring is the Alert Generation process.Alert generation: In order to send alerts to Microsoft SCOM, you could have a script that sends SNMP traps from the monitoring system. However, the integration built into eG Enterprise goes well beyond this. Applications and infrastructure elements discovered by eG Enterprise are auto-displayed in the SCOM console. Topology views in eG Enterprise are displayed in SCOM automatically, and the real-time status of every layer and every tier monitored by eG Enterprise is reflected in the SCOM console. And when required, a one-click in-context drilldown is possible from Microsoft SCOM to eG Enterprise.
  • Trouble-ticketing integration makes it easy to send the alert to the appropriate destinationTrouble ticketing integration: Another example is integration with ITSM tools like ServiceNow (SNOW), PagerDuty, AutoTask and others. It is easy to create a script that sends an alert from the monitoring tool to the ServiceNow console. But how do you make sure that the same alert is not reopened when the priority of the alert changes? Or how do you make sure the problem is closed in the ticketing tool when the alert is closed in the monitoring tool? To do this, you need the monitoring tool to be intelligent – to map an alarm ID to a ticket number and to use the ticket number for updates/closures. A script that executes once when an alarm is generated does not provide this capability.

How eG Enterprise Supports Scripting

Scripts have an important role to play when you have requirements that are unique to your organization, to your use case, or where you are in advance of the vendor built-in capabilities. eG Enterprise supports scripting in many areas:

1. Build your own automated monitoring capability

You may be having a custom application and may need to monitor its functioning. With eG Enterprise’s extensibility module, you can add new models and metrics to the system using different programming-free ways. You can write scripts that invoke and parse OS/application commands, you can use built-in log parsing capability to detect error conditions, and you can write SQL queries that extract key metrics from your application database. SNMP, REST APIs, and WMI/perfmon are other means by which you can extend the built-in monitoring capabilities of eG Enterprise.

Extensible Application Management

2. Quick actions / command execution on remote systems

From time to time, you may need to remotely connect to a system being monitored and run some commands/actions (e.g., reconfigure a system). eG Enterprise offers “Remote Actions” for this purpose. Remote Actions are contextual and focus on specific actions which an administrator often uses when monitoring a specific component. There are many remote actions built into eG Enterprise, but you may find certain other commands or actions to be required as well. You can add new commands or add your own scripts to our built-in repository of remote commands.
The remote actions control enables remote connections to systems that are being monitored

Remote Actions you can perform on a Citrix Virtual Apps server or a VMware Horizon RDS server

3. Auto Correction scripts

Auto-correction scripts mitigate many IT issuesAuto correction scripts are used to automatically mitigate (potential) issues. By using them carefully, you can increase service uptime and lower mean time to repair. Towards this end, eG Enterprise embeds an optional auto-correction capability that enables eG agents to automatically correct problems in the environment as soon as they occur. With this capability, as and when an abnormal situation is detected, an eG agent can initiate corrective actions automatically to resolve the problem.

The corrective actions to be performed may vary from one environment to another. The actions to be used may also be different based on the preferences of the administrator involved. eG Enterprise allows administrators to add custom corrective scripts for each metric. These scripts are triggered whenever the corresponding anomalous condition is detected.

Conclusion 

To summarize, eG Enterprise offers great flexibility to enhance an IT administrators experience. It offers an extensive library of out of the box available component specific remote tasks, automation regarding corrective actions and integration with 3rd party products. All of this being an integral part of the solution and therefor fully supported by eG Innovations which results in an Enterprise solution

Where customized scripting is out of the box available, eG Innovations makes sure that scripts are never required to work around feature gaps and will only be required for niche edge use cases. We actively engage with customers to look into specific scripting needs, and where feasible integrates the required functionality as an out of the box feature which is then obviously fully supported.