What is ‘Enterprise Class’?
‘Enterprise class’ is a buzzword that refers to applications that are designed to be robust, flexible, and scalable for deployment by a large organization. There are no firm standards for what makes an application or platform enterprise class, but enterprise-class applications are generally:
- Open and compatible with existing tools
- Customizable for the needs of specific departments
- Powerful enough to scale up along with the needs of the business using it
- Fit into existing and future architecture
- Secure from inside and outside threats and data leaks
Why Does it Matter?
When any product is developed, there are assumptions made. These assumptions dictate how widely the tool can be deployed and what constraints it has during usage. For instance, there may be an assumption that the product will be the only tool used by an organization. This means there is no effort being put in for integrating the tool with other systems in use in an organization.
If your organization is deploying a product, you will have to consider several questions:
- Does the tool give you several options for deployment?
- How does its licensing work? Does it use an all-or-nothing approach – i.e., if you choose to use it, will you have to monitor everything in your infrastructure by using the same tool?
- What changes to your security policy, network architecture, and/or usage model is required to deploy the tool?
- Does the same tool serve the needs of different stakeholders in your organization?
The above are just a few of the questions you will need to answer when choosing a tool for your deployment.
In this blog, we will consider the top 10 characteristics that an enterprise-class IT application and infrastructure monitoring tool should have:
- 1 Role-Based Access Control
- 2 Integration with Existing Systems and Tools
- 3 Easy Deployment in line with Corporate Security and Firewalling Policies
- 4 Simple Licensing
- 5 Being Extensible
- 6 Multi-Tenancy Support
- 7 Support for a Wide Range of Platforms
- 8 Limited Visibility Operability
- 9 Vertical and Horizontal Scalability
- 10 High Availability Configuration Support
#1 Role-Based Access Control
In any mid or large organization, there are different stakeholders.
- The main users of the monitoring solution are, of course, the IT operations folks, who are responsible for handling and fixing any problems that occur.
- Then, there is the helpdesk staff, who are only focused on triaging a problem quickly and determining which IT team/expert needs to handle a problem.
- IT domain experts, such as database admins, application admins, and such are interested in the performance of their respective domains only.
- IT architects take a longer-term view of performance – they are interested in capacity planning and in ways to optimize the infrastructure to get more out of current investments.
- Finally, executives are interested in higher level performance views – they are interested in learning about any IT issues that can affect the business of the organization.
An enterprise-class monitoring solution must make it possible to easily configure Role-Based Access Control (RBAC) to the different personas defined above. Dashboards and reports available to each persona must match their areas of interest.
Such roles may already be defined in the Active Directory or Identity Management Systems, so the monitoring system must integrate with such systems.
#2 Integration with Existing Systems and Tools
Most organizations are not starting from scratch. They have monitoring tools in place and processes that they are following to detect issues, troubleshoot problems, and resolve them. Ticketing tools, automation solutions, dashboarding tools, and report formats may already be in place.
Any new monitoring solution that is being introduced to these organizations must integrate seamlessly with these existing tools and processes. For example, ServiceNow could be deployed by an organization for incident management. Not only should the monitoring tool be able to create incidents automatically in ServiceNow, but it should also be able to update the incidents whenever its monitoring detects that a problem’s severity has been changed. It should also close the incident automatically when it detects that a problem has been resolved.
In a similar vein, the monitoring solution should be deployable using the same software mechanism that the organization uses to auto-deploy software to desktops and servers. The monitoring system must also expose a published REST API so that integration with external dashboard generation and reporting tools is supported. This will allow the organization to generate reports in a common format across different tools.
#3 Easy Deployment in line with Corporate Security and Firewalling Policies
Security is always top of mind for most CIOs. Monitoring tools for enterprises need to be simple to deploy and must align with existing security and firewalling policies. A few requirements in this regard include:
- Monitoring agents deployed on production servers can be a point of vulnerability, if they listen on any TCP port. Hence, a push architecture, with agents pushing metrics to the management server is preferred.
- If the monitoring system is deployed on a SaaS model, agents from within the corporate network have to communicate with management servers that are external to the network. The monitoring tool’s deployment architecture should not require the monitored systems to be moved out of the DMZ for them to be monitored.
- Ideally, communication between agents and the management server must use open, secure protocols so that corporate firewalls and policies need not be modified to enable the monitoring tool to function.
- Many organizations require external communications to be routed through a secure proxy server. Proxy support is a must for the monitoring system. For enhanced security, authenticated access to the proxy server is preferred.
- Password complexity rules and policies may already be specified in an organization’s Active Directory. The monitoring tool should seamlessly integrate with such systems.
- In organizations with diverse infrastructures, different regions could use the same private IP address range. The monitoring system must be able to handle such situations – i.e., systems should not be identified and monitored using their IP address.
#4 Simple Licensing
To make it easier for adoption, a monitoring tool must have a simple licensing policy:
- Licensing by OS or physical servers is the easiest option. Licensing based on server cores, CPUs, and sockets is a commonly-used model and this makes it complicated for an organization to deploy. They have to buy additional monitoring licenses every time they change the configuration of their servers!
- The monitoring tool must not adopt an all-or-nothing licensing model. Many tools assume that they are exclusively used within the organization. This is often not the case in large enterprises. For instance, many organizations have large virtual infrastructures and they could be using multiple toolsets. Only some of the VMs in the virtual infrastructure may support for a service, e.g., Citrix access. The monitoring system should not require that the organization procures licenses for all the VMs, when the IT administrator is interested in monitoring only the Citrix VMs. It should be possible for administrators to pick and choose what they want to monitor and only license those components.
- Licensing should be based on the target infrastructure to be monitored, not by the number of users accessing the enterprise monitoring solution. This allows every stakeholder to access the monitoring solution without any direct impact on cost.
#5 Being Extensible
Enterprises often have unique requirements. They may have home-grown applications. There may be backup jobs that need monitoring and custom schedulers may be used. Out-of-the-box capabilities of most monitoring tools may not suffice for such organizations.
The monitoring tools must provide open APIs for organizations to add their own monitoring capabilities. These custom metrics must be integrated seamlessly into the monitoring framework and should be handled in the same way as other out-of-the-box metrics (e.g., support for multi-level alerting, auto-baselining these metrics, etc., should be supported). As most IT administrators are not programmers, it should be possible to extend the tool without having to write software programs for the integration. IT admins should be able to add new metrics by configuring search patterns in log files, by writing shell, perl or batch scripts, by pointing to SNMP OIDs of interest, etc.
#6 Multi-Tenancy Support
Large enterprises have complex organizational structures. There are clear demarcations of roles and responsibilities across regions and across departments. When supporting multiple entities (e.g., departments/regions in an organization or customers/tenants in the case of a managed service provider), each entity may need control over how monitoring is done for its infrastructure. Settings such as the frequency of monitoring, thresholds for alerting, maintenance periods, etc., may be distinct for each entity. Furthermore, administrators of these entities may need the access to be able to add/remove applications/systems for monitoring.
Requiring one instance of the monitoring tool for each entity leads to wasteful usage of resources. The ideal monitoring solution should support multi-tenancy, so one instance of the monitoring tool is deployed, and different personalized views can be configured for each entity.
#7 Support for a Wide Range of Platforms
While smaller organizations standardize on one or a few platforms (OS, virtual platforms, storage technologies, etc.), large organizations have a wide range of platforms deployed. Each department or region might make its own purchasing decisions and hence, different technologies exist in large organizations. Therefore, a monitoring platform for large organizations needs to have broad platform coverage, supporting a number of operating systems, virtualization platforms, cloud technologies, storage technologies, and so on.
#8 Limited Visibility Operability
One of the biggest failings of some monitoring tools is that they require their agents to be deployed in every tier of the infrastructure. This may be possible in a small enterprise where one administrator or one administration team may be managing all the different tiers of an infrastructure. In large organizations, this is rarely the case. Each department or region may have its own IT administrator. Each tier of the infrastructure may have its own administration team – e.g., the database team manages databases, the network team manages the network tiers, and so on. Each team may use its own set of tools for administration and monitoring. It is impossible to get all the different IT teams in a large enterprise to use a common toolset.
Monitoring tools for large enterprises should be able to operate with limited visibility. The tool must be designed to provide its users with sufficient insights so that when performance issues arise, they are able to point out which domain a problem might lie in. For example, an application team may use a monitoring tool. One of their main concerns is whether network issues are impacting application performance. The network team may not provide the application team with access to router statistics for them to see what is happening in the network. In such a case, monitoring TCP retransmissions from the servers can provide the application team with insights into how the network is performing. When the network connectivity is poor, excessive TCP retransmissions will occur, thus reducing throughput and causing slow response. This is an example of how monitoring tools need to be designed keeping in mind the access restrictions that are common in large enterprises.
#9 Vertical and Horizontal Scalability
Large enterprises have hundreds of servers that may be supporting thousands of users. Their scale of operation imposes interesting challenges for monitoring tools. Every aspect of the monitoring tool must be tuned to handle large workloads. The historical data storage could run into terabytes and efficient caching methodologies need to be incorporated to provide fast response. Agents that poll target devices must be scalable, so instead of deploying tens of such pollers, a smaller subset can be deployed. The management system must be optimized to support monitoring, analysis, and reporting of millions of metrics that will be collected by agents deployed in such environments. From a visualization standpoint also, large enterprises pose challenges. Dashboards need to be designed with scalability in mind as well.
#10 High Availability Configuration Support
Larger enterprises have strict SLA guarantees they offer to users. Hence, high availabilty is another key requirement. High availability (HA) has to be considered for each component of the monitoring system – for the database, for the management server, and even for agents that may be polling hundreds of devices. Depending on the business need, there can be different options to consider:
- Is HA required within a region or across regions?
- Can HA components (e.g., servers in a HA cluster, databases in a HA cluster) be used or not?
- Can the HA solution be in a cold standby configuration, or it does it have to be a live standby that is automatically activated whenever a failure happens?
An ideal monitoring solution will provide a range of HA options depending on the answers to the questions above.
There are many choices for monitoring tools in the market. While a lot of the focus is on the functionality of the tool, enterprise customers must pay attention to the ability of the tool to address their unique needs.
In this blog post, we have spelt out the top 10 requirements that enterprises have from an application and infrastructure performance monitoring tool.