 |
Redundant Cluster Management by eGInnovations |
 |
|
|
The Need for Fail-proof Availability
In many domains, IT infrastructures have been transformed from being support structures to mainstream channels that drive the day-to-day business. While accurate and instantaneous notification of problems is important, it is even more critical that the monitoring system(s) that perform such diagnosis and notification are available 24*7, so no problem goes undetected or uninformed. The manager redundancy option for the eG management console ensures “fail-proof” availability of the eG management suite.
The eG Redundant Manager Cluster
The redundancy option for the eG suite ensures that there is no single point of failure while deploying monitoring for a mission-critical infrastructure. In this approach, a redundant cluster is created by grouping multiple eG management consoles. One of the managers functions as the “primary” manager to which operators can connect to view performance statistics as well as to perform administration actions. Any administrative changes made on the primary manager are propagated in real-time to the “secondary” managers.
 |
eG Agents in a cluster reporting to a secondary management console when
the primary manager is not available |
The eG agents can report to either the primary or the secondary managers, with the assignment of agents to managers being controlled through the web console. The agents are capable of dynamically discovering the redundant cluster configuration, thereby allowing for new managers to be brought in without needing to perform elaborate reconfiguration of the agents. When one of the managers in a cluster is down, the agents automatically start reporting to one of the other managers in the cluster. The agents are continuously informed about the state of the managers in the cluster, so when a manager comes back up, the agents reset their configurations so they start reporting to the managers to which they were assigned.
At any point, an agent reports to only one of the managers in the cluster. The managers in the cluster are responsible for replicating performance data between them. This architecture ensures that there is no additional overhead on the agents for supporting the redundant configuration of managers.
All the performance statistics received by one of the managers in the cluster is replicated to the other managers in the cluster. This ensures that the secondary managers act as active standbys for the primary manager. In the event of any failure of the primary manager, users can connect to any of the secondary manager to view real-time and historical performance statistics. Real-time alerting via email or SMS is also dynamically handled by one of the managers in the cluster, thereby ensuring that operators do not notice any unusual downtime on the primary manager.
When one of the managers in a cluster goes down, the other managers in the cluster store the metrics they receive from the agents locally and upload the data once the manager that was down comes up. Temporary storage of metrics by each of the managers ensures that there is no data loss in the cluster even though one or more of the managers in the cluster may have gone down for a while.
Data replication in the redundant cluster is handled at the application-layer, thereby ensuring that special-purpose hardware or software is not required to support the cluster. Each of the managers in a cluster functions autonomously, receiving metrics from the agents or other managers in the cluster, and performing data analysis and correlation in real-time. The managers in the cluster can be deployed on heterogeneous hardware and software configurations – i.e., one of the managers can be deployed on Unix with an Oracle database backend, whereas another can be deployed on Windows with a SQL server database.
The redundancy option for the eG management console is a must-have for mission-critical environments that require 100% uptime of the management solution to enable rapid problem detection and immediate resolution.
eG Redundant Manager Highlights |
• Fail-proof redundancy provided by one or more standby manager
• No special hardware for the primary or standby management consoles
• No special database requirements for the management console
• No impact on the agents that handle the monitoring
• Scalability through load balancing across the cluster of managers |
Benefits of the eG Redundant Manager Cluster |
| High availability of the management system : A secondary manager serves as an active standby for a primary manager, thereby ensuring high availability of the eG management console; the switch over between managers is done in real-time, thereby ensuring that no issue with the IT infrastructure goes unnoticed |
| Data consistency across managers : Critical performance data is replicated across the managers, thereby enabling a consistent view of the target infrastructure’s performance is maintained across the cluster |
| No impact on the agents : Redundancy is handled almost entirely by the management consoles; hence, the agents do not have any extra burden |
| Scalable load balancing across the cluster : Agents can be assigned to specific managers in the cluster, thereby allowing the load to be shared across the managers in a cluster |
|