The Need for Fail-proof Availability
In many domains, IT infrastructures have been transformed from being support structures to mainstream channels that drive the day-to-day business. While accurate and instantaneous notification of problems is important, it is even more critical that the monitoring system(s) that perform such diagnosis and notification are available 24*7, so no problem goes undetected or uninformed. The manager redundancy option for the eG management console ensures “fail-proof” availability of the eG management suite.
|eG Agents in a cluster reporting to a secondary management console when
the primary manager is not available
The eG agents can report to either the primary or the secondary managers, with the assignment of agents to managers being controlled through the web console. The agents are capable of dynamically discovering the redundant cluster configuration, thereby allowing for new managers to be brought in without needing to perform elaborate reconfiguration of the agents. When one of the managers in a cluster is down, the agents automatically start reporting to one of the other managers in the cluster. The agents are continuously informed about the state of the managers in the cluster, so when a manager comes back up, the agents reset their configurations so they start reporting to the managers to which they were assigned.
At any point, an agent reports to only one of the managers in the cluster. The managers in the cluster are responsible for replicating performance data between them. This architecture ensures that there is no additional overhead on the agents for supporting the redundant configuration of managers.
All the performance statistics received by one of the managers in the cluster is replicated to the other managers in the cluster. This ensures that the secondary managers act as active standbys for the primary manager. In the event of any failure of the primary manager, users can connect to any of the secondary manager to view real-time and historical performance statistics. Real-time alerting via email or SMS is also dynamically handled by one of the managers in the cluster, thereby ensuring that operators do not notice any unusual downtime on the primary manager.
When one of the managers in a cluster goes down, the other managers in the cluster store the metrics they receive from the agents locally and upload the data once the manager that was down comes up. Temporary storage of metrics by each of the managers ensures that there is no data loss in the cluster even though one or more of the managers in the cluster may have gone down for a while.
Data replication in the redundant cluster is handled at the application-layer, thereby ensuring that special-purpose hardware or software is not required to support the cluster. Each of the managers in a cluster functions autonomously, receiving metrics from the agents or other managers in the cluster, and performing data analysis and correlation in real-time. The managers in the cluster can be deployed on heterogeneous hardware and software configurations – i.e., one of the managers can be deployed on Unix with an Oracle database backend, whereas another can be deployed on Windows with a SQL server database.
The redundancy option for the eG management console is a must-have for mission-critical environments that require 100% uptime of the management solution to enable rapid problem detection and immediate resolution.