eG Total Performance Visibility
  •   Facebook  Linkedin  twitter  Blog


eG Server Hardware Monitor




The need for monitoring applications and software is unquestionable, but hardware monitoring is equally important. Sometimes, a malfunctioning hardware component can cause server downtime, thereby adversely impacting the performance of a critical business service. Detecting and fixing a hardware problem on time can increase service uptime and enhance customer satisfaction. Furthermore, if a hardware failure is not identified and addressed on time, it could cause irreparable damage to the hardware, bring down critical IT services, cause colossal data loss, and catapult maintenance costs.
divider

Challenges in Monitoring Hardware

Hardware Monitoring Software
Fig: Hardware monitors supported by the eG agent
One of the biggest challenges in managing hardware is heterogeneity. IT infrastructures typically comprise of equipment from multiple manufacturers. Each manufacturer provides their own solution for monitoring and managing their hardware. For example, Sun Microsystems provides the Sun Management Center for managing Sun hardware, Compaq/HP provides Compaq/HP Insight Manager managing their servers, and Dell provides Dell OpenManage for its servers. In a multi-vendor environment, IT administrators require a single integrated console from where they can monitor the heterogeneous hardware components that they are responsible for. Furthermore, administrators require the ability to correlate between the performance of the hardware and the user view of the IT services that use the hardware, so that problems can be identified as being caused by the hardware or by the software.

Hardware Monitoring by eG Enterprise

eG Enterprise offers integrated monitoring of multi-vendor hardware from a central console. Monitoring of IBM, HP, Sun Microsystems and Dell hardware is supported, irrespective of the operating system used (Windows, Linux, HPUX, Solaris, or AIX). Monitoring of servers running Windows, Linux, HPUX is done using SNMP. eG agents integrate with IBM Director agents, HP Insight agents, and Dell OpenManage agents for this purpose. For Solaris and AIX native OS hooks and commands are used to collect hardware status information.

Hardware Monitoring
Fig: The eG monitoring console displaying the status of a memory partition
While agent-based monitoring is required for monitoring Sun Solaris hardware, Compaq/HP, Dell, and IBM servers are managed using SNMP. Therefore hardware monitoring for these servers can also be done in an agentless manner (i.e., without installing eG agents on the servers being managed).The hardware metrics collected by eG agents include the status of processors, memory banks, fans, temperature, voltage, and power supply terminals

All of these metrics are integrated into specialized application models for over 120 applications, servers, and devices that eG Enterprise includes. A patented root-cause diagnosis engine continuously analyzes the metrics to identify if there are any bottlenecks in the system, and where they are: the network, the operating system, the application, or the hardware.

Using these metrics, administrators can proactively detect hardware failures and rectify them, and therefore considerably reduce related service downtimes. eG Monitor for Hardware, with its ability to accurately report the status of multi-vendor hardware running heterogeneous operating systems, is an ideal solution for hardware monitoring.

divider

What the eG Server Hardware Monitor Reveals?

  • Is the server hardware working properly?
  • Which hardware unit on the server is currently experiencing issues? Is it the chassis, a power supply unit, any voltage / amperage probe, any of the cooling devices, a temperature probe, a memory device, or a chassis intrusion detection device?
  • Are all processors available for use?
  • Is any memory partition currently unavailable? Which one is it?
  • Is the correctable memory error log feature enabled on the server?
  • Have any memory errors occurred recently?
  • Are there any unavailable disk drives? Which ones are they?
  • Have any physical or logical drives failed?
  • Are the cooling units/fans working properly?
  • Is any current sensor currently inactive?
  • What is the current voltage of the power supply? Is it normal or abnormal?
  • What is the current status of the power supply units on the server?
  • Have too many system faults occurred?
  • Is the temperature of any hardware units(processor, memory unit, etc.) unusually high?
  • Is the automatic server recovery feature functioning normally?
  • What about the drive array sub-system? Is its condition normal?
  • Is the array controller operating normally? What about the array controller's board? Were any abnormalities detected in its operations?
  • Are all Light Emitting Diodes (LED) active?

* Actual capabilities may vary according to the operating system in use and server hardware.

Benefits of the eG Server Hardware Monitor

  • An integrated multi-vendor monitoring solution: Use eG Enterprise to monitor the status and performance of multi-vendor, multi-platform hardware components at anytime, from anywhere, from a central web console. Administrators no longer need a separate console for HP hardware, another for Dell, another for Sun, and so on.
  • Leverages investment in existing hardware and monitoring agents: eG Enterprise integrates with HP Insight, Dell OpenManage, and IBM Director, so you can leverage your existing deployment of hardware agents.
  • Flexible monitoring options: eG Enterprise allows you the flexibility to choose between the agent-based and agentless approaches to monitoring hardware.
  • Proactive planning and enhanced server uptime: eG Enterprise enables you to collect, consolidate, and present a wealth of performance results pertaining to the monitored hardware. This information is critical for historical analysis, trending, and proactive planning, so that server downtimes can be minimized.
  • Proactive alerting: Instantly be notified of hardware and software issues, in many cases well before the actual failure occurs. Administrators can thus initiate corrective actions very early in the process, thereby ensuring minimal or no impact on the service performance.
  • Automatic correlation and accurate root-cause diagnosis: With eG Enterprise, you can look across hardware and software layers of a server, automatically correlate performance across these layers, and accurately identify the root-cause of problems.