Introduction

The need for monitoring applications and software is unquestionable, but monitoring of the hardware is equally important. Sometimes, a malfunctioning hardware component can cause server downtime, thereby adversely impacting the performance of a critical business service. Detecting and fixing a hardware problem on time can increase service uptime and enhance customer satisfaction. Furthermore, if a hardware failure is not identified and addressed on time, it could cause irreparable damage to the hardware device as such, bring down critical IT services, cause colossal data loss, and catapult maintenance costs.

One of the biggest challenges in managing hardware is the heterogeneity. IT infrastructures typically comprise of equipment from multiple manufacturers. Each manufacturer provides their own solution for monitoring and managing their hardware. For example, Sun Microsystems provides the Sun Management Center for managing Sun hardware, IBM offers the IBM Director, Compaq/HP provides Compaq/HP Insight manager managing their servers, and Dell provides Dell OpenManage for its servers.  In a multi-vendor environment, IT administrators require a single integrated console from where they can monitor the heterogeneous hardware components that they are responsible for. Furthermore, the administrators require the ability to correlate between the performance of the hardware and the user view of the IT services that use the hardware, so that problems can be identified as being caused by the hardware or by the software.

eG Enterprise offers integrated monitoring of multi-vendor hardware from a central console. eG agents for Sun Solaris and AIX use native operating system commands and hooks to monitor the status of the hardware on these servers. For other operating systems (Windows, Linux, and HPUX), the eG agents can obtain hardware status information from IBM Director agents, Compaq/HP Insight Agents and Dell OpenManage agents. The eG agent interfaces with the IBM, Compaq/HP and Dell solutions using SNMP – periodically, the eG agent can poll specific MIB variables from the IBM, Compaq/HP and Dell agents to track the status of the server hardware.  While agent-based monitoring is required for monitoring Sun Solaris and AIX hardware, since IBM, Compaq/HP and Dell servers are managed using SNMP, hardware monitoring for these servers can also be done in an agentless manner (i.e., without installing eG agents on the servers being managed). Prior to eG Enterprise v6, the eG agents cannot collect the hardware status information whenever the target server was down or unavailable. From v6, the eG agent is configured to communicate with the remote server management processor/management card of the corresponding server and retrieve the necessary hardware statuis information. If the server to be monitored is an IBM server, then the eG agent communicates with the Integrated Management Module (IMM) and collects the required metrics. Likewise, the eG agent communicates with the HP/Dell servers and Solaris servers through Integrated Lights Out (ILO) management processor and Integrated Lights Out Manager (ILOM) respectively.

Some of the key questions that administrators can answer using the hardware monitoring capabilities of the eG Enterprise are:

  • Is the server hardware working well? 
  • What is the status of the cooling units/fans of a server?
  • What is the current temperature of a server? Is it within norms?
  • Are all power supplies of a server available? If not, which ones have failed?
  • What is the current voltage of the power supplies on the server?
  • How many memory devices are available on a server and are they all working well?
  • How many memory errors have been detected? Is there a faulty memory module on the system?
  • Is a server's drive array subsystem working properly?
  • Are the different physical and logical drives on a server working well? If not, what is their current condition?

Note:

Hardware monitoring requires only a basic agent license.