Monitoring the Cisco UCS Manager

eG Enterprise provides a 100%, web-based Cisco UCS Manager monitoring model that periodically monitors the Cisco UCS manager, discovers the chassis, I/O modules, blades, and fabric interconnects managed by the UCS manager, and determines the current status of each of these components. 

Figure 1 : Layer model of the Cisco UCS Manager

Each layer of the layer model is mapped to a series of tests that instantly capture current/potential abnormalities in the state and functioning of the core components managed by the Cisco UCS manager, and alerts administrators to the same. With the help of the metrics collected by these tests, administrators can find quick and accurate answers for the following queries:

  • Are all I/O modules (i.e., fabric extenders) operating normally? Is any I/O module in a degraded/powered-off/inoperable state currently? If so, which one is it?
  • Is any I/O module experiencing any critical performance issues now?
  • How is the power/voltage/thermal states of the I/O modules?
  • Is any I/O module missing?
  • Is the temperature of all I/O modules normal? Is any I/O module experiencing abnormal temperatures?
  • Is any fan inoperable? In which chassis, does this fan exist?
  • Does any fan operate at abnormal speeds?
  • Is any fan experiencing any performance failures?
  • Have non-recoverable problems occurred in the power/thermal /voltage states of any fan?
  • How is the overall health of the chassis? Is any chassis in an inoperable state currently?
  • Is any chassis license-insufficient?
  • Are the power/thermal/voltage states of all chassis normal?
  • Is any chassis receiving / transmitting more power than it can handle?
  • Which fan module is currently in an inoperable state?
  • Which fan module is behaving abnormally?
  • Are all backplane ports healthy?
  • Have any operational/performance issues been detected in any of the PSUs in the chassis?
  • Which PSU is receiving voltage over 210 volts and emitting voltage over 12 volts?
  • Are the fabric interconnects operating normally?
  • Do the fabric interconnects have enough CPU and memory resources at their disposal? Is any fabric interconnect experiencing a CPU/memory contention?
  • Are the PSUs of the fabric interconnects operating normally?
  • Is the power/voltage input and output of the PSUs within acceptable limits?
  • Have any uplink ethernet ports failed?
  • Which uplink ethernet port is seeing very high traffic?
  • Are the fans of all fabric interconnects operating normally?
  • Is any uplink fibre channel port in an abnormal state?
  • Are there any disabled uplink fibre channel ports?
  • Is any fibre channel port seeing very high traffic?
  • Is any fibre channel port experiencing too many errors in transmission?
  • Are the blade servers in a chassis healthy?
  • Is any blade server unavailable?
  • Is the power state/slot state of the blade servers OK?
  • Are the blade servers utilizing memory optimally? If any blade server over-utilizing the memory?
  • Is the motherboard of any blade server consuming power/current excessively?
  • Is the temperature of the motherboard normal? If not, then which side of the motherboard is experiencing abnormal temperatures - the front or the rear?
  • Is the temperature of any memory array of any blade server very high?