Monitoring the Mellanox Switch

eG Enterprise has developed a dedicated Mellanox Switch monitoring model which periodically checks the data traffic to and from each network interface of the switch, the temperature and voltage of each module of the switch, the resource utilization etc, so that abnormalities can be detected before any irreparable damage occurs.

Figure 1 : The layer model of the Mellanox Switch

Every layer of Figure 1 is mapped to a variety of tests which connect to the SNMP MIB of the target Mellanox Switch to collect critical statistics pertaining to its performance. The metrics reported by these tests enable administrators to answer the following questions:

  • What is the current state and speed of each fan?

  • What is the current state each power supply unit?

  • What is the current state of each voltage sensor in each power supply?

  • What is the status of temperature sensor?

  • What is the current temperature of CPU?

  • How well CPU is performing, is it overloaded?

  • What is the physical and logical state of Infiband ports?

  • Is the current temperature within permissible range?

  • What is the number of packets being sent and received?

  • Is the number of packets steady over a range of measurements?

  • Is the switch able to prioritize the received traffic as per QoS priority rules?

  • Is the switch able to prioritize the trasmission traffic as per QoS priority rules?

  • Are there too many pause packets being sent by the switch?

  • Does the switch have enough free storage space for operation?