Monitoring EMC VNX Unified Storage

eG Enterprise offers a specialized Vnx Unified Storage monitoring model that monitors each of the key indicators of the performance of EMC VNX - such as the disks, file systems, volumes, DAEs, LUNs, etc. - and proactively alerts administrators to potential performance bottlenecks, so that administrators can resolve the issues well before end-users complain.

Figure 1 : The layer model of EMC VNX Unified Storage

Each layer of Figure 1.2 above is mapped to a variety of tests, each of which report a wealth of performance information related to the VNX unified storage. Using these metrics, administrators can find quick and accurate answers to the following performance queries:

  • Is the VNX storage system available over the network?
  • How responsive is VNX to requests over the network?
  • Are all the hardware components of the VNX storage system up and running? If not, which hardware component is unavailable - is it the fan? the power supply unit? or the LCC?
  • Is the VNX storage system using network bandwidth optimally? If not, which NIC on VNX is consuming bandwidth excessively?
  • Is any disk too busy? If so, which one is it?
  • Which disk is too slow in processing I/O requests? What type of I/O requests does it process very slowly - read or write requests?
  • Has any disk failed?
  • Is any disk consuming too much bandwidth? If so, which one is it?
  • Which disk is running out of disk space?
  • Are the read and write storage processor (SP) caches used optimally? Which storage processor's cache may require right-sizing, and which cache is it - read or write?
  • Which SP port is down currently?
  • Is the SFP (small form-factor pluggable module) of any SP port faulted?
  • Is any SP over-utilized?
  • Is any SP idle?
  • How are the data movers using their caches? Which cache's usage is most ineffective ineffective - Directory name lookup cache, Open file cache, or kernel buffer cache?
  • Which data mover has too many blocked threads?
  • Which data mover is experiencing a CPU and/or RAM contention?
  • Is the statmon service on any data mover not running currently?
  • Which data mover is processing the CIFS read/write requests to it very slowly?
  • Which data mover is processing the NFS read/write requests to it very slowly?
  • Which file system on which data mover is using too much storage space?
  • Are too many I/O requests in queue for any LUN? If so, which LUN is it?
  • Which LUN is experiencing too many errors? What type of errors are these - hard or soft?
  • Is any LUN making poor use of its read/write cache?
  • Which disk volume is running out of space?
  • Which disk volume has too many pending I/O requests?
  • Which meta volume is experiencing a processing bottleneck?