Monitoring the EMC CLARiiON

eG Enterprise offers a specialized EMC Clariion SAN monitoring model that monitors the core functions and components of the CLARiiON storage device, and proactively alerts administrators to issues in its overall performance and its critical operations, so that the holes are plugged before any data loss occurs.

EMC CLARiiON Layer Model

Figure 1 : The layer model of the EMC CLARiion device

Each layer of this model is mapped to tests that monitor a critical component of the device such as the disks, the LUNs, the storage processors, etc

Once the pre-requisites discussed in Pre-requisites for Monitoring EMC CLARiiON are fulfilled, the eG agent will extract useful statistics from the storage system and report it to the eG manager.

Using these metrics, the following critical performance queries can be answered:

Are there any faulty components on the storage system? If so, which components are these?
Does the storage system support any invalid CRUs (Customer Replaceable Units)?
Are any RAID groups invalid?
Do all RAID groups have sufficient disk space? Is any RAID group experiencing a space crunch?
Is any RAID group being defragmented or expanded?
Is the defragmentation / expansion priority 'High' for any RAID group?
Is I/O load balanced across all LUNS?
Is any LUN being rebuilt?
Is any LUN being bound? If so, what is the status of the binding process?
Is there sufficient space in the disks?
Are the disks processing requests quickly?
Is any disk experiencing too many read/write retries?
Is load uniformly distributed across disks?
Is any disk in the disabled state?
Is any disk running out of space currently?
Are any disks experiencing too many hard read/write or soft read/write errors?
Are there any error-prone LUNs?
Are the read/write caches of Storage Processors A and B enabled?
Are the read/write caches of Storage Processors A and B correctly sized? Have adequate memory pages not been allotted to any cache? If so, which cache is it (read/write), and which storage processor is that cache associated with?
Is any cache been under-utilized?
Is any storage port link down?
Is any storage processor in a faulty state now?
Is any storage processor overloaded?
Is any HBA port not plugged into the fibre channel?
Which HBA ports are not trusted?
Which HBA ports are not defined?