Monitoring AIX LPARs on IBM pSeries Servers

The "inside" and "outside" view statistics that the agent reports to the eG manager are then presented in the eG monitoring console vide the IBM pSeries layer model depicted by Figure 1 below.

Figure 1 : Figure 2: The layer model of the IBM pSeries server

Each layer of Figure 1 above reports a wide variety of metrics that enable administrators to find answers for the following performance queries:

  • Is the pSeries server available over the network? If so, how quickly is it responding to requests?
  • How are the LPARs on the server utilizing the physical CPU resources? Is any LPAR utilizing the physical processors excessively? If so, which one is it?
  • How many dedicated processors are used by the LPARs?
  • How many shared processors are used by the LPARs?
  • How many dedicated and shared partitions have been configured on the server? What are the names and IDs of the partitions?
  • Does the system firmware (i.e., hypervisor) have adequate memory to support LPAR operations?
  • Are the LPARs rightly sized in terms of memory? Are there any over-sized or under-sized LPARs?
  • Is load balanced across all the physical adapters supported by the Virtual I/O server? Is any adapter experiencing excessive activity? If so, which one is it? Which physical disk in the volume group is responsible for this activity?
  • Is any volume group currently inactive?
  • Are there any stale physical volumes and physical partitions?
  • Do the storage pools on the Virtual I/O server have sufficient space? Which pool is currently running out of space?
  • Is any LPAR currently powered off?
  • How many LPARs are currently not running? Which ones are they?
  • Were any LPARs migrated from or to the server recently? If so, which ones are they?
  • Which LPAR is utilizing the virtual CPU and memory resources excessively? Where does this LPAR spend the most of its entitled CPU resoures - doing user-level proessing, kernel level processing, being idle, in waiting, or making hypervisor calls?
  • Which LPAR is utilizing the allocated CPU, memory, and disk resources excessively? Which process executing on this LPAR is causing the resource drain?
  • Has any LPAR been down for too long a time?
  • Are too many TCP connections being currently established with any LPAR?
  • Is any LPAR dropping too many TCP connections?