Memory – ESX Test

This test reports statistics pertaining to the machine memory of the VMware vSphere/ESXi server.

Target of the test : An ESX server host

Agent deploying the test : An internal/remote agent

Outputs of the test : One set of results for every ESX server monitored

Configurable parameters for the test
  1. Test period - How often should the test be executed
  2. Host - The host for which the test is to be configured.
  3. port - The port at which the specified host listens. By default, this is NULL.
  4. esx user and esx password - In order to enable the test to extract the desired metrics from a target ESX server, you need to configure the test with an ESX USER and ESX PASSWORD. The user credentials to be passed here depend upon the mechanism used by the eG agent for collecting performance statistics from the ESX server and its VMs. These monitoring methodologies and their corresponding configuration requirements have been discussed hereunder:

    • Monitoring using the web services interface of the ESX server: Starting with ESX server 3.0, a VMware ESX server offers a web service interface using which the eG agent collects metrics from the ESX server. The VMware VI SDK is used by the agent to implement the web services interface. To use this interface for monitoring, this test should be configured with an ESX USER who has “Read-only” privileges to the target ESX server. By default, the root user is authorized to execute the test. However, it is preferable that you create a new user on the target ESX host and assign the “Read-only” role to him/her. The steps for achieving this have been elaborately discussed in Increasing the Memory Settings of the eG Agent that Monitors ESX Servers section.

      ESX servers terminate user sessions based on timeout periods. The default timeout period is 30 mins. When you stop an agent, sessions currently in use by the agent will remain open for this timeout period until ESX times out the session. If the agent is restarted within the timeout period, it will open a new set of sessions. If you want the eG agent to close already existing sessions before it opens new sessions, then you would have to configure all the tests with the credentials of an ESX user with permissions to View and stop sessions (prior to vSphere/ESX server 4.1, this was called the View and Terminate Sessions privilege). To know how to grant this permission to an ESX user, refer to Creating a Special Role on an ESX Server and Assigning the Role to a New User to the Server section.

      Sometimes, the VMware VI SDK may cache the hardware status metrics it collects and provide the test with the cached results. This may cause the eG agent to receive obsolete hardware status information from the SDK. This is also the reason why, you may at times notice a mismatch between the hardware status reported by the eG agent and by the vSphere client. To ensure that the eG agent always reports the current hardware status, you should configure the eG agent to obtain the hardware metrics from the VMware VI SDK only after the SDK resets the cache to clear its contents, and then refreshes the cache so that the latest hardware status information is fetched into it. To enable the eG agent to make the reset and refresh SDK calls, the esx user and esx password parameters must be configured with the credentials of a vSphere user with the Change Settings privilege. For that you need to create a special role on vSphere, assign the Change Settings privilege to that role, and then map the role with a new user on vSphere. The procedure for this is detailed in Configuring the eG Agent to Collect Current Hardware Status Metrics section.

    • Monitoring using the vCenter in the target environment: By default, the eG agent connects to each ESX server and collects metrics from it. While this approach scales well, it requires additional configuration for each server being monitored. For example, separate user accounts may need to be created on each server for read-only access to VM details. While monitoring large virtualized installations however, the agents can be optionally configured to monitor ESX servers using the statistics already available with different vCenter installations in the environment.

    In this case therefore, the ESX USER and ESX PASSWORD that you specify should be that of an Administrator or Virtual Machine Administrator in vCenter. However, if, owing to security constraints, you prefer not to use the credentials of such users, then, you can create a special role on vCenter with ‘Read-only’ privileges.

    Refer to Assigning the ‘Read-Only’ Role to a Local/Domain User to vCenter to know how to create a user on vCenter.

    If the ESX server for which this test is being configured had been discovered via vCenter, then the eG manager automatically populates the esx user and esx password text boxes with the vCenter user credentials using which the ESX discovery was performed.

    Like ESX servers, vCenter servers too terminate user sessions based on timeout periods. The default timeout period is 30 mins. When you stop an agent, sessions currently in use by the agent will remain open for this timeout period until vCenter times out the session. If the agent is restarted within the timeout period, it will open a new set of sessions. If you want the eG agent to close already existing sessions before it opens new sessions, then you would have to configure all the tests with the credentials of a vCenter user with permissions to View and stop sessions (prior to vCenter 4.1, this was called the View and Terminate Sessions permission). To know how to grant this permission to a user to vCenter, refer to Creating a Special Role on vCenter and Assigning the Role to a Local/Domain User section. When the eG agent is started/restarted, it first attempts to connect to the vCenter server and terminate all existing sessions for the user whose credentials have been provided for the tests.

    This is done to ensure that unnecessary sessions do not remain established in the vCenter server for the session timeout period.  Ideally, you should create a separate user account with the required credentials and use this for the test configurations. If you provide the credentials for an existing user for the test configuration, when the eG agent starts/restarts, it will close all existing sessions for this user (including sessions you may have opened using the Virtual Infrastructure client). Hence, in this case, you may notice that your VI client sessions are terminated when the eG agent starts/restarts.

    Sometimes, the VMware VI SDK may cache the hardware status metrics it collects and provide the test with the cached results. This may cause the eG agent to receive obsolete hardware status information from the SDK. This is also the reason why, you may at times notice a mismatch between the hardware status reported by the eG agent and by the vSphere client. To ensure that the eG agent always reports the current hardware status, you should configure the eG agent to obtain the hardware metrics from the VMware VI SDK only after the SDK resets the cache to clear its contents, and then refreshes the cache so that the latest hardware status information is fetched into it. To enable the eG agent to make the reset and refresh SDK calls, the esx user and esx password parameters must be configured with the credentials of a vCenter user with the Change Settings privilege. For that you need to create a special role on vCenter, assign the Change Settings privilege to that role, and then map the role with a new user on vCenter. The procedure for this is detailed in Configuring the eG Agent to Collect Current Hardware Status Metrics section.

  5. confirm password - Confirm the password by retyping it here.
  6. ssl - By default, the ESX server is SSL-enabled. Accordingly, the SSL flag is set to Yes by default. This indicates that the eG agent will communicate with the ESX server via HTTPS by default.

    Like the ESX sever, the vCenter is also SSL-enabled by default. If you have chosen to use the vCenter for monitoring, then you have to set the SSL flag to Yes.

  7. webport - By default, in most virtualized environments, the vSphere/ESX server and vCenter listen on port 80 (if not SSL-enabled) or on port 443 (if SSL-enabled). This implies that while monitoring an SSL-enabled vSphere/ESX server directly, the eG agent, by default, connects to port 443 of the vSphere/ESX server to pull out metrics, and while monitoring a non-SSL-enabled server, the eG agent connects to port 80. Similarly, while monitoring a vSphere/ESX server via an SSL-enabled vCenter, the eG agent connects to port 443 of vCenter to pull out the metrics, and while monitoring via a non-SSL-enabled vCenter, the eG agent connects to port 80 of vCenter. 

    Accordingly, the webport parameter is set to 80 or 443 depending upon the status of the ssl flag.  In some environments however, the default ports 80 or 443 might not apply. In such a case, against the webport parameter, you can specify the exact port at which the vSphere/ESX server or vCenter in your environment listens so that the eG agent communicates with that port.

  8. VIRTUAL CENTER - If the eG manager had discovered the target ESX server by connecting to vCenter, then the IP address of the vCenter server used for discovering this ESX server would be automatically displayed against the vIRTUAL center parameter; similarly, the esx user and esx password text boxes will be automatically populated with the vCenter user credentials, using which ESX discovery was performed.

    If this ESX server has not been discovered using vCenter, but you still want to monitor the ESX server via vCenter, then select the IP address of the vCenter host that you wish to use for monitoring the ESX server from the vIRTUAL center list. By default, this list is populated with the IP address of all vCenter hosts that were added to the eG Enterprise system at the time of discovery. Upon selection, the esx user and esx password that were pre-configured for that vCenter server will be automatically displayed against the respective text boxes.

    On the other hand, if the IP address of the vCenter server of interest to you is not available in the list, then, you can add the details of the vCenter server on-the-fly, by selecting the Other option from the vIRTUAL center list. This will invoke the add vcenter server details page. Refer to Adding the Details of a vCenter Server for Guest Discovery section to know how to add a vCenter server using this page. Once the vCenter server is added, its IP address, esx user, and esx password will be displayed against the corresponding text boxes.

    On the other hand, if you want the eG agent to behave in the default manner -i.e., communicate with each ESX server for monitoring it - then set the VIRTUAL CENTER parameter to ‘none’. In this case, the ESX USER and ESX PASSWORD parameters can be configured with the credentials of a user who has at least ‘Read-only’ privileges to the target ESX server.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Total physical memory:

Indicates the total amount of machine memory.

MB

 

Used physical memory:

Indicates the amount of physical memory that is in use.

MB

Ideally, the value of this measure should be low.

Free physical memory:

Indicates the amount of machine memory that is free.

MB

A high value is typically desired for this measure.

Percent physical memory free:

Indicates the percentage of machine memory that is available for use.

Percent

A very low value for this measure indicates a shortage of memory resources. If more machine memory is not made available soon, then this could significantly degrade the performance of the guest OS’. 

Memory state:

Describes the contention for memory.

Number

This measure takes one of the following values:

  • 0 - high (lot of memory available)
  • 1 - soft
  • 2 - hard
  • 3 - low (memory is overcommitted)

The higher the number, the memory state is more constrained.

Memory granted:

Indicates the amount of memory that the VMkernel has allocated to all virtual machines running on the server.

MB

 

Memory unreserved:

Indicates the current amount of unreserved swap space.

MB

 

Shared memory:

Indicates the current amount of shared guest operating system memory.

MB

VMware ESX can share common memory pages across VMs. This includes pages from VMs running the same virtual machine OS and applications.

Memory shared common:

Indicates the amount of memory required for a single copy of shared pages in running VMs.

MB

 

Balloon memory:

Indicates the total amount of physical memory currently reclaimed by the ESX server using the vmmemctl modules.

MB

The vmmectl driver that is installed on a virtual machine, emulates an increase or decrease inmemory pressure on the guest operating system; this way, it forces the guest OS to place memory pages into its local swap file. This driver differs from the VMware swap file method as it forces the operating system to determine what memory it wishes to page. Once the memory is paged locally on the guest operating system, the free physical pages of memory may be reallocated to other guests. As the ESX hosts sees that memory demand has been reduced, it will instruct vmmemctl to “deflate” the balloon and reduce pressure on the guest OS to page memory.

The maximum amount of memory that can be reclaimed from a guest may be configured by modifying the “sched.mem.maxmemctl” advanced option.

If the memory reclaimed from a guest (i.e., the value of this measure) is very low, it indicates excessive memory usage by the guest. Under such circumstances, you might want to consider allocating more memory to the guest.

Percent balloon memory:

Indicates the percentage of balloon memory.

Percent

Current swap used:

Indicates the total amount of swap space used.

MB

This counter reflects the total amount of VMkernel swap usage on the host. In almost any scenario, this counter should be at or close to zero as VMkernel memory swapping is used as a last resort. Significant or consistent memory swapping indicates that ESX host memory is severely overcommitted and that performance degradation is imminent or actively occurring.

Zero memory:

Indicates the amount of memory that is zeroed out.

MB

The “Memory Zero” amount will fluctuate as memory is over allocated. ESX will zero out the VM’s memory to use with other VM’s.

Memory reserved capacity:

Indicates the amount of memory currently utilized to satisfy minimum memory values set for all VMs.

MB

 

Active memory:

Indicates the amount of memory that is actively used.

MB

 

Kernel memory:

Indicates the amount  of  machine  memory being used by the ESX Server VMKernel.

MB

 

Swap in rate:

Indicates the rate at which memory is swapped from disk into active memory.

Mbps

A high rate of swap ins and swap outs could be indicative of a memory contention on the host.

Swap out rate:

Indicates the rate at which memory is swapped from active memory to disk.

Mbps

Memory overhead:

Indicates the total of all overhead metrics for powered-on virtual machines, plus the overhead of running vSphere services on the host.

MB

vSphere/ESXi virtual machines can incur two kinds of memory overhead:

  • The additional time to access memory within a virtual machine.
  • The extra space needed by the ESX/ESXi host for its own code and data structures, beyond the memory allocated to each virtual machine.

vSphere/ESXi memory virtualization adds little time overhead to memory accesses. Because the processor’s paging hardware uses page tables (shadow page tables for software-based approach or nested page tables for hardware-assisted approach) directly, most memory accesses in the virtual machine can execute without address translation overhead.

The memory space overhead has two components:

  • A fixed, system-wide overhead for the VMkernel
  • Additional overhead for each virtual machine

Overhead memory includes space reserved for the virtual machine frame buffer and various virtualization data structures, such as shadow page tables. Overhead memory depends on the number of virtual CPUs and the configured memory for the guest operating system. vSphere/ESXi also provides optimizations such as memory sharing to reduce the amount of physical memory used on the underlying server. These optimizations can save more memory than is taken up by the overhead.

Is memory overcommitted?

Indicates whether memory is over-committed or not.

 

Host memory is over-committed when the total memory space allocated (memory granted) to powered-on VMs, plus host memory overhead, is greater than the amount of total physical memory available to the host. However, note that it is unwise to run a virtual machine with a working set that is larger than the host memory. If this is the case, the hypervisor has to reclaim the virtual machine’s active memory through ballooning or hypervisor swapping, which will lead to potentially serious virtual machine performance degradation.

If the host memory is overcommited, then this measure will report the value Yes. If not, then this measure will report No.

The numeric values that correspond to the measure values discussed above are listed in the table below:

Numeric Value

Measure Value

1

Yes

0

No

Note:

By default, this measure reports the values Yes or No only to indicate whether the host memory is overcommitted or not. The graph of this measure however, represents the host memory state using the numeric equivalents - 0 or 1.

Memory overcommitted:

Indicates how much percentage of memory is over-committed.

Percent

Host memory is over-committed when the total memory space allocated (memory granted) to powered-on VMs, plus host memory overhead, is greater than the amount of total physical memory available to the host.

A very high value for this measure indicates a shortage of memory resources in the host.

Usage:

Indicates the memory usage as a percentage of the total configured or available memory.

Percent

A consistent increase in this value could be indicative of a slow, but steady erosion of the host physical memory. If the trend continues, it could significantly degrade the performance of the host and the guest OS’.

Service console memory:

Indicates the amount of memory that is currently reserved for the service console.

MB

 

Machine memory saving:

Indicates the amount of memory saved due to sharing of memory.

MB

The value of this measure is the difference between the value of the Memory shared common measure and the Shared memory measure.

The amount of memory saved by memory sharing depends on workload characteristics. A workload of many nearly identical virtual machines might free up more than thirty percent of memory, while a more diverse workload might result in savings of less than five percent of memory.

Host cache used for swapping

Indicates the space used for caching swapped pages in the host cache.

MB

Datastores that are created on solid state drives (SSD) can be used to allocate space for host cache. The host reserves a certain amount of space for swapping to host cache.

The host cache is made up of files on a low-latency disk that ESXi uses as a write back cache for virtual machine swap files. The cache is shared by all virtual machines running on the host.

When there is severe memory pressure and the hypervisor needs to swap memory pages to disk it will swap to the host cache on the SSD drive instead.

If the value of this measure rises consistently, it indicates that memory pages are constantly been swapped to the host cache. This in turn is indicative of a serious memory crunch on the hypervisor. You may want to throw more memory on your hypervisor to avoid this.

Also, If the value of this measure becomes equal to the space allocated to the host cache, it means that the swapped memory pages have completely filled the host cache. Under such circumstances, these memory pages will need to be copied to the regular .vswp file. This is not a recommended practice as it will decrease performance for your VMs as these pages more than likely at some point will need to be swapped in. To avoid this therefore, you want to resize the host cache.

Memory swap out rate to host cache from active memory

Indicates the rate at which memory is being swapped from active memory to host cache.

Mbps

When there is severe memory pressure and the hypervisor needs to swap memory pages to disk it will swap out to the host cache on the SSD drive instead.

If the memory pages are swapped out to the host cache at a high rate - i.e., if the value of this measure is consistently high - check the amount of free physical memory on the host. A free memory value of 6% or less indicates that the host requires more memory resources.

Memory swap in rate from host cache into active memory

Indicates the rate at which memory is being swapped from host cache into active memory.

Mbps

Ideally, the value of this measure should be high.

Latency

Indicates the percentage of time the virtual machines on the host were blocked waiting to access swapped, compressed memory, or ballooned memory.

Percent

The higher the value of this measure, the more adverse will be the impact on VM performance.

 

 

Active write

Indicates the amount of memory actively being written to by the virtual machines on the host.

MB

 

Low free threshold

Indicates the threshold of free host physical memory below which vSphere will begin reclaiming memory from virtual machines through ballooning and swapping.

MB

If the value of the Free physical memory measure has fallen below the value of this measure, it is a clear indicator of a memory contention on the host.

Compression rate

Indicates the rate of memory compression for the virtual machines on the host.

Mbps

If there is a danger of host level swapping, then ESXi will use memory compression to reduce the number of pages that it needs to swap out.

If the value of this measure is high, it could be indicative of a memory contention on the host.

Compressed

Indicates the amount of memory compressed.

MB

Higher the value of this measure, more will be the capacity of the host.

Decompression rate

Indicates the rate of memory decompression for the virtual machines on the host.

Mbps

 

Active memory over commitment

Indicates whether amount of memory that is actively used by VMs is higher than machine memory available.

 

If the machine memory is overcommited, then this measure will report the value Yes. If not, then this measure will report No.

The numeric values that correspond to the measure values discussed above are listed in the table below:

Numeric Value

Measure Value

1

Yes

0

No

Note:

By default, this measure reports the values Yes or No only to indicate whether the machine memory is overcommitted or not. The graph of this measure however, represents the same using the numeric equivalents - 0 or 1.

Memory swapped in from disk

Indicates the sum of swapin values for all powered-on virtual machines on the host.

MB

 

Memory swapped out from disk

Indicates the sum of swapout values for all powered-on virtual machines on the host.

MB

A high value for this measure is indicative of a severe memory contention on the host.

Heap

Indicates the VMkernel virtual address space dedicated to VMkernel main heap and related data.

MB

The main consumer of VMFS heap are the pointer blocks which are used to address file blocks in very large files/VMDKs on a VMFS filesystem. Therefore, the larger your VMDKs, the more VMFS heap you can consume.

As a rule of thumb, a single ESXi host should have enough default heap space to address around 10TB of open files/VMDKs on a VMFS-5 volume.

Heap free

Indicates the free address space in the VMkernel main heap.

MB

The value of this measure varies based on number of physical devices and configuration options.

A high value is desired for this measure. If there is no free heap left, then you will be unable to perform any VM operations (power-on, power-on, VMotion) and may also be denied access to your VMs. To avoid this, it would be good practice to allocate adequate heap.

VMFS PB cache capacity miss ratio

Indicates the trailing average of the ratio of capacity misses to compulsory misses for the VMFS PB Cache.

Percent

Typically, if a block in memory is accessed for the first time, the block is brought into the cache. This is called a compulsory miss.

If a cache miss occurs because the block requested was discarded from cache owing to lack of memory space, then such a miss is called a capacity miss.

Ideally, the value of this measure should be 0. A high value is indicative of frequent capacity cache misses. To avoid this, make sure that your cache is sized adequately.

VMFS PB cache overhead

Indicates the amount of VMFS heap used by the VMFS PB Cache.

MB

 

VMFS PB cache size

Indicates the space used for holding VMFS Pointer Blocks in memory.

MB

If the value of the VMFS PB cache capacity miss ratio measure is very high, you may want to compare the value of this measure with that of the Maximum VMFS PB cache size measure to determine whether/not the cache is getting filled up rapidly. If so, then you may want to increase the maximum cache size, so that the cache can grow freely without having to evict any pointer block entries.

Maximum VMFS PB cache size

Indicates the maximum size the VMFS pointer block cache can grow to.

MB

You can configure the minimum and maximum sizes of the pointer block cache on each ESXi host. When the size of the pointer block cache approaches the configured maximum size, an eviction mechanism removes some pointer block entries from the cache.

Base the maximum size of the pointer block cache on the working size of all open virtual disk files that reside on VMFS datastores. All VMFS datastores on the host use a single pointer block cache.