VM Snapshots Test

A disk snapshot is a copy of the virtual machine (VM) disk file recorded at a specific time, much like a Windows restore point. The snapshot preserves the original VM disk file by disabling writes to the original disk file - all new writes are made to the snapshot version of the VM. If you create more than one snapshot of your virtual machine (VM), then you’ll have multiple restore points available to revert to. When you create a snapshot, what was currently writable becomes read-only from that point on. Using in-file delta technology, new files are created that contain all changes (delta) to the original disk files.

The size of a snapshot file can never exceed the size of the original disk file. When requests are made to change a block on the original disk, it is instead changed in the delta file. If the previously changed disk block in a delta file is changed again it will not increase the size of the delta file because it simply updates the existing block in the delta file.

Though snapshot files are small in size initially, they will grow as writes are made to the VM’s disk files. If the number and size of the snapshot files grow significantly over time, they might end up eroding considerable datastore space, thereby choking VM operations. To conserve disk space, administrators need to continuously track snapshot growth per VM, identify ‘heavy-weight’ snapshots that may not be of use any longer, and purge them. The VM Snapshots test helps administrators achieve the same. While the measures reported by the test capture the snapshot file count per VM and the total size of the snapshot files of a VM, the detailed diagnosis reveals the size of each snapshot, thus enabling administrators to quickly spot those snapshot files that are too large in size.

Target of the test : An ESX server host

Agent deploying the test : An internal/remote agent

Outputs of the test : One set of results for each VM configured on the ESX host being monitored.

Configurable parameters for the test
  1. Test period - How often should the test be executed
  2. Host - The host for which the test is to be configured.
  3. port - The port at which the specified host listens. By default, this is NULL.
  4. esx user and esx password - In order to enable the test to extract the desired metrics from a target ESX server, you need to configure the test with an ESX USER and ESX PASSWORD. The user credentials to be passed here depend upon the mechanism used by the eG agent for collecting performance statistics from the ESX server and its VMs. These monitoring methodologies and their corresponding configuration requirements have been discussed hereunder:

    • Monitoring using the web services interface of the ESX server: Starting with ESX server 3.0, a VMware ESX server offers a web service interface using which the eG agent collects metrics from the ESX server. The VMware VI SDK is used by the agent to implement the web services interface. To use this interface for monitoring, this test should be configured with an ESX USER who has “Read-only” privileges to the target ESX server. By default, the root user is authorized to execute the test. However, it is preferable that you create a new user on the target ESX host and assign the “Read-only” role to him/her. The steps for achieving this have been elaborately discussed in Creating a New User with Read-Only Privileges to the ESX Server section.

      ESX servers terminate user sessions based on timeout periods. The default timeout period is 30 mins. When you stop an agent, sessions currently in use by the agent will remain open for this timeout period until ESX times out the session. If the agent is restarted within the timeout period, it will open a new set of sessions. If you want the eG agent to close already existing sessions before it opens new sessions, then you would have to configure all the tests with the credentials of an ESX user with permissions to View and stop sessions (prior to vSphere/ESX server 4.1, this was called the View and Terminate Sessions privilege). To know how to grant this permission to an ESX user, refer to section.

    • Monitoring using the vCenter in the target environment: By default, the eG agent connects to each ESX server and collects metrics from it. While this approach scales well, it requires additional configuration for each server being monitored. For example, separate user accounts may need to be created on each server for read-only access to VM details. While monitoring large virtualized installations however, the agents can be optionally configured to monitor ESX servers using the statistics already available with different vCenter installations in the environment.

    In this case therefore, the ESX USER and ESX PASSWORD that you specify should be that of an Administrator or Virtual Machine Administrator in vCenter. However, if, owing to security constraints, you prefer not to use the credentials of such users, then, you can create a special role on vCenter with ‘Read-only’ privileges.

    Refer to Assigning the ‘Read-Only’ Role to a Local/Domain User to vCenter section to know how to create a user on vCenter.

    If the ESX server for which this test is being configured had been discovered via vCenter, then the eG manager automatically populates the esx user and esx password text boxes with the vCenter user credentials using which the ESX discovery was performed.

    Like ESX servers, vCenter servers too terminate user sessions based on timeout periods. The default timeout period is 30 mins. When you stop an agent, sessions currently in use by the agent will remain open for this timeout period until vCenter times out the session. If the agent is restarted within the timeout period, it will open a new set of sessions. If you want the eG agent to close already existing sessions before it opens new sessions, then you would have to configure all the tests with the credentials of a vCenter user with permissions to View and stop sessions (prior to vCenter 4.1, this was called the View and Terminate Sessions permission). To know how to grant this permission to a user to vCenter, refer to Creating a Special Role on vCenter and Assigning the Role to a Local/Domain User section.

    When the eG agent is started/restarted, it first attempts to connect to the vCenter server and terminate all existing sessions for the user whose credentials have been provided for the tests. This is done to ensure that unnecessary sessions do not remain established in the vCenter server for the session timeout period.  Ideally, you should create a separate user account with the required credentials and use this for the test configurations. If you provide the credentials for an existing user for the test configuration, when the eG agent starts/restarts, it will close all existing sessions for this user (including sessions you may have opened using the Virtual Infrastructure client). Hence, in this case, you may notice that your VI client sessions are terminated when the eG agent starts/restarts.

  5. confirm password - Confirm the password by retyping it here.
  6. ssl - By default, the ESX server is SSL-enabled. Accordingly, the SSL flag is set to Yes by default. This indicates that the eG agent will communicate with the ESX server via HTTPS by default.

    Like the ESX sever, the vCenter is also SSL-enabled by default. If you have chosen to use the vCenter for monitoring, then you have to set the SSL flag to Yes.

  7. webport - By default, in most virtualized environments, the vSphere/ESX server and vCenter listen on port 80 (if not SSL-enabled) or on port 443 (if SSL-enabled). This implies that while monitoring an SSL-enabled vSphere/ESX server directly, the eG agent, by default, connects to port 443 of the vSphere/ESX server to pull out metrics, and while monitoring a non-SSL-enabled server, the eG agent connects to port 80. Similarly, while monitoring a vSphere/ESX server via an SSL-enabled vCenter, the eG agent connects to port 443 of vCenter to pull out the metrics, and while monitoring via a non-SSL-enabled vCenter, the eG agent connects to port 80 of vCenter. 

    Accordingly, the webport parameter is set to 80 or 443 depending upon the status of the ssl flag.  In some environments however, the default ports 80 or 443 might not apply. In such a case, against the webport parameter, you can specify the exact port at which the vSphere/ESX server or vCenter in your environment listens so that the eG agent communicates with that port.

  8. VIRTUAL CENTER - If the eG manager had discovered the target ESX server by connecting to vCenter, then the IP address of the vCenter server used for discovering this ESX server would be automatically displayed against the vIRTUAL center parameter; similarly, the esx user and esx password text boxes will be automatically populated with the vCenter user credentials, using which ESX discovery was performed.

    If this ESX server has not been discovered using vCenter, but you still want to monitor the ESX server via vCenter, then select the IP address of the vCenter host that you wish to use for monitoring the ESX server from the vIRTUAL center list. By default, this list is populated with the IP address of all vCenter hosts that were added to the eG Enterprise system at the time of discovery. Upon selection, the esx user and esx password that were pre-configured for that vCenter server will be automatically displayed against the respective text boxes.

    On the other hand, if the IP address of the vCenter server of interest to you is not available in the list, then, you can add the details of the vCenter server on-the-fly, by selecting the Other option from the vIRTUAL center list. This will invoke the add vcenter server details page. Refer to Adding the Details of a vCenter Server for Guest Discoverysection.

    On the other hand, if you want the eG agent to behave in the default manner -i.e., communicate with each ESX server for monitoring it - then set the VIRTUAL CENTER parameter to ‘none’. In this case, the ESX USER and ESX PASSWORD parameters can be configured with the credentials of a user who has at least ‘Read-only’ privileges to the target ESX server.

  9. inside view using - By default, this test communicates with every VM remotely and extracts “inside view” metrics. Therefore, by default, the inside view using flag is set to Remote connection to VM (Windows).

    Typically, to establish this remote connection with Windows VMs in particular, eG Enterprise requires that the eG agent be configured with domain administrator privileges. In high-security environments, where the IT staff might have reservations about exposing the credentials of their domain administrators, this approach to extracting “inside view” metrics might not be preferred. In such environments therefore, eG Enterprise provides administrators the option to deploy a piece of software called the eG VM Agent (Windows) on every Windows VM; this VM agent allows the eG agent to collect “inside view” metrics from the Windows VMs without domain administrator rights. Refer to Configuring the eG Agent to Collect Current Hardware Status Metrics section for more details on the eG VM Agent. To ensure that the “inside view” of Windows VMs is obtained using the eG VM Agent, set the inside view using flag to eG VM Agent (Windows).

  10. AGELIMIT - By default, 15 days is set as AGELIMIT. This implies that the test will report all those snapshots that are more than 15 days old as Old snapshots. If required, you can change the AGELIMIT.
  11. SIZELIMIT- By default, 10000 KB is set as the SIZELIMIT. This implies that the test will report all those snapshots that have a size more than 10000 KB as Large snapshots. If required, you can change the SIZELIMIT.
  12. DD FREQUENCY - Refers to the frequency with which detailed diagnosis measures are to be generated for this test. For instance, if you set to 4:1, it means that detailed measures will be generated every fourth time this test runs, and also every time the test detects a problem.

  13. DETAILED DIAGNOSIS - To make diagnosis more efficient and accurate, the eG suite embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

    The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

    • The eG manager license should allow the detailed diagnosis capability
    • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Number of snapshots:

Indicates the number of snapshot files of this VM that are currently available.

Number

A number of snapshots of a VM provides administrators with multiple restore points. On the flipside though, a high number of snapshots can also be considered a waste of valuable disk space, especially if many of the snapshots hold less critical, but heavy-weight changes/writes to the disk.  

To accurately identify those snapshots that are consuming disk space excessively, and to learn when they were created, who their parents are, and their current consistent file system state, use the detailed diagnosis of this measure. 

Total size of snapshots:

Indicates the total size of all snapshots of a VM.

MB

Snapshots typically grow in 16 MB increments to help reduce SCSI reservation conflicts. Though small in size initially (16 MB), snapshots can grow with time, but can never grow beyond the original disk file size.

If a marked increase is noticed in the value of this measure over time, it could indicate that one/more snapshots are rapidly growing in size. To know which snapshots are contributing to this phenomenon, use the detailed diagnosis of the Number of snapshots measure.

The rate of growth of a snapshot will be determined by how much disk write activity occurs on your server. Servers that have disk write intensive applications, such as SQL and Exchange, will have their snapshot files grow rapidly. On the other hand, servers with mostly static content and fewer disk writes, such as Web and application servers, will grow at a much slower rate.

Large size snapshots count:

Indicates the number of snapshots that are of a size more than the configured SIZELIMIT.

Number

Snapshots typically grow in 16 MB increments to help reduce SCSI reservation conflicts. Though small in size initially (16 MB), snapshots can grow with time, but can never grow beyond the original disk file size.

If a marked increase is noticed in the value of this measure over time, it could indicate that a number of snapshots are rapidly growing in size. To know which snapshots are growing beyond the size limit set, use the detailed diagnosis of this measure.

Aged snapshots count:

Indicates the number of snapshots that are of an age over the configured AGELIMIT.

Number

Use the detailed diagnosis of this measure to identify the old snapshots, so that you can figure out whether they deserve to be retained or not. While many snapshots provide essential restore points for VMs, many others hold less critical information. The ‘less useful’ snapshots can be eliminated to save disk space.

To accurately identify those snapshots that are consuming disk space excessively, and to learn when they were created, who their parents are, and their current consistent file system state, use the detailed diagnosis of this measure. 

Figure 1 : The detailed diagnosis of the VM Snapshots test