PVS Target Devices Test

Provisioning Services provides administrators the ability to virtualize a hard disk or workload and then stream it back out to multiple devices. The workloads, which can be server or desktop, are ripped from a physical or virtual disk into Microsoft’s virtual hard disk (VHD) format and treated as a golden master image called a vDisk. This master image is then streamed over the network from a Windows server running the stream service to multiple target devices that were PXE booted. When a vDisk is in private mode, the vDisk can be edited. When a vDisk is in standard mode, it is read-only and no changes can be made to it. Instead all disk write operations are redirected to what is referred to as a write-cache file.  The intelligent device drivers are smart enough to redirect writes to the write-cache file and read newly written files from the write-cache file instead of the server when necessary. When using Citrix Provisioning Services with the vDisk in standard mode you have a write-cache drive location that holds all the writes for the operating system. If the write-cache file fills up unexpectedly, the operating system will behave the same as if the drive ran out of space without any warning - in other words, it will blue screen. To avoid this, it is imperative to continuously track the usage of the write-cache, so that you can be forewarned of a probable space crunch in the write-cache and can resize the write-cache file to accommodate subsequent writes. The PVS Target Devices test enables this analysis.

This test helps administrators keep tabs on the usage of the write-cache of every target device that is connected to the Provisioning server, and sends out proactive alerts to administrators if it finds that a write-cache file is rapidly filling up. This way, the test aids in averting operating system crashes that may occur owing to lack of space in the write-cache. Moreover, in the process of monitoring the I/O activity on the Citrix PVS, the test also promptly captures I/O transaction failures and reports the number of times each target device had to retry an I/O transaction on the PVS. This will shed light on communication issues that may exist between the target device and the PVS.

Target of the test : Citrix Provisioning server

Agent deploying the test : An internal agent

Outputs of the test : By default, the test reports one set of results for each target device mapped to every PVS server in the farm being monitored

Configurable parameters for the test
  1. TEST PERIOD – How often should the test be executed
  2. Host – The host for which the test is to be configured
  3. Port – Refers to the port used by the Citrix Provisioning server. By default, this is 54321.
  4. mcli path – This test executes commands using the Management Command Line Interface (MCLI) of the Provisioning server to collect the required metrics. To enable the test to execute the commands, the eG agent, by default, auto-discovers the full path MCLI.exe on the target Provisioning server. This is why, the mcli path is set to none by default. If, for some reason, the eG agent is unable to auto-discover the mcli path, then you will have to manually specify the path here using the following pointers:

    • Typically, in a 32-bit Windows system, the MCLI.exe will be available in the following location by default: <System_Root>\Program Files\Citrix\Provisioning Services Console
    • In a 64-bit Windows system on the other hand, the MCLI.exe will be available in the following location by default: <System_Root>\Program Files (x86)\Citrix\Provisioning Services Console
  5. domain name, domain user and domain password – To report farm-related metrics, this test should run using the credentials of a user who fulfills the following requirements:

    • Should belong to the Security group with 'Farm Administrator' access.
    • Should be assigned the Allow log on locally security privilege on the Citrix Provisioning Server host.

    The steps for assigning such privileges to a user are detailed in the Pre-requisites for monitoring the Citrix Provisioning Server topic.

    Once you assigned the aforesaid privileges to the user, then configure this test with the domain name, domain user, and domain password of the same user.

  6. local host only - By default, this flag is set to Yes. This implies that, by default, the test reports metrics for the target server that is being monitored. Setting the flag to No ensures that the test auto-discovers all the servers that are part of the PVS farm, and reports metrics for each server in the PVS farm.
  7. show active targets only – By default, this flag is set to Yes, indicating that the test will monitor only those target devices that are up and running currently, by default. To enable the test to monitor all devices, regardless of their running state, set this flag to No
  8. detailed diagnosis - To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

    The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

    • The eG manager license should allow the detailed diagnosis capability;
    • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

RAM cache usage:

 

Indicates the percentage of space in the write-cache that is currently utilized.

 

Percent

 

A high value or a consistent increase in this value is a cause for concern, as it indicates that write-cache space is being eroded. You may have to allocate more space to the write-cache to avoid a complete space drain! The optimum size of write-cache drive does depend on several factors:

  • Frequency of server reboots. The write-cache file is reset upon each server boot so the size only needs to be large enough to handle the volume between reboots.
  • Amount of free space available on the c: drive. The space that will be used for new files written to the c: drive is considered the free space available. This is a key value when determining the write-cache drive size.
  • Amount of data being saved to the c: drive. Data that is written to the c: drive during operation will get stored automatically in the write-cache drive. New files will be stored in the write-cache file and decrease the amount of available space. Replacements for existing files will also be written to the write-cache file but will not marginally affect the amount of free space. For instance, a service pack install on a standard-mode disk will result in the write-cache file holding all the updated files, with very little change in available space.
  • Size and location of the pagefile. When a local NTFS-formatted drive is found, Provisioning Services moves the Windows pagefile off of the c: drive to the first available NTFS drive, which is also the location of the write-cache file. Therefore, in the default configuration, the write-cache drive will end up holding both the write-cache file and the pagefile. To learn more about correctly sizing your pagefile, see Nick Rintalan’s blog, “The Pagefile Done Right!”.
  • Location of the write-cache file. The location of the write-cache file is also a factor in determining its size. The write-cache file can be held in any of the following destinations:
  • Cache on device hard drive: Write cache can exist as a file in NTFS format, located on the target-device’s hard drive. This write cache option frees up the Provisioning Server since it does not have to process write requests and does not have the finite limitation of RAM.
  • Cache in device RAM: Write cache can exist as a temporary file in the target device’s RAM. This provides the fastest method of disk access since memory access is always faster than disk access. This measure will report metrics only if the cache resides in the device RAM.
  • Cache in device RAM with overflow on hard disk (only available for Windows 7 and Server 2008 R2 (NT 6.1) and later): In this case, when RAM is zero, the target device write cache is only written to the local disk. When RAM is not zero, the target device write cache is written to RAM first. When RAM is full, the least recently used block of data is written to the local differencing disk to accommodate newer data on RAM. The amount of RAM specified is the non-paged kernel memory that the target device will consume.
  • Cache on server: Write cache can exist as a temporary file on a Provisioning Server. In this configuration, all writes are handled by the Provisioning Server, which can increase disk IO and network traffic.
  • Cache on server persistent: This cache option allows for the saving of changes between reboots. Using this option, after rebooting, a target device is able to retrieve changes made from previous sessions that differ from the read only vDisk image. If a vDisk is set to Cache on server persistent, each target device that accesses the vDisk automatically has a device-specific, writable disk file created. Any changes made to the vDisk image are written to that file, which is not automatically deleted upon shutdown. This saves target device specific changes that are made to the vDisk image.

Target device retries:

Indicates the number of times this target device had to retry an I/O transaction on the Citrix PVS.

Number

Retries in PVS are a mechanism to track packet drops in the streaming traffic between a Provisioning Server and a target device. When working with PVS, I/O transactions happen between the local driver on a target device machine, the network and the PVS server itself. In the case when a client fails to get a response to an I/O request, it may try to send a request again – this is called a retry.

While a certain amount (0-100) can be deemed acceptable, anything that’s above that count is a cause of concern. Because the traffic between the PVS server and the target device is based on the not-so-reliable (however optimized by Citrix) UDP protocol, it’s very important that you don’t put configurations in place that would strangle that traffic to death. So, if the value of this measure is over 100 on some or most of your targets, it is a clear indication of a problem condition that needs to be addressed immediately.

Memory cache size:

Indicates the current size of the write cache of this target device.

MB

When using Citrix Provisioning Services with the vDisk in standard mode you have a write-cache drive location that holds all the writes for the operating system. If the write-cache file is not properly sized, it may fill up unexpectedly; in this case, the operating system will behave the same as if the drive ran out of space without any warning, in other words it will blue screen.

The optimum size of write-cache drive depends on several factors:

  • Frequency of server reboots. The write-cache file is reset upon each server boot so the size only needs to be large enough to handle the volume between reboots.
  • Amount of free space available on the c: drive. The space that will be used for new files written to the c: drive is considered the free space available. This is a key value when determining the write-cache drive size.
  • Amount of data being saved to the c: drive. Data that is written to the c: drive duringDCDaAZAS operation will get stored automatically in the write-cache drive. New files will be stored in the write-cache file and decrease the amount of available space. Replacements for existing files will also be written to the write-cache file but will not marginally affect the amount of free space. For instance, a service pack install on a standard-mode disk will result in the write-cache file holding all the updated files, with very little change in available space.
  • Size and location of the pagefile. When a local NTFS-formatted drive is found, Provisioning Services moves the Windows pagefile off of the c: drive to the first available NTFS drive, which is also the location of the write-cache file. Therefore, in the default configuration, the write- cache drive will end up holding both the write-cache file and the pagefile.
  • Location of the write-cache file. The location of the write-cache file is also a factor in determining its size. The write-cache file can be held in any of the following destinations:
  • Cache on device hard drive: Write cache can exist as a file in NTFS format, located on the target-device’s hard drive. This write cache option frees up the Provisioning Server since it does not have to process write requests and does not have the finite limitation of RAM.
  • Cache in device RAM: Write cache can exist as a temporary file in the target device’s RAM. This provides the fastest method of disk access since memory access is always faster than disk access. This measure will report metrics only if the cache resides in the device RAM.
  • Cache in device RAM with overflow on hard disk (only available for Windows 7 and Server 2008 R2 (NT 6.1) and later): In this case, when RAM is zero, the target device write cache is only written to the local disk. When RAM is not zero, the target device write cache is written to RAM first. When RAM is full, the least recently used block of data is written to the local differencing disk to accommodate newer data on RAM. The amount of RAM specified is the non-paged kernel memory that the target device will consume.
  • Cache on server: Write cache can exist as a temporary file on a Provisioning Server. In this configuration, all writes are handled by the Provisioning Server, which can increase disk IO and network traffic.
  • Cache on server persistent: This cache option allows for the saving of changes between reboots. Using this option, after rebooting, a target device is able to retrieve changes made from previous sessions that differ from the read only vDisk image. If a vDisk is set to Cache on server persistent, each target device that accesses the vDisk automatically has a device-specific, writable disk file created. Any changes made to the vDisk image are written to that file, which is not automatically deleted upon shutdown. This saves target device specific changes that are made to the vDisk image.

Below are a few guidelines for right-sizing the client-side write-cache drive.

  • Write-cache drive = write-cache file + pagefile (if pagefile is stored on the write-cache drive)
  • Write-cache file size should be equal to the amount of free space left on the vDisk image. This will work in most situations, except those where servers receive large file updates immediately after booting. As a rule, your vDisk should not be getting updated while running in standard-mode.
  • Always account for the pagefile location and size. If it is configured to reside on the c: or d: drive, include it in all size calculations.
  • Set the pagefile to a predetermined size to make it easier to account for it. Letting Windows manage the pagefile size starts with 1x RAM but it could vary. Manually setting it to a known value will provide a static number to use for calculations.
  • During the pilot, use server-side write caching to get an idea of the maximum size you might see a file reach between server reboots. Obviously, the server should have a full load and should be subject to the normal production reboot cycle for this to be of value.

In most situations, the recommended write-cache drive size will be free space available on vDisk image plus the pagefile size. For instance, if you have a 30GB Windows Server 2008 R2 vDisk with 16GB used (14GB free) and are running with an 8GB pagefile, it would be good practice to use a write-cache drive of 22GB calculated as 14GB free space + 8GB for the pagefile. If space doesn’t permit, you have a few options, not all of which may be available to you.

  • If storage location for the write-cache drive supports thin-provisioning, configure thin-provisioned drives for the write-cache drive to save space;
  • Use dynamic VHDs (instead of fixed VHDs) though this approach is generally only recommended for XenDesktop workloads. If you choose this approach, you will probably need to periodically reset the size of the dynamic VHD, which can be done with a PowerShell script.
  • Reboot the servers more frequently which in turn will reduce the maximum size of the write-cache file.
  • Move the pagefile to a different drive or run without a pagefile.

Note:This test will not report the RAM cache usage and the Write cache size, if the write cache file resides in any of the following destinations:

  • Cache on device hard drive
  • Cache on server
  • Cache on server persistent
  • Cache in device RAM with overflow on hard disk

To know how write-cache on the server, server persistent, or device hard drive is used, use the PVS Write Cache test mapped to the Citrix XenApp server. For more details about this test, refer to the Monitoring Citrix XenApp Servers document.

The option to store the cache in device RAM with overflow on hard disk is available only from Citrix PVS 7. As mentioned already, in the case of this setting, when the memory cache is full, PVS uses the disk for storing the additional cache data (disk access is slow – so you want as much in RAM as possible). Now, the implication of this is monitoring the percentage of cache in memory is no longer as critical as before. Even if the cache is 100% full in memory, it is not an error condition. The error condition would now be if the memory cache is full and the disk on which the additional cache is stored becomes full. Hence, it is critical to monitor the disk space on the drive where the cache is stored.This is why, the PVS Target Devices test will not report cache-related metrics if the cache is set to be stored in device RAM with overflow on hard disk. In such a situation, if the target devices are diskless, use the Disk Space test of the Citrix PVS server to understand how the drive in which the cache is stored is being utilized and how much free space it has. On the other hand, if the target devices are configured with hard disks, then use the Disk Space – VM test of the target device to understand cache usage.