Azure Virtual Machine Scale Set Test

Azure virtual machine scale sets let you create and manage a group of load balanced VMs. It can automatically increase or decrease the number of VM instances that run your application. This automated and elastic behavior reduces the management overhead to monitor and optimize the performance of your application.

You create rules that define the acceptable performance for a positive customer experience. When those defined thresholds are met, autoscale rules take action to adjust the capacity of your scale set. You can also schedule events to automatically increase or decrease the capacity of your scale set at fixed times. The number of VM instances can automatically increase or decrease in response to demand or a defined schedule.

You can create autoscale rules using built-in host metrics available from your VM instances. Host metrics give you visibility into the performance of the VM instances in a scale set without the need to install or configure additional agents and data collections. Autoscale rules that use these metrics can scale out or in the number of VM instances in response to CPU usage, memory demand, disk access, and network throughput.

To make sure that such rules are always effective, administrators should closely monitor the host metrics used as the basis for the auto-scaling rules to understand if usage patterns have changed. If so, then they should reconfigure the rules, so that they reflect these dynamics. If this is not done, then virtual machine scale sets will not automatically scale up/down when the resource demands of applications increase/decrease. Besides falsifying the scalability claims of virtual machine scale sets, this outcome can also significantly degrade application performance. To avoid this, administrators should periodically run the Azure Virtual Machine Scale Set test.

For each virtual machine scale set, this test reports the CPU, memory, network, and disk I/O resources used by that scale set. In the process, the test points to scale sets that are probably running resource-intensive applications. You can compare these usage metrics with the usage thresholds set as scaling rules for each scale set, to verify if the rules align with usage. You can also check if threshold violations trigger automatic scaling operations. This way, you can assess the effectiveness of the rules, identify scale sets with ineffective rules, and initiate efforts to reset them.

Target of the Test: A Microsoft Azure Subscription

Agent deploying the test: A remote agent

Output of the test: One set of results for each Azure virtual machine scale set configured for every resource group in the target Azure Subscription

Configurable parameters for the test
Parameters Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Subscription ID

Specify the GUID which uniquely identifies the Microsoft Azure Subscription to be monitored. To know the ID that maps to the target subscription, do the following:

  1. Login to the Microsoft Azure Portal.

  2. When the portal opens, click on the Subscriptions option (as indicated by Figure 1).

    Figure 1 : Clicking on the Subscriptions option

  3. Figure 2 that appears next will list all the subscriptions that have been configured for the target Azure AD tenant. Locate the subscription that is being monitored in the list, and check the value displayed for that subscription in the Subscription ID column.

    Figure 2 : Determining the Subscription ID

  4. Copy the Subscription ID in Figure 2 to the text box corresponding to the SUBSCRIPTION ID parameter in the test configuration page.

Tenant ID

Specify the Directory ID of the Azure AD tenant to which the target subscription belongs. To know how to determine the Directory ID, refer to Configuring the eG Agent to Monitor the Microsoft Azure App Service

Client ID and Client Password

The eG agent communicates with the target Microsoft Azure Subscrption using Java API calls. To collect the required metrics, the eG agent requires an Access token in the form of an Application ID and the client secret value. To know how to determine the Application ID and the key, refer to Configuring the eG Agent to Monitor the Microsoft Azure App Service. Specify the Application ID of the created Application in the Client ID text box and the client secret value in the Client Password text box.

Proxy Host

In some environments, all communication with the Azure cloud be routed through a proxy server. In such environments, you should make sure that the eG agent connects to the cloud via the proxy server and collects metrics. To enable metrics collection via a proxy, specify the IP address of the proxy server and the port at which the server listens against the Proxy Host and Proxy Port parameters. By default, these parameters are set to none, indicating that the eG agent is not configured to communicate via a proxy, by default.

Proxy Username, Proxy Password and Confirm Password

If the proxy server requires authentication, then, specify a valid proxy user name and password in the Proxy Username and Proxy Password parameters, respectively. Then, confirm the password by retyping it in the Confirm Password text box.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measures made by the test:
Measurement Description Measurement Unit Interpretation

Provisioning state

Indicates the current provisioning status of this scale set.

 

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value Numeric Value
Succeeded 1
Updating 2
Error 3
Unknown 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current provisioning status of a scale set. In the graph of this measure however, the same is represented using the numeric equivalents only.

Use the detailed diagnosis of this measure to know the location of the scale set, the disk size type, and capacity of the scale set.

Is auto scaling enabled?

Indicates whether/not auto-scaling is enabled for this scale set.

 

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value Numeric Value
Yes 1
No 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not auto-scaling is enabled for a virtual machine scale set. In the graph of this measure however, the same is represented using the numeric equivalents only.

Use the detailed diagnosis of this measure to know the profile settings of the virtual machine scale set. A profile is where the minimum, maximum, and default number of VM instances are defined. When your autoscale rules are applied, these instance limits make sure that you do not scale out beyond the maximum number of instances, or scale in beyond the minimum of instances.

Total instances

Indicates the number of VM instances in this scale set.

Number

Use the detailed diagnosis of this measure to know which VMs are in a scale set.

Total profiles

Indicates the total number of profiles defined for this scale set.

Number

A profile is where the minimum, maximum, and default number of VM instances are defined. When your autoscale rules are applied, these instance limits make sure that you do not scale out beyond the maximum number of instances, or scale in beyond the minimum of instances.

To know which profiles are defined for any monitored scale set, and which scaling rules are mapped to each profile, use the detailed diagnosis of this measure.

CPU utilization

Indicates the average percentage of allocated compute units that are currently in use by the VM instances in this virtual machine scale set.

Percent

If the value of this measure is consistently close to or equal to 100%, it implies that the VM instances in the scale set are over-utilizing the allocated compute units. In such a situation, the scale set should automatically scale out the number of VM instances, so that more computing resources are at the disposal of the application. To make sure that this action is triggered at the right time, you need to continuously track the variations in the value of this measure, understand the demand for CPU resources, and accordingly set the threshold for the Percentage CPU host-level metric in the scaling rule.

Incoming network traffic

Indicates the total number of bytes received on all network interfaces for this scale set.

MB

To make sure that automatic scaling occurs when network traffic is indeed high, you need to closely study the changes to these measures over time, understand how traffic normally is, and accordingly configure scaling rules using the Network In and Network Out host-level metrics.

Outgoing network traffic

Indicates the total number of bytes out on all network interfaces for this scale set.

MB

Data reads from disk

Indicates the total bytes read from the disks of all VMs in this scale set during the last measurement period.

MB

To make sure that automatic scaling alerts and reacts only to an unusual rise in read/write activity on disks, you need to closely study the changes to these measures over time, understand the normal disk activity levels, and accordingly configure scaling rules using the Disk Read Bytes and Disk Write Bytes host-level metrics.

Data writes to disk

Indicates the total written to the disks of all VMs in this scale set during the last measurement period.

MB

Disk read operations

Indicates the average number of disk read operations performed per second by all VM instances in this scale set.

Operations/Sec

To make sure that automatic scaling occurs only when the IOPS on VM instances are unusually high, you need to closely study the changes to these measures over time, understand what the normal disk operational level is, and accordingly configure scaling rules using the Disk Read Operations/Sec and Disk Write Operations/Sec host-level metrics.

Disk write operations

Indicates the average number of disk write operations performed per second by all VM instances in this scale set.

Operations/Sec

CPU credits remaining

Indicates the number of CPU credits that are yet to be used by the VMs in this scale set.

Number

VMs accumulate CPU credits when their CPU consumption is less than their base performance level. Credits are spent whenever VMs utilize CPU more than their base performance level. If the value of this measure is very low, it implies that VMs in the scale set have been consistently using up more processing power than their baseline. This in turn is indicative of a high demand for CPU resources.

To make sure that automatic scaling smartly detects and responds to such abnormal load conditions, you need to closely study the changes to these measures over time, understand what is normal CPU consumption, and accordingly configure scaling rules using the CPU Credits Remaining and CPU Credits Consumed host-level metrics.

CPU credits consumed

Indicates the count of CPU credits consumed by the VMs in this scale set.

Number

Data disk read data

Indicates the amount of data read from data disks of the VMs in this scale set during the last measurement period.

MB

To expand your available storage, Azure virtual machine scale sets support VM instances with attached data disks. Typically, data disks are added if you need to install applications and store data. Data disks should be used in any situation where durable and responsive data storage is desired.

Where data disks are used, you may want to configure automatic scaling to occur when read/write activity levels on data disks are really abnormal. In this case, you may want to closely study the changes to the value of these measures, understand the norms, and accordingly define scaling rules using the Data Disk Read Bytes/sec and Data Disk Write Bytes/sec host-level metrics.

 

Data disk write data

Indicates the amount of data written to data disks of the VMs in this scale set during the last measurement period.

MB

Data disk read operations

Indicates the average number of disk read operations performed per second on all data disks in this scale set.

Operations/Sec

To make sure that automatic scaling occurs only when the IOPS on data disks are unusually high, you need to closely study the changes to these measures over time, understand the norms of usage, and accordingly configure scaling rules using the Data Disk Read Operations/Sec and Data Disk Write Operations/Sec host-level metrics.

 

Data disk write operations

Indicates the average number of disk write operations performed on all data disks in this scale set.

Operations/Sec

Data disk queue depth

Indicates the number of enqueued I/O requests for data disks.

Number

A high value for this measure could be a sign of an I/O processing overload. Make sure that your autoscaling rules capture such scenarios and automatically scale out VM instances to handle the additional load.

OS disk read data

Indicates the amount of data read from OS disks of the VMs in this scale set during the last measurement period.

MB

When a scale set is created or scaled, an OS disk is automatically attached to each VM. Operating system disks can be sized up to 2 TB, and hosts the VM instance's operating system. The OS disk is labeled /dev/sda by default.

To enable autoscaling rules to detect and appropriately respond to abnormal read/write activity on OS disks, you may want to:

  • Closely study the changes to the value of these measures;

  • Understand the norms of disk usage, and;

  • Accordingly define scaling rules using the OS Disk Read Bytes/sec and OS Disk Write Bytes/sec host-level metrics.

OS disk write data

Indicates the amount of data written to OS disks of the VMs in this scale set during the last measurement period.

MB

OS disk read operations

Indicates the average number of disk read operations performed per second on all OS disks in this scale set.

Operations/Sec

To make sure that automatic scaling occurs when the IOPS on OS disks are unusually high, you need to closely study the changes to these measures over time, understand the norms of usage, and accordingly configure scaling rules using the OS Disk Read Operations/Sec and OS Disk Write Operations/Sec host-level metrics.

OS disk write operations

Indicates the average number of disk write operations performed on all OS disks in this scale set.

Operations/Sec

OS disk queue depth

Indicates the number of enqueued I/O requests for OS disks.

Number

A high value for this measure could be a sign of an I/O processing overload. Make sure that your autoscaling rules capture such scenarios and automatically scale out VM instances to handle the additional load.

The detailed diagnosis of the Provisioning state measure reveals the location of the scale set, the disk size type, and capacity of the scale set.

Figure 3 : The detailed diagnosis of the Provisioning state measure reported by the Azure Virtual Machine Scale Set test

Use the detailed diagnosis of the Total instances measure to know which VMs are in a scale set, and the current status of each VM - i.e., whether they are running or not.

Figure 4 : The detailed diagnosis of the Total instances measure