Azure Cosmos DB Test

Azure Cosmos DB is a fully managed NoSQL database service that assures users of better responsiveness, automatic and instant scalability, high availability, and enterprise-grade security.The service also supports multi region data distribution anywhere in the world, open source APIs and SDKs for popular languages. It also takes database administration off your hands with automatic management, updates and patching. It also handles capacity management with cost-effective serverless and automatic scaling options that respond to application needs to match capacity with demand.

To begin using Azure Cosmos DB, you should initially create an Azure Cosmos account in your Azure resource group in the required subscription, and then databases, containers, items under it.

Cosmos DB's guaranteed high availability, high throughput, low latency, and tunable consistency are some of the reasons why it is used by many mission-critical web, mobile, gaming, and IoT applications today. If this database service fails to deliver the guaranteed service levels, then not only will the performance of the dependent business-critical applications deteriorate, the user experience with such applications will also suffer. For instance, if a Cosmos DB account is the hot-bed for issues such as high service downtime, frequent errors/failures, significant read/write latencies, inadequate throughput, and/or insufficient storage capacity, then applications and users relying on that account for their data storage and retrieval requirements will be adversely impacted. To avoid this, administrators should track the status of and requests to each Azure Cosmos DB account that is configured for the Azure subscription, quickly capture problems in the availability, overall health, and operations of that account, and resolve them before the applications and users are affected. This is where the Azure Cosmos DB test helps! 

For each Azure Cosmos DB account that is configured for the target Azure subscription, this test reports the status of that account, and alerts administrators if the account's status is abnormal. Additionally, the test also tracks read/write requests to each account, measures the responsiveness of the database service to these requests, and proactively alerts administrators to potential processing bottlenecks. The database service availability is also checked periodically, and administrators instantly alerted to the unavailability of the service. Furthermore, the test also monitors the database operations performed on every account, reveals the cost of each operation, and turns administrator attention to the costliest operations in terms of resource usage. Administrator is notified if requests are throttled because the databases/containers in the account are not sized with enough provisioned throughput to process costly operations. Storage space usage of the account is also monitored, and administrators forewarned of potential space crunches. This way, the  test helps administators measure and evaluate the various service level criteria for the Azure Cosmos DB service, and determine if the performance levels promised by this database service are achieved or not.

Target of the Test: A Microsoft Azure Subscription

Agent deploying the test: A remote agent

Output of the test: One set of results for each Azure Cosmos DB account configured for every resource group in the target Azure Subscription

Configurable parameters for the test
Parameters Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Subscription ID

Specify the GUID which uniquely identifies the Microsoft Azure Subscription to be monitored. To know the ID that maps to the target subscription, do the following:

  1. Login to the Microsoft Azure Portal.

  2. When the portal opens, click on the Subscriptions option (as indicated by Figure 1).

    Figure 1 : Clicking on the Subscriptions option

  3. Figure 2 that appears next will list all the subscriptions that have been configured for the target Azure AD tenant. Locate the subscription that is being monitored in the list, and check the value displayed for that subscription in the Subscription ID column.

    Figure 2 : Determining the Subscription ID

  4. Copy the Subscription ID in Figure 2 to the text box corresponding to the SUBSCRIPTION ID parameter in the test configuration page.

Tenant ID

Specify the Directory ID of the Azure AD tenant to which the target subscription belongs. To know how to determine the Directory ID, refer to Configuring the eG Agent to Monitor the Microsoft Azure App Service

Client ID and Client Password

The eG agent communicates with the target Microsoft Azure Subscription using Java API calls. To collect the required metrics, the eG agent requires an Access token in the form of an Application ID and the client secret value. To know how to determine the Application ID and the key, refer to Configuring the eG Agent to Monitor the Microsoft Azure App Service. Specify the Application ID of the created Application in the Client ID text box and the client secret value in the Client Password text box.

Proxy Host

In some environments, all communication with the Azure cloud be routed through a proxy server. In such environments, you should make sure that the eG agent connects to the cloud via the proxy server and collects metrics. To enable metrics collection via a proxy, specify the IP address of the proxy server and the port at which the server listens against the Proxy Host and Proxy Port parameters. By default, these parameters are set to none, indicating that the eG agent is not configured to communicate via a proxy, by default.

Proxy Username, Proxy Password and Confirm Password

If the proxy server requires authentication, then, specify a valid proxy user name and password in the Proxy Username and Proxy Password parameters, respectively. Then, confirm the password by retyping it in the Confirm Password text box.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measures made by the test:
Measurement Description Measurement Unit Interpretation

Status

Indicates the current status of this Azure Cosmos DB account.

 

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value Numeric Value
Succeeded 1
Updating 2
Error 3

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the Azure Cosmos DB account. In the graph of this measure however, the same is represented using the numeric equivalents only.

Use the detailed diagnosis of this measure to know all about the Azure Cosmos DB account. The details displayed as part of detailed diagnostics include the region in which the account is located, the API used by the account for creating databases, whether/not automatic failover is enabled for the account and if so what is the failover location, the default consistency level set for the account, and more.

Total requests

Indicates the total number of HTTP requests processed by this account.

Number

 

Successful requests

Indicates the number of HTTP requests that were successfully processed by this account.

Number

A hgh value is desired for this measure.

Warning requests

Indicates the number of HTTP requests to which this account responded with warnings.

Number

A low value is desired for this measure.

Bad requests

Indicates the number of HTTP requests to which this account responded with the error code HTTP 400 Bad Request.

Number

Responses with the code HTTP 400 Bad Request are sent under the following circumstances:

  • The JSON, SQL, or JavaScript in the request body is invalid;

  • The required properties of a resource are not present or set in the body of the POST or PUT on the resource;

  • The consistency level for a GET operation is overridden by a stronger consistency from the one set for the account;

  • A request that requires an x-ms-documentdb-partitionkey does not include it.

Ideally therefore, the value of this measure should be 0.

Unauthorized requests

Indicates the number of HTTP requests to which this account responded with the error code HTTP 401 Unauthorized.

Number

Responses with the code HTTP 401 Unauthorized are sent when the Authorization header is invalid for the requested resource.

Ideally therefore, the value of this measure should be 0.

Throttled requests

Indicates the number of requests that this account throttled.

Number

Azure Cosmos DB allows you to set provisioned throughput on your databases and containers. This provisioned throughput is set using RUs - i.e., Request Units. A Request Unit is a performance currency abstracting the system resources such as CPU, IOPS, and memory that are required to perform the database operations supported by Azure Cosmos DB. In short, the cost of database operations is measured by RUs.

A request is typically throttled if the 'cost of processing that request' is more the provisioned throughput. The most common solution to this problem is to scale up the RUs for the given collection.

Internal server error

Indicates the number of internal server errors that this account encountered.

Number

Responses with code HTTP 4007 refer to internal server errors that typically occur if the the input bytes are not in the base64 format. Ideally, the value of this measure should be 0.

Service unavailable

Indicates the number of HTTP requests to which this account responded with the code HTTP 503 Service Unavailable.

Number

A response with the code HTTP 503 Service Unavailable is sent if the request could not be completed because the service was unavailable. This situation could happen due to network connectivity or service availability issues. It is safe to retry the operation. If the issue persists, contact support.

Ideally therefore, the value of this measure should be 0.

Average number of requests per second

Indicates the average number of requests this account processed per second.

Number

A consistent drop in the value of this measure could indicate a processing bottleneck.

Observed read latency

Indicates the read latency noticed in this account.

Seconds

If the value of the Average number of requests per second is consistently high, it is a clear indicator of processing latencies. To know where the latency is more pronounced - in read operations or in write operations - compare the value of these measures.

 

Observed write latency

Indicates the write latency noticed in this account.

Seconds

Total storage capacity per account

Indicates the total storage capacity of this account across all its databases and containers.

MB

 

Available storage capacity per account

Indicates the free/unused storage space in this account.

MB

A high value is desired for this measure. A very low value implies that the databases/containers in the account have almost run out of free space. This also means that storage space has been excessively utilized. If the pattern of usage continues, a serious storage space contention will occur soon. To avert this, you may want to know what type of objects - data objects or index objects - are hogging the storage space and see if any of those objects can be removed to free up space. For that, first compare the value of the Total data size per account measure with that of the Total index size per account measure.

Total data size per account

Indicates the total storage space in this account used up by data.

MB

In the event of abnormal storage space usage, compare the value of these measures to know what type of objects are hogging storage space - data objects or index objects.

 

Total index size per account

Indicates the total storage space in this account used up by indexes.

MB

Total documents per account

Indicates the number of documents in this account's storage.

Number

 

Total request units

Indicates the throughput used by this account in terms of request units.

Number

Azure Cosmos DB allows you to set provisioned throughput on your databases and containers. This provisioned throughput is set using RUs - i.e., Request Units. A Request Unit is a performance currency abstracting the system resources such as CPU, IOPS, and memory that are required to perform the database operations supported by Azure Cosmos DB. In short, the cost of database operations is measured by RUs.

If the value of this measure suddenly spikes, you may want to look up the value of the Throttled requests measure to check for requests that have been throttled. If the Throttled requests measure reports a non-zero value, it implies that the RU cosumption is higher than the provisioned throughput and has hence resulted in request throttling. To avoid this, you may want to increase the provisioned throughput to suit the actual usage, or identify the operations that are RU-intensive and see if they can be controlled. For the latter, you have to compare the value of the following measures and identify the costly operation: Request units on query operations, Request units on update operations, Request units on delete operations, Request units on insert operations, Request units on count operations, Request units on other operations

Maximum reserved units per second

Indicates the maximum number of reserved Request units (RUs) consumed by this account every second.

Number

Azure Cosmos DB reserved capacity pricing helps you enjoy cost savings of up to 65-percent and enhanced availability SLAs, while reducing the burden of capacity planning. After you buy an Azure Cosmos DB reserved capacity, the reservation discount is automatically applied to Azure Cosmos DB resources that match the attributes and quantity of the reservation. A reservation covers the throughput provisioned for Azure Cosmos DB resources.

Using the values of these measures, you can ascertain how much of the reserved capacity is actually utilized by an account. Based on what you observe, you can even decide to increase the reserved capacity, so as to avail additional cost benefits while aligning the reservation with real-time usage.

Maximum reserved unit consumed per minute

Indicates the maximum number of reserved Request units (RUs) consumed by this account every minute.

Number

Request units on query operations

Indicates the number of Request units (RU) consumed by query operations performed on databases/containers in this account.

Number

If the Total request units measure reports an unusually high value, then compare the value of these measures to identify the costly / RU-intensive operations.

 

 

 

 

 

Request units on update operations

Indicates the number of Request units (RU) consumed by update operations performed on databases/containers in this account.

Number

Request units on delete operations

Indicates the number of Request units (RU) consumed by delete operations performed on databases/containers in this account.

Number

Request units on insert operations

Indicates the number of Request units (RU) consumed by insert operations performed on databases/containers in this account.

Number

Request units on count operations

Indicates the number of Request units (RU) consumed by count operations performed on databases/containers in this account.

Number

Request units on other operations

Indicates the number of Request units (RU) consumed by all operations, other than query, update, delete, insert, and count operations, that are performed on databases/containers in this account.

Number

Query requests

Indicates the number of query requests processed by this account.

Number

 

Update requests

Indicates the number of update requests processed by this account.

Number

 

Delete requests

Indicates the number of delete requests processed by this account.

Number

 

Insert requests

Indicates the number of insert requests processed by this account.

Number

 

Count requests

Indicates the number of count requests processed by this account.

Number

 

Other requests

Indicates the number of requests, other than query / update / delete / insert / count requests, that are processed by this account.

Number

 

Failed query requests

Indicates the number of query requests that this account failed to process.

Number

Ideally, the value of this measure should be 0. A non-zero value implies that one/more requests have failed.

Failed update requests

Indicates the number of update requests that this account failed to process.

Number

Ideally, the value of this measure should be 0. A non-zero value implies that one/more requests have failed.

Failed delete requests

Indicates the number of delete requests that this account failed to process.

Number

Ideally, the value of this measure should be 0. A non-zero value implies that one/more requests have failed.

Failed insert requests

Indicates the number of insert requests that this account failed to process.

Number

Ideally, the value of this measure should be 0. A non-zero value implies that one/more requests have failed.

Failed count requests

Indicates the number of count requests that this account failed to process.

Number

Ideally, the value of this measure should be 0. A non-zero value implies that one/more requests have failed.

Failed other requests

Indicates the number of requests, other than query / update / delete / insert / count requests, that this account failed to process.

Number

Ideally, the value of this measure should be 0. A non-zero value implies that one/more requests have failed.

Service availability

Indicates whether/not this account is currently available.

Percent

If this measure reports the value 100, it means that the database service provided by this account is available. The value 0 on the other hand, denotes that the database service delivered by this account is unavailable.

Consistency level

Indicates the percentage of requests that meet with the consistency guarantee of the consistency level chosen for this account.

Percent

Distributed databases that rely on replication for high availability, low latency, or both, must make a fundamental tradeoff between the read consistency, availability, latency, and throughput - in other words, they have to compromise on one for the sake of the other. To improve read consistency with minimal impact on the other parameters, Azure Cosmos DB offers five well-defined levels of consistency, namely - Strong, Bounded Staleness, Session, Consistent Prefix, ane Eventuial.

Azure Cosmos DB guarantees that 100 percent of read requests meet the consistency guarantee for the consistency level chosen.

For instance, in the Strong level, reads are guaranteed to return the most recent committed version of an item. In bounded staleness consistency, the reads are guaranteed to honor the consistent-prefix guarantee. In session consistency, within a single client session reads are guaranteed to honor the consistent-prefix, monotonic reads, monotonic writes, read-your-writes, and write-follows-reads guarantees. Consistent prefix consistency level guarantees that reads never see out-of-order writes. n eventual consistency, there's no ordering guarantee for reads. In the absence of any further writes, the replicas eventually converge. Eventual consistency is the weakest form of consistency because a client may read the values that are older than the ones it had read before.

Use the value of this measure to determine what percentage of read requests meet with the consistency guarantee of the consistency level chosen. Ideally, the value of this measure should be 100. Lower values indicate that the consistency guarantees are not met. This is a cause for concern and hence should be investigated.

Use the detailed diagnosis of the Status measure to know all about the Azure Cosmos DB account. The details displayed as part of detailed diagnostics include the region in which the account is located, the API used by the account for creating databases, whether/not automatic failover is enabled for the account and if so what is the failover location, the default consistency level set for the account, and more.

Figure 3 : The detailed diagnosis of the Status measure reported by the Azure Cosmos DB test