AWS CloudSearch Test

Amazon CloudSearch is a fully managed service in the cloud that makes it easy to set up, manage, and scale a search solution for your website or application. With Amazon CloudSearch you can search large collections of data such as web pages, document files, forum posts, or product information.

To start searching your data with Amazon CloudSearch, you simply take the following steps:

  • Create and configure a search domain
  • Configure indexing options for your data
  • Upload your data for indexing
  • Send search requests to your domain

You create an Amazon CloudSearch search domain for each collection of data that you want to make searchable. A search domain encapsulates your data and the hardware and software resources required to operate a search engine. Each search domain has one or more search instances. A search instance is a server instance that has a finite amount of RAM and CPU resources for indexing data and processing requests. The number of search instances in a domain depends on the documents in your collection and the volume and complexity of your search requests.

As the amounted of data added and the volume of traffic to a domain increases, CloudSearch automatically scales your search domain to maximize search performance. Scaling is performed by automatically adding more search instances in the domain, and by partitioning the index across these instances. If you need more capacity than the additional search instances can offer, you can explicitly increase the number of search instances or instance replicas. To be able to decide whether/not additional capacity is required, you first need to determine the extent of usage of the current capacity. For this, use the AWS CloudSearch Test!

This test automatically discovers the search domains that have been configured in a region. For each domain, the test tracks the addition of searchable documents to that domain, and reports how much index capacity these documents consume and how many index partitions have already been created to support this load. From this, administrators can quickly infer whether/not the domain is about to exhaust its current capacity. If so, then the administrators can instantly figure out if the current number of partitions can support the anticipated load on the domain. In the process, administrators can easily compute how many more partitions would be required for maximizing the throughput and minimizing latency of search queries.

Optionally, you can configure this test to report metrics across all domains configured for the AWS account that the test uses.

Target of the test: Amazon EC2 Cloud

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each domain name / client ID

First-level descriptor: AWS Region

Second-level descriptor: ClientID / DomainName depending upon the option chosen from the CloudSearch Filter Name parameter of this test

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

AWS Access Key, AWS Secret Key, Confirm AWS Access Key, Confirm AWS Secret Key

To monitor an Amazon EC2 instance, the eG agent has to be configured with the access key and secret key of a user with a valid AWS account. For this purpose, we recommend that you create a special user on the AWS cloud, obtain the access and secret keys of this user, and configure this test with these keys. The procedure for this has been detailed in the Obtaining an Access key and Secret key topic. Make sure you reconfirm the access and secret keys you provide here by retyping it in the corresponding Confirm text boxes.

Proxy Host and Proxy Port

In some environments, all communication with the AWS EC2 cloud and its regions could be routed through a proxy server. In such environments, you should make sure that the eG agent connects to the cloud via the proxy server and collects metrics. To enable metrics collection via a proxy, specify the IP address of the proxy server and the port at which the server listens against the Proxy Host and Proxy Port parameters. By default, these parameters are set to none , indicating that the eG agent is not configured to communicate via a proxy, by default.

Proxy User Name, Proxy Password, and Confirm Password

If the proxy server requires authentication, then, specify a valid proxy user name and password in the Proxy User Name and Proxy Password parameters, respectively. Then, confirm the password by retyping it in the Confirm Password text box. By default, these parameters are set to none, indicating that the proxy sever does not require authentication by default.

Proxy Domain and Proxy Workstation

If a Windows NTLM proxy is to be configured for use, then additionally, you will have to configure the Windows domain name and the Windows workstation name required for the same against the Proxy Domain and Proxy Workstation parameters. If the environment does not support a Windows NTLM proxy, set these parameters to none.

Exclude Region

Here, you can provide a comma-separated list of region names or patterns of region names that you do not want to monitor. For instance, to exclude regions with names that contain 'east' and 'west' from monitoring, your specification should be: *east*,*west*

CloudSearch Filter Name

By default, this parameter is set to DomainName. This means that by default, this test will report metrics for each search domain that is configured.

If required, you can override this default setting by setting the CloudSearch Filter Name to ClientID. In this case, the test will report metrics for the AWS account that is configured for this test. The measures reported for the ClientID will be aggregated across all search domains configured for that ClientID.

Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Successful search requests

By default, this measure represents the number of search queries/requests that were successfully processed by this search domain.

If the CloudSearch Filter Name is set to ClientID, then this measure will report the number of search requests that were successfully processed by all search domains configured for this AWS account.

Number

A high value is desired for this measure.

Steady drops in the value of this measure is a cause for concern, as it implies poor search performance. You may want to investigate the reasons for the same.

Searchable documents in domain's search index

By default, this measure represents the number of searchable documents in this domain's search index.

If the CloudSearch Filter Name is set to ClientID, then this measure will report the number of searchable documents across all search domains configured for this AWS account.

Number

The maximum number of documents a search domain can hold depends upon the following:

  • Document size
  • Indexing options: To index and search movie documents like this one, we configure our search domain with an index field for each document field. We can specify multiple indexing options for each field, such as the type of the field and whether the field is searchable, facet enabled, return enabled, sort enabled, and highlight enabled. These indexing options directly impact how many documents fit onto a search instance.
  • Search instance type: By default, CloudSearch makes the following instance types available:

    • search.m1.small (Small Search Instance)
    • search.m3.medium (Medium Search Instance)
    • search.m3.large (Large Search Instance)
    • search.m3.xlarge (Extra Large Search Instance)
    • search.m3.2xlarge (Double Extra Large Search Instance).

Search instance's index usage

By default, this measure represents the percentage of this domain's index capacity that has been used.

If the CloudSearch Filter Name is set to ClientID, then this measure represents the percentage of index capacity used across all search domains configured for this AWS account.

Percent

A value close to 100% indicates that the search domain is about to exhaust its index capacity of its current search instance type.

Typically, when the amount of data you add to your domain exceeds the capacity of the initial search instance type, Amazon CloudSearch scales your search domain to a larger search instance type. After a domain exceeds the capacity of the largest search instance type, Amazon CloudSearch partitions the search index across multiple search instances.

To know whether the domain has exceeded the capacity of its largest instance type, check the value of the Index partitions measure for that domain. If this measure reports a non-zero value, you can conclude that the largest instance type's capacity has been exceeded.

Index partitions

By default, this measure represents the number of partitions across which the search index of this search domain is distributed.

If the CloudSearch Filter Name is set to ClientID, then this measure represents the number of partitions across which all the search domains configured for this AWS account have distributed their search index.

Number

If this measure reports a non-zero value, it indicates that the search domain has exceeded the capacity of its largest instance type.

If you anticipate the load on the search domain to increase further, you may have to explicitly increase the number of instances that your index is partitioned across.

The maximum number of search instances that can be deployed for a domain is 50 and the maximum number of partitions is 10. To increase these limits, you will have to submit an explicit request to Amazon.