AWS CloudSearch Test

Amazon CloudSearch is a fully managed service in the cloud that makes it easy to set up, manage, and scale a search solution for your website or application. With Amazon CloudSearch you can search large collections of data such as web pages, document files, forum posts, or product information.

To start searching your data with Amazon CloudSearch, you simply take the following steps:

  • Create and configure a search domain
  • Configure indexing options for your data
  • Upload your data for indexing
  • Send search requests to your domain

You create an Amazon CloudSearch search domain for each collection of data that you want to make searchable. A search domain encapsulates your data and the hardware and software resources required to operate a search engine. Each search domain has one or more search instances. A search instance is a server instance that has a finite amount of RAM and CPU resources for indexing data and processing requests. The number of search instances in a domain depends on the documents in your collection and the volume and complexity of your search requests.

As the amounted of data added and the volume of traffic to a domain increases, CloudSearch automatically scales your search domain to maximize search performance. Scaling is performed by automatically adding more search instances in the domain, and by partitioning the index across these instances. If you need more capacity than the additional search instances can offer, you can explicitly increase the number of search instances or instance replicas. To be able to decide whether/not additional capacity is required, you first need to determine the extent of usage of the current capacity. For this, use the AWS CloudSearch Test!

This test automatically discovers the search domains that have been configured in a region. For each domain, the test tracks the addition of searchable documents to that domain, and reports how much index capacity these documents consume and how many index partitions have already been created to support this load. From this, administrators can quickly infer whether/not the domain is about to exhaust its current capacity. If so, then the administrators can instantly figure out if the current number of partitions can support the anticipated load on the domain. In the process, administrators can easily compute how many more partitions would be required for maximizing the throughput and minimizing latency of search queries.

Optionally, you can configure this test to report metrics across all domains configured for the AWS account that the test uses.

Target of the test: Amazon Cloud

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each domain name / client ID

First-level descriptor: AWS Region

Second-level descriptor: ClientID / DomainName depending upon the option chosen from the CloudSearch Filter Name parameter of this test

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Access Type

eG Enterprise monitors the AWS cloud using AWS API. By default, the eG agent accesses the AWS API using a valid AWS account ID, which is assigned a special role that is specifically created for monitoring purposes. Accordingly, the Access Type parameter is set to Role by default. Furthermore, to enable the eG agent to use this default access approach, you will have to configure the eG tests with a valid AWS Account ID to Monitor and the special AWS Role Name you created for monitoring purposes.

Some AWS cloud environments however, may not support the role-based approach. Instead, they may allow cloud API requests only if such requests are signed by a valid Access Key and Secret Key. When monitoring such a cloud environment therefore, you should change the Access Type to Secret. Then, you should configure the eG tests with a valid AWS Access Key and AWS Secret Key.

Note that the Secret option may not be ideal when monitoring high-security cloud environments. This is because, such environments may issue a security mandate, which would require administrators to change the Access Key and Secret Key, often. Because of the dynamicity of the key-based approach, Amazon recommends the Role-based approach for accessing the AWS API.

AWS Account ID to Monitor

This parameter appears only when the Access Type parameter is set to Role. Specify the AWS Account ID that the eG agent should use for connecting and making requests to the AWS API. To determine your AWS Account ID, follow the steps below:

  • Login to the AWS management console. with your credentials.

  • Click on your IAM user/role on the top right corner of the AWS Console. You will see a drop-down menu containing the Account ID (see Figure 1).

    Figure 1 : Identifying the AWS Account ID

AWS Role Name

This parameter appears when the Access Type parameter is set to Role. Specify the name of the role that you have specifically created on the AWS cloud for monitoring purposes. The eG agent uses this role and the configured Account ID to connect to the AWS Cloud and pull the required metrics. To know how to create such a role, refer to Creating a New Role.

AWS Access Key, AWS Secret Key, Confirm AWS Access Key, Confirm AWS Secret Key

These parameters appear only when the Access Type parameter is set to Secret.To monitor an Amazon cloud instance using the Secret approach, the eG agent has to be configured with the access key and secret key of a user with a valid AWS account. For this purpose, we recommend that you create a special user on the AWS cloud, obtain the access and secret keys of this user, and configure this test with these keys. The procedure for this has been detailed in the Obtaining an Access key and Secret key topic. Make sure you reconfirm the access and secret keys you provide here by retyping it in the corresponding Confirm text boxes.

Proxy Host and Proxy Port

In some environments, all communication with the AWS cloud and its regions could be routed through a proxy server. In such environments, you should make sure that the eG agent connects to the cloud via the proxy server and collects metrics. To enable metrics collection via a proxy, specify the IP address of the proxy server and the port at which the server listens against the Proxy Host and Proxy Port parameters. By default, these parameters are set to none , indicating that the eG agent is not configured to communicate via a proxy, by default.

Proxy User Name, Proxy Password, and Confirm Password

If the proxy server requires authentication, then, specify a valid proxy user name and password in the Proxy User Name and Proxy Password parameters, respectively. Then, confirm the password by retyping it in the Confirm Password text box. By default, these parameters are set to none, indicating that the proxy sever does not require authentication by default.

Proxy Domain and Proxy Workstation

If a Windows NTLM proxy is to be configured for use, then additionally, you will have to configure the Windows domain name and the Windows workstation name required for the same against the Proxy Domain and Proxy Workstation parameters. If the environment does not support a Windows NTLM proxy, set these parameters to none.

Exclude Region

Here, you can provide a comma-separated list of region names or patterns of region names that you do not want to monitor. For instance, to exclude regions with names that contain 'east' and 'west' from monitoring, your specification should be: *east*,*west*

CloudSearch Filter Name

By default, this parameter is set to DomainName. This means that by default, this test will report metrics for each search domain that is configured.

If required, you can override this default setting by setting the CloudSearch Filter Name to ClientID. In this case, the test will report metrics for the AWS account that is configured for this test. The measures reported for the ClientID will be aggregated across all search domains configured for that ClientID.

Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Successful search requests

By default, this measure represents the number of search queries/requests that were successfully processed by this search domain.

If the CloudSearch Filter Name is set to ClientID, then this measure will report the number of search requests that were successfully processed by all search domains configured for this AWS account.

Number

A high value is desired for this measure.

Steady drops in the value of this measure is a cause for concern, as it implies poor search performance. You may want to investigate the reasons for the same.

Searchable documents in domain's search index

By default, this measure represents the number of searchable documents in this domain's search index.

If the CloudSearch Filter Name is set to ClientID, then this measure will report the number of searchable documents across all search domains configured for this AWS account.

Number

The maximum number of documents a search domain can hold depends upon the following:

  • Document size
  • Indexing options: To index and search movie documents like this one, we configure our search domain with an index field for each document field. We can specify multiple indexing options for each field, such as the type of the field and whether the field is searchable, facet enabled, return enabled, sort enabled, and highlight enabled. These indexing options directly impact how many documents fit onto a search instance.
  • Search instance type: By default, CloudSearch makes the following instance types available:

    • search.m1.small (Small Search Instance)
    • search.m3.medium (Medium Search Instance)
    • search.m3.large (Large Search Instance)
    • search.m3.xlarge (Extra Large Search Instance)
    • search.m3.2xlarge (Double Extra Large Search Instance).

Search instance's index usage

By default, this measure represents the percentage of this domain's index capacity that has been used.

If the CloudSearch Filter Name is set to ClientID, then this measure represents the percentage of index capacity used across all search domains configured for this AWS account.

Percent

A value close to 100% indicates that the search domain is about to exhaust its index capacity of its current search instance type.

Typically, when the amount of data you add to your domain exceeds the capacity of the initial search instance type, Amazon CloudSearch scales your search domain to a larger search instance type. After a domain exceeds the capacity of the largest search instance type, Amazon CloudSearch partitions the search index across multiple search instances.

To know whether the domain has exceeded the capacity of its largest instance type, check the value of the Index partitions measure for that domain. If this measure reports a non-zero value, you can conclude that the largest instance type's capacity has been exceeded.

Index partitions

By default, this measure represents the number of partitions across which the search index of this search domain is distributed.

If the CloudSearch Filter Name is set to ClientID, then this measure represents the number of partitions across which all the search domains configured for this AWS account have distributed their search index.

Number

If this measure reports a non-zero value, it indicates that the search domain has exceeded the capacity of its largest instance type.

If you anticipate the load on the search domain to increase further, you may have to explicitly increase the number of instances that your index is partitioned across.

The maximum number of search instances that can be deployed for a domain is 50 and the maximum number of partitions is 10. To increase these limits, you will have to submit an explicit request to Amazon.