Elasticsearch Cluster Health Test

An Elasticsearch cluster is a group of one or more Elasticsearch nodes that are connected together. The Elasticsearch cluster efficiently distributes the tasks, searches and indexes across all the nodes. The nodes in the Elasticsearch cluster can be assigned different jobs or responsibilities:

  • Data nodes - stores data and executes data-related operations such as search and aggregation
  • Master nodes - in charge of cluster management and configuration actions such as adding and removing nodes
  • Client nodes - forwards cluster requests to the master node and data-related requests to data nodes
  • Ingest nodes - for pre-processing documents before indexing

By default, each node is automatically assigned a unique identifier, or name, that is used for management purposes and becomes even more important in a multi-node, or clustered, environment. To add and efficiently manage a large amount of data in the cluster, Elasticsearch enables creating indexes in the cluster. An index is a collection of documents with similar characteristics, and is identified by a name. The index name is used to refer to the particular index while performing indexing, search, update, and delete operations against the documents in the cluster. The index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone. To solve this problem, Elasticsearch provides the ability to subdivide the index into multiple pieces called shards. When you create an index, you can simply define the number of shards you want. Each shard is a fully-functional and independent index that can be allocated to any node in the cluster. Furthermore, Elasticsearch allows you to create one or more copies of the shards called replica shards or replicas to provide high availability in case a primary shard/node goes offline or fails or becomes unavailable for any reason. Using the shards, administrators can horizontally split/scale content volume and distribute and parallelize operations across the nodes. If any of the shards is in the unassigned/relocating state for longer duration, the search queries to that particular shard will be queued or left unserviced permanently. If the issue is persisted, the incoming search queries will not be processed quickly as they have to be. This in turn will lead to processing bottleneck in the cluster which adversely impact the performance of the cluster. To avoid this, administrator should continuously monitor the health of the cluster at shards level. This can be easily achieved using the Elasticsearch Cluster Health test!

This test continuously monitors the cluster, and the health of the cluster at regular intervals. In addition, this test also reports the count of active shards and the number of shards in the unassigned. These revelations help administrators to track the health of the cluster continuously.

Target of the test : An Elasticsearch Cluster

Agent deploying the test : An internal/remote agent

Outputs of the test : One set of results for the target Elasticsearch cluster being monitored.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The IP address of the host for which this test is to be configured.

Port

The port number at which the target host being monitored listens. By default, this is set to 9200.

Cloud Instance URL

By default, the is set to none. If the target Elasticsearch cluster is hosted on the cloud environment, then you need to specify the URL of the cluster on cloud against this parameter. For example: 64bd966328067fd89e0c9b4c3bb8b042.us-east-1.aws.found.io. When the cloud URL is specified, the eG agent will use the cloud URL to monitor the target cluster rather than using the host specified in the Host text box.

Elastic Search User and Elastic Search Password

By default, the Elastic Search User and Elastic Search Password parameters are set to none indicating that the eG agent doesn't require authentication to collect metrics from the Elasticsearch cluster. If authentication is required to access the target Elasticsearch cluster, then specify the valid credentials against these parameters.

Confirm Password

Confirm the Elastic Search Password by retyping it in the Confirm Password text box.

SSL

By default, the SSL flag is set to No. If the Elasticsearch cluster is SSL-enabled by default or hosted on the cluster, then set this flag to Yes. This indicates that the eG agent will communicate with the target cluster via HTTPS by default.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Cluster health

Indicates the health of the cluster in terms of the current state of the shards in the cluster.

 

This measure reveals the health of the cluster at shard level. The numeric values that correspond to the measure values mentioned above are as follows:

Measure Value Numeric Value Description
Red 0 Indicates that the specific shard is not allocated to any node in the cluster.
Yellow 1 Indicates that the primary shard is allocated but replicas are not assigned to any node.
Green 2 Indicates that all shards in the cluster are allocated to the nodes.

Note:

This test typically reports the Measure Values listed in the table above to indicate the current health of the cluster. However, the graph of this measure is represented using the numeric equivalents only.

Active primary shards

Indicates the number of primary shards that are currently active on the cluster.

Number

 

Active shards

Indicates the total number of shards that are currently active on the cluster.

Number

 

Unassigned shards

Indicates the number of shards that are in the UNASSIGNED state.

Number

Ideally, zero is desired for this measure. A non-zero value indicates that one or many shards are yet to be allocated to the nodes, which may cause imbalance in the cluster and make the cluster unreliable when the nodes crash. To avoid this, administrators may have to allocate the unassigned shards to the various nodes on the cluster or delete the shards if the data in the shards is not needed anymore.

Initializing shards

Indicates the number of shards that are currently in the INITIALIZING state.

Number

 

Relocating Shards

Indicates the number of shards that are being moved from one node to another node in the cluster.

Number

Typically, administrators move the shards from one node to another node to maintain cluster's health when a new node is added to the cluster or many shards are idle or in unassinged state.

Data nodes

Indicates the number of data nodes in the cluster.

Number

 

Total nodes

Indicates the total number of nodes in the cluster.

Number