AWS MSK Cluster Info Test

In a AWS MSK cluster, one of the brokers serves as the controller, which is responsible for managing the states of partitions and replicas and for performing administrative tasks like reassigning partitions.

This test monitors the partitions managed by the active controller in the broker and reports the partitions that are in offline state that leads to a lag in the read/write operations.

Target of the test : AWS Managed Service Kafka

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each cluster executing in the target AWS Managed Service Kafka server.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The IP address of the AWS Managed Service Kafka Broker that is being monitored.

Port

Specify the port number at which the specified HOST listens. By default, this is NULL.

AWS Default Region

This test uses AWS CLI to interact with AWS Managed Service Kafka and pull relevant metrics. To enable the test to connect to AWS, you need to configure the test with the name of the region to which all requests for metrics should be routed, by default. Specify the name of this AWS Default Region, here.

AWS Access Key ID, AWS Secret Access Key and Confirm Password

To monitor AWS Managed Service Kafka, the eG agent has to be configured with the access key and secret key of a user with a valid AWS account. For this purpose, we recommend that you create a special user on the AWS cloud, obtain the access and secret keys of this user, and configure this test with these keys. The procedure for this has been detailed in the Obtaining an Access key and Secret key topic. Make sure you reconfirm the access and secret keys you provide here by retyping it in the corresponding Confirm Password text box.

Timeout Seconds

Specify the maximum duration (in seconds) for which the test will wait for a response from the server. The default is 10 seconds.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

State

Indicates whether/not the cluster is active.

 

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value

Numeric Value
Active 100
Updating 90
Creating 80
Maintenance 70
Healing 60
Rebooting broker 50
Deleting 40
Failed 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not the broker cluster is active.

Number of broker nodes

Indicates the number of broker nodes.

Number

 

Enhanced monitoring type

Indicates whether/not the target broker is ready for monitoring.

 

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value

Numeric Value
Default 100
Per broker 75
Per topic per broker 50
Per topic per partition 25
No monitoring 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not the target broker is ready for monitoring.

Number of active controller

Indicates the number of active controllers in the target broker.

Number

Only one controller per cluster should be active at any given time.

Number of global partition

Indicates the number of partitions across all topics in the cluster.

Number

Global Partition is updated when the Controller Event Thread gets a Topic Change, Topic Deletion, and Partition Reassignment request and is purged on Controller failover.

Since Global Partition Count does not include replicas, the sum of the Partition Count values can be higher than Global Partition Count if the replication factor for a topic is greater than 1.

Number of offline partition

Indicates the number of partitions that are offline in the cluster.

Number

Alert will be given when this value is greater than 0.

Number of global topic

Indicates the number of topics across all brokers in the cluster.

Number

 

Percentage of disk space used for data logs

Indicates the percentage of disk space used for data logs.

Percent