Amazon MSK Cluster Layer

Using the tests associated with the Amazon MSK Cluster layer (see Figure 1), you can monitor the following:

  • Size of buffered/cached memory for the broker.

  • Size of free memory available for use by the broker.

  • Size of total heap memory available after garbage collection.

  • Size of memory used by the broker.

  • Remaining balance of input-output burst credits for EBS volumes in the cluster.

  • Rate at which the messages are incoming to the broker.

  • Average rate at which data sent from producers is consumed by the broker.

  • Average rate at which data is fetched and read from the broker.

  • Percentage of disk space used for application logs.

  • Number of replicas for which the broker is the leader.

  • Total amount of time taken for reading the request and sending the result for the request.

  • Size of swap memory that is available for the broker as well as size of swap memory that is used by the broker.

  • Percentage of time the request handler threads are idle.

  • State of the cluster and the target broker.

  • Number of broker nodes and number of active controllers in the target broker.

  • Number of partitions across all brokers in the cluster.

  • Percentage of disk space used for data logs.

  • Number of new connections established per second per listener and number of connections closed per second per listener.

  • Number of incoming and outgoing TCP segments.

  • Mean time that the consumer request waits in the request/response queue.

  • Mean total time that the consumers spend on fetching data from the broker.

  • CPU credit balance on the brokers and the percentage of time that the CPU spent in an idle state.

  • Percentage of CPU in kernel/user space.

  • Mean time that the follower request is processed at the leader as well as the follower request that waited in the request queue.

  • Mean total time that followers spend on fetching data from the broker.

  • Number of packets dropped due to exceeding network allocations.

  • Size of network traffic between clients (producers and consumers) and brokers.

  • Average percentage of time the network processors are idle.

  • Mean time that request/response messages spent in the queue.

  • Number of messages in the throttle queue and the average fetch throttle time.

  • Average time spent in broker network and I/O threads to process requests that are exempt from throttling.

  • Number of fetch/produce message conversions per second for the broker.

  • Number of bytes per second sent to other brokers as well as number of bytes per second received from other brokers.

  • Number of read and write operation requests waiting to be completed.

  • Rate at which all read/write operations were performed on the volume used by the EC2 instances in the monitored region.

  • Rate at which data was read from the volume as well as rate at which data was written to the volume.

Figure 1 : The list of tests associated with the Amazon MSK Cluster layer