Flume Sink Test

Apache Flume has three main components - Source, Channel and Sink. A sink stores the data in centralized stores like HBase and HDFS. It consumes the data (events) from the channels and delivers it to the destination. The destination of the sink might be another agent or the central stores. One of the example of sink is the HDFS sink.

Given that the sink is responsible for storing the data in a centralized store for long-term storage and processing, it could lead to significant data loss if the sink malfunctions or is unavailable. That is the reason it is absolutely important to monitor the sink to fully capture its operations and highlight if there is an issue or error because of which data loss may occur. The metrics and insights from monitoring can help administrators identify and act on potential problems even before they propagate into failure.

This test monitors every Flume sink and collects key metrics like number of batches complete, channel reads fails, connections closed etc. These metrics help administrators understand the current performance of the system and alerts when it requires intervention to fix the problems.

Target of the test : Apache Flume

Agent deploying the test : An internal agent

Outputs of the test : One set of results for each sink in Apache Flume agent being monitored.

Configurable parameters for the test
Parameter Description

Test period

How often should the test be executed.

Host

The IP address of the target server that is being monitored.

Port

The port number through which the Apache Flume communicates. The default port is 8080.

FLUME JMX Remote Port

Specify the port at which the JMX listens for requests from remote hosts. Ensure that you specify the same port that you configured in theflume-env.ps1file, in JVM_OPTS variable.

JMX Username, Password and Confirm Password

These parameters appear only if the Mode is set to JMX. If JMX requires authentication only (but no security), then ensure that the user and password parameters are configured with the credentials of a user with read-write access to JMX. To know how to create this user, refer to Configuring the eG Agent to Support JMX Authentication. Confirm the password by retyping it in the Confirm Password text box.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Batch complete count

Indicates the number of batches of events received in this sink with size equal to maximum batch size in a second.

Batches/Sec

Having all batches full to the capacity is best use of Flume resources, but can only be achieved for high volume systems.

Batch empty count

Indicates the number of batches received in a second in this sink with no events in the batch.

Batches/Sec

This is not ideal as batches are processed but there is no data.

Batch under flow count

Indicates the number of batches received in a second in this sink with number of events less than maximum batch size.

Batches/Sec

This approach is optimal approach with systems with low amount of data being transferred.

Channel read fail

Indicates the number of events which this sink failed to read from the channel.

Events/Sec

Administrators should investigate if channel and sink are both in healthy state and should go through the logs to understand why the read is failing.

Connection closed count

Indicates the number of existing connections closed in a second from this sink to storage system or next hop.

Connections/Sec

If the number of connections closed is proportionate to the reduction in number of events then it is fine, otherwise there could be some issue which might be causing the connections to close and remaining connections may not be able to service the event flow.

Connection created count

Indicates the number of new connections created in a sec by this sink to storage system or next hop.

Connections/Sec

If the number of connections created is proportionate to the number of events then it is fine otherwise administrators may need to investigate.

Connection failed count

Indicates the number of connection requests from this sink to storage system, that failed.

Connection/Sec

 

Event drain attempt count

Indicates the number of events that this sink tried to write to the storage system in a second.

Events/Sec

 

Event drain success count

Indicates the number of events that this sink successfully wrote to the storage system in a second.

Events/Sec

 

Event write fail

Indicates the number of events that this sink tried to write to the storage system in a second but failed.

Events/Sec