Hadoop Data Node Memory Operations Test

HDFS supports writing to off-heap memory managed by the DataNodes. The DataNodes will flush in-memory data to disk asynchronously thus removing expensive disk IO and checksum computations from the performance-sensitive IO path; hence such writes are called Lazy Persist writes.

If one/more DataNodes are using their off-heap memory poorly and are instead writing data to disk directly, the I/O overheads of the cluster will significantly increase. Likewise, if any DataNode takes too long to flush the in-memory data to disk, it could result in data loss at the time of a node restart. This is why, administrators need to continuously track the blocks written to, evicted from, and flushed into disk by every DataNode, measure the time taken by every DataNode to write in-memory data to disk, and accurately isolate the following:

  • DataNodes that are making sparse use of their off-heap memory and/or;
  • DataNodes that are lethargically writing to disk.

The Hadoop Data Node Memory Operations test helps with this!

For each DataNode in a Hadoop cluster, this test reveals how well that node uses its off-heap memory. In the process, the test accurately pinpoints those DataNodes that are not effectively utilizing their off-heap memory. Additionally, the test also evaluates how quickly (or otherwise) every DataNode flushes in-memory data to disk, thus shedding light on DataNodes that are flushing data lazily. This also enables administrators to proactively detect chinks in the process that might adversely impact data reliability and integrity during a node restart.

Target of the test : A Hadoop cluster

Agent deploying the test : A remote agent

Outputs of the test : One set of the results for each DataNode in the target Hadoop cluster

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The IP address of the NameNode that processes client connections to the cluster. NameNode is the master node in the Apache Hadoop HDFS Architecture that maintains and manages the blocks present on the DataNodes (slave nodes). NameNode is a very highly available server that manages the File System Namespace and controls access to files by clients.

Port

The port at which the NameNode accepts client connections. NameNode is the master node in the Apache Hadoop HDFS Architecture that maintains and manages the blocks present on the DataNodes (slave nodes). NameNode is a very highly available server that manages the File System Namespace and controls access to files by clients. By default, the NameNode's client connection port is 8020.

Name Node Web Port

The eG agent collects metrics using Hadoop's WebHDFS REST API. While some of these API calls pull metrics from the NameNode, some others get metrics from the resource manager. NameNode is the master node in the Apache Hadoop HDFS Architecture that maintains and manages the blocks present on the DataNodes (slave nodes). NameNode is a very highly available server that manages the File System Namespace and controls access to files by clients. To run API commands on the NameNode and pull metrics, the eG agent needs access to the NameNode's web port.

To determine the correct web port of the NameNode, do the following:

  • Open the hdfs-default.xml file in the hadoop/conf/app directory.
  • Look for the dfs.namenode.http-address parameter in the file.
  • This parameter is configured with the IP address and base port where the DFS NameNode web user interface listens on. The format of this configuration is: <IP_Address>:<Port_Number>. Given below is a sample configuration:

    192.168.10.100:50070

Configure the <Port_Number> in the specification as the Name Node Web Port. In the case of the above sample configuration, this will be 50070.

Name Node User Name

The eG agent collects metrics using Hadoop's WebHDFS REST API. While some of these API calls pull metrics from the NameNode, some others get metrics from the resource manager. NameNode is the master node in the Apache Hadoop HDFS Architecture that maintains and manages the blocks present on the DataNodes (slave nodes). NameNode is a very highly available server that manages the File System Namespace and controls access to files by clients.

In some Hadoop configurations, a simple authentication user name may be required for running API commands and collecting metrics from the NameNode. When monitoring such Hadoop installations, specify the name of the simple authentication user here. If no such user is available/required, then do not disturb the default value none of this parameter.

Resource  Manager IP and Resource Manager Web Port

The eG agent collects metrics using Hadoop's WebHDFS REST API. While some of these API calls pull metrics from the NameNode, some others get metrics from the resource manager. The YARN Resource Manager Service (RM) is the central controlling authority for resource management and makes resource allocation decisions.

To pull metrics from the resource manager, the eG agents first needs to connect to the resource manager. For this, you need to configure this test with the IP address/host name of the resource manager and its web port. Use the Resource Manager IP and Resource Manager Web Port parameters to configure these details.

To determine the IP/host name and web port of the resource manager, do the following:

  • Open the yarn-site.xml file in the /opt/mapr/hadoop/hadoop-2. x.x/etc/hadoop directory.
  • Look for the yarn.resourcemanager.webapp.address parameter in the file.
  • This parameter is configured with the IP address/host name and web port of the resource manager. The format of this configuration is: <IP_Address_or_Host_Name>:<Port_Number>. Given below is a sample configuration:

    192.168.10.100:8080

Configure the <IP_Address_or_Host_Name> in the specification as the Resource Manager IP, and the <Port_Number> as the Resource Manager Web Port. In the case of the above sample configuration, this will be 8080.

Resource Manager Username

The eG agent collects metrics using Hadoop's WebHDFS REST API. While some of these API calls pull metrics from the NameNode, some others get metrics from the resource manager. The YARN Resource Manager Service (RM) is the central controlling authority for resource management and makes resource allocation decisions.

In some Hadoop configurations, a simple authentication user name may be required for running API commands and collecting metrics from the resource manager. When monitoring such Hadoop installations, specify the name of the simple authentication user here. If no such user is available/required, then do not disturb the default value none of this parameter.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Blocks write rate

Indicates the rate at which this DataNode wrote blocks to off-heap memory.

Blocks/Sec

A high value is indicative of effective usage of the off-heap memory, and is hence desired.

Unsatisfied block rate

Indicates the rate at which blocks were written to memory by this DataNode but not satisfied (failed-over to disk).

Blocks/Sec

 

Data write rate

Indicates the rate at which data was written to off-heap memory by this DataNode.

MB/Sec

A high value is indicative of effective usage of the off-heap memory, and is hence desired.

Block read rate

Indicates the rate which blocks were read from off-heap memory by this DataNode

Blocks/Sec

If read requests are served from the off-heap memory and not the disk, it results in huge savings in terms of processing overheads. This means that ideally, the value of this measure should be high.

Block evicted rate

Indicates the rate at which blocks were evicted from the off-heap memory by this DataNode.

Blocks/Sec

Blocks need to be evicted at short intervals from the off-heap memory, so that there is room for new blocks. If more number of blocks are evicted every second, it will release more off-heap memory for the use of new entries. This means that a high value is desired for this measure.

Block evicted without read

indicates the rate at which this DataNode evicted blocks in memory without ever being read from memory.

Blocks/Sec

 

Average block in-memory time

Indicates the average time blocks spent in-memory before this DataNode evicted them.

Milliseconds

Frequently accessed blocks should spend more time in memory, whereas blocks that are seldom used should be evicted.

If blocks spend too much time in-memory on an average, then you may want to tweak the eviction policies to make sure that there is always room in memory for new blocks.

Typically, the following block priorities govern eviction:

  • Single access priority: The first time a block is loaded from HDFS, that block is given single access priority, which means that it will be part of the first group to be considered during evictions. Scanned blocks are more likely to be evicted than blocks that are used more frequently.
  • Multi access priority: If a block in the single access priority group is accessed again, that block is assigned multi access priority, which moves it to the second group considered during evictions, and is therefore less likely to be evicted.
  • In-memory access priority: If the block belongs to a column family which is configured with the in-memory configuration option, its priority is changed to in memory access priority, regardless of its access pattern. This group is the last group considered during evictions, but is not guaranteed not to be evicted. Catalog tables are configured with in-memory access priority.

Block deleted before being persisted to disk

Indicates the rate at which the blocks in the off-heap memory of this DataNode were deleted before being persisted to disk during the measure period

Blocks/Sec

A very high value for this measure is a cause for concern. This is because, blocks that are deleted before being written to the disk can cause data loss at the time of a node restart.

Data written to disk by lazy writer

Indicates the rate at which this DataNode wrote data to disk using lazy writer.

MB/Sec

A high value for these measures will reduce the likelihood of data loss during a node restart.

Block write rate to disk by lazy writer

Indicates the rate at which this DataNode wrote blocks to disk using lazy writer.

Blocks/Sec

Average block write time to disk by lazy writer

Indicates the average time this DataNode took to write data to disk using lazy writer.

Milliseconds

Ideally, the value of this measure should be low. A high value or a consistent increase in this value is a cause for concern as it implies that the DataNode is flushing writes to disk very slowly. This can result in data loss at the time of a node restart.