How Does eG Enterprise Monitor Hadoop?

Hadoop can be monitored in an agent-based or an agentless manner. Agentless is the recommended approach.

For agentless monitoring of Hadoop, the eG agent should be deployed on a remote Windows host in the environment. For agent-based monitoring, make sure that the eG agent is deployed on the NameNode of the cluster being monitored. NameNode is the master node in the Apache Hadoop HDFS Architecture that maintains and manages the blocks present on the DataNodes (slave nodes). NameNode is a very highly available server that manages the File System Namespace and controls access to files by clients.

Regardless of where it is deployed (whether on a remote Windows host or on the NameNode), the eG agent collects metrics using Hadoop's WebHDFS REST API. While some of these API calls pull metrics from the NameNode, some others get metrics from the YARN ResourceManager (RM). The RM is the central controlling authority for resource management and makes resource allocation decisions.

To run API commands on the NameNode and pull metrics, the eG agent requires the following:

  • Access to the NameNode: To enable the eG agent to connect to the NameNode, you first need to manage the Hadoop component using the IP address/host name of the NameNode. Secondly, you need to configure the Name Node Web Port parameter of eG tests with the web port of the NameNode. To determine the correct web port of the NameNode, do the following:

    • Open the hdfs-default.xml file in the hadoop/conf/app directory.
    • Look for the dfs.namenode.http-address parameter in the file.
    • This parameter is configured with the IP address and base port where the DFS NameNode web user interface listens on. The format of this configuration is: <IP_Address>:<Port_Number>. Given below is a sample configuration:192.168.10.100:50070 
    • Configure the <Port_Number> in the specification as the Name Node Web Port. In the case of the above sample configuration, this will be 50070.
  • Simple authentication user name: In some Hadoop configurations, a simple authentication user name may be required for running API commands and collecting metrics from the NameNode. When monitoring such Hadoop installations, configure the Name Node User Name parameter of the eG tests with the simple authentication user name. If no such user is available/required, then do not disturb the default value none of this parameter.

To connect to the RM and run API commands on it for metrics collection, the eG agent requires the following:

  • Access to the RM: To enable the eG agent to access the RM, configure the Resource Manager IP and Resource Manager Web Port parameters of the eG tests with the IP address and web port of the RM, respectively. To determine the IP/host name and web port of the resource manager, do the following:

    • Open the yarn-site.xml file in the /opt/mapr/hadoop/hadoop-2. x.x/etc/hadoop directory.
    • Look for the yarn.resourcemanager.webapp.address parameter in the file.
    • This parameter is configured with the IP address/host name and web port of the resource manager. The format of this configuration is: <IP_Address_or_Host_Name>:<Port_Number>. Given below is a sample configuration:192.168.10.100:8080
    • Configure the <IP_Address_or_Host_Name> in the specification as the Resource Manager IP, and the <Port_Number> as the Resource Manager Web Port. In the case of the above sample configuration, this will be 8080.

  • Simple authentication user name: In some Hadoop configurations, a simple authentication user name may be required for running API commands and collecting metrics from the RM. When monitoring such Hadoop installations, configure the Resource Manager Username parameter of the eG tests with the simple authentication user name. If no such user is available/required, then do not disturb the default value none of this parameter.