Hadoop Kerberos Logins Test
Authentication is the first level of security for any system. It is all about validating the identity of a user or a process. In a simple sense, it means verifying a username and password.
Hadoop uses Kerberos for authentication and identity propagation. Kerberos is a network authentication protocol, which eliminates the need for transmission of passwords across the network and removes the potential threat of an attacker sniffing the network. It uses “tickets” to allow nodes and users to identify themselves.
If Kerberos authentication is not configured properly, then authentication will fail every time a DataNode attempts to communicate with the NameNode in the Hadoop cluster. Likewise, clients will also be unable to login to the NameNode for submitting application requests. To ensure that users/nodes are able to access Hadoop storage at all times, administrators should be intolerant to such authentication failures, and should instantly check Kerberos configuration if such failures frequently occur.
At the same time, repeated authentication failures may not always imply a Kerberos configuration issue. Sometimes, users with malicious intent can pose as a trusted identity and attempt to gain access to the data stored in Hadoop. Kerberos may be foiling such attempts by failing authentication. Administrators need to be wary of such attempts as well.
Also, a delay in authentication, no matter how short, can adversely impact user satisfaction with the Hadoop storage. For 'happy' Hadoop users, administrators should promptly detect such delays, ascertain the reason for the same, and eliminate it, before end-users complain.
The insights provided by the Hadoop Kerberos Logins helps administrators on all the above accounts! This test closely tracks login attempts to the NameNode in a Hadoop cluster and alerts administrators to consistent authentication failures. In the process, the test sheds light on improper Kerberos configuration or suspicious login activity on the storage. Additionally, the test measures the average time taken by successful and failed logins, thus pointing administrators to authentication delays that may be spoiling user experience with Hadoop storage.
Target of the test : A Hadoop cluster
Agent deploying the test : A remote agent
Outputs of the test : One set of the results for the Hadoop storage being monitored
Parameter | Description |
---|---|
Test Period |
How often should the test be executed. |
Host |
The IP address of the NameNode that processes client connections to the cluster. NameNode is the master node in the Apache Hadoop HDFS Architecture that maintains and manages the blocks present on the DataNodes (slave nodes). NameNode is a very highly available server that manages the File System Namespace and controls access to files by clients. |
Port |
The port at which the NameNode accepts client connections. NameNode is the master node in the Apache Hadoop HDFS Architecture that maintains and manages the blocks present on the DataNodes (slave nodes). NameNode is a very highly available server that manages the File System Namespace and controls access to files by clients. By default, the NameNode's client connection port is 8020. |
Name Node Web Port |
The eG agent collects metrics using Hadoop's WebHDFS REST API. While some of these API calls pull metrics from the NameNode, some others get metrics from the resource manager. NameNode is the master node in the Apache Hadoop HDFS Architecture that maintains and manages the blocks present on the DataNodes (slave nodes). NameNode is a very highly available server that manages the File System Namespace and controls access to files by clients. To run API commands on the NameNode and pull metrics, the eG agent needs access to the NameNode's web port. To determine the correct web port of the NameNode, do the following:
Configure the <Port_Number> in the specification as the Name Node Web Port. In the case of the above sample configuration, this will be 50070. |
Name Node User Name |
The eG agent collects metrics using Hadoop's WebHDFS REST API. While some of these API calls pull metrics from the NameNode, some others get metrics from the resource manager. NameNode is the master node in the Apache Hadoop HDFS Architecture that maintains and manages the blocks present on the DataNodes (slave nodes). NameNode is a very highly available server that manages the File System Namespace and controls access to files by clients. In some Hadoop configurations, a simple authentication user name may be required for running API commands and collecting metrics from the NameNode. When monitoring such Hadoop installations, specify the name of the simple authentication user here. If no such user is available/required, then do not disturb the default value none of this parameter. |
Resource Manager IP and Resource Manager Web Port |
The eG agent collects metrics using Hadoop's WebHDFS REST API. While some of these API calls pull metrics from the NameNode, some others get metrics from the resource manager. The YARN Resource Manager Service (RM) is the central controlling authority for resource management and makes resource allocation decisions. To pull metrics from the resource manager, the eG agents first needs to connect to the resource manager. For this, you need to configure this test with the IP address/host name of the resource manager and its web port. Use the Resource Manager IP and Resource Manager Web Port parameters to configure these details. To determine the IP/host name and web port of the resource manager, do the following:
Configure the <IP_Address_or_Host_Name> in the specification as the Resource Manager IP, and the <Port_Number> as the Resource Manager Web Port. In the case of the above sample configuration, this will be 8080. |
Resource Manager Username |
The eG agent collects metrics using Hadoop's WebHDFS REST API. While some of these API calls pull metrics from the NameNode, some others get metrics from the resource manager. The YARN Resource Manager Service (RM) is the central controlling authority for resource management and makes resource allocation decisions. In some Hadoop configurations, a simple authentication user name may be required for running API commands and collecting metrics from the resource manager. When monitoring such Hadoop installations, specify the name of the simple authentication user here. If no such user is available/required, then do not disturb the default value none of this parameter. |
Measurement | Description | Measurement Unit | Interpretation |
---|---|---|---|
Successful logins |
Indicates the rate of successful logins to the NameNode in the target Hadoop cluster. |
Successes/Sec |
A high value is desired for this measure. |
Average time for successful logins |
Indicates the average time taken to successfully authenticate logins to the NameNode. |
Seconds |
A high value indicates that the cluster is taking too long to authenticate logins. This can have a negative impact on user experience. You may want to check Kerberos configuration for irregularities. . |
Failed logins |
Indicates the rate of failed logins to the NameNode in the target Hadoop cluster. |
Failures/Sec |
A high value for this measure is a cause for concern, as it indicates frequent authentication failures. Clusters that use Kerberos for authentication have several possible sources of potential issues, including:
These are just some examples, but they can prevent users and services from authenticating and can interfere with the cluster's ability to run and process workloads. The first step whenever an issue emerges is to try to isolate the source of the actual issue, by answering basic questions such as these:
If all users and multiple services are affected—and if the cluster has not worked at all after integrating with Kerberos for authentication—step through all settings for the Kerberos configuration files. However, a configuration issue may not always be the reason for a spurt in authentication failures. If this measure registers an unusually high value during certain time windows, it could indicate an attempt to hack the cluster. Do what is required to protect your cluster against such attacks. |
Average time for failed logins |
Indicates the time taken for authentication to fail. |
Seconds |
A high value indicates that the cluster is waiting too long before it fails an authentication attempt. You may want to check Kerberos configuration to figure out where the bottleneck is. |