Impala Daemon IO Manager Test
The IO Manager allocates the buffers and maintains them in their internal queues in the target Impala Daemon server. This test reports the usage of IO buffer memory for processing those queries in the Impala Daemon server and reports IO buffer memory contentions, if any.
Target of the test : Apache Impala
Agent deploying the test : An internal/remote agent
Outputs of the test : One set of results for the target Impala Daemon server.
Parameter | Description |
---|---|
Test period |
How often should the test be executed. |
Host |
The IP address of the target server that is being monitored. |
Port |
The port number through which the Apache Impala communicates. The default port is 25000. |
Impalad URL |
Each daemon server has a different endpoint URL. You need to configure the eG agent with the endpoint URL of each daemon, so that the agent can access the URL and pull metrics on the health of the target daemon servers. Specify such URL in the Impalad URL text box . To know how to configure the eG agent with the API Endpoint URLs, refer to Configuring the eG agent with the API Endpoint URLs topic in Pre-requisites for Monitoring Apache Impala. |
Impalad Authorization type |
To access daemon server, you need to configure Digest authorization. Digest authorization is a method used to verify the identity of a user requesting access to API Endpoint URL of the daemon server. Digest authorization uses a combination of a username, password, and a unique message digest (a type of encrypted code). Specify such Digest authorization value in the Impalad Authorization type text box considering the below criteria:
If multiple Impalad URLs are mentioned in the Impalad URL text box, then specify a comma-separated list of authorization type for each URL. For example, if there are two Impalad URLs, one URL without authorization and the other URL protected with authorization (username/password authorization), then specify Impalad Authorization type as: no_auth,digest_auth in the Impalad Authorization type text box. By default, Impalad Authorization type value is no_auth. |
Impalad User, Impalad Password and Confirm Password |
Once you enable web server access to the eG agent, make sure that 'password authentication' is also enabled. This is needed, so that the eG agent is able to access the built-in web server for each daemon in a safe, secure manner. Configure the credentials of such a user against Impalad User and Password text boxes. Confirm the password by retyping it in the Confirm Password text box. To know how to configure the eG agent with 'secure' access to the built-in web server, refer to Configuring the eG agent with 'secure' access to the built-in web server topic in Pre-requisites for Monitoring Apache Impala. By default, Impalad user parameter is set to none.
|
Statestored URL |
Each daemon server has a different endpoint URL. You need to configure the eG agent with the endpoint URL of each daemon, so that the agent can access the URL and pull metrics on the health of the target daemon servers. Specify such URL in the Statestored URL text box . To know how to configure the eG agent with the API Endpoint URLs, refer to Configuring the eG agent with the API Endpoint URLs topic in Pre-requisites for Monitoring Apache Impala. |
Statestored Authorization type |
To access daemon server, you need to configure Digest authorization. Digest authorization is a method used to verify the identity of a user requesting access to API Endpoint URL of the daemon server. Digest authorization uses a combination of a username, password, and a unique message digest (a type of encrypted code). Specify such Digest authorization value in the Statestored Authorization type text box considering the below criteria:
By default, Statestored Authorization type value is no_auth. |
Statestored User, Statestored Password and Confirm Password |
Once you enable web server access to the eG agent, make sure that 'password authentication' is also enabled. This is needed, so that the eG agent is able to access the built-in web server for each daemon in a safe, secure manner. Configure the credentials of such a user against Statestored User and Password text boxes. Confirm the password by retyping it in the Confirm Password text box. To know how to configure the eG agent with 'secure' access to the built-in web server, refer to Configuring the eG agent with 'secure' access to the built-in web server topic in Pre-requisites for Monitoring Apache Impala. By default, Impalad user parameter is set to none.
|
Catalogd URL |
Each daemon server has a different endpoint URL. You need to configure the eG agent with the endpoint URL of each daemon, so that the agent can access the URL and pull metrics on the health of the target daemon servers. Specify such URL in the Catalogd URL text box . To know how to configure the eG agent with the API Endpoint URLs, refer to Configuring the eG agent with the API Endpoint URLs topic in Pre-requisites for Monitoring Apache Impala. |
Catalogd Authorization type |
To access daemon server, you need to configure Digest authorization. Digest authorization is a method used to verify the identity of a user requesting access to API Endpoint URL of the daemon server. Digest authorization uses a combination of a username, password, and a unique message digest (a type of encrypted code). Specify such Digest authorization value in the Catalogd Authorization type text box considering the below criteria:
By default, Statestored Authorization type value is no_auth. |
Catalogd User, Catalogd Password and Confirm Password |
Once you enable web server access to the eG agent, make sure that 'password authentication' is also enabled. This is needed, so that the eG agent is able to access the built-in web server for each daemon in a safe, secure manner. Configure the credentials of such a user against Catalogd User and Password text boxes. Confirm the password by retyping it in the Confirm Password text box. To know how to configure the eG agent with 'secure' access to the built-in web server, refer to Configuring the eG agent with 'secure' access to the built-in web server topic in Pre-requisites for Monitoring Apache Impala. By default, Impalad user parameter is set to none.
|
Measurement | Description | Measurement Unit | Interpretation |
---|---|---|---|
Data reads |
Indicates the size of data read by the IO manager. |
KB |
|
Data writes |
Indicates the size of data written to the disk by the IO manager. |
KB |
|
Cache data reads |
Indicates the size of cache data read by the IO manager. |
KB |
|
Local data read |
Indicates the size of local data read by the IO manager. |
KB |
|
Buffers |
Indicates the number of allocated IO buffers where IO buffers are shared by all queries. |
Number |
|
Open files |
Indicates the number of files opened by the IO manager. |
Number |
|
Unused buffers |
Indicates the number of unused IO buffers shared by all queries. |
Number |
|
Short circuit data reads |
Indicates the size of short-circuit data read by the IO manager. |
KB |
In Hadoop Distributed File System (HDFS), reads normally go through the DataNode. When the client asks the DataNode to read a file, the DataNode reads that file off of the disk and sends the data to the client over a TCP socket. So-called "short-circuit" reads bypass the DataNode, allowing the client to read the file directly. |
Total disk buffers used |
Indicates the size of data used by IO buffers. |
KB |
|
Cache files handle hits |
Indicates the number of times the cache files handle was looked. |
Number |
A high value is desired for this measure. |
Cache files handle misses |
Indicates the number of times the file cache could not service requests for HDFS file information. |
Number |
Ideally, this value should be low. A high value indicates an ineffective cache – in other words, the cache may not have adequate entries to service requests. This could be owing to the small cache heap size. You may want to consider resizing the cache heap, so that the file cache is able to accommodate more entries, and thus service more number of requests. |
Total cached file handles |
Indicates the number of currently cached HDFS file handles in the IO manager. |
Number |
|
Currently open files for writing |
Indicates the number of HDFS files currently opened for writing. |
Number |
|
Current hashtable size |
Indicates the current size of all allocated hash tables. |
KB |
Hash Table is a data structure which stores data in an associative manner. In a hash table, data is stored in an array format, where each data value has its own unique index value. Access of data becomes very fast if we know the index of the desired data. |
Hedge reads |
Indicates the total number of hedge reads. |
Number |
If a read from a block is slow, the Hadoop Distributed File System (HDFS) client starts up another parallel, 'hedged' read against a different block replica. The result of whichever read returns first is used, and the outstanding read is cancelled. This feature helps in situations where a read occasionally takes a long time rather than when there is a systemic problem. Hedged reads can be enabled for HBase when the HFiles are stored in HDFS. |
Faster hedge reads |
Indicates the total number of faster hedge reads. |
Number |
|
Total scan ranges |
Indicates the total number of scan range reads. |
Number |
A high value is desired for this measure since maximum length of the scan range interacts with the number of HDFS blocks in the table to determine how many CPU cores across the cluster are involved with the processing for a query. (Each core processes one scan range.) A low value of this measure can sometimes increase parallelism if you have unused CPU capacity, but a too-small value can limit query performance because each scan range involves extra overhead. |