Hive Cache Test

Direct disk accesses are expensive operations, which may result in increasing the processing overheads and eventually, degrading the overall performance of the Apache Hive data warehouse. The primary focus of administrators therefore is to improve the disk cache usage, so that direct disk accesses are kept at a minimum. By closely monitoring the requests to the Apache Hive data warehouse and reporting the fraction of requests that have been serviced by the disk cache, this test reveals whether/not the disk cache has been effectively utilized and helps assess the impact of this usage on the processing overheads of the data warehouse. From the metrics reported by this test. administrators can also figure out if the disk cache needs any further fine-tuning.

Target of the test : Apache Hive

Agent deploying the test : An internal/remote agent

Outputs of the test : One set of results for the target Apache Hive

Configurable parameters for the test
Parameter Description

Test period

How often should the test be executed.

Host

The IP address of the target server that is being monitored.

Port

The port number through which the Apache Hive communicates. The default port is 10002.

SSL

By default, the SSL flag is set to False, indicating that the target Apache Hive is not SSL-enabled by default. To enable the test to connect to an SSL-enabled Apache Hive, set the SSL flag to True.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time the test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD Frequency.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Cache hits

Indicates the number of requests serviced by the disk cache during the last measurement period.

Number

A high value is desired for this measure.

Cache miss

Indicates the number of requests that were not serviced by the disk cache during the last measurement period.

Number

A low value is desired for this measure.

Cache hit ratio

Indicates the percentage of requests that were serviced by the disk cache.

Percent

A high ratio of hits is ideal. A very low ratio indicates that a majority of requests have been served by direct disk accesses only. This has an adverse impact on the overall health of the data warehouse.