Monitoring Apache Hive

eG Enterprise offers a special-purpose monitoring model for the Apache Hive to monitor the status and overall performance of the target Apache Hive.

Figure 1 depicts the layer model of an Apache Hive.

Figure 1 : Layer model for Apache Hive

Every layer in the Figure 1 is mapped to various tests to determine the critical statistics related to the performance of the target Apache Hive. Using the metrics reported by the tests, administrators can find accurate answers for the following performance queries:

  • Are the incoming connections are sufficient to handle subsequent connection requests?

  • Are there more number of open operations in the Apache Hive data warehouse?

  • Is the connection pool is over-utilized?

  • Is there any shortage of connections in the pool?

  • Does the connection pool took a long duration to grant a connection to the client?

  • Does the prepared statements are executed as and when the statements are prepared?

  • Is there any inadequate sizing of the prepared statement cache?

  • Is the disk cache has been effectively utilized by the Apache Hive data warehouse?

  • Are there more number of abandoned user sessions in the target Apache Hive data warehouse?

  • Are the Hive queries consuming too much of physical reads per execution?

  • Is the count of Hive queries that failed and awaiting compilation is high?

  • Are there any poor API calls detected while initiating the calls to the database/tables/function/objects in Apache Hive data warehouse?

  • Is the count of databases/partitions/tables in the Metastore of the target Apache Hive data warehouse is adequate?

  • Is the size of asynchronous thread pool and asynchronous operation queue are at required level?

  • Are the tasks executed by the MapReduce/Spark/Tez engines are imposing the maximum load on the Apache Hive data warehouse?

  • Are the SQL/API call operations of the target Apache Hive are completed or kept pending?

Since the Operating System, Application Processes and TCP layers have been elaborately discussed in Monitoring Unix and Windows Servers document, the tests mapped to the Network Layer have been elaborately discussed in Monitoring Cisco Router document, the tests mapped to the JVM layer have already been discussed in Monitoring Java Applications document, and the HTTP test in Hive Service layer has already been discussed in the Apache Web Server document, the sections to come will discuss the other layers in detail.