Google Cloud BigQuery Test

BigQuery is a fully managed enterprise data warehouse that helps you manage and analyze your data with built-in features like machine learning, geospatial analysis, and business intelligence. BigQuery's serverless architecture lets you use SQL queries to answer your organization's biggest questions with zero infrastructure management. BigQuery's scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes.

To group and manage resources within BigQuery service, Google Cloud offers fundamental organizational units called projects. Each project is a unique container for various components and resources related to your data and analytics workloads. Each project is associated with a specific Google Cloud Platform (GCP) billing account or organization. Resources (slots) and operations within each project are billed according to the usage by the linked billing account. BigQuery projects are where SQL queries are executed to analyze and process data stored in tables. In addition, various jobs in BigQuery, such as query execution, data ingestion, and exports are preformed within each project.

To ensure a satisfactory user experience with a BigQuery project, administrators should first measure how quickly the project processes queries and how well the slots are used within the project. This will reveal query processing bottlenecks and overutilization/underutilization of the slots, if any. Since poor slot usage and unavailability of slots are common causes for slow execution of queries, administrators should then proceed to monitor slot usage within the project. In the process, if the administrators find that the SQL queries are executed very slowly or are not serviced as expected, then they should investigate the reason for the same and fix it, so that queries execute faster. In addition, the administrators should also ascertain how much amount of data is scanned, uploaded or ingested while processing the SQL queries. To monitor all the aforesaid performance aspects and capture anomalies (if any) on-the-fly, the administrators can use the Google Cloud BigQuery test.

This test monitors the BigQuery project and reports the total number of datasets and tables in the datasets in the project. If query processing is slow, then the administrators can use the slot usage metrics reported by the test to determine if poor slot usage is the reason for query performance to suffer. In the process, this test also reports average time taken by the queries that executed successfully which helps the administrators to find out whether the queries are executed within the expected time duration or not.

Note:

This test will report metrics only if the BigQuery API is enabled in the target Google Cloud project. If you want to know how to turn on a service API in the Google Cloud project, refer Enabling Service APIs.

Target of the test : Google Cloud

Agent deploying the test : A remote agent

Outputs of the test : One set of results for the Google Cloud project being monitored

Configurable parameters for the test
Parameters Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Private Keyfile Name

To connect to the Google Cloud Project in which the services are running, the eG agent requires a private key of a service account with Compute Viewer, Monitoring Viewer, and Cloud Asset Viewer roles in the target project. If a service account pre-exists in the project, then you can download the private key as a JSON file. Save this JSON file in the <eG_Install_Dir>/agent/lib folder and provide the name of that file against this parameter. However, if no such service account pre-exists, you will have to create one for monitoring the project. To know how to create a service account and download its private key, refer to How does eG Enterprise Monitor Google Cloud?.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Total datasets

Indicates the total number of datasets available in the BigQuery project.

Number

Datasets are top-level containers that are used to organize and control access to your tables and views. Since a table or view must belong to a dataset, you should create at least one dataset before loading data into BigQuery.

Use the detailed diagnosis of this measure to know the names and type of datasets available in the project, GEO locations and Entity tags of the datasets, number of days within which the tables created for the datasets will expire, time stamp at which the datasets were created and modified.

Total tables in datasets

Indicates the total number of tables created in the datasets of the BigQuery project.

 

A BigQuery table contains individual records organized in rows. Each record is composed of columns (also called fields).

Every table is defined by a schema that describes the column names, data types, and other information. You can specify the schema of a table when it is created, or you can create a table without a schema and declare the schema in the query job or load job that first populates it with data.

Total flight jobs

Indicates the total number of jobs that are currently running in the project.

Number

Jobs are actions that BigQuery runs on your behalf to load data, export data, query data, or copy data.

Total flight queries

Indicates the total number of queries that are currently executed in the project.

Number

 

Query execution count

Indicates the total number of queries that are executed in the project.

Number

 

Query execution times

Indicates the average time taken by the queries that executed successfully.

Seconds

 

Scanned bytes

Indicates the number of bytes scanned during query execution.

Bytes

Scanned bytes are the number of bytes that are read or processed during query execution. BigQuery stores data in a highly compressed columnar format called Capacitor, and when you run a query, it needs to scan and read the relevant columns to process the query and return results. This measure is a good indicator of the volume of data read during this process.

Ideally, the value of this measure should be low. A high value for this measure can cause serious impact on the cost and performance of BigQuery queries. You are billed not only for the storage of your data but also for the amount of data you scan when you run queries. Therefore, minimizing the number of bytes scanned during the query execution is important for cost optimization.

Scanned bytes billed

Indicates the number of scanned bytes that are billed for query execution.

Number

A low value is preferred for this measure.

Statement scanned bytes

Indicates the number of bytes scanned during statement execution.

Bytes

Ideally, the value of this measure should be low.

Statement scanned bytes billed

Indicates the number of scanned bytes that are billed for statement execution.

Number

 

Total slots allocated

Indicates the total number of slots allocated for the project.

Number

A BigQuery slot is a virtual CPU used by BigQuery to execute SQL queries. During query execution, BigQuery automatically calculates how many slots a query requires, depending on the query size and complexity.

The slots play a crucial role in managing and optimizing query performance and concurrency.

Slots used by project

Indicates the number of slots that are being used by the project to process the queries.

Number

A low value is desired for this measure.

Slots used by project and job type

Indicates the number of slots used by the project to process different types of query jobs such as interactive, batch, or large queries.

Number

A low value is desired for this measure. If the value of this measure increases suddenly/gradually, it indicates that the slots in the project slot pool are depleting rapidly due to concurrent queries and upcoming queries may have to wait for slots to become available.

Slots used by project in reservation

Indicates the number of slots reserved for use in the project.

Number

Organizations reserve a fixed number of slots for future use to maintain consistent query performance and optimize costs for their query workloads. However, it is important to choose the right number and type of reserved slots based on specific usage patterns, as overcommitting or underutilizing reserved slots can impact your costs and efficiency.

Slots used across projects in reservation

Indicates the number of reserved slots used by the project.

Number

 

Total stored bytes

Indicates the number of bytes stored in the datasets and tables of the project.

Bytes

This measure is a good indicator of the volume of data that is stored and maintained within BigQuery's storage infrastructure.

Uploaded bytes

Indicates the number of bytes ingested or loaded into the project.

Bytes

A high value for this measure can impact data transfer costs and overall storage footprint of the project.

Uploaded bytes billed

Indicates the number of uploaded bytes that are billed.

Number

 

Uploaded rows

Indicates the number of rows of data that have been ingested or loaded into the project

Number

 

BI Engine query fallback count

Indicates the number of queries that were not accelerated by BI Engine and had to fall back to standard query processing during the last measurement period.

Number

 

Slots assigned

Indicates the number of slots that are assigned to the project.

Number

 

Slots capacity committed

Indicates the number of slots committed for a specific period.

Number

BigQuery offers annual or three-year commitment plans.

The minimum commitment size is 100 slots, and commitments are available in 100-slot increments, up to your slot quota. There is no limit on the number of commitments that you can create. You are charged from the moment your commitment purchase is successful.

Slots max assigned

Indicates the maximum number of slots that can be assigned to the project.

Number

 

Uploaded bytes using insertall API

Indicates the number of bytes uploaded to the project via insertall API.

Bytes

 

Uploaded rows using insertall API

Indicates the number of rows uploaded to the project via insertall API.

Number