AWS API Gateway Test

Amazon API Gateway is an AWS service for creating, publishing, maintaining, monitoring, and securing REST, HTTP, and WebSocket APIs at any scale. API developers can create APIs that access AWS or other web services, as well as data stored in the AWS Cloud. As an API Gateway API developer, you can create APIs for use in your own client applications. Or you can make your APIs available to third-party app developers.

To invoke an API, a URL is used. This URL includes an API stage, which is a logical reference to a lifecycle state of your API (for example, dev, prod, beta, or v2). API stages are identified by their API ID and stage name. Each stage is a named reference to a deployment of the API and is made available for client applications to call.

If an API created using the API Gateway service throws many errors or is sluggish in responding to client requests, it will leave the corresponding application users unhappy. To improve user experience with the APIs they create, API / app developers should be able to promptly detect errors/latencies in the APIs, and resolve them before the application users notice and complain. This is now possible using the AWS API Gateway test.

By default, this test monitors each API that is created using the API Gateway service, and reports the errors and latencies observed in each API. This way, API developers can figure out if any of the APIs they have created are experiencing issues. To know which specific deployment of the API is troublesome, you can optionally configure this test to report metrics by API_ID:Stage, instead of just the API ID.

Additionally, the test provides diagnostics on API cache usage, so that API developers can figure out if the poor responsiveness is owing to improper cache configuration.

Target of the test: Amazon Cloud

Agent deploying the test: A remote agent

Output of the test: One set of results for each API created in every AWS region in the configured AWS user account.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Access Type

eG Enterprise monitors the AWS cloud using AWS API. By default, the eG agent accesses the AWS API using a valid AWS account ID, which is assigned a special role that is specifically created for monitoring purposes. Accordingly, the Access Type parameter is set to Role by default. Furthermore, to enable the eG agent to use this default access approach, you will have to configure the eG tests with a valid AWS Account ID to Monitor and the special AWS Role Name you created for monitoring purposes.

Some AWS cloud environments however, may not support the role-based approach. Instead, they may allow cloud API requests only if such requests are signed by a valid Access Key and Secret Key. When monitoring such a cloud environment therefore, you should change the Access Type to Secret. Then, you should configure the eG tests with a valid AWS Access Key and AWS Secret Key.

Note that the Secret option may not be ideal when monitoring high-security cloud environments. This is because, such environments may issue a security mandate, which would require administrators to change the Access Key and Secret Key, often. Because of the dynamicity of the key-based approach, Amazon recommends the Role-based approach for accessing the AWS API.

AWS Account ID to Monitor

This parameter appears only when the Access Type parameter is set to Role. Specify the AWS Account ID that the eG agent should use for connecting and making requests to the AWS API. To determine your AWS Account ID, follow the steps below:

  • Login to the AWS management console. with your credentials.

  • Click on your IAM user/role on the top right corner of the AWS Console. You will see a drop-down menu containing the Account ID (see Figure 1).

    Identifying AWS Account ID

    Figure 1 : Identifying the AWS Account ID

AWS Role Name

This parameter appears when the Access Type parameter is set to Role. Specify the name of the role that you have specifically created on the AWS cloud for monitoring purposes. The eG agent uses this role and the configured Account ID to connect to the AWS Cloud and pull the required metrics. To know how to create such a role, refer to Creating a New Role.

AWS Access Key, AWS Secret Key, Confirm AWS Access Key, Confirm AWS Secret Key

These parameters appear only when the Access Type parameter is set to Secret.To monitor an Amazon instance, the eG agent has to be configured with the access key and secret key of a user with a valid AWS account. For this purpose, we recommend that you create a special user on the AWS cloud, obtain the access and secret keys of this user, and configure this test with these keys. The procedure for this has been detailed in the Obtaining an Access key and Secret key topic. Make sure you reconfirm the access and secret keys you provide here by retyping it in the corresponding Confirm text boxes.

Proxy Host and Proxy Port

In some environments, all communication with the AWS cloud and its regions could be routed through a proxy server. In such environments, you should make sure that the eG agent connects to the cloud via the proxy server and collects metrics. To enable metrics collection via a proxy, specify the IP address of the proxy server and the port at which the server listens against the Proxy Host and Proxy Port parameters. By default, these parameters are set to none , indicating that the eG agent is not configured to communicate via a proxy, by default.

Proxy User Name, Proxy Password, and Confirm Password

If the proxy server requires authentication, then, specify a valid proxy user name and password in the proxy user name and proxy password parameters, respectively. Then, confirm the password by retyping it in the CONFIRM PASSWORD text box. By default, these parameters are set to none, indicating that the proxy sever does not require authentication by default.

Proxy Domain and Proxy Workstation

If a Windows NTLM proxy is to be configured for use, then additionally, you will have to configure the Windows domain name and the Windows workstation name required for the same against the proxy domain and proxy workstation parameters. If the environment does not support a Windows NTLM proxy, set these parameters to none.

Exclude Region

Here, you can provide a comma-separated list of region names or patterns of region names that you do not want to monitor. For instance, to exclude regions with names that contain 'east' and 'west' from monitoring, your specification should be: *east*,*west*

Exclude Instance

This parameter applies only to AWS- Instance Connectivity, AWS- Instance Resources , and AWS- Instance Uptime tests. In the Exclude Instance text box, provide a comma-separated list of instance names or instance name patterns that you do not wish to monitor. For example: i-b0c3e*,*7dbe56d. By default, this parameter is set to none.

Measures reported by the test:

Measurement

Description

Measurement Unit

Interpretation

4xx errors

Indicates the number of client-side errors in this API / stage.

For the Summary descriptor, this measure reports the total number of client-side errors encountered across APIs/stages.

Number

Errors in the range of 400 to 499 usually point to a problem with the API client. Some of the common client-side errors encountered by the API gateway include the following:

Error Code

Error Description

400

Error: Bad Request - This error is often due to invalid JSON, missing fields, wrong data types, or invalid characters in the API request.

403

Error: Access Denied - This error, also known as “Forbidden,” usually arises from permission issues related to the IAM role of the API Gateway or AWS Cognito.

404

Error: Not Found - This error typically indicates an incorrect URL or an attempt to access non-existent data in the services integrated with API Gateway.

409

Error: Conflict - This error indicates a conflict with a resource’s current state, often related to a caller’s reference.

429

Error: Limit Exceeded - This error signals that you’ve exceeded an API quota, often due to the usage limit of an API key.

Ideally, the value of this measure should be 0. A high value indicates that many client-side errors occurred. In this case, compare the value of this measure across descriptors to identify the exact API/stage that is experiencing the maximum number of client-side errors.

Check out the CloudWatch Dashboard or CloudWatch logs to view the errors and figure out how to resolve them.

5xx errors

Indicates the number of server-side errors in this API / stage.

For the Summary descriptor, this measure reports the total number of server-side errors encountered across APIs/stages.

Number

Errors in the range of 500 to 599 mean something on the server is wrong.

Some of the common client-side errors encountered by the API gateway include the following:

Error Code

Error Description

500

Error: Internal Server Error - This is a catch-all error and could imply a buggy code, inconsistent error mapping, or other technical issues.

502

Error: Bad Gateway - This error suggests a problem with the service integrated with your API Gateway, like connection issues or invalid response structures.

503

Error: Service Unavailable - A common error that signifies the service took too long to respond.

504

Error: Endpoint Request Timed-out - This error could indicate DNS or network problems, often arising when an integrated service isn’t running or IP/hostname is incorrect.

Ideally, the value of this measure should be 0. A high value indicates that many server-side errors occurred. In this case, compare the value of this measure across descriptors to identify the exact API/stage that is experiencing the maximum number of server-side errors.

Check out the CloudWatch Dashboard or CloudWatch logs to view the errors and figure out how to resolve them.

Requests served from API cache

Indicates the number of requests to this API/stage that were serviced from the API cache.

For the Summary descriptor, this measure reports the total number of requests, across APIs/stages, that were served by the API cache.

Number

You can enable API caching in Amazon API Gateway to cache your endpoint's responses. With caching, you can reduce the number of calls made to your endpoint and also improve the latency of requests to your API.

If the value of this measure is much higher than that of the Requests served from the backend measure, it means that caching has effectively minimized calls to endpoints. On the other hand, if the value of this measure is significantly lower than that of the Requests served from the backend measure, it is indicative of poor caching.

One of the common causes of ineffective cache usage is improper configuration of the cache capacity. When you enable caching, you must choose a cache capacity.

If the cache is under-sized - i.e., if the cache capacity chosen is not commensurate to the workload of the API - then the API cache will not be able to service many API requests. A majority of the requests will hence be routed to directly to the backend for processing. As a result, latencies will increase. To avoid this, you may want to consider right-sizing the cache if you observe that the value of this measure is alarmingly lower than the value of the Requests served from backend measure.

Requests served from backend

Indicates the number of requests to this API/stage that were serviced from the backend and not the cache.

For the Summary descriptor, this measure reports the total number of requests, across APIs/stages, that were served by the backend.

Number

You can enable API caching in Amazon API Gateway to cache your endpoint's responses. With caching, you can reduce the number of calls made to your endpoint and also improve the latency of requests to your API.

If the value of this measure is much lower than that of the Requests served from API cache measure, it means that caching has effectively minimized calls to backend. On the other hand, if the value of this measure is significantly lower than that of the Requests served from API cache measure, it is indicative of poor caching.

One of the common causes of ineffective cache usage is improper configuration of the cache capacity. When you enable caching, you must choose a cache capacity.

If the cache is under-sized - i.e., if the cache capacity chosen is not commensurate to the workload of the API - then the API cache will not be able to service many API requests. A majority of the requests will hence be routed to directly to the backend for processing. As a result, latencies will increase. To avoid this, you may want to consider right-sizing the cache if you observe that the value of this measure is alarmingly higher than the value of the Requests served from API cache measure.

Total API requests

Indicates the total number of requests received by this API/stage.

For the Summary descriptor, this measure reports the total number of requests received by the API Gateway service, across APIs/stages.

Number

This is a good indicator of the workload of an API/stage.

Integration latency

Indicates the time between when this API/stage relays a request to the backend and when it receives a response from the backend.

Seconds

Ideally, the value of this measure should be low. A consistent increase in this value could be indicative of processing bottlenecks in the backend.

Compare the value of this measure across APIs/stages to know which API/stage is receiving responses very slowly from the backend.

Latency

Indicates the time between when this API/stage receives a request from a client and when it returns a response to the client.

Seconds

Ideally, the value of this measure should be low. A consistent increase in this value could be indicative of latencies in an API/stage.

Compare the value of this measure across APIs/stages to know which API/stage is the most lethargic when processing requests. To diagnose what is causing this slowness, first compute the difference between the values of the Integration latency and Latency measures for the problematic API/stage. This difference represents other latencies that are impacting the responsiveness of the API/stage. Now, compare this difference with the value of the Integration Latency measure to determine if the slowness is owing to a latent backend or other processing latencies.

If a latent backend is impacting the responsiveness of an API/stage, you may want to enable API caching and adequately size the cache. This will minimize backend processing and optimize cache processing, which will eventually improve latencies.