AWS Lambda Test

AWS Lambda is a compute service that lets you run code without provisioning or managing servers. In other words, AWS Lambda runs your code on a high-availability compute infrastructure and performs all of the administration of the compute resources, including server and operating system maintenance, capacity provisioning and automatic scaling, code monitoring and logging. All you need to do is author your code in a language that AWS Lambda supports (currently Node.js, Java, C#, Go and Python), and upload your application code to AWS Lambda in the form of one or more Lambda functions. Using AWS Lambda, you can even maintain multiple versions of your in-production function code, and can also create aliases for each of your function versions for easy reference.

Typically, AWS Lambda is used to run code in response to events, such as changes to data in an Amazon S3 bucket or an Amazon DynamoDB table; to run your code in response to HTTP requests using Amazon API Gateway; or invoke your code using API calls made using AWS SDKs.

In such scenarios, if the Lambda function code fails or takes too long to execute, it can stall or even completely stop data/request processing by critical AWS services (eg., Amazon S3, Amazon DynamoDB, Amazon API Gateway, etc.) that rely on that code for their operations. To pre-empt the failure/delay of critical AWS services, administrators need to monitor each Lambda function that these services use and promptly capture problems in the function code. This is exactly what the AWS Lambda test does!

This test automatically discovers the Lambda functions, monitors the invocations of each function, and in the process, reports latencies and errors/failures in function execution. This enables administrators to quickly and accurately identify slow and/or buggy functions, so that they can take those functions and their codes up for closer review and fine-tuning.

Optionally, you can configure this test to report metrics for each version of a function or for every alias of a function version. This enables administrators to quickly compare the performance of different versions or aliases of a function, and then decide which version/alias to use in the production environment.

Target of the test: Amazon EC2 Cloud

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each Lambda function / version / alias in every region

First-level descriptor: AWS Region

Second-level descriptor: Function / Version / Alias, depending upon the option chosen from the Lambda Filter Name parameter of this test

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

AWS Access Key, AWS Secret Key, Confirm AWS Access Key, Confirm AWS Secret Key

To monitor an Amazon EC2 instance, the eG agent has to be configured with the access key and secret key of a user with a valid AWS account. For this purpose, we recommend that you create a special user on the AWS cloud, obtain the access and secret keys of this user, and configure this test with these keys. The procedure for this has been detailed in the Obtaining an Access key and Secret key topic. Make sure you reconfirm the access and secret keys you provide here by retyping it in the corresponding Confirm text boxes.

Proxy Host and Proxy Port

In some environments, all communication with the AWS EC2 cloud and its regions could be routed through a proxy server. In such environments, you should make sure that the eG agent connects to the cloud via the proxy server and collects metrics. To enable metrics collection via a proxy, specify the IP address of the proxy server and the port at which the server listens against the Proxy Host and Proxy Port parameters. By default, these parameters are set to none , indicating that the eG agent is not configured to communicate via a proxy, by default.

Proxy User Name, Proxy Password, and Confirm Password

If the proxy server requires authentication, then, specify a valid proxy user name and password in the Proxy User Name and Proxy Password parameters, respectively. Then, confirm the password by retyping it in the Confirm Password text box. By default, these parameters are set to none, indicating that the proxy sever does not require authentication by default.

Proxy Domain and Proxy Workstation

If a Windows NTLM proxy is to be configured for use, then additionally, you will have to configure the Windows domain name and the Windows workstation name required for the same against the Proxy Domain and Proxy Workstation parameters. If the environment does not support a Windows NTLM proxy, set these parameters to none.

Exclude Region

Here, you can provide a comma-separated list of region names or patterns of region names that you do not want to monitor. For instance, to exclude regions with names that contain 'east' and 'west' from monitoring, your specification should be: *east*,*west*

Lambda Filter Name

By default, this parameter is set to FunctionName. This means that by default, this test will report metrics for each Lambda function that is in use.

If required, you can override this default setting by setting the Lambda Filter Name parameter to one of the following:

  • Version - When you use versioning in AWS Lambda, you can publish one or more versions of your Lambda function. As a result, you can work with different variations of your Lambda function in your development workflow, such as development, beta, and production. Each Lambda function version has a unique Amazon Resource Name (ARN).

    If you want this test to report metrics for every version of every Lambda function, select the Version option.

  • Alias - AWS Lambda also supports creating aliases for each of your Lambda function versions. Conceptually, an AWS Lambda alias is a pointer to a specific Lambda function version. It's also a resource similar to a Lambda function, and each alias has a unique ARN. Each alias maintains an ARN for the function version to which it points.

    Select the Alias option if you want this test to report metrics for each alias that points to a version of a function.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Invocations

By default, this measure represents the number of times this function was invoked in response to an event or invocation API call.

If the Lambda Filter Name is set to Version, then this measure represents the number of times this version of a function was invoked in response to an event or invocation API call..

If the Lambda Filter Name is set to Alias, then this measure represents the number of times this alias was invoked in response to an event or invocation API call.

Number

Compare the value of this measure acros functions to know which function is used the maximum. The failure of such a function will naturally have a more adverse impact on performance and productivity than other functions.

Invocations failed due to errors

By default, this measure represents the number of invocations of this function failed due to errors (response code 4xx)

If the Lambda Filter Name is set to Version, then this measure represents the number of number of invocations of this version of a function that failed due to errors.

If the Lambda Filter Name is set to Alias, then this measure represents the number of invocations of this alias that failed due to errors.

Number

Ideally, the value of this measure should be 0. A non-zero value is indicative of one/more errors in the function code.

To know what errors occurred during invocation, check the logs. Typically, each time the code is executed in response to an event, it writes a log entry into the log group associated with a Lambda function, which is /aws/lambda/<function name>.

Following are some examples of errors that might show up in the logs:

  • If you see a stack trace in your log, there is probably an error in your code. Review your code and debug the error that the stack trace refers to.
  • If you see a permissions denied error in the log, the IAM role you have provided as an execution role may not have the necessary permissions. Check the IAM role and verify that it has all of the necessary permissions to access any AWS resources that your code references.
  • If you see a timeout exceeded error in the log, your timeout setting exceeds the run time of your function code. This may be because the timeout is too low, or the code is taking too long to execute.
  • If you see a memory exceeded error in the log, your memory setting is too low. Set it to a higher value. Typically, when creating a function, you need to mention the amount of memory that should be given to that function. Lambda uses this memory size to infer the amount of CPU and memory allocated to your function. Your function use-case determines your CPU and memory requirements. For example, a database operation might need less memory compared to an image processing function. The default value is 128 MB. The value must be a multiple of 64 MB.

Dead letter errors

By default, this measure represents the number of times Lambda could not write the failure of this function to the configured dead letter queues.

If the Lambda Filter Name is set to Version, then this measure represents the number of times Lambda could not write the failure of this version of a function to the configured dead letter queues.

If the Lambda Filter Name is set to Alias, then this measure represents the number of times Lambda could not write the failure of this alias to the configured dead letter queues.

Number

By default, a failed Lambda function invoked asynchronously is retried twice, and then the event is discarded. Using Dead Letter Queues (DLQ), you can indicate to Lambda that unprocessed events should be sent to an Amazon SQS queue or Amazon SNS topic instead, where you can take further action.

If the value of this measure keeps increasing, it implies that the event payload is consistently failing to reach the dead letter queue. Probable cause for this are as follows:

  • Permissions errors
  • Throttles from downstream services
  • Misconfigured resources
  • Timeouts tidings

Time taken to execute function

By default, this measure indicates the average elapsed wall clock time from when this function's code starts executing because of an invocation to when it stops executing.

If the Lambda Filter Name is set to Version, then this measure represents the average elapsed wall clock time from when this version of a function's code starts executing because of an invocation to when it stops executing.

If the Lambda Filter Name is set to Alias, then this measure represents the average elapsed wall clock time from when this alias starts executing because of an invocation to when it stops executing.

Secs

Ideally, the value of this measure should be low. A high value indicates that a function/version/alias is taking too long to execute.

To determine why there is increased latency in the execution of a Lambda function, do the following:

  • Test your code with different memory settings: If your code is taking too long to execute, it could be that it does not have enough compute resources to execute its logic. Try increasing the memory allocated to your function and testing the code again, using the Lambda console's test invoke functionality. You can see the memory used, code execution time, and memory allocated in the function log entries. Changing the memory setting can change how you are charged for duration.
  • Investigate the source of the execution bottleneck using logs: You can test your code locally, as you would with any other Node.js function, or you can test it within Lambda using the test invoke capability on the Lambda console, or using the asyncInvoke command by using AWS CLI. Each time the code is executed in response to an event, it writes a log entry into the log group associated with a Lambda function, which is named aws/lambda/<function name>. Add logging statements around various parts of your code, such as callouts to other services, to see how much time it takes to execute different parts of your code.

Function invocation that throttled

By default, this measure indicates the number of invocation attempts for this Lambda function that were throttled due to invocation rates exceeding the customer’s concurrent limits (error code 429).

If the Lambda Filter Name is set to Version, then this measure represents the number of invocation attempts that were throttled for this version of the Lambda function due to invocation rates exceeding the customer’s concurrent limits (error code 429).

If the Lambda Filter Name is set to Alias, then this measure represents the number of invocation attempts that were throttled for the version of the function that maps to this alias, due to invocation rates exceeding the customer’s concurrent limits (error code 429).

Number

The unit of scale for AWS Lambda is a concurrent execution (see Understanding Scaling Behavior for more details). However, scaling indefinitely is not desirable in all scenarios. For example, you may want to control your concurrency for cost reasons, or to regulate how long it takes you to process a batch of events, or to simply match it with a downstream resource. To assist with this, Lambda provides a concurrent execution limit control at both the account level and the function level.

On reaching the concurrency limit associated with a function, any further invocation requests to that function are throttled, i.e. the invocation doesn't execute your function. Each throttled invocation increases the value of this measure for the corresponding function.

AWS Lambda handles throttled invocation requests differently, depending on their source:

  • Event sources that aren't stream-based: Some of these event sources invoke a Lambda function synchronously, and others invoke it asynchronously. Handling is different for each:

    • Synchronous invocation: If the function is invoked synchronously and is throttled, Lambda returns a 429 error and the invoking service is responsible for retries. The ThrottledReason error code explains whether you ran into a function level throttle (if specified) or an account level throttle (see note below). Each service may have its own retry policy. For example, CloudWatch Logs retries the failed batch up to five times with delays between retries. For a list of event sources and their invocation type, see Supported Event Sources.
    • Asynchronous invocation: If your Lambda function is invoked asynchronously and is throttled, AWS Lambda automatically retries the throttled event for up to six hours, with delays between retries. Remember, asynchronous events are queued before they are used to invoke the Lambda function.
  • Stream-based event sources: For stream-based event sources (Kinesis and DynamoDB streams), AWS Lambda polls your stream and invokes your Lambda function. When your Lambda function is throttled, Lambda attempts to process the throttled batch of records until the time the data expires. This time period can be up to seven days for Kinesis. The throttled request is treated as blocking per shard, and Lambda doesn't read any new records from the shard until the throttled batch of records either expires or succeeds. If there is more than one shard in the stream, Lambda continues invoking on the non-throttled shards until one gets through.