AWS ElastiCache Test

Amazon ElastiCache is a web service that improves the performance of web applications by allowing you to retrieve information from a fast, managed, in-memory system, instead of relying entirely on slower disk-based databases. Using Amazon ElastiCache, you can not only improve load and response times to user actions and queries, but also reduce the cost associated with scaling web applications.

Amazon ElastiCache supports the Memcached and Redis cache engines.

  • Redis - a fast, open source, in-memory data store and cache. Amazon ElastiCache for Redis is a Redis-compatible in-memory service that delivers the ease-of-use and power of Redis along with the availability, reliability and performance suitable for the most demanding applications.
  • Memcached - a widely adopted memory object caching system. ElastiCache is protocol compliant with Memcached, so popular tools that you use today with existing Memcached environments will work seamlessly with the service.

Using Amazon ElastiCache, you can create and manage cache clusters. The key components of this cluster are nodes and shards (Redis). A node is the smallest building block of an ElastiCache deployment. Each node is a fixed-size chunk of secure, network-attached RAM. A Redis shard (called a node group in the API and CLI) is a grouping of 1–6 related nodes. A Redis (cluster mode disabled) cluster always has one shard. A Redis (cluster mode enabled) cluster can have from 1–15 shards. A Redis cluster is a logical grouping of one or more ElastiCache Shards (Redis). Data is partitioned across the shards in a Redis (cluster mode enabled) cluster. A Memcached cluster is a logical grouping of one or more ElastiCache Nodes. Data is partitioned across the nodes in a Memcached cluster. Once a cluster is provisioned, Amazon ElastiCache automatically detects and replaces failed nodes, providing a resilient system that mitigates the risk of overloaded databases, which slow website and application load times.

If a cluster is unavailable or is not sized with adequate resources for cache operations, then the cache cluster will not be able to service requests from applications. This can cause the cache hit ratio to fall drastically, thus increasing datastore accesses and related processing overheads. Consequently, request processing will slow down and application performance will suffer. To avoid this, administrators need to track the status, usage, and resource consumption of a cache cluster and its nodes, proactively detect abnormalities, and promptly fix them. This is where the AWS ElastiCache test helps!

By default, this test auto-discovers the cache clusters that have been created and launched. For each cluster, the test then reports the following:

  • The status of the cluster;
  • The load on the cluster, in terms of the number and type of requests received by it;
  • How much CPU and memory was consumed by the cluster when servicing the requests;
  • How well the cache served the different type of requests (Check and set, Decrement, Delete, Increment, Config get, Config set, Get);

In the process, the test points to unavailable clusters, irregularities in request processing by a cluster, and inadequacies in cache size. The number and type of requests that the cluster was unable to serve are highlighted. Moreover, using the performance results reported by the test, administrators can also receive useful pointers on how to resize the cluster to optimize cache performance.

Optionally, you can configure the test to report metrics for each cluster node, instead of cluster. The node-level analytics will help administrators quickly identify the unavailable nodes and resource-thin nodes in the cluster.

Target of the test: Amazon Cloud

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each cluster / cluster node

First-level descriptor: AWS Region

Second-level descriptor: Cluster / cluster node, depending upon the option chosen against the ElastiCache Filter Name parameter.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Access Type

eG Enterprise monitors the AWS cloud using AWS API. By default, the eG agent accesses the AWS API using a valid AWS account ID, which is assigned a special role that is specifically created for monitoring purposes. Accordingly, the Access Type parameter is set to Role by default. Furthermore, to enable the eG agent to use this default access approach, you will have to configure the eG tests with a valid AWS Account ID to Monitor and the special AWS Role Name you created for monitoring purposes.

Some AWS cloud environments however, may not support the role-based approach. Instead, they may allow cloud API requests only if such requests are signed by a valid Access Key and Secret Key. When monitoring such a cloud environment therefore, you should change the Access Type to Secret. Then, you should configure the eG tests with a valid AWS Access Key and AWS Secret Key.

Note that the Secret option may not be ideal when monitoring high-security cloud environments. This is because, such environments may issue a security mandate, which would require administrators to change the Access Key and Secret Key, often. Because of the dynamicity of the key-based approach, Amazon recommends the Role-based approach for accessing the AWS API.

AWS Account ID to Monitor

This parameter appears only when the Access Type parameter is set to Role. Specify the AWS Account ID that the eG agent should use for connecting and making requests to the AWS API. To determine your AWS Account ID, follow the steps below:

  • Login to the AWS management console. with your credentials.

  • Click on your IAM user/role on the top right corner of the AWS Console. You will see a drop-down menu containing the Account ID (see Figure 1).

    Figure 1 : Identifying the AWS Account ID

AWS Role Name

This parameter appears when the Access Type parameter is set to Role. Specify the name of the role that you have specifically created on the AWS cloud for monitoring purposes. The eG agent uses this role and the configured Account ID to connect to the AWS Cloud and pull the required metrics. To know how to create such a role, refer to Creating a New Role.

AWS Access Key, AWS Secret Key, Confirm AWS Access Key, Confirm AWS Secret Key

These parameters appear only when the Access Type parameter is set to Secret.To monitor an Amazon cloud instance using the Secret approach, the eG agent has to be configured with the access key and secret key of a user with a valid AWS account. For this purpose, we recommend that you create a special user on the AWS cloud, obtain the access and secret keys of this user, and configure this test with these keys. The procedure for this has been detailed in the Obtaining an Access key and Secret key topic. Make sure you reconfirm the access and secret keys you provide here by retyping it in the corresponding Confirm text boxes.

Proxy Host and Proxy Port

In some environments, all communication with the AWS cloud and its regions could be routed through a proxy server. In such environments, you should make sure that the eG agent connects to the cloud via the proxy server and collects metrics. To enable metrics collection via a proxy, specify the IP address of the proxy server and the port at which the server listens against the Proxy Host and Proxy Port parameters. By default, these parameters are set to none , indicating that the eG agent is not configured to communicate via a proxy, by default.

Proxy User Name, Proxy Password, and Confirm Password

If the proxy server requires authentication, then, specify a valid proxy user name and password in the Proxy User Name and Proxy Password parameters, respectively. Then, confirm the password by retyping it in the Confirm Password text box. By default, these parameters are set to none, indicating that the proxy sever does not require authentication by default.

Proxy Domain and Proxy Workstation

If a Windows NTLM proxy is to be configured for use, then additionally, you will have to configure the Windows domain name and the Windows workstation name required for the same against the Proxy Domain and Proxy Workstation parameters. If the environment does not support a Windows NTLM proxy, set these parameters to none.

Exclude Region

Here, you can provide a comma-separated list of region names or patterns of region names that you do not want to monitor. For instance, to exclude regions with names that contain 'east' and 'west' from monitoring, your specification should be: *east*,*west*

ElastiCache Filter Name

By default, this parameter is set to CacheClusterId. In this case, this test will report metrics for cache cluster that is created and launched.

If required, you can override this default setting by setting the ElastiCache Filter Name to CacheNodeId. In this case, the test will report metrics for every cluster node.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

CPU utilization

By default, this measure reports the percentage CPU utilization of this cluster.

If the ElastiCache Filter Name parameter is set to CacheNodeI, then this measure reports the percentage CPU utilization of this node.

Percent

Typically, a high value for this measure is a sign of excessive CPU usage by a cluster/node. It could also hint at a potential CPU contention at the cluster / node-level.

In case of a cluster, the cache engine used determines how high the CPU usage can be and what its implications are:

  • Memcached: Since Memcached is multi-threaded, this metric can be as high as 90%. If you exceed this threshold, scale your cache cluster up by using a larger cache node type, or scale out by adding more cache nodes.
  • Redis: Since Redis is single-threaded, the threshold is calculated as (90 / number of processor cores). For example, suppose you are using a cache.m1. xlarge node, which has four cores. In this case, the threshold for CPU Utilization would be (90 / 4), or 22.5%. You will need to determine your own threshold, based on the number of cores in the cache node that you are using in redis. f you exceed this threshold, and your main workload is from read requests, scale your cache cluster out by adding read replicas. If the main workload is from write requests, depending on your cluster configuration, we recommend that you:

    • Redis (cluster mode disabled) clusters: scale up by using a larger cache instance type.
    • Redis (cluster mode enabled) clusters: add more shards to distribute the write workload across more primary nodes.

Freeable memory

By default, this measure reports amount of free memory available to this cluster.

If the ElastiCache Filter Name parameter is set to CacheNodeId then this measure reports the amount of free memory on this node.

MB

A high value is desired for this measure. A steady and significant drop in the value for this measure indicates a memory contention on the cluster/node.

Incoming network traffic

By default, this measure reports the rate at which this cluster has read from the network.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the rate at which this node reads from the network.

KB/Sec

Outgoing network traffic

By default, this measure reports the rate at which this cluster has written to the network.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the rate at which this node has written to the network.

Number

Swap usage

By default, this measure reports the amount of swap space used by this cluster.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the amount of swap space used by this node.

KB

For a memcached cluster, the value of this measure should not exceed 50 MB. If it does, we recommend that you increase the ConnectionOverhead parameter value.

Status

By default, this measure reports the whether/not this cluster is available.

If the ElastiCache Filter Name parameter is set to CacheNodeId then this measure reports whether/not this node is available.

Number

The values that this measure can report and the states they represent are listed in the table below:

Measure Value State
0 Available
1 Creating
2 Modifying
3 Rebooting
4 Cache cluster nodes
5 Incompatible - network
6 Snapshotting
7 Restore-failed
8 Deleting
9 Deleted

No of nodes

Indicates the number of nodes in this cluster.

Number

This measure is reported only for memcached clusters.

This measure is not reported for a node - i.e., if the Elastic Filter Name parameter is set to 'CacheNodeId'.

Current connections

Indicates the current number of connections to this cluster.

Number

This measure is reported only for a cluster - i.e., if the ELASTIC FILTER NAME parameter is set to 'CacheClusterId'.

This is a cache engine metric, published for both Memcached and Redis cache clusters. We recommend that you determine your own alarm threshold for this metric based on your application needs.

Whether you are running Memcached or Redis, an increasing number of CurrConnections might indicate a problem with your application; you will need to investigate the application behavior to address this issue.

Current items

By default, this measure reports the total number of items currently stored in this cluster.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of items in this node.

Number

Reclaimed

By default, this measure reports the number of expired items this cluster evicted to allow space for new writes.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of expired items that were evicted from this node to allow space for new writes.

Number

In case there are no free chunks, or no free pages in the appropriate slab class, Amazon ElastiCache will look at the LRU in tail to reclaim an item. Basically, it will search the last few items in the “end” and identifies the ones that are already expired, and reclaims it - i.e., makes it free for reuse.

A high value for this measure therefore implies that the cache is running out of memory. You may want to check the value of the Freeable memory measure to corroborate this finding.

If the value of this measure is very low, while Freeable memory is also low, it means that there are very few expired items in the cache to be reclaimed. This potentially means that very shortly, there may not be any expired items at the end of the LRU to be reused. In such a situation, ElastiCache will evict an item that has not expired. This can result in the loss of frequently-accessed items from the cache. If the situation persists, it will seriously undermine cache performance. To avoid this, you should increase the memory capacity of a memcached cluster by adding more nodes to it or by using a larger node type; for a redis cluster, use a larger node type.

Alternatively, you can configure a memcached cluster to send out an error message instead of evicting items (expired or non-expired), whenever it has no more memory to store items. For this, turn on the error_on_memory_exhausted flag of memacached.

Evictions

Indicates number of non-expired items this cluster evicted to allow space for new writes.

Number

This measure is reported only for a cache cluster - i.e., this measure is reported only if the Elastic Filter Name is set to 'CacheClusterId'.

Typically, items are evicted from Amazon ElastiCache if they are expired or the slab class is completely out of free chunks and there are no free pages to assign to a slab class. In case there are no free chunks, or no free pages in the appropriate slab class, Amazon ElastiCache will look at the LRU in tail to reclaim an item. Basically, it will search the last few items in the “end” and identifies the ones that are already expired, and makes it free for reuse. If it cannot find an expired item on the end, it will "evict" one which has not yet expired. Actually you could end up with one slab class constantly evicting recently used items, on the other hand another slab having a bunch of old items that just sit around. For example: When we need a 104 byte chunk, it will evict a 104 byte chunk, even though there might be a 280 byte chunk that is even older. This explains the internal workings that “Each slab-class has its own LRU and statistical counter updates, it behaves like a separate cache itself, it is not global LRU, but slab class LRU in short”.

We recommend that you determine your own alarm threshold for this metric based on your application needs.

  • Memcached: If you exceed your chosen threshold, scale your cluster up by using a larger node type, or scale out by adding more nodes.
  • Redis: If you exceed your chosen threshold, scale your cluster up by using a larger node type.

New connections

By default, this measure reports the number of new connections this cluster has received.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of new connections this node has received.

Number

This measure derived from the memcached total_connections statistic by recording the change in total_connections across a period. This will always be at least 1, due to a connection reserved for an ElastiCache.

Data used for cache items

By default, this measure reports the amount of memory in this cluster that has been used to store items.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of new connections this node has received amount of memory used to store items in this node.

KB

If the value of this measure keeps increasing over time, it implies that the items are consuming memory excessively.

Keep checking if the cluster/node has sufficient memory to hold new items. If not, cache performance will be adversely impacted, thus degrading application performance as well.

To increase the memory capacity of a cluster, add more nodes to a cluster or add nodes of a larger node type.

To increase the memory capacity of a memcached node, you may have to increase the max_cache_memory and/or memcached_connections_overhead parameters of that node. max_cache_memory sets the total amount of memory available on a node. The memcached_connections_overhead is the memory used for connections and other overheads. The memory available for storing items is the difference between the max_cache_memory and memcached_connections_overhead. By increasing the max_cache_memory and/or by reducing the memached_connections_overhead, you can make more memory available for storing items.

To increase the memory capacity of a redis node, you may have to increase the maxmemory and/or reserved-memory parameters of that node. maxmemory sets the total amount of memory available on a node. The reserved-memory is the memory that is set aside for non-data purposes. The memory available for storing items is the difference between the maxmemory and reserved-memory. By increasing the maxmemory and/or by reducing the reserved-memory, you can make more memory available for storing items.

Data read into memcached

By default, this measure reports the amount of data that this cache cluster read from the network.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the amount of data that this node read from the network.

KB

This measure is reported only for the memcached engine.

Data written out from memcached

By default, this measure reports the amount of data that this cache cluster wrote to the network.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the amount of data that this node wrote to the network.

KB

This measure is reported only for the memcached engine.

Check and set bad value

By default, this measure reports the number of CAS (check and set) requests that this cluster received, where the Cas value did not match with the Cas value stored.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of CAS (check and set) requests that this node received, where the Cas value did not match with the Cas value stored.

Number

This measure is reported only for the memcached engine.

Check and set hits

By default, this measure reports the number of CAS (check and set) requests that this cluster received, where the requested key was found and the Cas value matched.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of CAS (check and set) requests that this node received, where the requested key was found and the Cas value matched.

Number

This measure is reported only for the memcached engine.

Check and set misses

By default, this measure reports the number of CAS (check and set) requests that this cluster received, where the key requested was not found.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of CAS (check and set) requests that this node received, where the key requested was not found.

Number

This measure is reported only for the memcached engine.

In the event of poor cache performance, you can compare the value of this measure with that of the Touch request missed hits, Increment request missed hits, Get request missed hits, Delete request missed hits, and Decrement request missed hits measure to know what type of requests the cache has been unable to serve most of the time.

Flush commands

By default, this measure reports the number of flush commands this cluster has received.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of flush commands this node received.

Number

This measure is reported only for the memcached engine.

Get commands

By default, this measure reports the number of Get commands this cluster has received.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of Get commands this node received.

Number

This measure is reported only for the memcached engine.

Decrement hits

By default, this measure reports the number of decrement requests this cluster has received, where the requested key was found.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of decrement requests this node has received, where the requested key was found.

Number

This measure is reported only for the memcached engine.

A high value is desired for this measure.

Decrement misses

By default, this measure reports the number of decrement requests this cluster has received, where the requested key was not found.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of decrement requests this node has received, where the requested key was not found.

Number

This measure is reported only for the memcached engine.

In the event of poor cache performance, you can compare the value of this measure with that of the Check and set request missed, Touch request missed hits, Increment request missed hits, Get request missed hits, and Delete request missed hits measures to know what type of requests the cache has been unable to serve most of the time.

Delete hits

By default, this measure reports the number of delete requests this cluster has received, where the requested key was found.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of delete requests this node has received, where the requested key was found.

Number

This measure is reported only for the memcached engine.

A high value is desired for this measure.

Delete misses

By default, this measure reports the number of delete requests this cluster has received, where the requested key was not found.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of delete requests this node has received, where the requested key was not found.

Number

This measure is reported only for the memcached engine.

In the event of poor cache performance, you can compare the value of this measure with that of the Check and set request missed, Touch request missed hits, Increment request missed hits, Get request missed hits, and Decrement request missed hits measures to know what type of requests the cache has been unable to serve most of the time.

Get hits

By default, this measure reports the number of Get requests this cluster has received, where the requested key was found.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of Get requests this node has received, where the requested key was found.

Number

This measure is reported only for the memcached engine.

A high value is desired for this measure.

Get misses

By default, this measure reports the number of Get requests this cluster has received, where the requested key was not found.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of Get requests this node has received, where the requested key was not found.

Number

This measure is reported only for the memcached engine.

In the event of poor cache performance, you can compare the value of this measure with that of the Check and set request missed, Touch request missed hits, Increment request missed hits, Decrement request missed hits, and Delete request missed hits measures to know what type of requests the cache has been unable to serve most of the time.

Increment hits

By default, this measure reports the number of increment requests this cluster has received, where the requested key was found.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of increment requests this node has received, where the requested key was found.

Number

This measure is reported only for the memcached engine.

A high value is desired for this measure.

Increment misses

By default, this measure reports the number of increment requests this cluster has received, where the requested key was not found.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of increment requests this node has received, where the requested key was not found.

Number

This measure is reported only for the memcached engine.

In the event of poor cache performance, you can compare the value of this measure with that of the Check and set request missed, Touch request missed hits, Decrement request missed hits, Get request missed hits, and Delete request missed hits measures to know what type of requests the cache has been unable to serve most of the time.

Data used for hash

By default, this measure reports the amount of data in this cluster that is currently used by hash tables.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the amount of data in this node that is currently used by hash tables.

MB

This measure is reported only for the memcached engine.

Command configGet

By default, this measure reports the cumulative number of config 'get' requests to this cluster.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of config 'get' requests to this node.

Number

This measure is reported only for the memcached engine.

Command configSet

By default, this measure reports the cumulative number of config 'set' requests to this cluster.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the cumulative number of config 'set' requests to this node.

Number

This measure is reported only for the memcached engine.

Command touch

By default, this measure reports the cumulative number of 'touch' requests to this cluster.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the cumulative number of touch requests to this node.

Number

This measure is reported only for the memcached engine.

Current configurations

By default, this measure reports the number of configurations currently stored in this cluster.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of configurations currently stored in this node.

Number

This measure is reported only for the memcached engine.

Evicted unfetched

By default, this measure reports number of valid items evicted from the least recently used cache (LRU) of this cluster, which were never touched after being set.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of valid items evicted from the least recently used cache (LRU) of this node, which were never touched after being set.

Number

These measures are reported only for the memcached engine.

If you store an item and it expires, it still sits in the LRU cache at its position. If that item is not fetched by any request, then it falls to the end of the cache and is then picked up for reuse. However, if you fetch an expired item, memcached will find that the item is expired and free its memory for reuse immediately. This means that unfetched items in the LRU take longer to be evicted than the ones fetched.

Expired unfetched

By default, this measure reports the number of expired items reclaimed from the LRU of this cluster, which were never touched after being set.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of expired items reclaimed from the LRU of this node, which were never touched after being set.

Number

Slabs moved

By default, this measure reports the total number of slab pages moved in all nodes of this cluster.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of slab pages moved in this node

Number

This measure is reported only for the memcached engine.

Amazon ElastiCache node usually breaks the allocated memory into smaller parts called pages. Each page is usually 1 megabyte in size. Each page is then assigned to a slab class when necessary. Each slab class is in turn divided into chunks of a specific size. The chunks in each slab have the same size. There can be multiple pages assigned for each slab-class, but as soon as a page is assigned to a slab-class, it is permanent.

Touch hits

By default, this measure reports the number of keys in this cluster that were touched and given a new expiration time.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of keys in this node that were touched and given a new expiration time.

Number

This measure is reported only for the memcached engine.

Touch misses

By default, this measure reports the number of keys in this cluster that have been touched, but were not found.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of keys in this node that have been touched, but were not found.

Number

This measure is reported only for the memcached engine.

In the event of poor cache performance, you can compare the value of this measure with that of the Check and set request missed, Decrement request missed hits, Get request missed hits, Increment request missed hits, and Delete request missed hits measures to know what type of requests the cache has been unable to serve most of the time.

New items

By default, this measure reports the number of new items stored in this cluster during the last measurement period.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of new items stored in this node during the last measurement period.

Number

This measure is reported only for the memcached engine.

Unused memory

By default, this measure represents the amount of memory in this cluster that can be used to store items.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the amount of memory in this node that can be used to store items.

MB

This measure is reported only for the memcached engine.

A consistent drop in the value of this measure is a cause for concern, as it indicates that there is not enough free memory in the cache to store new items. This can cause the cache hit ratio to drop steeply, which in turn can affect application performance. To avoid this, you may have to increase the memory capacity of the cluster by adding more nodes to it or by adding a large node type.

At the node-level, memory leakages can cause a serious memory contention. To avoid memory leakages in a memcached cluster, you need to have a decent understanding of how the memory internals of a node work.

A memcached node usually breaks the allocated memory into smaller parts called pages. Each page is usually 1 megabyte in size. Each page is then assigned to a slab class when necessary. Each slab class is in turn divided into chunks of a specific size. The chunks in each slab have the same size. For instance, you can have a page that is assigned to say, slab class 1, which contains 13,107 chunks of 80 bytes each.

When you are storing items in Amazon ElastiCache, they are pushed into the slab class of the nearest fit. For instance, in the example above, if an item of size 70 bytes is to be stored in the cache, it will go into slab class 1, causing an overhead loss of 10 bytes per item. If you are running Amazon ElastiCache clusters spanning in Hundreds of GB or TB, you will end up losing lots of allocated memory as overheads this way. This can cause a serious contention for memory resources. To avoid this, it is imperative that the chunk size and growth factor of the chunks is appropriately set. These two factors are governed by chunk_size and chunk_size_growth_factor parameters of a Memcached cluster. chunk_size is the minimum amount, in bytes, of space to allocate for the smallest item's key, value, and flags. By default, this is 48 bytes. chunk_size_growth_factor is the growth factor that controls the size of each successive memcached chunk; each chunk will be chunk_size_growth_factor times larger than the previous chunk. By default, this is set to 1.25. For best performance, you should keep the chunk sizes closer to your item sizes. This means that if item sizes are big and predictable it is recommended to have bigger chunks and growth factors. If the item sizes are varied, it is better to have smaller initial size and growth factor. This will keep the wastage minimal and increase free memory.

Data used for cache

By default, this measure indicates the amount of memory allocated to this node for cache usage.

MB

This measure is reported only for the redis engine.

A Redis node will grow until it consumes the maximum memory configured for that node - i.e., the value set against its maxmemory parameter. If this occurs, then node performance will likely suffer due to excessive memory paging. By reserving memory you can set aside some of the available memory for non-Redis purposes to help reduce the amount of paging. Use the reserved-memory parameter for this purpose.

Cache hits

By default, this measure indicates the number of successful key lookups in this cluster.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of successful key lookups in this node.

Number

This measure is reported only for the redis engine.

A high value is desired for this measure.

Cache misses

By default, this measure indicates the number of unsuccessful key lookups in this node.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the number of unsuccessful key lookups in this node.

Number

This measure is reported only for the redis engine.

Ideally, the value of this measure should be very low. If this value is higher than the value of the Cache request hits measure, it implies poor cache performance.

Hyperloglog based commands

By default, this measure indicates the total number of HyperLogLog commands received by this cluster.

If the ElastiCache Filter Name parameter is set to CacheNodeId, then this measure reports the total number of HyperLogLog commands received by this node.

Number

This measure is reported only for the redis engine.

This measure is the sum of all pf type commands (pfadd, pfcount, pfmerge) received by a cluster/node.

Replication lag

Indicates how far behind, in seconds, this replica is in applying changes from the primary cache cluster.

Secs

This measure is reported only for a redis node running as a read replica.

'Inconsistency” or lag between a read replica and its primary cache node is common with Redis asynchronous replication. If an existing read replica has fallen too far behind to meet your requirements, you can reboot it. Keep in mind that replica lag may naturally grow and shrink over time, depending on your primary cache node’s steady-state usage pattern.

Replication data

Indicates the amount of data that the primary is sending to all its replicas.

KB

This measure is reported only for a primary in a redis cluster.

This metric is representative of the write load on the replication group. For replicas and standalone primaries, replication is always 0.

Is background save in progress?

Indicates whether/not a background save is in progress on this node.

Number

This measure is reported only for a redis node.

This measure reports 1 whenever a background save (forked or forkless) is in progress, and 0 otherwise. A background save process is typically used during snapshots and syncs. These operations can cause degraded performance. With the help of this measure, you can diagnose whether or not degraded performance was caused by a background save process.

Get type commands

Indicates the total number of get type of commands received by this node.

Number

This measure is reported only for a redis node.

This is derived by summing all the get types of commands (get, mget, hget, etc.).

Hash based commands

Indicates the total number of hash-based commands received by this node.

Number

This measure is reported only for a redis node.

This is derived by summing all the commands that act upon one or more hashes.

Key based commands

Indicates the total number of key-based commands received by this node.

Number

This measure is reported only for a redis node.

This is derived by summing all the commands that act upon one or more keys.

List based commands

Indicates the total number of list-based commands received by this node.

Number

This measure is reported only for a redis node.

This is derived by summing all the commands that act upon one or more lists.

Set based commands

Indicates the total number of set-based commands received by this node.

Number

This measure is reported only for a redis node.

This is derived by summing all the commands that act upon one or more sets.

Set type commands

Indicates the total number of set type of commands received by this node.

Number

This measure is reported only for a redis node.

This is derived by summing all the set types of commands (set, hset, etc.).

Sortedset based commands

Indicates the total number of sorted set-based commands received by this node.

Number

This measure is reported only for a redis node.

This is derived by summing all the commands that act upon one or more sorted sets.

String based commands

Indicates the total number of string-based commands received by this node.

Number

This measure is reported only for a redis node.

This is derived by summing all the commands that act upon one or more strings.

Is master?

Indicates whether/not this node is the master node in a redis cluster.

This measure is reported only for a redis node.

The values that this measure can report and their corresponding numeric values are listed below:

Measure Value Numeric Value
Yes 1
No 0

Note:

By default, this measure reports one of the Measure Values listed above to indicate whether/not a redis node is the master. In the graph of this measure however, the same is indicated using the numeric equivalents.