AWS Cloud Services Adoption
Among the 200+ fully features services that Amazon Web Services (AWS) offers, Elastic Compute Cloud (EC2) is the most popular. In the recent eG Innovations and DevOps Institute survey of 900+ IT professionals, cloud instances were the most commonly used cloud service, with 63% usage among respondents. This statistic is not a surprise because most organizations start their cloud journey with a lift and shift migration model that involves taking their on-premises applications and deploying them on cloud instances.
EC2 is a good starting point for IT teams to replicate their on-premises server instances and over time upgrade the application’s execution environment to a cloud-native centric model.
Why is My AWS EC2 Instance Slow?
When you install applications on EC2 instances, you might encounter situations where your AWS EC2 instance is slow. Troubleshooting such problems can be challenging. In this blog, we will discuss some of the main reasons why your EC2 instance can be slow and how you can troubleshoot this easily.
One of the first things to do when you notice slowness with your AWS EC2 instance is to check AWS CloudWatch, the monitoring and management service that provides data and insights about AWS resources you are using. AWS CloudWatch provides a number of basic metrics about each EC2 instance.
|Type of Metric
|Names of Metric
|CPUUtilization, DiskReadOps, DiskWriteOps, DiskReadBytes, DiskWriteBytes, MetadataNoToken, NetworkIn, NetworkOut, NetworkPacketsIn, NetworkPacketsOut
|CPU Credit Metrics
|CPUCreditUsage, CPUCreditBalance, CPUSurplusCreditBalance, CPUSurplusCreditsCharged,
|Dedicated Host Metrics
|Amazon EBS Metrics
|EBSReadOps, EBSWriteOps, EBSReadBytes, EBSWriteBytes, EBSIOBalance%, EBSByteBalance%
|Status Check Metrics
|StatusCheckFailed, StatusCheckFailed_Instance, StatusCheckFailed_System
Limitations of AWS CloudWatch
Focusing on the instance metrics in the above table, notice that there are many limitations with the metrics AWS CloudWatch provides:
- Lack of memory visibility: While CPU, network and disk resource usage is tracked, AWS CloudWatch does not provide system-level memory metrics for EC2 instances.
- Lacks CPU usage of underlying hardware: Remember, we mentioned EC2 provides virtualized processing capacity? CloudWatch gives you the CPU of that virtualized instance. These are called “compute units” in AWS lingo. CloudWatch does not report CPU usage of the underlying hardware that the instance is being hosted on.
- Lacks additional detail: While these basic metrics can tell you if CPU or disk is heavily used, you do not have additional details you need for troubleshooting: is it due to many application instances running, or is it one application that is causing the heavy usage?
Detailed monitoring can inflate your AWS bill: CloudWatch provides two categories of monitoring: basic monitoring and detailed monitoring. Most AWS services publish metrics at five-minute intervals. You can enable detailed monitoring to increase the frequency to one minute, at additional cost.
Note that with detailed monitoring:
- You will be charged per metric and how frequently you call the API. The more metrics you send to CloudWatch and the higher the frequency of API invocation, the higher your bill.
- You will be charged for all of the metrics that were previously included as part of basic monitoring.
- Additional hoops to jump to aggregate: You can view CloudWatch metrics for EC2 instances within:
- a single AWS account at a time, and…
- a single region at a time.
In order to view CloudWatch data from multiple AWS accounts and regions into one dashboard, you would have to manually create cross-account and cross-region dashboards. This is especially important if you are an MSP hosting multiple tenants.
Deploying Monitoring Agents on EC2 Instances
To get these additional details, you need to consider deploying a CloudWatch agent inside the VM or deploying other monitoring agents.
Agents deployed on EC2 instances provide additional details you need for troubleshooting.
- You can determine from the metrics provided by these agents if you are bottlenecked on CPU resources and why – i.e., which application(s) is responsible for this?
- If there is a memory bottleneck, which application is taking resources?
- Disk bottlenecks can also be identified this way and you can even determine which file is being read/written to the most.
- Slowness can arise from other factors also. A faulty network driver can cause excessive packet re-transmissions, slowing down application access. Applications that do not release operating system resources like file objects, sockets, etc. can cause havoc. They can make the operating system slower and slower over time, to a point where the system becomes unresponsive.
A full list of additional KPIs that can be monitored using agents on VMs or EC2 instances is indicated in our earlier blog.
If the slowness is caused within your EC2 instance, agents on the OS instances can help. But what if the issue is not being caused within the OS instance? These are the types of issues that do not occur commonly in on-premises environments, but which can happen in the cloud. These types of issues are the most difficult to detect and troubleshoot.
Review AWS Credit Metrics for EC2 and EBS
When deploying AWS EC2 instances, there are several different instance types you can choose from – see https://aws.amazon.com/ec2/instance-types/. While many Amazon EC2 instance types provide fixed CPU resources, there are burstable performance instances which are becoming very popular as they are more cost effective. The burstable instances provide a baseline level of CPU utilization with the ability to burst CPU utilization above the baseline level. The baseline level might be 20%, 40%, or so on, depending on the instance type. A burstable instance earns credit when it stays below the CPU baseline, and spends credits when it bursts above the baseline.
If you are using a burstable instance, it is important that you track the CPU credit balance of an instance. If the credit balance drops to 0 and your applications need additional CPU, the EC2 instance can become very slow until such time that the CPU credit for a period increases above 0 and you get your credits back. An agent deployed on an EC2 instance cannot monitor CPU credits for the instance. You need to have integration with AWS CloudWatch to track this metric for all your EC2 instances.
Slowness in an EC2 instance may also occur because of depleted I/O burst credits on EBS volumes of type gp2. (In most AWS Regions, gp2 is the default storage drive for root volumes. For more information, see Amazon EBS volume types). A burst balance of 0% implies that all the burst credits have been used up and the volume can’t burst above its baseline performance level.
It is essential to monitor CPU credits and I/O burst balance of EC2 instances and their EBS volumes, in order to determine if AWS instance type selection or configuration is limiting the performance of the instance.
Recommendations for Improving AWS EC2 Performance
- If you must use burstable EC2 instances, consider setting the unlimited mode for these instances. See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-unlimited-mode.html for details. A burstable instance with unlimited CPU configuration can handle high CPU utilization for any period of time. The instance price remains at the baseline price as long as CPU usage is below the baseline level and it increases automatically to cover the times when CPU spikes. This will avoid system freezes and slow responses that you might see when an AWS EC2 instance has limited CPU mode set.
- For gp2 EBS volumes, you may need to increase the size of the volume to get more IOPS (the baseline IOPS is 3 IOPS per GB of volume size). Alternatively, you can choose volumes with provisioned IOPS without increasing the volume size.
Complete Monitoring of AWS EC2 Instances with eG Enterprise
The eG Enterprise solution offers complete visibility into all aspects of EC2 performance:
- Through tight integration with AWS CloudWatch APIs, eG Enterprise tracks all the key metrics relating to EC2 and EBS performance. Proactive alerts are generated when CPU credit balance or I/O burst balance has become very low. For more details, on eG Enterprise’s integration with AWS CloudWatch, see https://www.eginnovations.com/aws-monitoring/
- Agents deployed on EC2 instances track all the key server and OS parameters. For complete details on system monitoring with eG Enterprise, see https://www.eginnovations.com/server-monitoring
Test drive eG Enterprise today. Sign up at https://www.eginnovations.com/it-monitoring/free-trial
- Monitoring AWS EC2 Cloud Instances | eG Innovations
- Read about eG Enterprise turnkey SaaS solutions on the AWS Marketplace: AWS Marketplace: eG Enterprise Express Cloud (amazon.com)
- Understand System Monitoring with AWS CloudWatch – Pros and Cons (eginnovations.com)
- AWS Performance Monitoring with eG Enterprise – An overview of key capabilities and features
- Read about Synthetic Monitoring and Testing for Amazon WorkSpaces: Synthetic Monitoring of Amazon Workspaces | eG Innovations
- Read how Israel’s largest supermarket chain leverages an eG Enterprise integration with CloudWatch to manage large and complex auto-scaling deployments: Cloud Issues & Problems – Management Case Study | eG Innovations
- A real-life deep-dive post-mortem case study – debugging slow performance on AWS public cloud burstable instances on EC2, see: AWS EC2 Monitoring Tools | eG Innovations
- More on how eG Enterprise leverages AIOps technologies for event correlation, anomaly detection and root-cause diagnostic analysis: AIOps Tools – 8 Proactive Monitoring Tips
- Cloud Migration Strategy: A Framework for Cloud Adoption and Target Cloud Models – Define your Pathway to Cloud Migration