Monitoring ECS Metrics: A Guide for Developers and Operations Teams

For anyone leveraging cloud computing, Amazon Elastic Container Service (ECS) continues to provide a seamless solution for managing containerized applications. AWS Fargate takes this cloud-native architecture a step further by allowing you to run containers without servers or clusters. As a serverless offering for ECS, Fargate provisions compute capacity and scales it based on demand.

With Fargate, monitoring ECS metrics should be a priority to ensure these applications’ optimal health and performance. This article will cover the relevant key metrics. Find out how to collect the metrics using CloudWatch and third-party innovations, analyze them, and do it right.

What are AWS ECS Metrics?

Monitoring the performance of your containerized applications and services can be time-consuming. ECS metrics come to the rescue and help you understand how your containerized applications are performing on Amazon Elastic Container Service. Like little sensor readings, they collect readings from containers and help you understand what’s happening.

The metrics show if a container is slow, crashing too much, or using too many resources. This way, you know how to allocate resources best.

Amazon Web Services provides various metrics through Amazon CloudWatch to help monitor applications, including those running on Amazon ECS. These metrics can either be default (provided by AWS) or custom metrics that you define as a user.

The default ECS metrics are system-built. They’re automatically generated and include useful metrics for monitoring tasks and services. They help answer questions like: Are tasks receiving traffic? Are services scaling as expected? Are containers being overly taxed? These are the relevant metrics to ECS:

CPU Utilization: Measures the percentage of CPU resources used by your containers.
Memory Utilization: Tracks how much memory is being consumed by your containers.
Task Count: Indicates the number of tasks currently running in your ECS cluster.
Network Traffic: Monitors inbound and outbound data to assess network performance.
Disk I/O: Measures read and write operations on storage to evaluate performance.

Even with the default metrics available, users may still want to monitor specific performance indicators or obtain data unavailable from default metrics, so they use custom metrics.

While Fargate does not require these metrics to function, they’re still important. Without them, you might miss bottlenecks, and it will be difficult to allocate resources.

Default metrics track tasks, services, placement, and health, providing visibility into the managed infrastructure. Custom metrics provide performance data since customers can’t access hosts. Together, these metrics enable auto-scaling, replacing failed containers, and troubleshooting Fargate applications without direct access.

Key ECS Service Metrics to Monitor

So, your organization is using Amazon ECS to manage your containerized applications. That’s important, but it might not be enough. Your operations team needs AWS monitoring to gain visibility into infrastructure usage, traffic levels, errors, and application performance. They should focus on the following:

CPU Utilization

CPU utilization in ECS measures the percentage of CPU resources being used by tasks within a cluster. It is calculated as the ratio of CPU units consumed by running tasks to the total units allocated for those tasks, expressed as a percentage.

0-50% indicates that the application is underutilizing resources, which suggests overprovisioning, leading to unnecessary costs. If it’s consistently below 50%, consider decreasing CPU units in the container spec to avoid paying for unused resources.
50% to 80% shows balanced resource usage. However, it’s essential to monitor trends over time to avoid spikes.
80% to 100% is high utilization, indicating that the application is nearing its resource limits. Consistent high utilization results in performance degradation, increased latency, or task failures.

Memory Utilization

Memory utilization in ECS is the percentage of memory used by tasks compared to the total memory allocated. Monitoring this metric prevents memory-related issues, such as application crashes or slowdowns. It also helps adjust resource allocations to ensure optimal performance without overprovisioning.

0-60% indicates that the allocated memory is not fully utilized. While this may seem acceptable, it can lead to wasted resources; you should consider adjusting memory reservations.
60-80% is safe but should be monitored so that if memory usage trends upward, you’ll know there’s a need for more resources or optimization.
80-100% is high and can lead to application crashes, as ECS will terminate containers that exceed their memory limits. If memory usage exceeds 80% regularly, it’s time to increase memory allocation or optimize the application to reduce consumption.

Task Count and Status

Task count is the number of current tasks in an ECS cluster. It could mean they’re running, necessary, or pending in the cluster. Getting these metrics provides insights into how busy the cluster is and how it’s managing workloads.

Tracking the number of running tasks ensures that the desired number is maintained, which helps when scaling.

To check the AWS ECS status of applications on ECS Fargate, utilize Amazon CloudWatch Container Insights. This service has a dashboard that displays real-time metrics. The CloudWatch alarms alert your team when task counts deviate from expected thresholds.

Alternatively, you can use the ECS Management Console to view task statuses and resource utilization. Then, keep reviewing logs and metrics to identify trends and optimize resource allocation, ensuring that applications run smoothly and efficiently.

Network Performance

In Fargate environments, metrics on your network performance allow you to understand the data transfer rates:

Network bandwidth measures the maximum data transfer rate in megabits per second (Mbps). It indicates how much data can be sent or received over the network within a specified time frame.
Network latency measures the time it takes for a data packet to travel from the sender to the receiver in milliseconds (ms). Low latency is key for real-time applications, while high latency causes delays and degraded user experience.
Packet loss metrics show the percentage of packets that are lost during transmission. High packet loss impacts application performance, leading to interruptions and low service quality.

While CloudWatch tracks metrics and sends notifications when thresholds are exceeded, it falls short in providing deep visibility and comprehensive insights. A third-party solution like eG Enterprise offers richer visualizations, advanced analytics, and a more holistic view of network performance, enabling faster and more effective troubleshooting.

Container Health Metrics

Container health metrics give important insights into how well the containers are running. This helps fix problems and improve performance. The metrics count healthy running containers vs. ones that should be running and help maintain their reliability.

What happens if a container is unhealthy? The ECS automatically stops and replaces that task with a new one, keeping only healthy containers to serve traffic.

To add a health check, you specify the HealthCheck object when defining a container in your task definition. This allows you to configure commands, intervals between each health check, retries of the failed checks, the start period, and timeout.

Health check results can yield three possible statuses for containers: healthy, unhealthy, or unknown. The healthy status means the application is performing as it should; the unhealthy indicates the health check has failed, and ECS may replace it. The unknown status indicates that either no health check is defined or ECS is still evaluating the container’s status.

Collecting AWS ECS Statistics

You can employ various methods to collect ECS metrics, including CloudWatch and other third-party solutions.

Using AWS CloudWatch

Amazon ECS lets you use CloudWatch, a managed service, to monitor resources and summarize ECS cloudwatch metrics. To use CloudWatch,

Enable CloudWatch monitoring for your ECS cluster in the management console to start collecting metrics automatically.
In CloudWatch, navigate to the Metrics section and select ECS from the drop-down.
Click on a metric name to view it as a graph. You can filter by dimensions like ClusterName, TaskDefinition, ContainerInstance, etc.
Set alarms on metrics to trigger notifications when a certain threshold is breached. For example, you can set an alarm to alert you when the CPU crosses 80%.
Create an AWS ECS dashboard by adding your desired widgets for different metrics. Configurable widgets include line graphs or statistics.
Add dashboards to the default or custom Dashboards menu for one-click access. Dashboards will give you a unified view of multiple metrics.
Configure alarms to notify Slack/SNS/Lambda when the alarm state changes. For example, notify the ops team using a CONTINUOUS_ALARM.
Save graphs and dashboards with descriptive names for easy reference later. Dashboards are shareable across AWS accounts.
Finally, graphs or dashboards can be exported to other monitoring systems using open standards like Graphite.

eG Innovations ECS Tracking Solutions

Using third-party monitoring solutions offers several advantages, from better log management to faster resolutions.

Comprehensive Metrics Collection

eG Enterprise provides more detailed metrics than standard AWS CloudWatch metrics. It captures various performance indicators such as CPU utilization, memory usage, network throughput, and application-specific metrics. This comprehensive data allows teams to gain deeper insights into application behavior and resource consumption.

Advanced Analytics and Visualization

Third-party solutions often have sophisticated analytics capabilities. They can detect anomalies and perform trend analysis. eG Enterprise can visualize metrics through customizable dashboards, enabling teams to spot performance issues quickly and understand the underlying causes of bottlenecks.

Agentless monitoring

Some solutions, like eG Enterprise, support agentless monitoring via scraping metadata endpoints, which avoids deployment on Fargate with no host access.

Real-Time Monitoring and Alerts

With real-time monitoring capabilities, eG Enterprise can provide immediate alerts for performance degradation or failures. This approach helps teams respond to issues before they impact users, ensuring more reliable applications.

Best Practices for Utilizing ECS Metrics

If you aren’t relying on third-party solutions, there are alternatives to leverage ECS metrics. Even with the other options, eG Innovations’ monitoring solutions still have a role in enhancing your ability to manage and optimize applications on ECS Fargate. Consider the following best practices.

Setting Up Alerts

If you haven’t already, set alarms to receive notifications when thresholds are breached (e.g., CPU utilization exceeds 80% or memory usage exceeds 75%). To do this, access the CloudWatch console and create a new alarm:

Select the relevant ECS metric, such as CPU utilization, and define the threshold conditions (e.g., greater than 80% for 5 minutes).
Configure actions to send notifications via Amazon SNS, ensuring you have an email subscription for alerts.
Provide a name and description for easy identification, then review and create the alarm.

Regular Review and Optimization

Perform regular reviews on CloudWatch metrics and logs to identify trends and anomalies. Monitoring trends allows for informed decisions regarding scaling resources up or down based on actual usage patterns. The insights you obtain will guide you in making adjustments and optimizing resource utilization, thereby reducing costs.

Mastering ECS Container Insights Metrics for Better Performance

Know the state of your containerized applications by monitoring ECS metrics. With Fargate’s ephemeral infrastructure, metrics help you foresee issues. Collecting CPU, memory, network, and custom metrics provides insights into how well serverless workloads function. You can best get the metrics by setting alarms and regularly monitoring them.

Third-party monitoring solutions provide comprehensive visibility and automated diagnostics. At eG Innovations, we perform automatic discovery and monitoring of ECS clusters, tasks, and containers without requiring agents and in a single pane of glass. If you’re finding it hard to identify bottlenecks in complex applications and infrastructure, ditch the guesswork and get end-to-end visibility across databases.

Book a free trial today, and let us help you transform your monitoring strategy so you can concentrate on growing your business.

eG Enterprise is an Observability solution for Modern IT. Monitor digital workspaces,
web applications, SaaS services, cloud and containers from a single pane of glass.

Free Trial See the platform

What are AWS ECS Metrics?

Key ECS Service Metrics to Monitor

Collecting AWS ECS Statistics

eG Innovations ECS Tracking Solutions

Best Practices for Utilizing ECS Metrics

Mastering ECS Container Insights Metrics for Better Performance

About the Author

Amazon AppStream 2.0 Multi-session service monitoring

Detecting an AWS outage and DR lessons

Monitoring AWS billing costs with AWS tags

Monitoring ECS Metrics: A Guide for Developers and Operations Teams

What are AWS ECS Metrics?

Key ECS Service Metrics to Monitor

Collecting AWS ECS Statistics

eG Innovations ECS Tracking Solutions

Best Practices for Utilizing ECS Metrics

Mastering ECS Container Insights Metrics for Better Performance

About the Author

Related Blogs

Amazon AppStream 2.0 Multi-session service monitoring

Detecting an AWS outage and DR lessons

Monitoring AWS billing costs with AWS tags