What is AWS Cloud Monitoring?
A recent eG Innovations & DevOps Institute APM survey of more than 900 IT professionals indicated that AWS is the dominant cloud service provider. Organizations are deploying a wide variety of workloads on AWS cloud environments to ensure agility, scalability, and high availability of their application services.
After applications are deployed in the cloud, their performance has to be monitored. When users complain that the applications are slow, your IT team needs to be able to troubleshoot and diagnose these problems. This is where AWS monitoring comes in. Broadly, AWS monitoring focuses on tracking the availability and performance of key applications you have hosted on AWS cloud. It also tracks the usage, availability, and performance of the AWS services that your applications may be using. AWS monitoring also should cover real-time monitoring, diagnosis, and historical analytics to assist IT operations and architect teams.
AWS has a built-in monitoring tool – AWS CloudWatch, that provides some insight into the performance and usage of AWS services.
Why is AWS Cloud Monitoring Necessary?
Proactive monitoring and rapid diagnosis are the key
Proactive monitoring and rapid diagnosis is a must for most organizations deploying applications in the cloud. Every minute of application downtime or slowness costs your organization money. And if your applications don’t work well in the cloud, you may need to move them back to an on-premises infrastructure. After all, end users care about application performance, not about where the application is hosted and what infrastructure it uses.
|The cost of downtime ranges from $140,000 per hour to $540,000 per hour, with $300,000 per hour being the average.|
Right-sizing can save you money
AWS monitoring also must focus on collecting sufficient information for diagnosing problems quickly. After all, if a problem takes hours to diagnose and fix, that impacts revenue and user satisfaction as well. As AWS monitoring collects a wealth of metrics, it can also provide insights needed for right-sizing AWS infrastructures. Unlike in on-premises infrastructure, right-sizing on AWS cloud can yield direct cost savings.
Dynamic, autoscaling environments across hybrid/multi-cloud
The typical AWS environment is very elastic and dynamic. Consider a task-processing system in which tasks are queued in a messaging system and processed by EC2 instances. You might want the number of EC2 instances to scale dynamically according to the number of tasks waiting in the queue. New EC2 instances and containers (such as Docker orchestrated by Kubernetes) can come and go across on-premises and multi-cloud environments.
In such cases, you need visibility into how many instances were spun-up or down and did the health improve as a result of the automated scaling. Monitoring systems should therefore automatically discover, instrument, and manage new compute instances as they come and go.
Is Amazon CloudWatch Sufficient for AWS Cloud Monitoring?
There are many who believe that just deploying application workloads in the cloud is sufficient and that the cloud service provider will take care of the monitoring and performance of their applications. There are others who believe that they can just use the native AWS monitoring tool, Amazon CloudWatch, to monitor their applications. Read our whitepaper “Top 6 myths of cloud performance monitoring” to understand why native AWS monitoring capabilities may not be sufficient for monitoring your applications and the supporting AWS service infrastructure.
71% of respondents to the eG Innovations & DevOps Institute APM survey indicated that they were not happy with their native cloud monitoring tool. Gaps in functionality, ease of deployment, and cost were three factors mentioned. The basic metrics provided by Amazon CloudWatch are straightforward to configure. However, configuring of more advanced monitoring capabilities is challenging. There are no pre-built templates. While detailed monitoring of database servers is possible, configuring application-level monitoring is more involved. Furthermore, Amazon CloudWatch is priced per metric. Advanced analytics through auto-baselining costs money. Hence, the cost of using Amazon CloudWatch is not predictable as well.
What Metrics are Important for AWS Monitoring?
There are hundreds of AWS services available today. First of all, it is important that any AWS monitoring tool should integrate with Amazon CloudWatch and provide information about the usage and performance of these AWS services. The most popular AWS services, which should be supported include AWS EC2, AWS RDS, AWS Lambda, AWS S3, AWS ECS, AWS EKS, AWS SQS, AWS DynamoDB, and AWS WorkSpaces.
Secondly, AWS services function very differently from on-premises infrastructures. For instance, if you are using burstable instances on AWS EC2, it is critical that you keep track of CPU credits and credit balance. If your credit balance reaches a very low number, application performance will suffer. Read our blog describing how the choice of EC2 instances can affect application performance. Legacy tools that are meant for on-premises infrastructures do not track or report such issues.
Thirdly, full stack monitoring is essential. Many a times, the problem may not be with your application but could be with your supporting infrastructure. You need to have the metrics in place to prove exactly where the problem lies. Read our AWS anomaly detection case study that highlights one such instance.
How to Monitor AWS Workloads?
If you are deploying applications on AWS, you must monitor the digital user experience. Synthetic monitoring and real user monitoring are common techniques used for this purpose. Ideally, you want to be able to implement both techniques to get a 360-degree view of user experience.
While user experience monitoring tells you when there is a problem, you need additional insights for problem diagnosis. A key for this is application transaction tracing. Typically, a tag-and-follow technique is used to follow a transaction as it is processed by each tier and from there a transaction-flow graph is obtained to see easily which tier is causing the slowness: is it application code, database query, call to a third-party service, API calls to AWS services, etc.
Sometimes, usage limits on AWS services impact application performance. Hence, it is important to have a single pane of glass from where you can also see the usage and performance of different AWS services. In-depth domain-specific monitoring is also required for specific services including AWS RDS databases, AWS WorkSpaces, and Kubernetes.