{"id":26470,"date":"2022-10-07T05:55:25","date_gmt":"2022-10-07T09:55:25","guid":{"rendered":"https:\/\/www.eginnovations.com\/blog\/?p=26470"},"modified":"2022-11-09T04:21:12","modified_gmt":"2022-11-09T09:21:12","slug":"monitoring-multicloud-infrastructures","status":"publish","type":"post","link":"https:\/\/www.eginnovations.com\/blog\/monitoring-multicloud-infrastructures\/","title":{"rendered":"How to Monitor and Troubleshoot Multi-cloud Applications"},"content":{"rendered":"<div class=\"inner_content\">\n<h2><span class=\"ez-toc-section\" id=\"How_to_Troubleshoot_Multi-cloud_Applications_using_Modern_Observability\"><\/span>How to Troubleshoot Multi-cloud Applications using Modern Observability<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<\/div>\n<p>Multi-cloud and hybrid cloud applications are deployed on multiple cloud vendor platforms, including on-premises private cloud. While these platforms offer tremendous benefits by providing a reliable and scalable platform for fuelling digital transformation, they also add significant <em>monitoring complexity.\u00a0<\/em><\/p>\n<p style=\"margin-bottom: 15px;\">Site reliability engineers (SRE) need multicloud monitoring visibility to:<\/p>\n<ul>\n<li>Pinpoint the <strong>root-cause<\/strong> of slowness or errors to the specific microservice deployed on the cloud vendor&#8217;s platform.<\/li>\n<li>Not waste time in other microservices (deployed on other cloud vendors or on-premises) that may experience <strong>cascading side-effects<\/strong>.<\/li>\n<\/ul>\n<p style=\"margin-bottom: 15px;\">Why is this important? SREs face several challenges in multicloud and hybrid cloud environments:<\/p>\n<ol>\n<li><strong>Complex, connected and cascading failures:<\/strong> Multicloud applications are complex. Applications are made up of connected microservices distributed across multiple clouds. A single problem can trigger a number of cascading side-effects, potentially triggering a large number of alarms.<\/li>\n<li><strong>Tools that don&#8217;t talk to each other:<\/strong> While each cloud vendor provides their own monitoring solution, none of these tools speak to each other. This could lead to each of the cloud-native monitoring solutions to show a &#8220;red&#8221; status, thereby causing confusion on true root-cause.<\/li>\n<li><strong>Where to start?:<\/strong> SREs have difficulty knowing where to begin problem diagnosis in multicloud environments.<\/li>\n<\/ol>\n<div class=\"link_list_style\" style=\"padding: 20px 20px; margin-bottom: 30px;\">\n<p style=\"margin-bottom: 15px;\">Our recent <a href=\"https:\/\/www.eginnovations.com\/ebooks\/application-performance-monitoring-survey\">State of APM survey results<\/a> found that 40% of SREs, DevOps and IT teams were using four or more APM tools to try and isolate the root-cause of performance problems.<\/p>\n<p style=\"margin-bottom: 5px;\">In this article, you&#8217;ll learn how a modern approach to observability that uses AI-assistance to connect the dots across multiple clouds can give you actionable insights to remediate and resolve the problem quickly.<\/p>\n<\/div>\n<h2><span class=\"ez-toc-section\" id=\"Troubleshooting_a_Multicloud_Issue_in_Three_Clicks\"><\/span>Troubleshooting a Multicloud Issue in Three Clicks<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>I\u2019m going to take a microservices based e-commerce application that is deployed on a combination of on-premises, AWS and Azure. I&#8217;ll walk you through a troubleshooting workflow that requires <strong><em>only three clicks to pinpoint<\/em> <\/strong>the root-cause of the problem.<\/p>\n<p>Three clicks is a useful heuristic for SREs to optimize the MTTR \u2013 Mean Time to Resolution, Repair, Respond. A good way to keep MTTR low is by consolidating observability signals (code-level traces, service dependencies, change events, metrics, logs) and automatically pinpointing the root-cause so that SREs can quickly diagnose and engage the right teams to remediate the issue.<\/p>\n<p><a href=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/problemsolutions-view.jpg\" data-rel=\"lightbox-image-0\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-27171 size-full\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/problemsolutions.webp\" alt=\"Multi-cloud performance problems and solution\" width=\"850\" height=\"450\" \/><\/a><\/p>\n<div class=\"img_caption\">SREs need the ability to quickly pinpoint the root-cause of problems in a multicloud application that spans multiple cloud providers. To that end, observability solutions need to provide color-coded problem correlation.<\/div>\n<h2><span class=\"ez-toc-section\" id=\"Microservices_Based_Technology_Stack\"><\/span>Microservices Based Technology Stack<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>In terms of the technology stack, it would be very familiar to anyone working on applications and microservices in a multicloud environment. The application uses a &#8220;<a href=\"https:\/\/www.openpersuasion.org\/why-polyglot-microservices\/\" target=\"_blank\" rel=\"noopener\">polyglot<\/a>&#8221; approach with multiple languages, frameworks deployed on containers and multiple clouds (AWS and Azure).<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-26491 size-full\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/tech-stack-eg.jpg?noresize\" alt=\"Multi-cloud technology stack\" width=\"750\" height=\"465\" \/><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Problem_Statement_-_Troubleshooting_an_E-Commerce_Application\"><\/span>Problem Statement &#8211; Troubleshooting an E-Commerce Application<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The e-commerce application is decomposed into microservices that are deployed on different hosting services. The front-end is hosted on-premises, the product inventory and customer database are hosted on AWS and the checkout and payments service are hosted on Azure.<\/p>\n<p>It is also reliant on an external payment gateway to process the payments of customers that checkout on its store.<\/p>\n<p>One of the external payment gateway service is slow and throwing errors. Users get a error message as shown below and user complaints are flooding the call center.<\/p>\n<p><a href=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/payment-gateway-error-view.jpg\" data-rel=\"lightbox-image-1\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-27173 size-full\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/payment-gateway-error.webp\" alt=\"Multi-cloud application problem\" width=\"750\" height=\"350\" \/><\/a><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Drawbacks_of_using_Cloud-native_Monitoring_Solutions\"><\/span>Drawbacks of using Cloud-native Monitoring Solutions<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The SRE teams were erstwhile using cloud native monitoring tools supplied by the cloud vendor. In this case it was <a href=\"https:\/\/www.eginnovations.com\/supported-technologies\/aws-monitoring\">AWS CloudWatch<\/a> for the AWS components and <a href=\"https:\/\/www.eginnovations.com\/supported-technologies\/azure-monitoring\">Azure Monitor<\/a> for the Azure components.<\/p>\n<p>Since the microservices have a dependency with each other, failures cascaded across the application. Additionally, none of these tools speak to each other, so each cloud-native monitoring solution showed a &#8220;red&#8221; state\u2014signaling that there was a problem with the microservice hosted on that cloud.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-27198 size-full\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/Problem_Dashboard-scaled.webp\" alt=\"Multicloud monitoring is challenging\" width=\"2560\" height=\"1195\" srcset=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/Problem_Dashboard-scaled.webp 2560w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/Problem_Dashboard-300x140.webp 300w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/Problem_Dashboard-1024x478.webp 1024w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/Problem_Dashboard-768x359.webp 768w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/Problem_Dashboard-1536x717.webp 1536w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/Problem_Dashboard-2048x956.webp 2048w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/Problem_Dashboard-800x374.webp 800w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/Problem_Dashboard-310x145.webp 310w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/Problem_Dashboard-140x65.webp 140w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/><\/p>\n<div class=\"img_caption\"><strong>Cloud-native monitoring tools don&#8217;t talk to each other<\/strong>. Failures in multicloud applications cascade across cloud boundaries. As a result, each of the cloud monitoring tools might show a &#8220;RED&#8221; state causing confusion and chaos.<\/div>\n<p>Since the payments microservice on Azure experienced failures, the depending microservice on AWS also failed. However, since AWS CloudWatch has no knowledge of Azure, the SREs were unable to pinpoint the root-cause of the problem.<\/p>\n<p>They ended up in a war room with administrators of all the services looking at their dashboards and chaos ensued. Hence, the need for a single pane of glass, multicloud monitoring solution.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Solution_-_Troubleshooting_Multicloud_Apps_in_Three_Clicks\"><\/span>Solution \u2013 Troubleshooting Multicloud Apps in Three Clicks<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>A full stack monitoring tool has the ability to automatically detect components and determine the inter-dependencies between them. One of the complaints against Azure Monitor and AWS CloudWatch is that they <a href=\"https:\/\/www.eginnovations.com\/blog\/azure-monitor-avd\/\">require a lot of manual setup<\/a> and lots of tedious data entry work as you manually set thresholds.<\/p>\n<p><a href=\"https:\/\/www.eginnovations.com\/product\/cloud-monitoring\">eG Enterprise<\/a> does all of that for you automatically, with built-in vendor-recommended thresholds. The result is a topology map that lays out your multicloud environment in an easy-to-read graphical interface.<\/p>\n<p>This screenshot closely resembles the architectural diagram that we saw above, with the added microservices shown within the cloud provider\u2019s groups along with the inter-dependencies of those microservices.<\/p>\n<p>The SRE teams typically have this dynamic and real-time topology dashboard projected on a large TV screen continuously monitoring the health of the multicloud application.<\/p>\n<p><a href=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/webstore-ecomm-dashboard-view.jpg\" data-rel=\"lightbox-image-2\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-27175 size-full\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/webstore-ecomm-dashboard.jpg\" alt=\"Multicloud monitoring dashboard\" width=\"750\" height=\"484\" srcset=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/webstore-ecomm-dashboard.jpg 750w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/webstore-ecomm-dashboard-300x194.jpg 300w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/webstore-ecomm-dashboard-310x200.jpg 310w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/webstore-ecomm-dashboard-140x90.jpg 140w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/a><\/p>\n<div class=\"img_caption\">SREs need the ability to auto-discover and auto-correlate microservices dependencies in a multicloud application across cloud boundaries.<\/div>\n<p>In the screenshot below, we can see that there is a critical alert on the Payment service within Azure as indicated by the red color. eG Enterprise has identified this as the likely root-cause of the problem which is having a knock-on effect on dependent services &#8211; i.e. customer microservice on AWS and the frontend microservice on the on-premises environment.<\/p>\n<p><a href=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/webstore-ecomm-topology-view.jpg\" data-rel=\"lightbox-image-3\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-27177 size-full\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/webstore-ecomm-topology.jpg\" alt=\"Multicloud monitoring for a complex web application\" width=\"750\" height=\"484\" srcset=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/webstore-ecomm-topology.jpg 750w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/webstore-ecomm-topology-300x194.jpg 300w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/webstore-ecomm-topology-310x200.jpg 310w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/webstore-ecomm-topology-140x90.jpg 140w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/a><\/p>\n<div class=\"img_caption\">Even non-experts can easily identify that the payment microservice (Azure) needs immediate attention while the side-effects are on the customer microservice on AWS &amp; the frontend microservice on the on-premises environment.<\/div>\n<p>The Payment microservice relies on external payment gateways in order to function properly. We need to determine whether the fault is within our own microservice or if there\u2019s a problem with the external payment gateway.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Click_1_List_of_Problems_for_the_Selected_Microservice\"><\/span>Click #1: List of Problems for the Selected Microservice<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Clicking on the red alert icon brings up an automatic contextual description of the problem that eG Enterprise has discovered. In this case: \u201cMany recent slow transactions for payments\u201d which is exactly the experience our customers are having on the front end with payment failed error messages.<\/p>\n<p><a href=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/webstore-alert-comment-view.jpg\" data-rel=\"lightbox-image-4\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-27179 size-full\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/webstore-alert-comment.jpg\" alt=\"Multicloud monitoring alerts\" width=\"750\" height=\"484\" srcset=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/webstore-alert-comment.jpg 750w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/webstore-alert-comment-300x194.jpg 300w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/webstore-alert-comment-310x200.jpg 310w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/webstore-alert-comment-140x90.jpg 140w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/a><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Click_2_Drill-down_into_Set_of_Transactions_Exhibiting_the_Problem\"><\/span>Click #2: Drill-down into Set of Transactions Exhibiting the Problem<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Clicking on the magnifying glass icon brings you to a deeper level of analysis for <a href=\"https:\/\/www.eginnovations.com\/supported-technologies\/java-transaction-monitoring\">Business Transaction Monitoring<\/a>, or what we call Slice and Dice analytics. Every action a user takes on the e-commerce website is captured by eG Enterprise and we can drill down into an individual transaction to see what is going on.<\/p>\n<p><a href=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/multi-cloud-product-06-zoom.jpg\" data-rel=\"lightbox-image-5\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-26514 size-full\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/multi-cloud-product-06.jpg\" alt=\"Analysis of multicloud application performance\" width=\"556\" height=\"348\" srcset=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/multi-cloud-product-06.jpg 556w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/multi-cloud-product-06-300x188.jpg 300w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/multi-cloud-product-06-310x194.jpg 310w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/multi-cloud-product-06-140x88.jpg 140w\" sizes=\"auto, (max-width: 556px) 100vw, 556px\" \/><\/a><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Click_3_Pinpoint_the_root-cause_down_to_the_Code-level\"><\/span>Click #3: Pinpoint the root-cause down to the Code-level<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Diving into an individual transaction, we get a diagram visualizing the data journey through the application. You can see that eG Enterprise has automatically detected the technology &#8211; in this case, Java running on <a href=\"https:\/\/www.eginnovations.com\/supported-technologies\/tomcat-monitoring\">SpringBoot based on Tomcat<\/a>.<\/p>\n<p>Everything looks fine between the user and the front-end, and the front-end and the payment service. However, there is a 20 second delay between the payment gateway on Azure and the payment gateway API. I think we have found our root-cause of the problem!<\/p>\n<p><a href=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/multi-cloud-product-07-zoom.png\" data-rel=\"lightbox-image-6\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-26517 size-full\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/multi-cloud-product-07.webp\" alt=\"Tracing application transactions in a multicloud infrastructure\" width=\"556\" height=\"388\" \/><\/a><\/p>\n<p>We can even do a further execution analysis at the code level to further confirm that it is the API call to the external payment gateway that\u2019s causing the problem. This screenshot shows a trace of all the HTTP calls and the response time for each execution.<\/p>\n<p><a href=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/multi-cloud-dashboard-08-zoom.jpg\" data-rel=\"lightbox-image-7\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-26519 size-full\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/multi-cloud-product-08.jpg\" alt=\"Deep dive into a multicloud application stack\" width=\"556\" height=\"377\" srcset=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/multi-cloud-product-08.jpg 556w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/multi-cloud-product-08-300x203.jpg 300w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/multi-cloud-product-08-310x210.jpg 310w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/multi-cloud-product-08-140x95.jpg 140w\" sizes=\"auto, (max-width: 556px) 100vw, 556px\" \/><\/a><\/p>\n<h2><span class=\"ez-toc-section\" id=\"What_is_a_Topology_or_Service_Map_and_What_are_its_Benefits\"><\/span>What is a Topology or Service Map and What are its Benefits?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>In the troubleshooting flow that I have described above, the starting point was a topology diagram &#8211; also known as a &#8220;service map&#8221;. SREs need a real-time visual representation of the application&#8217;s dynamic architecture. \u200b<\/p>\n<p style=\"margin-bottom: 15px;\">Key benefits of the observability benefits that topology or service map offer:<\/p>\n<ul>\n<li><strong>Understand connections in distributed microservices architecture :<\/strong> SREs can visualize the relationships between microservices. The upstream services can give context on what the impact of latency or errors are on the end user. This helps SREs prioritize and triage issues, focusing on what\u2019s critical.<\/li>\n<li><strong>Automatically created and updated:<\/strong> A topology automatically discovers and updates new microservices dependencies. Developers might sometimes inject new dependencies (such as a third-party API) and because of a gap in documentation, a new service dependency might come as a (sometimes nasty) surprise for the SRE. With a topology dashboard, SREs can spot unintended dependencies rightaway.<\/li>\n<li><strong>Rich observability context:<\/strong> Topology provides rich context of traffic, latency and errors on top of arrows that connect one microservice to another.<\/li>\n<li><strong>Zoom into details:<\/strong> Clicking on the arrow or the microservice can take you to the exact line-of-code that causes slowness or error. You can pinpoint exactly where in the execution flow, there&#8217;s an anomaly in latency or errors.<\/li>\n<\/ul>\n<p>Many cloud monitoring tools offer just charts and graphs where you have to do the debugging heavy lifting. The value of a microservice architecture-aware observability tool is incredible and can drastically reduce the mean-time-to-resolve.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Avoid_Unnecessary_War_Rooms_with_Multicloud_Monitoring\"><\/span>Avoid Unnecessary War Rooms with Multicloud Monitoring<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>There\u2019s no need for war rooms in this scenario. eG Enterprise has made multicloud monitoring very simple. The only action to take is to submit a support ticket to the payment gateway provider \u2013 and maybe pull up the contract with them that mentions uptime SLAs!<\/p>\n<div class=\"link_list_style\" style=\"padding: 20px 20px; margin-bottom: 30px;\">\n<h3 style=\"margin-top: 10px;\"><span class=\"ez-toc-section\" id=\"How_to_Choose_the_Right_Multicloud_Monitoring_and_Root-cause_Diagnosis_Tool\"><\/span>How to Choose the Right Multicloud Monitoring and Root-cause Diagnosis Tool<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p style=\"margin-bottom: 15px;\">While cloud-native monitoring tools claim to do correlation and root-cause diagnosis, you should be aware of a few key distinguishing factors:<\/p>\n<ol>\n<li>Merely showing state color on a bunch of related applications and network devices is not correlation. True correlation involves analyzing the dependencies and sub-components to identify where the cause of a problem lies.<\/li>\n<li>Suppression of events from multiple sources based on thresholds is also not correlation.<\/li>\n<li>Tools that force you to manually and visually correlate based on data from different data points will lead to a time consuming and laborious diagnostic process.<\/li>\n<li>Tools that require rule-based correlation require you to build elaborate if-then-else conditions. In a dynamic and ever-changing environment such as cloud and containers, this is not an option. Moreover, you might require expert knowledge and many months of consulting hours. If your infrastructure were to change ever so slightly, you will need to re-architect the correlation rules.<\/li>\n<\/ol>\n<p style=\"margin-bottom: 15px;\">Look for tools that:<\/p>\n<ul style=\"margin-bottom: 5px;\">\n<li>Automate the correlation and diagnostic process.<\/li>\n<li>Correlate across layers and tiers of the microservices application (e.g., if the network is down and application is not reachable, then the network error must e given higher priority over application error).<\/li>\n<li>Are non-expert friendly. IT organizations are under-staffed and SRE teams are asked to do more with less. At a minimum, even if the helpdesk operator cannot solve the problem, they should be able to identify the right expert to be involved in the troubleshooting process.<\/li>\n<li>Don&#8217;t require a separate license for auto-correlation from the base product.<\/li>\n<\/ul>\n<\/div>\n<p>With the troubleshooting flow outlined above, anyone can immediately identify in which tier and layer any performance problem might be. It helps to eliminate finger-pointing because often when IT teams work in silos they look at their own dashboard (for example, AWS CloudWatch) and declare that everything is fine for them so go and look for the problem elsewhere.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"SRE_Dashboards_for_Multicloud_Health_Monitoring\"><\/span>SRE Dashboards for Multicloud Health Monitoring<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"margin-bottom: 15px;\">We looked at a troubleshooting workflow above for a specific problem scenario. Several themes stand out in what we have seen so far:<\/p>\n<ul class=\"x_list\">\n<li>Full stack monitoring tools can give you a clear understanding of business impact. This helps you to triage and prioritize problems that matter the most (and defer those that don\u2019t).<\/li>\n<li>You need a clear separation of root-cause vs symptoms or side-effects.<\/li>\n<li>Even non-experts (such as helpdesk or L1 support) can effectively triage performance problems if the dashboards provide clear color-coded insight into root-cause.<\/li>\n<li>Transaction-level visibility at each dependent microservice tier enables effective fault isolation. SREs can collaborate with development teams to solve problems.<\/li>\n<\/ul>\n<p>Even when there are no active incidents, SREs need a bird&#8217;s-eye view visibility into all cloud services across <a href=\"https:\/\/www.eginnovations.com\/product\/end-user-experience-monitoring\">end-user experience<\/a>, <a href=\"https:\/\/www.eginnovations.com\/product\/application-performance-monitoring\">applications<\/a> and <a href=\"https:\/\/www.eginnovations.com\/product\/it-infrastructure-monitoring\">infrastructure<\/a>.<\/p>\n<p>eG Enterprise offers the ability to see the status of all your cloud services side by side on a single screen.<\/p>\n<p><a href=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/multicloud-health-dashboard-view.jpg\" data-rel=\"lightbox-image-8\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-27181 size-full\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/multicloud-health-dashboard.jpg\" alt=\"Multicloud monitoring made easy with custom dashboard\" width=\"750\" height=\"395\" srcset=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/multicloud-health-dashboard.jpg 750w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/multicloud-health-dashboard-300x158.jpg 300w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/multicloud-health-dashboard-310x163.jpg 310w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/09\/multicloud-health-dashboard-140x74.jpg 140w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/a><\/p>\n<p>These dashboards are pre-made in eG Enterprise and are just one of hundreds of pre-built dashboards. You also have the option to build <a href=\"https:\/\/www.eginnovations.com\/blog\/monitoring-dashboards-it-performance\/\/\">customized monitoring dashboards<\/a> using an intuitive drag-and-drop interface to show the metrics that are relevant to your multicloud environment.<\/p>\n<p>When you project these dashboards showing metrics from all cloud services onto the TV screen then you\u2019re bringing about a cultural change within the organization. You\u2019re bringing in system administrators, devops, developers, site reliability engineers and getting them all on the same page and looking at the same metrics.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-27182\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/unified-multicloud-dashboard.webp\" alt=\"\" width=\"850\" height=\"843\" \/><\/p>\n<div class=\"img_caption\">eG Enterprise unifies metrics, traces, logs, events and more from all major public or private cloud platforms and on-premises data centers. A unified dashboard can enable your SRE teams to monitor your entire multicloud and hybrid cloud estate from a single console.<\/div>\n<h2><span class=\"ez-toc-section\" id=\"Conclusion_-_Three_Clicks_is_All_it_takes_to_Resolve_Performance_Problems_If_you_have_the_Right_Observability_Tool\"><\/span>Conclusion &#8211; Three Clicks is All it takes to Resolve Performance Problems (If you have the Right Observability Tool)<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The process I showed you above took three clicks to go from the multicloud topology to identifying the root-cause of the problem. The amount of time saved to detect the problem and resolve the problem means the business won\u2019t be losing out on thousands of dollars due to the downtime.<\/p>\n<p>If there was a traditional siloed approach to the scenario outlined above, all the infrastructure admins will declare that there is nothing wrong with the services because everything is running as normal. A war room would be declared, and everyone would be pointing fingers to where the problem lies.<\/p>\n<p style=\"margin-bottom: 15px;\">DevOps, and SRE teams are in a race to keep pace with the rapidly expanding complexity of modern cloud ecosystems. SREs need the right observability tool to:<\/p>\n<ul class=\"list\">\n<li>Detect outages, software malfunctions, and degradations in service levels.<\/li>\n<li>Live dashboards and historical reports to analyze the health of the system by measuring demand, resource consumption and quality.<\/li>\n<li>Visualize microservice dependencies and understand how dependent microservices might impact each other.<\/li>\n<li>Observability is all about unearthing the &#8220;unknown unknowns&#8221;. To that end, SREs need the slice-and-dice capabilities to observe patterns that have never occurred in the past.<\/li>\n<li>Proactive capacity planning &#8211; Identify long-term trends for capacity planning and business objectives.<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"Start_your_Multicloud_Monitoring_Journey\"><\/span>Start your Multicloud Monitoring Journey<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>If you are looking to monitor complex multicloud environments, take a look at <a href=\"https:\/\/www.eginnovations.com\/product\/application-performance-monitoring\">eG Enterprise<\/a>. It <a href=\"https:\/\/www.eginnovations.com\/product\/technologies\">supports over 200 technologies<\/a> out of the box and is fully compatible to monitor microservices running in <a href=\"https:\/\/www.eginnovations.com\/supported-technologies\/docker-container-monitoring\">Docker containers<\/a> and <a href=\"https:\/\/www.eginnovations.com\/supported-technologies\/kubernetes-monitoring\">Kubernetes pods<\/a>.<\/p>\n<p>Get full visibility into your multicloud environment with a <a href=\"https:\/\/www.eginnovations.com\/product\/application-performance-monitoring\/free-trial\">30 day free trial<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>How to Troubleshoot Multi-cloud Applications using Modern Observability Multi-cloud and hybrid cloud applications are deployed on multiple cloud vendor platforms, including on-premises private cloud. While these platforms offer tremendous benefits by providing a reliable and scalable platform for fuelling digital transformation, they also add significant monitoring complexity.\u00a0 Site reliability engineers (SRE) need multicloud monitoring visibility [&hellip;]<\/p>\n","protected":false},"author":8,"featured_media":27228,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"_lmt_disableupdate":"yes","_lmt_disable":"","footnotes":""},"categories":[369],"tags":[1661,395,405,115,503,781,774,775,783,776,778,780,773,777,779,782],"class_list":["post-26470","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cloud-monitoring","tag-alibaba-cloud","tag-aws","tag-azure","tag-cloud-monitoring","tag-google-cloud","tag-hybrid-cloud-monitoring","tag-microservices","tag-multi-cloud","tag-multi-cloud-application","tag-multi-cloud-applications","tag-multi-cloud-monitoring","tag-multi-cloud-observability","tag-multicloud","tag-multicloud-monitoring","tag-multicloud-observability","tag-troubleshooting-multicloud"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Monitoring and Troubleshooting Multi-cloud Infrastructures<\/title>\n<meta name=\"description\" content=\"Learn how to achieve multi-cloud monitoring from a single pane of glass. Find out how to get to the root-cause in 3 clicks.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.eginnovations.com\/blog\/monitoring-multicloud-infrastructures\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Monitoring and Troubleshooting Multicloud Infrastructures\" \/>\n<meta property=\"og:description\" content=\"Learn how to achieve multicloud monitoring from a single pane of glass. Find out how to get to the root-cause in 3 clicks.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.eginnovations.com\/blog\/monitoring-multicloud-infrastructures\/\" \/>\n<meta property=\"og:site_name\" content=\"eG Innovations\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/eGInnovations\" \/>\n<meta property=\"article:published_time\" content=\"2022-10-07T09:55:25+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-11-09T09:21:12+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/Multicloud-Social-Banner-Image-1.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Arun Aravamudhan\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"Monitoring and Troubleshooting Multicloud Infrastructures\" \/>\n<meta name=\"twitter:description\" content=\"Learn how to achieve multicloud monitoring from a single pane of glass. Find out how to get to the root-cause in 3 clicks.\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/Multicloud-Social-Banner-Image-1.jpg\" \/>\n<meta name=\"twitter:creator\" content=\"@https:\/\/x.com\/perfclarity\" \/>\n<meta name=\"twitter:site\" content=\"@eginnovations\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Arun Aravamudhan\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Monitoring and Troubleshooting Multi-cloud Infrastructures","description":"Learn how to achieve multi-cloud monitoring from a single pane of glass. Find out how to get to the root-cause in 3 clicks.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.eginnovations.com\/blog\/monitoring-multicloud-infrastructures\/","og_locale":"en_US","og_type":"article","og_title":"Monitoring and Troubleshooting Multicloud Infrastructures","og_description":"Learn how to achieve multicloud monitoring from a single pane of glass. Find out how to get to the root-cause in 3 clicks.","og_url":"https:\/\/www.eginnovations.com\/blog\/monitoring-multicloud-infrastructures\/","og_site_name":"eG Innovations","article_publisher":"https:\/\/www.facebook.com\/eGInnovations","article_published_time":"2022-10-07T09:55:25+00:00","article_modified_time":"2022-11-09T09:21:12+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/Multicloud-Social-Banner-Image-1.jpg","type":"image\/jpeg"}],"author":"Arun Aravamudhan","twitter_card":"summary_large_image","twitter_title":"Monitoring and Troubleshooting Multicloud Infrastructures","twitter_description":"Learn how to achieve multicloud monitoring from a single pane of glass. Find out how to get to the root-cause in 3 clicks.","twitter_image":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/Multicloud-Social-Banner-Image-1.jpg","twitter_creator":"@https:\/\/x.com\/perfclarity","twitter_site":"@eginnovations","twitter_misc":{"Written by":"Arun Aravamudhan","Est. reading time":"14 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.eginnovations.com\/blog\/monitoring-multicloud-infrastructures\/#article","isPartOf":{"@id":"https:\/\/www.eginnovations.com\/blog\/monitoring-multicloud-infrastructures\/"},"author":{"name":"Arun Aravamudhan","@id":"https:\/\/www.eginnovations.com\/blog\/#\/schema\/person\/d788cb81df96a940429c3f5a3b294a6a"},"headline":"How to Monitor and Troubleshoot Multi-cloud Applications","datePublished":"2022-10-07T09:55:25+00:00","dateModified":"2022-11-09T09:21:12+00:00","mainEntityOfPage":{"@id":"https:\/\/www.eginnovations.com\/blog\/monitoring-multicloud-infrastructures\/"},"wordCount":2468,"publisher":{"@id":"https:\/\/www.eginnovations.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.eginnovations.com\/blog\/monitoring-multicloud-infrastructures\/#primaryimage"},"thumbnailUrl":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/Multicloud-Thumbnail-1.webp","keywords":["Alibaba cloud","AWS","Azure","Cloud Monitoring","Google Cloud","Hybrid cloud monitoring","Microservices","multi-cloud","multi-cloud application","multi-cloud applications","Multi-cloud monitoring","multi-cloud observability","Multicloud","Multicloud monitoring","multicloud observability","Troubleshooting multicloud"],"articleSection":["Cloud Monitoring"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.eginnovations.com\/blog\/monitoring-multicloud-infrastructures\/","url":"https:\/\/www.eginnovations.com\/blog\/monitoring-multicloud-infrastructures\/","name":"Monitoring and Troubleshooting Multi-cloud Infrastructures","isPartOf":{"@id":"https:\/\/www.eginnovations.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.eginnovations.com\/blog\/monitoring-multicloud-infrastructures\/#primaryimage"},"image":{"@id":"https:\/\/www.eginnovations.com\/blog\/monitoring-multicloud-infrastructures\/#primaryimage"},"thumbnailUrl":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/Multicloud-Thumbnail-1.webp","datePublished":"2022-10-07T09:55:25+00:00","dateModified":"2022-11-09T09:21:12+00:00","description":"Learn how to achieve multi-cloud monitoring from a single pane of glass. Find out how to get to the root-cause in 3 clicks.","breadcrumb":{"@id":"https:\/\/www.eginnovations.com\/blog\/monitoring-multicloud-infrastructures\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.eginnovations.com\/blog\/monitoring-multicloud-infrastructures\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.eginnovations.com\/blog\/monitoring-multicloud-infrastructures\/#primaryimage","url":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/Multicloud-Thumbnail-1.webp","contentUrl":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2022\/10\/Multicloud-Thumbnail-1.webp","width":362,"height":235},{"@type":"BreadcrumbList","@id":"https:\/\/www.eginnovations.com\/blog\/monitoring-multicloud-infrastructures\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.eginnovations.com\/blog\/"},{"@type":"ListItem","position":2,"name":"How to Monitor and Troubleshoot Multi-cloud Applications"}]},{"@type":"WebSite","@id":"https:\/\/www.eginnovations.com\/blog\/#website","url":"https:\/\/www.eginnovations.com\/blog\/","name":"eG Innovations","description":"IT Performance Monitoring Insights","publisher":{"@id":"https:\/\/www.eginnovations.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.eginnovations.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.eginnovations.com\/blog\/#organization","name":"eG Innovations","alternateName":"eg innovations","url":"https:\/\/www.eginnovations.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.eginnovations.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2014\/07\/eg-logo-dark-gray1_new.jpg","contentUrl":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2014\/07\/eg-logo-dark-gray1_new.jpg","width":362,"height":235,"caption":"eG Innovations"},"image":{"@id":"https:\/\/www.eginnovations.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/eGInnovations","https:\/\/x.com\/eginnovations"]},{"@type":"Person","@id":"https:\/\/www.eginnovations.com\/blog\/#\/schema\/person\/d788cb81df96a940429c3f5a3b294a6a","name":"Arun Aravamudhan","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.eginnovations.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/7ff42334d908fb4060880a4487331e4a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/7ff42334d908fb4060880a4487331e4a?s=96&d=mm&r=g","caption":"Arun Aravamudhan"},"sameAs":["https:\/\/www.linkedin.com\/in\/arun-aravamudhan\/","https:\/\/x.com\/https:\/\/x.com\/perfclarity"],"url":"https:\/\/www.eginnovations.com\/blog\/author\/arun-aravamudhan\/"}]}},"modified_by":"eG Innovations","_links":{"self":[{"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/posts\/26470","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/comments?post=26470"}],"version-history":[{"count":0,"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/posts\/26470\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/media\/27228"}],"wp:attachment":[{"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/media?parent=26470"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/categories?post=26470"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/tags?post=26470"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}