{"id":38982,"date":"2026-03-03T06:40:41","date_gmt":"2026-03-03T11:40:41","guid":{"rendered":"https:\/\/www.eginnovations.com\/blog\/?p=38982"},"modified":"2026-03-12T06:45:13","modified_gmt":"2026-03-12T10:45:13","slug":"cloud-application-slowness-when-every-team-says-its-not-my-problem","status":"publish","type":"post","link":"https:\/\/www.eginnovations.com\/blog\/cloud-application-slowness-when-every-team-says-its-not-my-problem\/","title":{"rendered":"Cloud Application Slowness: When Every Team Says &#8216;It&#8217;s Not My Problem&#8217;"},"content":{"rendered":"<div class=\"inner_content\">\n<div style=\"padding: 20px; border: 1px solid #ffd392; background: #fcf8ef; text-align: justify; margin-bottom: 20px;\">\n<h2 style=\"margin-top: 10px !important;\"><span class=\"ez-toc-section\" id=\"When_All_Dashboards_Report_Green_During_a_Production_Outage\"><\/span>When All Dashboards Report Green During a Production Outage<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"400\" height=\"291\" class=\"alignright size-full wp-image-39097\" style=\"width: 400px;\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/503-error-find.png\" alt=\"\" srcset=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/503-error-find.png 750w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/503-error-find-300x218.png 300w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/503-error-find-310x226.png 310w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/503-error-find-140x102.png 140w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/p>\n<p>A retail ERP system underwent a vertical scaling operation to support growth from 3,000 to 10,000 stores on AWS. Immediately following the cutover, users experienced widespread HTTP 503 (&#8220;Service Unavailable&#8221;) errors and checkout failures. Yet, standard performance dashboards indicated a healthy environment.<\/p>\n<p style=\"margin-bottom: 15px!important;\">During the incident response, each team reviewed their respective telemetry, which indicated normal operation:<\/p>\n<ul>\n<li><strong>Database Team:<\/strong> &#8220;Query latency is flat at sub-millisecond levels. The database is executing requests instantly.&#8221;<\/li>\n<li><strong>Application Team:<\/strong> &#8220;JVM threads are in a WAIT state on <code>sun.nio.ch.SocketDispatcher.read<\/code>. The code is blocked, waiting for database responses.&#8221;<\/li>\n<li><strong>Infrastructure Team:<\/strong> &#8220;CPU is at 9%, storage IOPS is at 8%, and bandwidth is within SLA. We have substantial headroom.&#8221;<\/li>\n<\/ul>\n<p>While component-level metrics appeared healthy, system-wide transactions were failing.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Case_Study_Non-Linear_Failure_at_3X_Scale\"><\/span>Case Study: Non-Linear Failure at 3X Scale<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>To understand why this happens, we have to look outside standard telemetry. This article breaks down a real production incident where the root cause was an invisible bottleneck: the EC2 instance had hit a hard packets-per-second (PPS) ceiling, not a bandwidth limit.<\/p>\n<p>The system looked perfectly healthy at 9% CPU and under 10% storage IOPS. It wasn&#8217;t; it was silently discarding traffic. TCP retransmissions had climbed past 20% at peak (with spikes to 50%), database insert latency jumped from 1ms to 150ms, and connection time to the SQL service ballooned to 3 seconds.<\/p>\n<p>The standard monitoring stack saw none of it.<\/p>\n<p>This postmortem documents how cross-layer correlation\u2014specifically overlaying synthetic connection probes, network stack metrics, and application thread states on a single timeline\u2014exposed what siloed monitoring missed, and exactly what SRE teams must instrument to catch it early.<\/p>\n<p>(Note: This article summarizes a 15-page forensic postmortem. <a class=\"link\" href=\"https:\/\/www.eginnovations.com\/white-paper\/beyond-cloud-monitoring-eight-lessons-for-delivering-high-performing-cloud-applications\">Download the full technical case study (PDF)<\/a> for the complete timeline, configuration diffs, and TCP tuning parameters.)<\/p>\n<\/div>\n<h2><span class=\"ez-toc-section\" id=\"Multiple_Blind_Spots\"><\/span>Multiple Blind Spots<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Troubleshooting an outage where component metrics are green but users are seeing 503s creates an operational blind spot. Standard monitoring tools are built to answer &#8216;Is it up?&#8217; and &#8216;Is it busy?&#8217;\u2014they aren&#8217;t built to answer &#8216;Is the packet flow healthy?&#8217;<\/p>\n<p style=\"margin-bottom: 15px!important;\">This postmortem breaks down four specific blind spots that hid the root cause from the operations team:<\/p>\n<ol>\n<li>\n<p style=\"margin-bottom: 5px;\"><b>Utilization vs. Saturation: <\/b>The infrastructure team saw 9% CPU utilization, yet the system was silently dropping &gt;20% of packets. The CPU wasn&#8217;t busy, but the kernel queue was full. Standard tools missed this because they don&#8217;t correlate transport-layer metrics with resource utilization.<\/p>\n<\/li>\n<li>\n<p style=\"margin-bottom: 5px;\"><b>PPS Limits vs. Bandwidth Limits:\u00a0<\/b>An instance can hit a packet processing limit while overall bandwidth remains well within SLA. Cloud provider health checks reported \u201cHealthy\u201d because the bandwidth pipe wasn\u2019t full, even though the underlying network interface couldn&#8217;t serialize the TCP handshakes fast enough.<\/p>\n<\/li>\n<li>\n<p style=\"margin-bottom: 5px;\"><b>Breaking the \u201cGreen Dashboard\u201d Deadlock:\u00a0<\/b>When every siloed team has a clean dashboard, you need a unified timeline. Proving this was a transport issue (and not a slow database) required overlaying application thread states with network counters.<\/p>\n<\/li>\n<li>\n<p style=\"margin-bottom: 5px;\"><b>The Managed-Cloud Responsibility Myth:\u00a0<\/b>The cloud provider guarantees infrastructure availability, but the configuration of the data plane (connection lifecycles, packet-flow behavior, and OS-level networking) remains entirely the domain of the operations team.<\/p>\n<\/li>\n<\/ol>\n<h2><span class=\"ez-toc-section\" id=\"The_Scale-Up_Context\"><\/span>The Scale-Up Context<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>This outage occurred after a strategic acquisition required the ERP system to scale up. To support the load, Engineering executed a standard vertical scale-up: EC2 instances were upgraded to 32 vCPU general-purpose families (m5.8xlarge), and RDS was migrated to SQL Server Standard Edition.<\/p>\n<p style=\"margin-bottom: 15px;\">Immediately post-cutover, inventory updates began failing with timeouts. Yet, as the war room participants insisted, the standard telemetry backed up their claims of a healthy environment:<\/p>\n<ul>\n<li><strong>Database CPU:<\/strong> 9% average (Peak 17%)<\/li>\n<li><strong>IOPS:<\/strong> 8% average<\/li>\n<li><strong>Query Execution:<\/strong> &lt;400ms.<\/li>\n<li><strong>JVM Threads:<\/strong> Saturated at 1,500 (Max Pool). Dominant thread state: WAIT.<\/li>\n<li><strong>Infrastructure:<\/strong> Memory allocations normal, Bandwidth within SLA.<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"In_the_Public_Cloud_the_Physical_Layer_is_an_Opaque_Abstraction\"><\/span>In the Public Cloud, the Physical Layer is an Opaque Abstraction<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>In a traditional data center, ownership is clear. If a switch port is saturated, the Network team logs into the device and fixes it. In the cloud, the network is an opaque abstraction where the provider owns the physical wire, while the operations team owns only the logical configuration and data plane.<\/p>\n<p>When latency spikes without explicit errors, no one sees a red light on \u201cthe network.\u201d Each team falls back to the boundaries of its own dashboards. The application server, OS\/kernel, and database all looked healthy in isolation\u2014even as packets were being dropped in the middle.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-39031\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/Investigation-layer.jpg\" alt=\"\" width=\"750\" height=\"345\" srcset=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/Investigation-layer.jpg 750w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/Investigation-layer-300x138.jpg 300w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/Investigation-layer-310x143.jpg 310w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/Investigation-layer-140x64.jpg 140w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/p>\n<p>In this incident, every team reported healthy metrics (ALB, App\/EC2, RDS) while packets were dropped in the invisible layer between them. At the risk of repeating the core concepts, it is critical to examine exactly how these specific blind spots manifested for each team to understand why the root cause remained invisible for so long.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"The_Database_Administrators_Perspective_My_Engine_is_Fast\"><\/span>The Database Administrator\u2019s Perspective: \u201cMy Engine is Fast\u201d<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The DBA focused on the golden metric of their domain: <strong>Query Execution Time<\/strong>. This measures the milliseconds between the database receiving a query and finishing it.<\/p>\n<p>As the performance data showed, this metric remained flat at a steady baseline (just 31 ms) throughout the outage. The DBA\u2019s conclusion was logical:<em> &#8220;The database is processing requests instantly. The problem is upstream.&#8221;<\/em><\/p>\n<p style=\"margin-bottom: 15px; font-size: 20px;\"><strong>Why the Discrepancy?<\/strong><\/p>\n<p>Standard database performance tools only measure the &#8220;tip&#8221; of the transaction. As illustrated in the\u00a0 iceberg analogy below, the bulk of the latency (~3,000ms) was hidden beneath the surface in the transport layer\u2014consumed by SYN\/ACK retries, packet drops, and kernel queue waits\u2014entirely invisible to standard SQL monitoring.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-39053\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/query-31s-execution.jpg\" alt=\"\" width=\"750\" height=\"500\" srcset=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/query-31s-execution.jpg 750w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/query-31s-execution-300x200.jpg 300w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/query-31s-execution-310x207.jpg 310w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/query-31s-execution-140x93.jpg 140w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/p>\n<ul>\n<li><strong>The Flaw:<\/strong> Their dashboard was scientifically accurate but practically blind. It measured processing time (just 31 ms) but missed the 3-second delay requests spent in TCP connection establishment.<\/li>\n<\/ul>\n<h1 style=\"margin-bottom: 15px; font-size: 20px;\"><span class=\"ez-toc-section\" id=\"The_Developers_Perspective_My_Code_is_Waiting\"><\/span><strong>The Developer&#8217;s Perspective: &#8220;My Code is Waiting&#8221;<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h1>\n<p>The Application Developers analyzed JVM thread dumps. They found hundreds of threads in a WAIT state (specifically blocked on sun.nio.ch.SocketDispatcher.read).<\/p>\n<ul>\n<li>\n<p style=\"margin-bottom: 15px;\"><strong>The Developer&#8217;s Conclusion:<\/strong> &#8220;The app is blocked waiting on the database. The code isn&#8217;t churning CPU or looping; it&#8217;s waiting for a socket response.&#8221;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-39024\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/application-thread-report.jpg\" alt=\"\" width=\"750\" height=\"447\" srcset=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/application-thread-report.jpg 750w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/application-thread-report-300x179.jpg 300w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/application-thread-report-310x185.jpg 310w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/application-thread-report-140x83.jpg 140w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/p>\n<p>The application thread reports it is waiting, which developers often mistake for a slow database. In reality, that time is being consumed by the OS Kernel retrying dropped packets. The actual database query is a tiny fraction of the total delay.<\/li>\n<li><strong>The Flaw:<\/strong> To a Java developer, a WAIT state is an exoneration. It proves the code isn&#8217;t the bottleneck. However, without visibility into the TCP stack, they couldn&#8217;t distinguish between a slow database (processing delay) and a slow network (travel delay). They assumed the former because that is the standard interpretation of WAIT.<\/li>\n<\/ul>\n<h1 style=\"margin-bottom: 15px; font-size: 20px;\"><span class=\"ez-toc-section\" id=\"The_SysAdmins_Perspective_The_Hardware_is_Idle\"><\/span><strong>The SysAdmin&#8217;s Perspective: &#8220;The Hardware is Idle&#8221;<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h1>\n<p style=\"margin-bottom: 15px;\">The System Administrator monitored the EC2 fleet. The signals were overwhelmingly positive: the m5 instances had massive vCPU headroom, storage IOPS averaged just 8%, and there were zero OS-level alarms.<\/p>\n<ul>\n<li><strong>The SysAdmin&#8217;s Conclusion:<\/strong> &#8220;Infrastructure health is green. We have plenty of capacity.&#8221;<\/li>\n<li><strong>The Flaw:<\/strong> They tracked Utilization (busy time) but missed Saturation (queue depth). The NIC was silently dropping packets due to the instance hitting its Packets-Per-Second (PPS) ceiling, not bandwidth.<\/li>\n<\/ul>\n<div style=\"padding: 20px; border: 1px solid #ffd392; background: #fcf8ef; text-align: justify; margin-bottom: 20px;\">\n<p style=\"margin-bottom: 15px; font-size: 20px;\"><strong>The Fallacy of the Idle CPU<\/strong><\/p>\n<p style=\"margin-bottom: 15px;\">We are trained to equate CPU % with Work. If the CPU is 90%, the server is busy; if it&#8217;s 10%, it&#8217;s available.<\/p>\n<p style=\"margin-bottom: 15px;\">But in distributed systems, &#8220;Idle&#8221; is ambiguous. It can mean:<\/p>\n<ol style=\"margin-bottom: 15px;\">\n<li>True Idleness: The system has zero pending tasks.<\/li>\n<li>Starvation: The system has pending tasks but is blocked on I\/O.<\/li>\n<\/ol>\n<p style=\"margin-bottom: 5px;\">In this incident, the CPU was <strong>starved<\/strong>. The packet processing queue was saturated, preventing requests from crossing the user\/kernel boundary to reach the application. This demonstrates why CPU utilization is a flawed proxy for availability: <strong>A low-utilization CPU is often a symptom of high-saturation I\/O<\/strong>.<\/p>\n<\/div>\n<h2><span class=\"ez-toc-section\" id=\"Architecture_Bottlenecks_are_Silent\"><\/span>Architecture Bottlenecks are Silent<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>In this incident, the bottleneck lived in the transport layer, not the application logic. The application server was attempting to serialize thousands of concurrent TCP handshakes on a single network interface, overwhelming the instance\u2019s packets-per-second (PPS) limit. It was a packet-rate bottleneck, not a bandwidth bottleneck.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-39025\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/Architecture-Bottlenecks.jpg\" alt=\"\" width=\"750\" height=\"500\" srcset=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/Architecture-Bottlenecks.jpg 750w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/Architecture-Bottlenecks-300x200.jpg 300w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/Architecture-Bottlenecks-310x207.jpg 310w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/Architecture-Bottlenecks-140x93.jpg 140w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/p>\n<p>The graphic above illustrates this: a wide road (10Gbps bandwidth available) with a narrow gate (PPS limit). The server could handle the total volume, but not the rate of small packets.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"The_Non-Linear_Failure_Pattern\"><\/span>The Non-Linear Failure Pattern<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"margin-bottom: 15px;\">This created a classic non-linear failure mode.<\/p>\n<ul>\n<li><strong>Linear Phase (0\u20133k Stores):<\/strong> Performance was flat and stable.<\/li>\n<li><strong>The Saturation Point:<\/strong> As soon as the load crossed the concurrency threshold, we hit the &#8220;knee&#8221; of the curve. Latency didn&#8217;t just drift; it went vertical.<\/li>\n<\/ul>\n<p>Standard metrics (CPU\/IOPS, basic health) stayed deceptively normal. The failure only became obvious once the team correlated synthetic connection time with TCP retransmissions and JVM thread states across the same time window.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"The_Cloud_Responsibility_Gap\"><\/span>The Cloud Responsibility Gap<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>There is a pervasive myth that running on managed infrastructure outsources performance risk. This incident demonstrated the risk of that assumption.<\/p>\n<p>When the team escalated the issue with time-correlated graphs, synthetic test results, and tcping data, the cloud provider&#8217;s official response was:<em> &#8220;Everything is fine from our end.&#8221;<\/em><\/p>\n<p>Cloud providers ensure the health of their underlying infrastructure. However, application performance and connection-layer behavior remain the customer&#8217;s responsibility. Under the shared responsibility model, ensuring that the underlying TCP stack and network parameters are tuned to handle the required transactional load <em>falls entirely on the operations team<\/em>.<\/p>\n<p>The compute and storage resources were functioning normally. The bottleneck was network packet processing within the EC2 instance itself. It was simply mismatched to the packet rate being pushed through it. This mismatch stayed invisible without transport-layer visibility.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"The_Observability_Blind_Spot\"><\/span>The Observability Blind Spot<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"margin-bottom: 15px!important;\">Standard CloudWatch is strong on instance health and resource metrics, but it\u2019s weak on the transport-level symptoms that explain connection quality and packet flow. In this incident, the decisive signals lived at a layer you typically don\u2019t get from basic instance dashboards:<\/p>\n<ul>\n<li><strong>TCP retransmission rates:<\/strong> A strong indicator of packet loss and congestion.<\/li>\n<li><strong>TCP handshake latency:<\/strong> Time to establish a new connection (SYN \u2192 ACK).<\/li>\n<li><strong>Network Adapter Buffer Exhaustion:<\/strong> Drops occurring when instances hit packet-per-second (PPS) limits or exhaust transmit\/receive buffers.<\/li>\n<\/ul>\n<p>Even when upgrading to enhanced networking like AWS ENA Express, critical visibility gaps remain in standard cloud dashboards. TCP handshake latency is simply not exposed as a native instance metric. Low-level counters for packet drops or OS-level socket exhaustion are often cumulative or buried in driver-level tools, making them reactive rather than easily alertable.<\/p>\n<p>These transport-level metrics\u2014not CPU or bandwidth\u2014are what reveal network processing bottlenecks.<em> (Recommended Alert: TCP Retransmits rising above a near-zero baseline, or anomalous spikes in database connection time).<\/em><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Initial_Tuning_Attempts_And_Why_They_Failed\"><\/span>Initial Tuning Attempts (And Why They Failed)<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"margin-bottom: 15px!important;\">Before the team proved the issue was transport-layer latency, they worked through the standard optimizations\u2014driver tuning, connection pooling changes, and database-side adjustments\u2014because early symptoms looked like a classic app\/DB bottleneck. They toggled driver behaviors (including TcpNoDelay and packet sizing), tried different JDBC drivers (jTDS vs Microsoft), increased the initial connection pool to reduce handshake frequency, and even reduced SQL Server\u2019s memory allocation to free resources for the OS\/TCP stack.<\/p>\n<p style=\"margin-bottom: 15px!important;\"><em>None of these moved the needle on the key symptom:<\/em> connection establishment time remained erratic and high. That \u201cfailure to improve\u201d became a critical data point\u2014it narrowed the root cause away from application\/database configuration and toward the network transport path and packet processing behavior between the tiers.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"The_Breakthrough_Unified_and_Correlated_Monitoring\"><\/span>The Breakthrough: Unified and Correlated Monitoring<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"margin-bottom: 15px;\">To bypass the siloed views, the team used eG Enterprise for unified observability. Instead of relying on passive infrastructure metrics, it executed an active validation and correlation strategy:<\/p>\n<ol>\n<li>\n<p style=\"margin-bottom: 10px;\"><strong>Synthetic Validation:<\/strong> Periodically initiated real database connections from the EC2 tier to measure round-trip time. This revealed a critical discrepancy:<\/p>\n<ul style=\"margin-bottom: 15px;\">\n<li><strong>Connection Time:<\/strong> Spiked to over 3 seconds during peak periods.<\/li>\n<li><strong>Query Execution Time:<\/strong> Remained flat at a baseline of 0.4 seconds.<\/li>\n<li><strong>Conclusion:<\/strong> This mathematically isolated the latency to the pre-execution phase: the delay was occurring in the network handshake.<\/li>\n<\/ul>\n<\/li>\n<li>\n<p style=\"margin-bottom: 10px;\"><strong>Cross-Layer Correlation:<\/strong> By overlaying metrics from the application, network, and database on a single timeline, the pattern became undeniable:<\/p>\n<ul style=\"margin-bottom: 15px;\">\n<li><strong>TCP Retransmits: <\/strong>Spiked from near zero to over 20% at peak load, climbing as high as 50% of total packets sent in some intervals<\/li>\n<li><strong>Database Connection Time: J<\/strong>umped to 3 seconds while Query Execution stayed flat.<\/li>\n<li><strong>JVM Threads: <\/strong>Hit 1,500 (saturated) while SQL CPU remained at 9% (idle).<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<p>The root cause was not visible in any single metric; it emerged only through correlation. The database was performant. The network was dropping packets. The system was throttled by TCP handshake serialization, not compute capacity.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"The_Fix_Architectural_Tuning_Not_Code_Refactoring\"><\/span>The Fix: Architectural Tuning, Not Code Refactoring<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"margin-bottom: 15px;\">Once the transport layer was identified as the bottleneck, the solution was architectural, requiring zero changes to the application logic:<\/p>\n<ol>\n<li><strong>Enabled HTTP Keep-Alives:<\/strong> Reduced TCP handshake volume by allowing persistent connections between the Application Load Balancer (ALB) and the Tomcat tier.<\/li>\n<li><strong>Upgraded Instance Class:<\/strong> Migrated from m5.8xlarge to m6in.8xlarge. This retained identical CPU and memory capacity, but unlocked AWS ENA Express (SRD technology) for accelerated packet processing and reduced jitter.<\/li>\n<li><strong>Tuned the OS\/Network Stack:<\/strong> Disabled RSC and ECN; expanded the ephemeral port range; increased free TCBs (Transmission Control Blocks); and heavily enlarged the receive\/transmit buffers on the EC2 adapter. This allowed the system to absorb high-concurrency bursts without dropping frames.<\/li>\n<\/ol>\n<p>The full configuration parameters, registry keys, and Tomcat connector settings for each of these changes are documented in the <a href=\"https:\/\/www.eginnovations.com\/white-paper\/beyond-cloud-monitoring-eight-lessons-for-delivering-high-performing-cloud-applications\">complete case study PDF<\/a>.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Measure_the_Flow_Not_Just_the_Capacity\"><\/span>Measure the Flow, Not Just the Capacity<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Scaling goes beyond vertical provisioning. It requires understanding how architectural limits manifest under increased load. A system that works at 3,000 users can fail non-linearly at 10,000. This happens not because of compute exhaustion, but because of transport saturation.<\/p>\n<p>To detect these failures, move from measuring <strong>resource consumption<\/strong> (CPU, Memory) to measuring <strong>flow quality<\/strong> (Connection Time, Retransmission Rate, Buffer Exhaustion). In cloud environments, the transport layer is often the first place scale breaks, yet it is the last place teams instrument.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Download_the_Full_Case_Study\"><\/span>Download the Full Case Study<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"margin-bottom: 15px;\">We have documented the complete forensic analysis of this incident in our technical white paper, including:<\/p>\n<ul>\n<li>The exact correlation that isolated the root cause.<\/li>\n<li>Step-by-step configuration changes for TCP tuning, Keep-Alives, and ENA Express.<\/li>\n<li>Eight architectural principles for scaling cloud applications without hitting non-linear failure curves.<\/li>\n<\/ul>\n<p><a href=\"https:\/\/www.eginnovations.com\/white-paper\/beyond-cloud-monitoring-eight-lessons-for-delivering-high-performing-cloud-applications\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-39056\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/beyond-cloud-monitoring-whitepaper.jpg\" alt=\"\" width=\"850\" height=\"180\" srcset=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/beyond-cloud-monitoring-whitepaper.jpg 850w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/beyond-cloud-monitoring-whitepaper-300x64.jpg 300w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/beyond-cloud-monitoring-whitepaper-768x163.jpg 768w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/beyond-cloud-monitoring-whitepaper-800x169.jpg 800w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/beyond-cloud-monitoring-whitepaper-310x66.jpg 310w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/beyond-cloud-monitoring-whitepaper-140x30.jpg 140w\" sizes=\"auto, (max-width: 850px) 100vw, 850px\" \/><\/a><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Stop_Debugging_with_Green_Dashboards\"><\/span>Stop Debugging with Green Dashboards<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>A &#8220;Healthy&#8221; status from your cloud provider means the infrastructure is running as designed. It does not mean your transactions are completing on time.<\/p>\n<p>This incident proved that a system can be simultaneously healthy by every dashboard metric and broken from the user&#8217;s perspective. The gap lives in the transport layer\u2014a layer no single team owns, and the last layer anyone instruments.<\/p>\n<p>Conventional monitoring answers &#8216;Is it up?&#8217; whereas Unified monitoring answers, &#8216;Why is it slow?&#8217;. eG Enterprise correlates infrastructure signals (TCP retransmits), application context (thread states), and database behavior (connection time vs. query time) on a single timeline \u2014 so the next time every dashboard is green and users are seeing 503s, you have a path to root cause in minutes, not days.<\/p>\n<p>Break the &#8216;Not My Problem&#8217; loop.<\/p>\n<p><a href=\"https:\/\/www.eginnovations.com\/it-monitoring\/free-trial\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-39058\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/start-trial-banner.jpg\" alt=\"\" width=\"850\" height=\"180\" srcset=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/start-trial-banner.jpg 850w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/start-trial-banner-300x64.jpg 300w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/start-trial-banner-768x163.jpg 768w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/start-trial-banner-800x169.jpg 800w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/start-trial-banner-310x66.jpg 310w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/start-trial-banner-140x30.jpg 140w\" sizes=\"auto, (max-width: 850px) 100vw, 850px\" \/><\/a><\/p>\n<div class=\"containers mb-4\" style=\"clear:both\">\n \t<div class=\"fixed-free-trial-div mb-3\" id=\"fixedsectioninfo_blog_btn\">\n \t\n \t<style>.containers_hide_row,.all_blogs_bottom{\n \tdisplay:none;\n   \n}\t<\/style>\n                <div class=\"box-style container row pt-4 pb-4  animatedParent animateOnce\" data-sequence=\"100\" style=\"border-bottom: 1px solid #ddd;border-top: 1px solid #ddd;background: #4b4b4b;padding: 15px 15px 0 15px;border-radius: 12px;\">\n                \n                <div class=\"text-center animated fadeIn go\"> \n                <p class=\"text-center mb-4\" style=\"    color: #fff;\">\n\neG Enterprise is an Observability solution for Modern IT. Monitor digital workspaces, <br\/>web applications, SaaS services, cloud and containers from a single pane of glass.\n<\/p>\n                <\/div>\n                    <div class=\"text-center pb-1 animated fadeIn go\" data-id=\"8\">\n                        <a class=\"border-btnhead-eg\"  href=\"https:\/\/www.eginnovations.com\/it-monitoring\/free-trial\"> <span style=\"font-family: GraphikMedium!important;color: #fff;\">Free Trial<\/span><\/a>\n                        <a href=\"https:\/\/www.eginnovations.com\/product\/cloud-monitoring\" class=\" border-btnhead-eg\" style=\"width:230px;   \"> <svg width=\"24\" height=\"24\" style=\"margin-top:-3px\" version=\"1.1\" id=\"Layer_1\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" x=\"0px\" y=\"0px\"\n\t viewBox=\"0 0 26.5 26.5\" style=\"enable-background:new 0 0 26.5 26.5;\" xml:space=\"preserve\">\n<style type=\"text\/css\">\n\t.st2{fill:#fff !important;stroke:#fff !important;stroke-miterlimit:10;}\n\t\n\t\t.border-btnhead:hover .st2 {\n  fill: #ffffff !important;\n  stroke: #ffffff;\n}\n<\/style>\n<g>\n\t<g>\n\t\t<path class=\"st2\" d=\"M13.3,25.8c-6.9,0-12.5-5.6-12.5-12.5S6.4,0.8,13.3,0.8s12.5,5.6,12.5,12.5S20.2,25.8,13.3,25.8z M13.3,1.8\n\t\t\tC6.9,1.8,1.8,6.9,1.8,13.3S7,24.8,13.3,24.8s11.5-5.2,11.5-11.5S19.6,1.8,13.3,1.8z M11.2,18.1c-0.2,0-0.4-0.1-0.6-0.2\n\t\t\tc-0.3-0.2-0.6-0.6-0.6-1V9.7c0-0.4,0.2-0.8,0.6-1c0.3-0.2,0.8-0.2,1.2,0l6.2,3.6c0.3,0.2,0.6,0.6,0.6,1s-0.2,0.8-0.6,1l-6.2,3.6\n\t\t\tC11.6,18,11.4,18.1,11.2,18.1z\"\/>\n\t<\/g>\n<\/g>\n<\/svg> <span style=\"font-family: GraphikMedium!important;color: #fff;\">&nbsp;See the platform<\/span><\/a>\n                    <\/div>\n                <\/div>\n                \n                 <\/div>\n            <\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>When All Dashboards Report Green During a Production Outage A retail ERP system underwent a vertical scaling operation to support growth from 3,000 to 10,000 stores on AWS. Immediately following the cutover, users experienced widespread HTTP 503 (&#8220;Service Unavailable&#8221;) errors and checkout failures. Yet, standard performance dashboards indicated a healthy environment. During the incident response, [&hellip;]<\/p>\n","protected":false},"author":8,"featured_media":39020,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"_lmt_disableupdate":"yes","_lmt_disable":"","footnotes":""},"categories":[369],"tags":[],"class_list":["post-38982","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cloud-monitoring"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Cloud Application Slowness: When Every Team Says &#039;It&#039;s Not My Problem&#039; | eG Innovations<\/title>\n<meta name=\"description\" content=\"A major AWS scale-up triggered widespread user errors, yet every siloed dashboard reported perfectly healthy. Discover the invisible architectural limit that standard cloud monitoring missed during this massive outage.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.eginnovations.com\/blog\/cloud-application-slowness-when-every-team-says-its-not-my-problem\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Cloud Application Slowness: When Every Team Says &#039;It&#039;s Not My Problem&#039; | eG Innovations\" \/>\n<meta property=\"og:description\" content=\"A major AWS scale-up triggered widespread user errors, yet every siloed dashboard reported perfectly healthy. Discover the invisible architectural limit that standard cloud monitoring missed during this massive outage.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.eginnovations.com\/blog\/cloud-application-slowness-when-every-team-says-its-not-my-problem\/\" \/>\n<meta property=\"og:site_name\" content=\"eG Innovations\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/eGInnovations\" \/>\n<meta property=\"article:published_time\" content=\"2026-03-03T11:40:41+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-03-12T10:45:13+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/Social-Banner-Cloud-Application-Slowness-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Arun Aravamudhan\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@https:\/\/x.com\/perfclarity\" \/>\n<meta name=\"twitter:site\" content=\"@eginnovations\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Arun Aravamudhan\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Cloud Application Slowness: When Every Team Says 'It's Not My Problem' | eG Innovations","description":"A major AWS scale-up triggered widespread user errors, yet every siloed dashboard reported perfectly healthy. Discover the invisible architectural limit that standard cloud monitoring missed during this massive outage.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.eginnovations.com\/blog\/cloud-application-slowness-when-every-team-says-its-not-my-problem\/","og_locale":"en_US","og_type":"article","og_title":"Cloud Application Slowness: When Every Team Says 'It's Not My Problem' | eG Innovations","og_description":"A major AWS scale-up triggered widespread user errors, yet every siloed dashboard reported perfectly healthy. Discover the invisible architectural limit that standard cloud monitoring missed during this massive outage.","og_url":"https:\/\/www.eginnovations.com\/blog\/cloud-application-slowness-when-every-team-says-its-not-my-problem\/","og_site_name":"eG Innovations","article_publisher":"https:\/\/www.facebook.com\/eGInnovations","article_published_time":"2026-03-03T11:40:41+00:00","article_modified_time":"2026-03-12T10:45:13+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/Social-Banner-Cloud-Application-Slowness-1.png","type":"image\/png"}],"author":"Arun Aravamudhan","twitter_card":"summary_large_image","twitter_creator":"@https:\/\/x.com\/perfclarity","twitter_site":"@eginnovations","twitter_misc":{"Written by":"Arun Aravamudhan","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.eginnovations.com\/blog\/cloud-application-slowness-when-every-team-says-its-not-my-problem\/#article","isPartOf":{"@id":"https:\/\/www.eginnovations.com\/blog\/cloud-application-slowness-when-every-team-says-its-not-my-problem\/"},"author":{"name":"Arun Aravamudhan","@id":"https:\/\/www.eginnovations.com\/blog\/#\/schema\/person\/d788cb81df96a940429c3f5a3b294a6a"},"headline":"Cloud Application Slowness: When Every Team Says &#8216;It&#8217;s Not My Problem&#8217;","datePublished":"2026-03-03T11:40:41+00:00","dateModified":"2026-03-12T10:45:13+00:00","mainEntityOfPage":{"@id":"https:\/\/www.eginnovations.com\/blog\/cloud-application-slowness-when-every-team-says-its-not-my-problem\/"},"wordCount":2469,"publisher":{"@id":"https:\/\/www.eginnovations.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.eginnovations.com\/blog\/cloud-application-slowness-when-every-team-says-its-not-my-problem\/#primaryimage"},"thumbnailUrl":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/Thumbnail-Banner-Cloud-Application-Slowness.png","articleSection":["Cloud Monitoring"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.eginnovations.com\/blog\/cloud-application-slowness-when-every-team-says-its-not-my-problem\/","url":"https:\/\/www.eginnovations.com\/blog\/cloud-application-slowness-when-every-team-says-its-not-my-problem\/","name":"Cloud Application Slowness: When Every Team Says 'It's Not My Problem' | eG Innovations","isPartOf":{"@id":"https:\/\/www.eginnovations.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.eginnovations.com\/blog\/cloud-application-slowness-when-every-team-says-its-not-my-problem\/#primaryimage"},"image":{"@id":"https:\/\/www.eginnovations.com\/blog\/cloud-application-slowness-when-every-team-says-its-not-my-problem\/#primaryimage"},"thumbnailUrl":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/Thumbnail-Banner-Cloud-Application-Slowness.png","datePublished":"2026-03-03T11:40:41+00:00","dateModified":"2026-03-12T10:45:13+00:00","description":"A major AWS scale-up triggered widespread user errors, yet every siloed dashboard reported perfectly healthy. Discover the invisible architectural limit that standard cloud monitoring missed during this massive outage.","breadcrumb":{"@id":"https:\/\/www.eginnovations.com\/blog\/cloud-application-slowness-when-every-team-says-its-not-my-problem\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.eginnovations.com\/blog\/cloud-application-slowness-when-every-team-says-its-not-my-problem\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.eginnovations.com\/blog\/cloud-application-slowness-when-every-team-says-its-not-my-problem\/#primaryimage","url":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/Thumbnail-Banner-Cloud-Application-Slowness.png","contentUrl":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2026\/02\/Thumbnail-Banner-Cloud-Application-Slowness.png","width":362,"height":235},{"@type":"BreadcrumbList","@id":"https:\/\/www.eginnovations.com\/blog\/cloud-application-slowness-when-every-team-says-its-not-my-problem\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.eginnovations.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Cloud Application Slowness: When Every Team Says &#8216;It&#8217;s Not My Problem&#8217;"}]},{"@type":"WebSite","@id":"https:\/\/www.eginnovations.com\/blog\/#website","url":"https:\/\/www.eginnovations.com\/blog\/","name":"eG Innovations","description":"IT Performance Monitoring Insights","publisher":{"@id":"https:\/\/www.eginnovations.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.eginnovations.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.eginnovations.com\/blog\/#organization","name":"eG Innovations","alternateName":"eg innovations","url":"https:\/\/www.eginnovations.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.eginnovations.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2014\/07\/eg-logo-dark-gray1_new.jpg","contentUrl":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2014\/07\/eg-logo-dark-gray1_new.jpg","width":362,"height":235,"caption":"eG Innovations"},"image":{"@id":"https:\/\/www.eginnovations.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/eGInnovations","https:\/\/x.com\/eginnovations"]},{"@type":"Person","@id":"https:\/\/www.eginnovations.com\/blog\/#\/schema\/person\/d788cb81df96a940429c3f5a3b294a6a","name":"Arun Aravamudhan","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.eginnovations.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/7ff42334d908fb4060880a4487331e4a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/7ff42334d908fb4060880a4487331e4a?s=96&d=mm&r=g","caption":"Arun Aravamudhan"},"sameAs":["https:\/\/www.linkedin.com\/in\/arun-aravamudhan\/","https:\/\/x.com\/https:\/\/x.com\/perfclarity"],"url":"https:\/\/www.eginnovations.com\/blog\/author\/arun-aravamudhan\/"}]}},"modified_by":"eG Innovations","_links":{"self":[{"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/posts\/38982","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/comments?post=38982"}],"version-history":[{"count":2,"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/posts\/38982\/revisions"}],"predecessor-version":[{"id":39231,"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/posts\/38982\/revisions\/39231"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/media\/39020"}],"wp:attachment":[{"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/media?parent=38982"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/categories?post=38982"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/tags?post=38982"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}