{"id":13387,"date":"2020-12-11T08:32:42","date_gmt":"2020-12-11T13:32:42","guid":{"rendered":"https:\/\/www.eginnovations.com\/blog\/?p=13387"},"modified":"2022-08-23T00:15:34","modified_gmt":"2022-08-23T04:15:34","slug":"what-is-chaos-engineering","status":"publish","type":"post","link":"https:\/\/www.eginnovations.com\/blog\/what-is-chaos-engineering\/","title":{"rendered":"What is Chaos Engineering and <br\/>Why is it Important?"},"content":{"rendered":"<div class=\"inner_content\">Robust, resilient IT systems are crucial to data-driven operations. Whether these systems drive internal processes or deliver customer-facing services, the need for reliability and availability remains the same.So, why would you deliberately try to break your services?<\/p>\n<h2>What is \u2018Chaos Monkey\u2019?<\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignright size-full wp-image-13470\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/chaos-engineering-logo.jpg\" alt=\"\" width=\"180\" height=\"180\" border=\"0\" srcset=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/chaos-engineering-logo.jpg 180w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/chaos-engineering-logo-150x150.jpg 150w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/chaos-engineering-logo-140x140.jpg 140w\" sizes=\"auto, (max-width: 180px) 100vw, 180px\" \/>Chaos engineering does just that &#8211; deliberately terminating instances in your production environment. Online video streaming service Netflix was one of the first organizations to popularize the concept with their <a href=\"https:\/\/netflix.github.io\/chaosmonkey\/\">Chaos Monkey engine<\/a>.<\/p>\n<p>When Netflix began migrating to the cloud in 2010, they found a potential problem with hosted infrastructure &#8211; hosts could be terminated and replaced at any moment, potentially affecting quality of service. To ensure a smooth streaming experience, their systems needed to be able to manage these terminations seamlessly.<\/p>\n<p>To assist with testing, Netflix developers created \u2018Chaos Monkey\u2019. This application runs in the background of Netflix operations, terminating services randomly.<\/p>\n<p>The Netflix Chaos Monkey is perhaps the <a href=\"https:\/\/www.techrepublic.com\/article\/aws-outage-how-netflix-weathered-the-storm-by-preparing-for-the-worst\/\">best-known example<\/a> of chaos engineering. And as cloud services mature, this chaos engineering methodology will gain in popularity.<\/p>\n<h3>Why would you deliberately break your IT systems?<\/h3>\n<p>At the heart of the chaos engineering model is the concept of deliberately breaking things <em>in your production environment<\/em>. But why would you do that? Why not restrict testing to the dev environment?<\/p>\n<div style=\"font-size: 1em; background: #FFC107; line-height: 35px; padding: 10px 20px; margin-bottom: 15px; border: 1px solid #ddd; font-weight: bold; border-radius: 5px;\">\n<p>\u201cNo amount of testing can prove software right; a single test can prove software wrong.\u201d<\/p>\n<div style=\"text-align: right; padding-top: 0px; padding-bottom: 0px; font-size: 18px;\">\u2013 Amir Ghahrai<\/div>\n<\/div>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignright size-full wp-image-13469\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/chaos-engineering-labs.jpg\" alt=\"\" width=\"300\" height=\"200\" border=\"0\" srcset=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/chaos-engineering-labs.jpg 300w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/chaos-engineering-labs-140x93.jpg 140w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/>Using chaos engineering principles, you introduce an important element of randomness into testing and accelerate the process of identifying single points of failure. System failures are rarely predictable, and the chaos monkey can surface issues that have not been previously considered. If you only ever test for what you think may break, other important issues may be overlooked.<\/p>\n<p>In this way, random outages help to keep testing honest. The testing scripts cannot be skewed, shortened, or cheated, and every fault identified is real &#8211; you can literally see the problem and its effects.<\/p>\n<h3>Benefits of Chaos Engineering<\/h3>\n<p>Conducting tests on the production system is quite a high risk. You will probably need a relatively robust, mature platform before you unleash the chaos monkey. However, there are also some benefits.<\/p>\n<p>First, you probably do not have to replicate your entire production environment for testing, which helps to <a href=\"https:\/\/www.zdnet.com\/article\/cloud-computing-heres-how-much-a-huge-outage-could-cost-you\/\">reduce costs<\/a>. It is also almost impossible to properly the simulate effects at scale in a development environment.<\/p>\n<div style=\"font-size: 1em; background: #FFC107; line-height: 35px; padding: 10px 20px; margin-bottom: 15px; border: 1px solid #ddd; font-weight: bold; border-radius: 5px;\">\n<p>\u201cThe impact of an extended outage would depend on the scale of the cloud provider: an incident that takes a top-three cloud provider offline in the US for three to six days would result in losses of between $6.9bn and $14.7bn, and between $1.5bn and $2.8bn in industry insured losses. A cyber-incident that takes a 10th to 15th placed cloud provider offline in the US for three to six days would result in losses of between $1.1bn to $2.1bn and between $220m and $450 million in industry insured losses.\u201d<\/p>\n<div style=\"text-align: right; padding-top: 0px; padding-bottom: 0px; font-size: 18px;\">\u2013 ZDNet<\/div>\n<\/div>\n<p>Second, there is an added incentive to address issues quickly. Any breakages caused by the chaos monkey need to be fixed as fast as possible to maintain an adequate level of service for customers. It\u2019s also worth remembering that building fixes in the production environment will dramatically reduce time to deployment.<\/p>\n<h3>Breaking things the correct way<\/h3>\n<p>Developing meaningful fixes after a chaos monkey breakage is often a two step-process: a quick \u2018patch\u2019 to restore operations followed by a more in-depth code update.<\/p>\n<p style=\"margin-bottom: 15px;\">Chaos tests are best performed in four cases:<\/p>\n<ol>\n<li>When deploying new code<\/li>\n<li>When adding dependencies<\/li>\n<li>As usage patterns change<\/li>\n<li>When mitigating problems<\/li>\n<\/ol>\n<p>Although random, chaos tests should not be completely uncontrolled. In many cases, the monkey should only be unleashed on a sub-section of the system to test a specific hypothesis. Only if the test is passed, you should widen the scope of the test to assess other parts of the system.<\/p>\n<p>Along with randomly crashing services, chaos engineering also requires an effective monitoring system. This will help you assess the impact and severity of an outage and its effect on the user experience, for instance. Application tracing is absolutely critical for identifying the source of any failure and the modules that require work.<\/p>\n<p>Wherever the chaos monkey exposes blind spots in your system design, <a href=\"https:\/\/www.eginnovations.com\/product\/application-performance-monitoring\">application monitoring<\/a> can help you understand them better. This allows you to formulate robust fixes and updates. Monitoring will also allow you to assess the efficacy of each fix, verify that future outages can be prevented, and that the system continues to meet your performance requirements.<\/p>\n<p style=\"margin-bottom: 15px;\">From a customer\/user-facing perspective, monitoring also allows you to assess the impact on <a href=\"https:\/\/www.eginnovations.com\/product\/end-user-experience-monitoring\">digital user experience<\/a>:<\/p>\n<ul>\n<li>How has the outage affected performance?<\/li>\n<li>Does the degradation in service fall below standard?<\/li>\n<li>Did the outage breach any SLAs?<\/li>\n<li>Are you dealing with a single point of failure, or are there multiple factors at fault?<\/li>\n<\/ul>\n<p>With ongoing <a href=\"https:\/\/www.eginnovations.com\/blog\/what-is-application-performance-monitoring\/\/\">application and performance monitoring<\/a>, you can continue to assess user experience. Importantly, the chaos engineering and development teams can also provide empirical proof of any improvements or failings. By taking guesswork out of patches and fixes, you can allocate better resources to appropriate tasks that will yield the greatest benefit to your users.<\/p>\n<h3>Maximizing your chaos engineering potential<\/h3>\n<p>eG Enterprise offers extensive applications and platform monitoring functions, allowing you to assess current system health &#8211; and the effects of every chaos monkey-inspired failure.<\/p>\n<p><a href=\"https:\/\/www.eginnovations.com\/product\/application-performance-monitoring\/free-trial\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-13388\" src=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/chaos-engineering-bottom-banner.jpg\" alt=\"\" width=\"850\" height=\"150\" border=\"0\" srcset=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/chaos-engineering-bottom-banner.jpg 850w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/chaos-engineering-bottom-banner-300x53.jpg 300w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/chaos-engineering-bottom-banner-800x141.jpg 800w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/chaos-engineering-bottom-banner-310x55.jpg 310w, https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/chaos-engineering-bottom-banner-140x25.jpg 140w\" sizes=\"auto, (max-width: 850px) 100vw, 850px\" \/><\/a><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Robust, resilient IT systems are crucial to data-driven operations. Whether these systems drive internal processes or deliver customer-facing services, the need for reliability and availability remains the same.So, why would you deliberately try to break your services? What is \u2018Chaos Monkey\u2019? Chaos engineering does just that &#8211; deliberately terminating instances in your production environment. Online [&hellip;]<\/p>\n","protected":false},"author":28,"featured_media":21278,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[371,375],"tags":[],"class_list":["post-13387","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-application-performance-monitoring-apm","category-general"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>What is Chaos Engineering &amp; Testing | eG Innovations<\/title>\n<meta name=\"description\" content=\"Why would you deliberately try to break your own IT systems? Learn what Chaos Engineering is and how it can strengthen your entire technical environment.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.eginnovations.com\/blog\/what-is-chaos-engineering\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"What is Chaos Engineering &amp; Testing | eG Innovations\" \/>\n<meta property=\"og:description\" content=\"Why would you deliberately try to break your own IT systems? Learn what Chaos Engineering is and how it can strengthen your entire technical environment.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.eginnovations.com\/blog\/what-is-chaos-engineering\/\" \/>\n<meta property=\"og:site_name\" content=\"eG Innovations\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/eGInnovations\" \/>\n<meta property=\"article:published_time\" content=\"2020-12-11T13:32:42+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-08-23T04:15:34+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/Chaos-Engineering-fb.jpg\" \/>\n<meta name=\"author\" content=\"Abhilash Warrier\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"What is Chaos Engineering &amp; Testing | eG Innovations\" \/>\n<meta name=\"twitter:description\" content=\"Why would you deliberately try to break your own IT systems? Learn what Chaos Engineering is and how it can strengthen your entire technical environment.\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/Chaos-Engineering-fb.jpg\" \/>\n<meta name=\"twitter:creator\" content=\"@https:\/\/twitter.com\/PoetAbhilash\" \/>\n<meta name=\"twitter:site\" content=\"@eginnovations\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Abhilash Warrier\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"What is Chaos Engineering & Testing | eG Innovations","description":"Why would you deliberately try to break your own IT systems? Learn what Chaos Engineering is and how it can strengthen your entire technical environment.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.eginnovations.com\/blog\/what-is-chaos-engineering\/","og_locale":"en_US","og_type":"article","og_title":"What is Chaos Engineering & Testing | eG Innovations","og_description":"Why would you deliberately try to break your own IT systems? Learn what Chaos Engineering is and how it can strengthen your entire technical environment.","og_url":"https:\/\/www.eginnovations.com\/blog\/what-is-chaos-engineering\/","og_site_name":"eG Innovations","article_publisher":"https:\/\/www.facebook.com\/eGInnovations","article_published_time":"2020-12-11T13:32:42+00:00","article_modified_time":"2022-08-23T04:15:34+00:00","og_image":[{"url":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/Chaos-Engineering-fb.jpg","type":"","width":"","height":""}],"author":"Abhilash Warrier","twitter_card":"summary_large_image","twitter_title":"What is Chaos Engineering & Testing | eG Innovations","twitter_description":"Why would you deliberately try to break your own IT systems? Learn what Chaos Engineering is and how it can strengthen your entire technical environment.","twitter_image":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/Chaos-Engineering-fb.jpg","twitter_creator":"@https:\/\/twitter.com\/PoetAbhilash","twitter_site":"@eginnovations","twitter_misc":{"Written by":"Abhilash Warrier","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.eginnovations.com\/blog\/what-is-chaos-engineering\/#article","isPartOf":{"@id":"https:\/\/www.eginnovations.com\/blog\/what-is-chaos-engineering\/"},"author":{"name":"Abhilash Warrier","@id":"https:\/\/www.eginnovations.com\/blog\/#\/schema\/person\/3814c1ba0ca3fb4bb33acbb2989679af"},"headline":"What is Chaos Engineering and Why is it Important?","datePublished":"2020-12-11T13:32:42+00:00","dateModified":"2022-08-23T04:15:34+00:00","mainEntityOfPage":{"@id":"https:\/\/www.eginnovations.com\/blog\/what-is-chaos-engineering\/"},"wordCount":891,"commentCount":0,"publisher":{"@id":"https:\/\/www.eginnovations.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.eginnovations.com\/blog\/what-is-chaos-engineering\/#primaryimage"},"thumbnailUrl":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/Chaos-Engineering-Thumbnail.jpg","articleSection":["Application Performance Monitoring (APM)","General"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.eginnovations.com\/blog\/what-is-chaos-engineering\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.eginnovations.com\/blog\/what-is-chaos-engineering\/","url":"https:\/\/www.eginnovations.com\/blog\/what-is-chaos-engineering\/","name":"What is Chaos Engineering & Testing | eG Innovations","isPartOf":{"@id":"https:\/\/www.eginnovations.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.eginnovations.com\/blog\/what-is-chaos-engineering\/#primaryimage"},"image":{"@id":"https:\/\/www.eginnovations.com\/blog\/what-is-chaos-engineering\/#primaryimage"},"thumbnailUrl":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/Chaos-Engineering-Thumbnail.jpg","datePublished":"2020-12-11T13:32:42+00:00","dateModified":"2022-08-23T04:15:34+00:00","description":"Why would you deliberately try to break your own IT systems? Learn what Chaos Engineering is and how it can strengthen your entire technical environment.","breadcrumb":{"@id":"https:\/\/www.eginnovations.com\/blog\/what-is-chaos-engineering\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.eginnovations.com\/blog\/what-is-chaos-engineering\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.eginnovations.com\/blog\/what-is-chaos-engineering\/#primaryimage","url":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/Chaos-Engineering-Thumbnail.jpg","contentUrl":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2020\/12\/Chaos-Engineering-Thumbnail.jpg","width":362,"height":235},{"@type":"BreadcrumbList","@id":"https:\/\/www.eginnovations.com\/blog\/what-is-chaos-engineering\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.eginnovations.com\/blog\/"},{"@type":"ListItem","position":2,"name":"What is Chaos Engineering and Why is it Important?"}]},{"@type":"WebSite","@id":"https:\/\/www.eginnovations.com\/blog\/#website","url":"https:\/\/www.eginnovations.com\/blog\/","name":"eG Innovations","description":"IT Performance Monitoring Insights","publisher":{"@id":"https:\/\/www.eginnovations.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.eginnovations.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.eginnovations.com\/blog\/#organization","name":"eG Innovations","alternateName":"eg innovations","url":"https:\/\/www.eginnovations.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.eginnovations.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2014\/07\/eg-logo-dark-gray1_new.jpg","contentUrl":"https:\/\/www.eginnovations.com\/blog\/wp-content\/uploads\/2014\/07\/eg-logo-dark-gray1_new.jpg","width":362,"height":235,"caption":"eG Innovations"},"image":{"@id":"https:\/\/www.eginnovations.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/eGInnovations","https:\/\/x.com\/eginnovations"]},{"@type":"Person","@id":"https:\/\/www.eginnovations.com\/blog\/#\/schema\/person\/3814c1ba0ca3fb4bb33acbb2989679af","name":"Abhilash Warrier","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.eginnovations.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/dcaf620d284dd73c0cde0f986f69ad99?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/dcaf620d284dd73c0cde0f986f69ad99?s=96&d=mm&r=g","caption":"Abhilash Warrier"},"sameAs":["https:\/\/x.com\/https:\/\/twitter.com\/PoetAbhilash"],"url":"https:\/\/www.eginnovations.com\/blog\/author\/abhilash-warriereginnovations-com\/"}]}},"modified_by":"HawkSEM Dev","_links":{"self":[{"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/posts\/13387","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/users\/28"}],"replies":[{"embeddable":true,"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/comments?post=13387"}],"version-history":[{"count":0,"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/posts\/13387\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/media\/21278"}],"wp:attachment":[{"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/media?parent=13387"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/categories?post=13387"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.eginnovations.com\/blog\/wp-json\/wp\/v2\/tags?post=13387"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}