I attended many sessions at the Gartner IT Operations and Management Summit, 2010 last week. One of these sessions by David Williams, a Gartner analyst, was thoroughly enlightening and enjoyable. Several interesting points to consider for folks in the IT infrastructure management space:
- The top 3 business priorities on the CIO agenda for 2010 (based on a Gartner survey): Improving business processes, reducing enterprise cost, and increasing the use of information/analytics. The top three technology priorities on the CIO agenda for 2010 – Virtualization, Cloud Computing and Web 2.0. You can see the survey results here – http://www.gartner.com/it/page.jsp?id=1283413.
- “Good Enough” monitoring may be just fine! Very often we try to define SLAs that are too strict and tend to then search for ways to enforce these SLAs. Referring to the SLA that Google offers for Google Apps, David mentioned that Google’s SLA indicates that less than 10 mins of downtime is not considered as a downtime! You can see the entire SLA with this clause mentioned here – http://www.google.com/apps/intl/en/terms/sla.html. None of the Google Apps users have really complained about this. This is an example of good enough monitoring. Over-engineering to ensure that things never fail is not necessary, and likewise, you may not want to install monitoring tools that are expensive, difficult to configure, yet providing the highest details on all your servers as insurance. May be you dont need these tools running all the time and a less expensive, easier to use tool that provides most of what you need to know on a continuous basis is sufficient.
- Monitoring services, rather than silos is becoming increasingly important. Gartner’s IT Infrastructure Operations Maturity Model reflects this. Organizations at level 4 in the model are looking at aligning operations with IT services, whereas organizations at level 5 are looking to partner to enable businesses to succeed. For business service management to succeed, it is imperative for management to be collaborative so clear ownership of events exist and monitoring is a shared responsibility. Related to this is the need to “dumb down” management, so it does not require an expert to be involved every time in troubleshooting. At eG Innovations, we have been talking about Collaborative Management for several years. See our related whitepaper on this topic here.
- Fault management as we know it is no longer important or relevant. Mean time to repair is ceasing to be an important metric. MTTR assumes something is broken. The new paradigm is “Outage Avoidance – Fix IT before it Breaks!”.
Taking this one step further, it is no longer important to have people in the NOC to look at red/green/blue lights. It is important to have Analysts who can analyze IT performance at every tier of the infrastructure and can determine where action needs to be taken proactively to avoid an outage. Of course, if the monitoring tools can include built-in analytics to proactively alert administrators, even better!
Finally, the take away point for enterprises – “Dont allow siloed teams to make tools decisions!”. Of course, this is easier said than done!