Solving the last mile challenge of operations noise reduction with AIOps

IT noise can be defined as any piece of information that a first responder has to deal with in order to solve a problem that is affecting their business – and that is not contributing to the understanding or resolution of that issue. By Guy Fighel, General Manager AIOps & VP of Product Engineering at New Relic.

  • 4 years ago Posted in

IT noise is irrelevant information that is making it harder to spot and solve problems. It’s an issue because in an IT operational environment, every second IT is not doing what it’s supposed to it means potential revenue loss. Today, modern technologies – such as AIOPs – is helping teams reduce operations noise significantly. Over the next decade, experts and innovators will continually be pushing to eliminate the last mile of noise reduction. But how?

 

Applying intelligence throughout the DevOps cycle

Rather than narrowing your IT approach to one specific aspect of the incident response process, teams should strengthen the relationships between each stage of the process to create a more powerful solution. Focusing only on faster detection, faster understanding, faster response, or faster follow-up is not enough; teams need a comprehensive tool that thinks like their best SREs—from a systems perspective.

 

Tapping intelligent assistance

Understanding the root cause and determining steps to resolution usually account for the majority of the time between an issue occurring and its remediation. To achieve this, teams need useful context about existing issues, including their classification based on the “Four Golden Signals” (latency, traffic, errors, and saturation) and correlated issues from across an environment.

 

Leveraging smarter tools for creating perfect software

In order to help customers create stellar software, experiences, and businesses, it’s critical to embrace solutions that are easy to connect and configure, work with the tools teams already use, create value throughout the entire observability process, and learn from data patterns and user feedback to get smarter over time. AI is one more step in this journey.

 

Enter AIOps

DevOps, SRE, and on-call teams rely on a multitude of tools to detect and respond to incidents. This ever-growing list of tools can pose problems: incident, event, and telemetry data is fragmented, siloed, or redundant, making it harder to find the information needed to diagnose and resolve incidents.

 

AIOps platforms promise to solve these problems with a centralized, intelligent feed of incident information that displays everything you need to troubleshoot and respond to problems, all behind a single pane of glass. Unlocking this value, though, can require a significant time commitment and workflow shift, potentially costing teams hundreds of hours in integration, configuration, training, and on-boarding tasks.

 

Delivering operations noise reduction and augmentation teams

On-call teams are familiar with noisy alerts triggered by low-priority, irrelevant, or flapping issues. These can lead to pager fatigue, cause distractions, and increase the probability that a critical signal will go unnoticed.

 

Developing a system of AIOPs whereby AI is augmenting humans means IT teams can achieve an IT noise-free production environment. Via the AIOps system, the user will get a much richer problem description with details of all the sub-incidents in one single notification, enabling them to more easily identify the root cause of the issue and solve all sub-incidents at once. Once operations understand what is wrong in one incidence it is much easier for them to solve the problem.

In summary, IT operations teams that want to successfully eliminate IT noise need to apply the aforementioned different techniques to augment IT operations teams and ultimately impact the bottom line positively.

By John Kreyling, Managing Director, Centiel UK.
By David de Santiago, Group AI & Digital Services Director at OCS.
By Krishna Sai, Senior VP of Technology and Engineering.
By Danny Lopez, CEO of Glasswall.
By Oz Olivo, VP, Product Management at Inrupt.
By Jason Beckett, Head of Technical Sales, Hitachi Vantara.