What really matters in IT event management? Short answer: meaningful alerts. A bit longer answer: a single pane of glass combining real-time event analytic visualizations with contextual alerts sent in real time. Why? Event-monitoring systems show their true value only when a component is trending towards failure or has already failed (for example, a solid-state storage device [SSD] in a rack-mounted storage array).
With proper configuration and pertinent events streaming in, event-monitoring systems can continually perform single-level and multilevel analytics to determine if anomalies are occurring, if thresholds are about to be breached, or if they have already been breached. When such a condition exists and it continues to exist for a user-specified period, meaningful alerts can be sent via appropriate channels to assist with prompt resolution of the issue.
This is all that really matters.
I can hear you asking, “But what about all the beautiful visualizations I display on my wall monitors showing the state of my systems?”
They’re great – if someone happens to be looking at them when issues are beginning to occur. After the fact, these displays are of minimal value given the likelihood of cascade failures. They do look great on the walls, though!
When an alert is received, the ability to quickly synchronize the alert time with the visualizations is key. That’s when the visualizations show their true worth. A skilled practitioner looks at them and needs to figure out what is causing the issue. The ability to quickly view the alerting component, associated components, and drill down quickly are the hallmarks of useful analytic visualizations.
Key capabilities of an event-monitoring system
Meaningful alerts and analytic visualizations that can be synchronized with the alert time are the two primary criteria that must be evaluated when selecting an event-monitoring system. If an event monitoring system has top-tier capabilities in these two areas, it will be used, and its return on investment (ROI) will be easy to measure.
The reason is simple.
The underlying cause(s) of the alert will eventually be determined. Once the causes are identified, additional analytic processing will be set up to monitor those components for the condition that caused the original alert. A library of analytic-processing workflows will be built over time that continually monitor the component problem signatures associated with previous issues.
All systems will improve over time, as the continual monitoring becomes better at noticing issues before they become problems. For example, a storage array manufacturer might set up up different analytic processing workflows for different SSD manufacturers or batches of SSDs based on previous issues.
Other event-monitoring system criteria to evaluate, in decreasing order of importance, are:
- Ease of creating analytic processing on input data streams
- Richness of analytic library functions (statistical, predictive, and so on)
- Scaling capability with increases in input volumes
- Alert hierarchies (like a rack of SSDs versus individual SSDs)
How easy it is to set up input streams, parse input streams, create users, and so on are all necessary features. However, they don’t constitute the core value that is needed for effective IT event management.
SAP’s IT operations analytics focuses on what is important. It leverages the analytic power, processing speed, and scaling capabilities of the SAP HANA platform. And that’s why it deserves to be on your evaluation short list. To find out more about IT operations analytics: