VMware Inc.

08/11/2022 | News release | Distributed by Public on 08/11/2022 11:43

Root Cause Analysis for Anomalies

In prior blogs, we went over how anomaly detection can help you identify unwanted usage and manage budgets optimally (Avoid Costly Surprises with AI/ML Based Anomaly Detection)as well as give control back to the user to take proactive measures against these anomalies (Take Control of Cloud Costs with CloudHealth Anomaly Detection).

Today we want to dive a bit deeper into anomaly detection and discuss a new topic: Root Cause Analysis (RCA).

Anomaly detection refresher

For a refresher, anomaly detection helps identify unusual cloud spend and alert the user. Early detection of anomalies can enable users to take appropriate action on time - basically, time is of essence.

Anomaly detection, majority of the time, can be divided into two parts:

  • Identify anomalies
  • Identify the reason of an anomaly

Being able to identify anomalies is the first step to solving unusual cloud spend. However, preventing the anomaly from occurring again requires knowing why the anomaly happened at all. This is where Root Cause Analysis comes in to help pinpoint the actual cause of the anomaly and take necessary actions to mitigate it.

Root Cause Analysis explained

It's not sufficient to only detect and report on anomalies - it's only scratching the surface of the issue. Using RCA, you can drill down to resources in cost and usage datasets and understand the root cause of an anomaly.

How CloudHealth can help you reach the root cause

CloudHealth enables users to identify anomalies as well as identify the root cause of the anomalies.

From the chart, they can drill down to resources to identify potential resources that caused the anomaly.

Use case example: Amazon EC2 Instance

Image one: On the anomaly details page, users can click on view root cause button, which will navigate to RCA page. Here the user has option to apply filters, view cost in addition to usage and categorize by usage type and other available dimensions.

Image two: Here, to identify the resources which significantly contributed to this anomaly, sort the unblended cost column of the grid in descending order. By doing sort the user can easily find out top resources with higher unblended cost.

Image three: From this chart, users can drill down to resources to identify potential resources that caused the anomaly.

Use case example: AmazonS3 Service

Image one: On the RCA page, we want to see whether this anomaly occurred due to change in pricing or any change on usage

Image two: To check the usage, we change the y-axis to usage-amount. Here the plotted graph indicates that increase in sudden usage is the cause of this anomaly

Next Steps

Ready to learn more about CloudHealth? Get started today by signing up for a 14-day free trial or learn more in our Resources Center for more CloudHealth content.