Skip to main content

AI Services: Introduction to the Anomaly Arena

Author: Scott Reader, Created: 2024-07-30

In response to customers' growing need for anomaly detection and data normalization, the AI Services team has created a fully developed web app built on top of SensibleAI Studio known as the Anomaly Arena. This article provides a detailed overview of the Anomaly Arena, highlighting its features, user personas, and visualization dashboards for identifying and cleansing erroneous data points.

What is the Anomaly Arena?

The Anomaly Arena is a web application offering advanced tools for anomaly detection, visualization, and normalization. It is equipped with six distinct anomaly detection routines:

  1. Level Shift:
  • Statistical method that calculates and compares median values over a forward and backward rolling window for a given dataset. If the median of the forward rolling window is sufficiently higher or lower than the median of the backward rolling window, the point is marked as a level shift anomaly.
  1. Volatility Shift:
  • Statistical method that calculates and compares standard deviation values over a forward and backward rolling window in a given dataset. If the standard deviation of the forward rolling window is sufficiently higher or lower than the standard deviation of the backward rolling window, the point is marked as a volatility anomaly.
  1. Z-Score:
  • A statistical method that calculates the Z-Score for each data point and evaluates high-point and low-point anomalies based on the number of standard deviations from the dataset’s mean. Anomalies are identified when the absolute Z-Score exceeds a predefined threshold.
  1. Isolation Forest:
  • A tree-based ensemble statistical method that isolates anomalies by randomly, recursively partitioning the data into subsets. Anomalies are determined by distance, where nodes without any (or few) children are anomalies.
  1. Matrix Profile:
  • Statistical method that calculates the pairwise distances of subsequences in a dataset and builds them up into a matrix. Anomalies are identified based on deviations from expected patterns of the dataset.
  1. Sarima:
  • A model-based method that works by fitting a SARIMA model to historical data, forecasting future values based on trend and seasonality, and identifying anomalies where the actual value exceeds a pre-identified threshold in comparison to the forecasted value.

The Anomaly Arena also features interactive dashboards for visualizing and evaluating anomalies across datasets, then creates a cleaning manifest that allows users to determine which anomalies should be cleansed for further processing.

Who uses the Anomaly Arena?

The Anomaly Arena caters to two primary user personas: the Builder and the End User.

The Builder

Builders are responsible for setting up and fine-tuning anomaly detector configurations. Their tasks include:

  • Setting data bounds and rolling window sizes.
  • Tuning hyperparameters through JSON scripts (with future updates planned for button-click dashboards).
  • Fitting algorithms to data.
  • Running predictive models to identify anomalies in historical data.
  • Updating and running additional cleansing processes as needed.

The End User

End Users, typically business users or central support team members, consume the output of anomaly runs. Their responsibilities include:

  • Utilizing anomaly visualization dashboards.
  • Analyzing the timing and magnitude of anomalies.
  • Advising on the application of proposed cleansing methods based on business knowledge.
  • Identifying features or events that may have caused anomalies.

How does the Anomaly Arena work?

The Anomaly Arena operates through two main pages: Runs and Visualize.

Runs Page

The Runs page is where Builders create and manage anomaly detection routines. The process involves:

  • Creating a new routine and naming it.
  • Connecting to the desired database.
  • Selecting the source data table and specifying key columns.
  • Configuring the appropriate anomaly detectors.
  • Fitting the model to historical data.
  • Running predictions over a specified time horizon.

Visualize Page

Once anomaly detection runs are completed, End Users access the Visualize page to review results. Key dashboards include:

  • Anomaly Aggregate: Displays anomaly counts over time, with options to filter by detection routine and timeframe.
  • Anomaly Detail - Anomaly Name Filter TS: Allows filtering by anomaly detection type and name, showing details by associated dimensions and targets.
  • Anomaly Detail - Target Dim Drill Down: Identifies anomalies by target, aiding in pinpointing problematic areas for further investigation.
  • Anomaly Time Series Detail - Anomaly Overview TS: Provides a comprehensive overview of anomalies by target and dimension, with explanations for each detected anomaly.
  • Anomaly Time Series Detail - Target Filter TS: Allows users to see an overlay view of a target's historical values along with any identified anomalies across the snapshot time horizon.
  • Anomaly Time Series Detail - Monthly Aggregate: Consolidates daily data into monthly levels for easier analysis.
  • Anomaly Cleansed Dashboard: Compares historical actuals with cleansed values, quantifying the difference and allowing users to decide on further cleansing actions.

Conclusion

The Anomaly Arena represents a valuable addition to the AI Services suite. By providing powerful detection routines, intuitive dashboards, and comprehensive user functionalities, it empowers business users to effectively and efficiently identify and normalize anomalous data. As the Anomaly Arena continues to evolve, it will become an indispensable tool for data-driven decision-making and strategic planning.

Was this page helpful?