Skip to main content

AnomalyArena

Versions

v1.0.0

Basic Information

Class Name: AnomalyArena

Title: Anomaly Arena Routine

Version: 1.0.0

Author: Evan Rasmussen

Organization: OneStream

Creation Date: 2024-04-23

Default Routine Memory Capacity: 2.0 GB

Tags

Anomaly, Point Anomaly, Level Shift Anomaly, Data Preprocessing, Time Series

Description

Short Description

The Anomaly Arena Routine gives users the ability to run multiple anomaly detector routines on the same dataset at once.

Long Description

This routine is designed to simplify the process for finding a variety of anomalies in a dataset. By nature, anomalies are typically unknown ahead of time, meaning that it is difficult to know which anomaly detection algorithm will work best. The Anomaly Arena routine simplifies this process by allowing users to specify any and all anomaly detection algorithms they would like to run on a dataset.

Use Cases

1. Which Anomaly Detector Should I Use?

I am a data scientist working with a new dataset tracking daily sales of my company's products over the last six years. I have conducted some initial exploratory data analysis to gain a better understanding of the data, but I would like to paint a more complete picture of the data by identifying any anomalies. I am unsure which anomaly detection algorithm would be best suited for this, as I don't know what types of anomalies may exist in the dataset. I would like to try several different algorithms to see what kind of results I get. Rather than creating new instances, fitting the models, and predicting anomalies for each algorithm, I will simply use the Anomaly Arena Routine to create a single instance that will run all of the algorithms I choose on my dataset. This will save me time and allow me to get more comprehensive insights into the anomalies contained in my data. Once I have identified the anomalies present in my dataset, I will use the Time Series Cleaning Routine to clean the anomalous data points and prepare the dataset for further analysis.

Routine Methods

1. Init (Constructor)
  • Method: __init__
    • Type: Constructor

    • Memory Capacity: 2.0 GB

    • Allow In-Memory Execution: No

    • Read Only: No

    • Method Limits: There are no limits to the constructor method. This method simply saves the input parameters to be utilized in subsequent runs of the fit and predict methods.

    • Outputs Dynamic Artifacts: No

    • Short Description:

      • Constructor method for the Anomaly Arena Routine.
    • Detailed Description:

      • This method is responsible for creating all the anomaly detector instances specified in the input parameters, storing relevant information to reuse between each detector.
    • Inputs:

      • Required Input
        • Source Data Definition: The source data definition to use.
          • Name: source_data_definition
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: Must be an instance of Time Series Source Data
          • Nested Model: Time Series Source Data
            • Required Input
              • Connection: The connection to the source data.
                • Name: data_connection
                • Tooltip:
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: Must be an instance of Tabular Connection
                • Nested Model: Tabular Connection
                  • Required Input
                    • Connection: The connection type to use to access the source data.
                      • Name: tabular_connection
                      • Tooltip:
                        • Validation Constraints:
                          • This input may be subject to other validation constraints at runtime.
                      • Type: Must be one of the following
                        • SQL Server Connection
                          • Required Input
                            • Database Resource: The name of the database resource to connect to.
                              • Name: database_resource
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: str
                            • Database Name: The name of the database to connect to.
                              • Name: database_name
                              • Tooltip:
                                • Detail:
                                  • Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: str
                            • Table Name: The name of the table to use.
                              • Name: table_name
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: str
                        • MetaFileSystem Connection
                          • Required Input
                            • Connection Key: The MetaFileSystem connection key.
                              • Name: connection_key
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: MetaFileSystemConnectionKey
                            • File Path: The full file path to the file to ingest.
                              • Name: file_path
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: str
                        • Partitioned MetaFileSystem Connection
                          • Required Input
                            • Connection Key: The MetaFileSystem connection key.
                              • Name: connection_key
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: MetaFileSystemConnectionKey
                            • File Type: The type of files to read from the directory.
                              • Name: file_type
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: FileExtensions_
                            • Directory Path: The full directory path containing partitioned tabular files.
                              • Name: directory_path
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: str
              • Dimension Columns: The columns to use as dimensions.
                • Name: dimension_columns
                • Tooltip:
                  • Validation Constraints:
                    • The input must have a minimum length of 1.
                    • This input may be subject to other validation constraints at runtime.
                • Type: list[str]
              • Date Column: The column to use as the date.
                • Name: date_column
                • Tooltip:
                  • Detail:
                    • The date column must in a DateTime readable format.
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: str
              • Value Column: The column to use as the value.
                • Name: value_column
                • Tooltip:
                  • Detail:
                    • The value column must be a numeric (int, float, double, decimal, etc.) column.
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: str
        • Feature Data Definition: The feature data definition to use.
          • Name: feature_data_definitions
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: list[TimeSeriesTableDefinition]
        • Group Name: Group name for your anomaly detector.
          • Name: group_name
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: str
        • Anomaly Detector Configurations: The configurations for each anomaly detector to be used in the Anomaly Arena Routine.
          • Name: anomaly_detector_configurations
          • Tooltip:
            • Validation Constraints:
              • The input must have a minimum length of 1.
              • This input may be subject to other validation constraints at runtime.
          • Type: list[AnomalyArenaConfigurationAdaptorParameters]
    • Artifacts: No artifacts are returned by this method

2. Fit (Method)
  • Method: fit
    • Type: Method

    • Memory Capacity: 2.0 GB

    • Allow In-Memory Execution: No

    • Read Only: No

    • Method Limits: On a daily dataset with 15K targets, 2.25M rows, and 6 columns, this method completed in about one and a half hours with 100GB of memory allocated. Each target has daily data from 2000-01-01 to 2000-05-29. For this method run the time range parameter was used with the start date as 2000-01-01 and the end date as 2000-03-26. The size of this dataset is approximately the largest dataset this method can handle. The runtime and scale this method can handle is dependent on how many detectors set up in the constructor. The dataset size and runtime provided by this method limit represent the worst case scenario where every detector is configured.

    • Outputs Dynamic Artifacts: No

    • Short Description:

      • Fit the anomaly detectors to the data.
    • Detailed Description:

      • This method loops over all the anomaly detector instances calling their fit methods. The same input parameters are used for all individual detector fits. Strategies for fitting the detectors will vary depending on the type of anomaly detector used.
    • Inputs:

      • Optional Input
        • Date Range: The date range to fit anomalies on.
          • Name: time_range
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: Must be an instance of Start and End Date
          • Nested Model: Start and End Date
            • Required Input
              • Start Date: The inclusive start of the date range (MM/DD/YYYY).
                • Name: start_date
                • Tooltip:
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: datetime
              • End Date: The inclusive end of the date range (MM/DD/YYYY).
                • Name: end_date
                • Tooltip:
                  • Detail:
                    • Note, the Seasonal ARIMA Anomaly Detector Routine treats the end date as exclusive.
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: datetime
    • Artifacts: No artifacts are returned by this method

3. Predict (Method)
  • Method: predict
    • Type: Method

    • Memory Capacity: 2.0 GB

    • Allow In-Memory Execution: No

    • Read Only: No

    • Method Limits: On a daily dataset with 15K targets, 2.25M rows, and 6 columns, this method completed in about one hour and twenty minutes with 100GB of memory allocated. Each target has daily data from 2000-01-01 to 2000-05-29. For this method run the time range parameter was used with the start date as 2000-03-27 and the end date as 2000-05-29. The size of this dataset is approximately the largest dataset this method can handle. The runtime and scale this method can handle is dependent on how many detectors set up in the constructor. The dataset size and runtime provided by this method limit represent the worst case scenario where every detector is configured.

    • Outputs Dynamic Artifacts: No

    • Short Description:

      • Predict anomalies using the fitted anomaly detectors.
    • Detailed Description:

      • This method loops over all the anomaly detector instances and predicts anomalies using the same input parameters for all individual detectors. Strategies for predicting anomalies will vary depending on the type of anomaly detector used. The artifact dataframes from all detectors are concatenated into a single artifact, containing the anomaly snapshots, anomaly dates, and anomaly instances dataframes. A summary PDF report is generated highlighting which detectors were included in the run along with the number of anomalies each one detected, the most common anomalous datapoints and types of anomalies detected between all detectors, and other relevant information.
    • Inputs:

      • Required Input
        • State Info Definition: The snapshot name and description.
          • Name: state_info
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: Must be an instance of State Info
          • Nested Model: State Info
            • Required Input
              • Name: The name of the anomaly detector instance.
                • Name: snapshot_name
                • Tooltip:
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: str
              • Snapshot Description: The description of your anomaly detector instance.
                • Name: snapshot_description
                • Tooltip:
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: str
      • Optional Input
        • Date Range: The date range to predict anomalies on.
          • Name: time_range
          • Tooltip:
            • Detail:
              • If None, entire dataset will be used.
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: Must be an instance of Start and End Date
          • Nested Model: Start and End Date
            • Required Input
              • Start Date: The inclusive start of the date range (MM/DD/YYYY).
                • Name: start_date
                • Tooltip:
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: datetime
              • End Date: The inclusive end of the date range (MM/DD/YYYY).
                • Name: end_date
                • Tooltip:
                  • Detail:
                    • Note, the Seasonal ARIMA Anomaly Detector Routine treats the end date as exclusive.
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: datetime
    • Artifacts:

      • Anomaly Detection Artifacts: The combined artifact dataframes from each anomaly detection routine

        • Qualified Key Annotation: anomaly_artifacts
        • Aggregate Artifact: True
        • In-Memory Json Accessible: False
        • File Annotations:
          • artifacts_/@anomaly_artifacts/data_
            • Folder containing inner artifacts
        • Nested Artifacts:
      • AnomalySnapshot: Parquet file containing data about your anomaly detection run.

        • Qualified Key Annotation: anomaly_artifacts.anomaly_snapshot
        • Aggregate Artifact: False
        • In-Memory Json Accessible: False
        • File Annotations:
          • artifacts_/@anomaly_artifacts/data_/anomaly_snapshot/data_/data_<int>.parquet
            • A partitioned set of parquet files where each file will have no more than 1000000 rows.
      • Specific Anomaly Dates: Parquet file containing data about the specific dates an anomaly was detected.

        • Qualified Key Annotation: anomaly_artifacts.anomaly_dates
        • Aggregate Artifact: False
        • In-Memory Json Accessible: False
        • File Annotations:
          • artifacts_/@anomaly_artifacts/data_/anomaly_dates/data_/data_<int>.parquet
            • A partitioned set of parquet files where each file will have no more than 1000000 rows.
      • Specific Anomaly Instances: Parquet file containing data about the specific anomaly instances that were detected.

        • Qualified Key Annotation: anomaly_artifacts.anomaly_instance
        • Aggregate Artifact: False
        • In-Memory Json Accessible: False
        • File Annotations:
          • artifacts_/@anomaly_artifacts/data_/anomaly_instance/data_/data_<int>.parquet
            • A partitioned set of parquet files where each file will have no more than 1000000 rows.
      • Anomaly Summarization Report: A PDF summarization report of the anomaly detection routines and the HTML contents used to generate it

        • Qualified Key Annotation: anomaly_summarization
        • Aggregate Artifact: False
        • In-Memory Json Accessible: False
        • File Annotations:
          • artifacts_/@anomaly_summarization/data_/document.pdf
            • A pdf variant of the html file. Please note the interactivity that may be found in the html is lost within the pdf variant.
          • artifacts_/@anomaly_summarization/data_/html_content.html
            • The html content.

Interface Definitions

No interface definitions found for this routine

Developer Docs

Routine Typename: AnomalyArena

Method NameArtifact Keys
__init__N/A
fitN/A
predictanomaly_artifacts, anomaly_summarization

Was this page helpful?