Skip to main content

ZScoreAnomalyDetector

Versions

v1.0.0

Basic Information

Class Name: ZScoreAnomalyDetector

Title: Z Score Anomaly Detector

Version: 1.0.0

Author: Simon Vedder

Organization: OneStream

Creation Date: 2024-02-27

Default Routine Memory Capacity: 2.0 GB

Tags

Anomaly, Time Series, Supervised, ML

Description

Short Description

Constructor for the Z-Score Anomaly Detection Routine

Long Description

This routine leverages a rolling window to detect local point anomalies. The mean and standard deviation of the target value is computed over the size of the rolling window, the current value is then compared to the local mean of that target. If the current value is a constant threshold number of standard deviations above or below the local mean, the point will be flagged as an anomaly. Increasing the standard deviations parameter will decrease the sensitivity of the model, meaning less points will be flagged as anomalous.

Use Cases

1. Golden Ticket

Chocolate company A implemented a unique promotion where 1 in 10,000,000 chocolate bars sold contained a golden ticket, granting the finder a visit to their magical facilities. This rare event, designed as a one-time promotion, caused a significant spike in sales, marked as a high point anomaly by our anomaly detection system. Given the uniqueness of this event, it is advisable to remove this outlier from the dataset of company A to maintain the integrity of ongoing sales analysis, as replicating this sales promotion is not planned. Conversely, company B, inspired by company A, decides to run a similar golden ticket promotion, but they will run it at an annual frequency. In this scenario, the promotion becomes a predictable event, influencing regular sales patterns. Thus, for company B, it would be prudent to incorporate these annual promotion dates into the forecasting models to enhance the accuracy of sales predictions and better understand the promotion's impact on regular sales trends. This approach in anomaly handling not only assists in cleaning historical sales data, but also enriches the forecasting process by accounting for predictable high-impact events, thereby allowing companies to strategically plan based on past promotions and their outcomes.

2. Natural Disaster

During natural disasters, sales data can exhibit significant fluctuations, with essential products experiencing a surge in sales, while luxury items may see a drastic drop. Our anomaly detection system is equipped to identify these changes as either high point or low point anomalies. Depending on the nature and predictability of the disaster, different data management strategies may be employed. For unpredictable, one-off disasters, where forecasting future occurrences is not feasible, it is beneficial to cleanse the dataset of these anomalies. This involves removing or adjusting the data points related to the disaster period to maintain the integrity of general sales trends and ensure the accuracy of routine forecasts. Conversely, for natural disasters that can be anticipated, such as seasonal hurricanes or floods, incorporating these events into our forecasting models is crucial. By tagging these sales fluctuations as event-related anomalies in our system, we can enhance our model's accuracy during similar future events. This practice not only improves the model's robustness by accounting for the cyclical nature of such disasters, but also enables more precise adjustments in inventory and pricing strategies in anticipation of changing consumer needs during these periods. Both approaches, whether cleansing the data or enhancing the model with event-specific anomalies, aim to optimize forecasting accuracy under varying circumstances, providing tailored insights that help in strategic planning and operational adjustments.

Routine Methods

1. Init (Constructor)
  • Method: __init__
    • Type: Constructor

    • Memory Capacity: 2.0 GB

    • Allow In-Memory Execution: No

    • Read Only: No

    • Method Limits: There are no limits to the constructor method. This method simply saves the input parameters to be utilized in subsequent runs of the fit and predict methods.

    • Outputs Dynamic Artifacts: No

    • Short Description:

      • Constructor for the Z score Anomaly Detection Routine.
    • Detailed Description:

      • The Z score constructor is used to set the group name, state info, and hyperparameters for this anomaly detection instance.
    • Inputs:

      • Required Input
        • Hyperparameters: The hyper parameters of the anomaly detector.
          • Name: hyper_parameters
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: Must be an instance of Z Score Anomaly Detection Hyper Parameters
          • Nested Model: Z Score Anomaly Detection Hyper Parameters
            • Required Input
              • Rolling Window Size: The size of the rolling window to use in days, weeks, or months depending on data frequency.
                • Name: rolling_window_size
                • Tooltip:
                  • Validation Constraints:
                    • The input must be greater than 0.
                    • This input may be subject to other validation constraints at runtime.
                • Type: int
              • Standard Deviations: The number of standard deviations to use as a threshold for detecting anomalies.
                • Name: standard_deviations
                • Tooltip:
                  • Validation Constraints:
                    • The input must be greater than 0.
                    • This input may be subject to other validation constraints at runtime.
                • Type: float
              • Side: Decides whether to detect anomalies on the positive, negative, or both sides of the data.
                • Name: side
                • Tooltip:
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: SideType_
              • Seasonality: Seasonality in the dataset. Can be 'auto' or an integer.
                • Name: seasonal_period
                • Tooltip:
                  • Detail:
                    • If set to auto, the model will determine the optimal seasonality for each target.
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: int | Literal | NoneType
        • Group Name: Group name for your anomaly detector, used as an identifier in the output artifact.
          • Name: group_name
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: str
    • Artifacts: No artifacts are returned by this method

2. Fit (Method)
  • Method: fit
    • Type: Method

    • Memory Capacity: 2.0 GB

    • Allow In-Memory Execution: No

    • Read Only: No

    • Method Limits: This method loads and saves the input data for use in subsequent predict runs. The method has been tested on various time series datasets at 100 GB memory capacity. With 25K targets and 7.4M rows, this method completed in 1 minute. With 27.5K targets and 20.1M rows, this method completed in 4 minutes. With 40K targets and 29.2M rows, this method completed in 5 minutes. With 50K targets and 7.5M rows, this method completed in 3 minutes.

    • Outputs Dynamic Artifacts: No

    • Short Description:

      • Fit the Z-score anomaly detection model.
    • Detailed Description:

      • The fit method will take the data input and save it. If prior to the predict data, this fit data will help calculate the rolling mean used for the predict method.
    • Inputs:

      • Required Input
        • Source Data Definition: The source data definition to use.
          • Name: source_data_definition
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: Must be an instance of Time Series Source Data
          • Nested Model: Time Series Source Data
            • Required Input
              • Connection: The connection to the source data.
                • Name: data_connection
                • Tooltip:
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: Must be an instance of Tabular Connection
                • Nested Model: Tabular Connection
                  • Required Input
                    • Connection: The connection type to use to access the source data.
                      • Name: tabular_connection
                      • Tooltip:
                        • Validation Constraints:
                          • This input may be subject to other validation constraints at runtime.
                      • Type: Must be one of the following
                        • SQL Server Connection
                          • Required Input
                            • Database Resource: The name of the database resource to connect to.
                              • Name: database_resource
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: str
                            • Database Name: The name of the database to connect to.
                              • Name: database_name
                              • Tooltip:
                                • Detail:
                                  • Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: str
                            • Table Name: The name of the table to use.
                              • Name: table_name
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: str
                        • MetaFileSystem Connection
                          • Required Input
                            • Connection Key: The MetaFileSystem connection key.
                              • Name: connection_key
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: MetaFileSystemConnectionKey
                            • File Path: The full file path to the file to ingest.
                              • Name: file_path
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: str
                        • Partitioned MetaFileSystem Connection
                          • Required Input
                            • Connection Key: The MetaFileSystem connection key.
                              • Name: connection_key
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: MetaFileSystemConnectionKey
                            • File Type: The type of files to read from the directory.
                              • Name: file_type
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: FileExtensions_
                            • Directory Path: The full directory path containing partitioned tabular files.
                              • Name: directory_path
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: str
              • Dimension Columns: The columns to use as dimensions.
                • Name: dimension_columns
                • Tooltip:
                  • Validation Constraints:
                    • The input must have a minimum length of 1.
                    • This input may be subject to other validation constraints at runtime.
                • Type: list[str]
              • Date Column: The column to use as the date.
                • Name: date_column
                • Tooltip:
                  • Detail:
                    • The date column must in a DateTime readable format.
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: str
              • Value Column: The column to use as the value.
                • Name: value_column
                • Tooltip:
                  • Detail:
                    • The value column must be a numeric (int, float, double, decimal, etc.) column.
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: str
        • Feature Data Definition: The feature data definition to use.
          • Name: feature_data_definitions
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: list[TimeSeriesTableDefinition]
      • Optional Input
        • Date Range: The date range to fit anomalies on.
          • Name: time_range
          • Tooltip:
            • Detail:
              • If None, entire dataset will be used.
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: Must be an instance of Start and End Date
          • Nested Model: Start and End Date
            • Required Input
              • Start Date: The inclusive start of the date range (MM/DD/YYYY).
                • Name: start_date
                • Tooltip:
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: datetime
              • End Date: The inclusive end of the date range (MM/DD/YYYY).
                • Name: end_date
                • Tooltip:
                  • Detail:
                    • Note, the Seasonal ARIMA Anomaly Detector Routine treats the end date as exclusive.
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: datetime
    • Artifacts: No artifacts are returned by this method

3. Predict (Method)
  • Method: predict
    • Type: Method

    • Memory Capacity: 2.0 GB

    • Allow In-Memory Execution: No

    • Read Only: No

    • Method Limits: This method uses a rolling window to compute local means and standard deviations for each target, then flags values that exceed the specified number of standard deviations from the local mean. The method has been tested on various time series datasets at 100 GB memory capacity. With 25K targets and 7.4M rows, this method completed in 3 minutes. With 50K targets and 7.5M rows, this method completed in 4 minutes.

    • Outputs Dynamic Artifacts: No

    • Short Description:

      • Detect anomalies using the fit model.
    • Detailed Description:

      • This routine leverages a rolling window to detect local anomalies. The mean and standard deviation of the target value is computed over the size of the rolling window, the current value is then compared to the local mean of that target. If the current value is X standard deviations above or below the local mean, the point will be flagged as an anomaly. Take a look at the constructor input parameters to gain a better understanding of how to fine tune the anomaly detector to extract the desired information.
    • Inputs:

      • Required Input
        • Source Data Definition: The source data definition to use.
          • Name: source_data_definition
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: Must be an instance of Time Series Source Data
          • Nested Model: Time Series Source Data
            • Required Input
              • Connection: The connection to the source data.
                • Name: data_connection
                • Tooltip:
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: Must be an instance of Tabular Connection
                • Nested Model: Tabular Connection
                  • Required Input
                    • Connection: The connection type to use to access the source data.
                      • Name: tabular_connection
                      • Tooltip:
                        • Validation Constraints:
                          • This input may be subject to other validation constraints at runtime.
                      • Type: Must be one of the following
                        • SQL Server Connection
                          • Required Input
                            • Database Resource: The name of the database resource to connect to.
                              • Name: database_resource
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: str
                            • Database Name: The name of the database to connect to.
                              • Name: database_name
                              • Tooltip:
                                • Detail:
                                  • Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: str
                            • Table Name: The name of the table to use.
                              • Name: table_name
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: str
                        • MetaFileSystem Connection
                          • Required Input
                            • Connection Key: The MetaFileSystem connection key.
                              • Name: connection_key
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: MetaFileSystemConnectionKey
                            • File Path: The full file path to the file to ingest.
                              • Name: file_path
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: str
                        • Partitioned MetaFileSystem Connection
                          • Required Input
                            • Connection Key: The MetaFileSystem connection key.
                              • Name: connection_key
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: MetaFileSystemConnectionKey
                            • File Type: The type of files to read from the directory.
                              • Name: file_type
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: FileExtensions_
                            • Directory Path: The full directory path containing partitioned tabular files.
                              • Name: directory_path
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: str
              • Dimension Columns: The columns to use as dimensions.
                • Name: dimension_columns
                • Tooltip:
                  • Validation Constraints:
                    • The input must have a minimum length of 1.
                    • This input may be subject to other validation constraints at runtime.
                • Type: list[str]
              • Date Column: The column to use as the date.
                • Name: date_column
                • Tooltip:
                  • Detail:
                    • The date column must in a DateTime readable format.
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: str
              • Value Column: The column to use as the value.
                • Name: value_column
                • Tooltip:
                  • Detail:
                    • The value column must be a numeric (int, float, double, decimal, etc.) column.
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: str
        • Feature Data Definition: The feature data definition to use.
          • Name: feature_data_definitions
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: list[TimeSeriesTableDefinition]
        • State Info Definition: The snapshot name and description. These will be used as identifiers in the snapshot artifact.
          • Name: state_info
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: Must be an instance of State Info
          • Nested Model: State Info
            • Required Input
              • Name: The name of the anomaly detector instance.
                • Name: snapshot_name
                • Tooltip:
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: str
              • Snapshot Description: The description of your anomaly detector instance.
                • Name: snapshot_description
                • Tooltip:
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: str
      • Optional Input
        • Date Range: The date range to predict anomalies on.
          • Name: time_range
          • Tooltip:
            • Detail:
              • If None, entire dataset will be used.
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: Must be an instance of Start and End Date
          • Nested Model: Start and End Date
            • Required Input
              • Start Date: The inclusive start of the date range (MM/DD/YYYY).
                • Name: start_date
                • Tooltip:
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: datetime
              • End Date: The inclusive end of the date range (MM/DD/YYYY).
                • Name: end_date
                • Tooltip:
                  • Detail:
                    • Note, the Seasonal ARIMA Anomaly Detector Routine treats the end date as exclusive.
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: datetime
    • Artifacts:

      • AnomalySnapshot: Parquet file containing data about your anomaly detection run.

        • Qualified Key Annotation: anomaly_snapshot
        • Aggregate Artifact: False
        • In-Memory Json Accessible: False
        • File Annotations:
          • artifacts_/@anomaly_snapshot/data_/data_<int>.parquet
            • A partitioned set of parquet files where each file will have no more than 1000000 rows.
      • Specific Anomaly Dates: Parquet file containing data about the specific dates an anomaly was detected.

        • Qualified Key Annotation: anomaly_dates
        • Aggregate Artifact: False
        • In-Memory Json Accessible: False
        • File Annotations:
          • artifacts_/@anomaly_dates/data_/data_<int>.parquet
            • A partitioned set of parquet files where each file will have no more than 1000000 rows.
      • Specific Anomaly Instances: Parquet file containing data about the specific anomaly instances that were detected.

        • Qualified Key Annotation: anomaly_instance
        • Aggregate Artifact: False
        • In-Memory Json Accessible: False
        • File Annotations:
          • artifacts_/@anomaly_instance/data_/data_<int>.parquet
            • A partitioned set of parquet files where each file will have no more than 1000000 rows.

Interface Definitions

1. Anomaly Detection Interface

An interface class requiring fit and predict methods to be implemented.

This BaseRoutineInterface class enforces a common interface for all anomaly detection routines. The interface requires each anomaly detection routine to implement a fit method and a predict method with the same input parameters. Each concrete class will have constructor methods where hyperparameters specific to the anomaly detection algorithm may be set, however, this interface does not enforce any specific constructor method.

Interface Methods:

1. Fit

Method Name: fit

Short Description: Abstract Fit Method

Detailed Description: This specifies the necessary input and output parameters for the fit method on all anomaly detection routines. The input parameters contain a source data definition and time range to fit an anomaly detector to.

Inputs:

PropertyTypeRequiredDescription
source_data_definition#/$defs/TimeSeriesTableDefinitionYesThe source data definition to use.
feature_data_definitionsarrayYesThe feature data definition to use.
time_range`#/$defs/StartDateEndDateDefinitionnull`No

Input Schema (JSON):

{
"$defs": {
"FileExtensions_": {
"description": "File Extensions.",
"enum": [
".csv",
".tsv",
".psv",
".parquet",
".xlsx"
],
"title": "FileExtensions_",
"type": "string"
},
"FileTabularConnection": {
"properties": {
"connection_key": {
"$ref": "#/$defs/MetaFileSystemConnectionKey",
"description": "The MetaFileSystem connection key.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "connection_key",
"title": "Connection Key",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"file_path": {
"description": "The full file path to the file to ingest.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.filetable:FileTabularConnection.get_file_path_bound_options",
"options_callback_kwargs": null,
"state_name": "file_path",
"title": "File Path",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"connection_key",
"file_path"
],
"title": "FileTabularConnection",
"type": "object"
},
"MetaFileSystemConnectionKey": {
"enum": [
"sql-server-routine",
"sql-server-shared"
],
"title": "MetaFileSystemConnectionKey",
"type": "string"
},
"PartitionedFileTabularConnection": {
"properties": {
"connection_key": {
"$ref": "#/$defs/MetaFileSystemConnectionKey",
"description": "The MetaFileSystem connection key.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "connection_key",
"title": "Connection Key",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"file_type": {
"$ref": "#/$defs/FileExtensions_",
"description": "The type of files to read from the directory.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "file_info",
"title": "File Type",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"directory_path": {
"description": "The full directory path containing partitioned tabular files.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.partitionedfiletable:PartitionedFileTabularConnection.get_directory_path_bound_options",
"options_callback_kwargs": null,
"state_name": "file_info",
"title": "Directory Path",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"connection_key",
"file_type",
"directory_path"
],
"title": "PartitionedFileTabularConnection",
"type": "object"
},
"SqlTabularConnection": {
"properties": {
"database_resource": {
"description": "The name of the database resource to connect to.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.sqltable:SqlTabularConnection.get_database_resources",
"options_callback_kwargs": null,
"state_name": "database_resource",
"title": "Database Resource",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
},
"database_name": {
"description": "The name of the database to connect to.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.sqltable:SqlTabularConnection.get_database_schemas",
"options_callback_kwargs": null,
"state_name": "database_name",
"title": "Database Name",
"tooltip": "Detail:\nNote: If you don\u2019t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.\n\nValidation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
},
"table_name": {
"description": "The name of the table to use.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.sqltable:SqlTabularConnection.get_tables",
"options_callback_kwargs": null,
"state_name": "table_name",
"title": "Table Name",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"database_resource",
"database_name",
"table_name"
],
"title": "SqlTabularConnection",
"type": "object"
},
"StartDateEndDateDefinition": {
"properties": {
"start_date": {
"description": "The inclusive start of the date range (MM/DD/YYYY).",
"field_type": "input",
"format": "date-time",
"input_component": {
"component_type": "dateselector",
"max_date": null,
"min_date": null
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "date_selection",
"title": "Start Date",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
},
"end_date": {
"description": "The inclusive end of the date range (MM/DD/YYYY).",
"field_type": "input",
"format": "date-time",
"input_component": {
"component_type": "dateselector",
"max_date": null,
"min_date": null
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "date_selection",
"title": "End Date",
"tooltip": "Detail:\nNote, the Seasonal ARIMA Anomaly Detector Routine treats the end date as exclusive.\n\nValidation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"start_date",
"end_date"
],
"title": "StartDateEndDateDefinition",
"type": "object"
},
"TabularConnection": {
"description": "A shared parameter base model dedication to tabular connections.",
"properties": {
"tabular_connection": {
"anyOf": [
{
"$ref": "#/$defs/SqlTabularConnection"
},
{
"$ref": "#/$defs/FileTabularConnection"
},
{
"$ref": "#/$defs/PartitionedFileTabularConnection"
}
],
"description": "The connection type to use to access the source data.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "connection",
"title": "Connection",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
}
},
"required": [
"tabular_connection"
],
"title": "TabularConnection",
"type": "object"
},
"TimeSeriesTableDefinition": {
"description": "A parameter base model dedicated to loading tabular time series data.",
"properties": {
"data_connection": {
"$ref": "#/$defs/TabularConnection",
"description": "The connection to the source data.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "source_connection",
"title": "Connection",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"dimension_columns": {
"description": "The columns to use as dimensions.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"items": {
"type": "string"
},
"long_description": null,
"minItems": 1,
"options_callback": "xperiflow.source.app.routines.pbm.store.tsf.tstable:TimeSeriesTableDefinition.get_table_columns",
"options_callback_kwargs": null,
"state_name": "column_selection",
"title": "Dimension Columns",
"tooltip": "Validation Constraints:\nThe input must have a minimum length of 1.\n\nThis input may be subject to other validation constraints at runtime.",
"type": "array"
},
"date_column": {
"description": "The column to use as the date.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.tsf.tstable:TimeSeriesTableDefinition.get_table_columns",
"options_callback_kwargs": null,
"state_name": "column_selection",
"title": "Date Column",
"tooltip": "Detail:\nThe date column must in a DateTime readable format.\n\nValidation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
},
"value_column": {
"description": "The column to use as the value.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.tsf.tstable:TimeSeriesTableDefinition.get_table_columns",
"options_callback_kwargs": null,
"state_name": "column_selection",
"title": "Value Column",
"tooltip": "Detail:\nThe value column must be a numeric (int, float, double, decimal, etc.) column.\n\nValidation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"data_connection",
"dimension_columns",
"date_column",
"value_column"
],
"title": "TimeSeriesTableDefinition",
"type": "object"
}
},
"properties": {
"source_data_definition": {
"$ref": "#/$defs/TimeSeriesTableDefinition",
"description": "The source data definition to use.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "SourceDefinition",
"title": "Source Data Definition",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"feature_data_definitions": {
"description": "The feature data definition to use.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"items": {
"$ref": "#/$defs/TimeSeriesTableDefinition"
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "FeatureDefinition",
"title": "Feature Data Definition",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "array"
},
"time_range": {
"anyOf": [
{
"$ref": "#/$defs/StartDateEndDateDefinition"
},
{
"type": "null"
}
],
"default": null,
"description": "The date range to fit anomalies on.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "DateRange",
"title": "Date Range",
"tooltip": "Detail:\nIf None, entire dataset will be used.\n\nValidation Constraints:\nThis input may be subject to other validation constraints at runtime."
}
},
"required": [
"source_data_definition",
"feature_data_definitions"
],
"title": "AnomalyDetectionFitParameters",
"type": "object"
}

Artifacts: No artifacts are returned by this method

2. Predict

Method Name: predict

Short Description: Abstract Predict Method

Detailed Description: This specifies the necessary input and output parameters for the predict method on all anomaly detection routines. The input parameters contain a source data definition and a time range to detect anomalies.

Inputs:

PropertyTypeRequiredDescription
source_data_definition#/$defs/TimeSeriesTableDefinitionYesThe source data definition to use.
feature_data_definitionsarrayYesThe feature data definition to use.
time_range`#/$defs/StartDateEndDateDefinitionnull`No
state_info#/$defs/StateInfoDefinitionYesThe snapshot name and description. These will be used as identifiers in the snapshot artifact.

Input Schema (JSON):

{
"$defs": {
"FileExtensions_": {
"description": "File Extensions.",
"enum": [
".csv",
".tsv",
".psv",
".parquet",
".xlsx"
],
"title": "FileExtensions_",
"type": "string"
},
"FileTabularConnection": {
"properties": {
"connection_key": {
"$ref": "#/$defs/MetaFileSystemConnectionKey",
"description": "The MetaFileSystem connection key.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "connection_key",
"title": "Connection Key",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"file_path": {
"description": "The full file path to the file to ingest.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.filetable:FileTabularConnection.get_file_path_bound_options",
"options_callback_kwargs": null,
"state_name": "file_path",
"title": "File Path",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"connection_key",
"file_path"
],
"title": "FileTabularConnection",
"type": "object"
},
"MetaFileSystemConnectionKey": {
"enum": [
"sql-server-routine",
"sql-server-shared"
],
"title": "MetaFileSystemConnectionKey",
"type": "string"
},
"PartitionedFileTabularConnection": {
"properties": {
"connection_key": {
"$ref": "#/$defs/MetaFileSystemConnectionKey",
"description": "The MetaFileSystem connection key.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "connection_key",
"title": "Connection Key",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"file_type": {
"$ref": "#/$defs/FileExtensions_",
"description": "The type of files to read from the directory.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "file_info",
"title": "File Type",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"directory_path": {
"description": "The full directory path containing partitioned tabular files.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.partitionedfiletable:PartitionedFileTabularConnection.get_directory_path_bound_options",
"options_callback_kwargs": null,
"state_name": "file_info",
"title": "Directory Path",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"connection_key",
"file_type",
"directory_path"
],
"title": "PartitionedFileTabularConnection",
"type": "object"
},
"SqlTabularConnection": {
"properties": {
"database_resource": {
"description": "The name of the database resource to connect to.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.sqltable:SqlTabularConnection.get_database_resources",
"options_callback_kwargs": null,
"state_name": "database_resource",
"title": "Database Resource",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
},
"database_name": {
"description": "The name of the database to connect to.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.sqltable:SqlTabularConnection.get_database_schemas",
"options_callback_kwargs": null,
"state_name": "database_name",
"title": "Database Name",
"tooltip": "Detail:\nNote: If you don\u2019t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.\n\nValidation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
},
"table_name": {
"description": "The name of the table to use.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.sqltable:SqlTabularConnection.get_tables",
"options_callback_kwargs": null,
"state_name": "table_name",
"title": "Table Name",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"database_resource",
"database_name",
"table_name"
],
"title": "SqlTabularConnection",
"type": "object"
},
"StartDateEndDateDefinition": {
"properties": {
"start_date": {
"description": "The inclusive start of the date range (MM/DD/YYYY).",
"field_type": "input",
"format": "date-time",
"input_component": {
"component_type": "dateselector",
"max_date": null,
"min_date": null
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "date_selection",
"title": "Start Date",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
},
"end_date": {
"description": "The inclusive end of the date range (MM/DD/YYYY).",
"field_type": "input",
"format": "date-time",
"input_component": {
"component_type": "dateselector",
"max_date": null,
"min_date": null
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "date_selection",
"title": "End Date",
"tooltip": "Detail:\nNote, the Seasonal ARIMA Anomaly Detector Routine treats the end date as exclusive.\n\nValidation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"start_date",
"end_date"
],
"title": "StartDateEndDateDefinition",
"type": "object"
},
"StateInfoDefinition": {
"properties": {
"snapshot_name": {
"description": "The name of the anomaly detector instance.",
"field_type": "input",
"input_component": {
"component_type": "textbox",
"height": null,
"multiline": false
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "snapshot_info",
"title": "Name",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
},
"snapshot_description": {
"description": "The description of your anomaly detector instance.",
"field_type": "input",
"input_component": {
"component_type": "textbox",
"height": null,
"multiline": false
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "snapshot_info",
"title": "Snapshot Description",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"snapshot_name",
"snapshot_description"
],
"title": "StateInfoDefinition",
"type": "object"
},
"TabularConnection": {
"description": "A shared parameter base model dedication to tabular connections.",
"properties": {
"tabular_connection": {
"anyOf": [
{
"$ref": "#/$defs/SqlTabularConnection"
},
{
"$ref": "#/$defs/FileTabularConnection"
},
{
"$ref": "#/$defs/PartitionedFileTabularConnection"
}
],
"description": "The connection type to use to access the source data.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "connection",
"title": "Connection",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
}
},
"required": [
"tabular_connection"
],
"title": "TabularConnection",
"type": "object"
},
"TimeSeriesTableDefinition": {
"description": "A parameter base model dedicated to loading tabular time series data.",
"properties": {
"data_connection": {
"$ref": "#/$defs/TabularConnection",
"description": "The connection to the source data.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "source_connection",
"title": "Connection",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"dimension_columns": {
"description": "The columns to use as dimensions.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"items": {
"type": "string"
},
"long_description": null,
"minItems": 1,
"options_callback": "xperiflow.source.app.routines.pbm.store.tsf.tstable:TimeSeriesTableDefinition.get_table_columns",
"options_callback_kwargs": null,
"state_name": "column_selection",
"title": "Dimension Columns",
"tooltip": "Validation Constraints:\nThe input must have a minimum length of 1.\n\nThis input may be subject to other validation constraints at runtime.",
"type": "array"
},
"date_column": {
"description": "The column to use as the date.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.tsf.tstable:TimeSeriesTableDefinition.get_table_columns",
"options_callback_kwargs": null,
"state_name": "column_selection",
"title": "Date Column",
"tooltip": "Detail:\nThe date column must in a DateTime readable format.\n\nValidation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
},
"value_column": {
"description": "The column to use as the value.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.tsf.tstable:TimeSeriesTableDefinition.get_table_columns",
"options_callback_kwargs": null,
"state_name": "column_selection",
"title": "Value Column",
"tooltip": "Detail:\nThe value column must be a numeric (int, float, double, decimal, etc.) column.\n\nValidation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"data_connection",
"dimension_columns",
"date_column",
"value_column"
],
"title": "TimeSeriesTableDefinition",
"type": "object"
}
},
"description": "\"Note that only most recent fit will be utilized in predictions.\"",
"properties": {
"source_data_definition": {
"$ref": "#/$defs/TimeSeriesTableDefinition",
"description": "The source data definition to use.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "SourceDefinition",
"title": "Source Data Definition",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"feature_data_definitions": {
"description": "The feature data definition to use.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"items": {
"$ref": "#/$defs/TimeSeriesTableDefinition"
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "FeatureDefinition",
"title": "Feature Data Definition",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "array"
},
"time_range": {
"anyOf": [
{
"$ref": "#/$defs/StartDateEndDateDefinition"
},
{
"type": "null"
}
],
"default": null,
"description": "The date range to predict anomalies on.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "DateRange",
"title": "Date Range",
"tooltip": "Detail:\nIf None, entire dataset will be used.\n\nValidation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"state_info": {
"$ref": "#/$defs/StateInfoDefinition",
"description": "The snapshot name and description. These will be used as identifiers in the snapshot artifact.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "StateInfoDefinition",
"title": "State Info Definition",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
}
},
"required": [
"source_data_definition",
"feature_data_definitions",
"state_info"
],
"title": "AnomalyDetectionPredictParameters",
"type": "object"
}

Artifacts:

PropertyTypeRequiredDescription
anomaly_snapshotunknownYesParquet file containing data about your anomaly detection run.
anomaly_datesunknownYesParquet file containing data about the specific dates an anomaly was detected.
anomaly_instanceunknownYesParquet file containing data about the specific anomaly instances that were detected.

Artifact Schema (JSON):

{
"additionalProperties": true,
"properties": {
"anomaly_snapshot": {
"description": "Parquet file containing data about your anomaly detection run.",
"io_factory_kwargs": {},
"preview_factory_kwargs": null,
"preview_factory_type": null,
"statistic_factory_kwargs": null,
"statistic_factory_type": null,
"title": "AnomalySnapshot"
},
"anomaly_dates": {
"description": "Parquet file containing data about the specific dates an anomaly was detected.",
"io_factory_kwargs": {},
"preview_factory_kwargs": null,
"preview_factory_type": null,
"statistic_factory_kwargs": null,
"statistic_factory_type": null,
"title": "Specific Anomaly Dates"
},
"anomaly_instance": {
"description": "Parquet file containing data about the specific anomaly instances that were detected.",
"io_factory_kwargs": {},
"preview_factory_kwargs": null,
"preview_factory_type": null,
"statistic_factory_kwargs": null,
"statistic_factory_type": null,
"title": "Specific Anomaly Instances"
}
},
"required": [
"anomaly_snapshot",
"anomaly_dates",
"anomaly_instance"
],
"title": "AnomalyDetectionArtifacts",
"type": "object"
}

Developer Docs

Routine Typename: ZScoreAnomalyDetector

Method NameArtifact Keys
__init__N/A
fitN/A
predictanomaly_snapshot, anomaly_dates, anomaly_instance

Was this page helpful?