GenericDataMonitor

Versions

0.1.0

v0.1.0

Basic Information

Class Name: GenericDataMonitor

Title: Generic Data Monitor

Version: 0.1.0

Author: Chris Bahr

Organization: OneStream

Creation Date: 2025-12-11

Default Routine Memory Capacity: 2.0 GB

Description

Short Description

The generic, pass-through Data Monitor

Long Description

The Generic Data Monitor is used to act as a data monitor that materializes data from a configured tabular connection and validates it against a predefined schema. Using the Generic Data Monitor, developers can specify dimension and fact column configurations through an intuitive list-based interface and provide a tabular connection. The data will be materialized from the connection and automatically cast to match the configured schema types. The scan schema must remain consistent per instance.

Use Cases

1. Integrate Anomaly Detection Capabilities

Developers can easily integrate Anomaly Detection capabilities into their solution using the Generic Data Monitor. By pre-calculating statistics on a dataset, users can create instances of the Generic Data Monitor, and create accompanying rules to identify particular outliers or anomalies. Leveraging the Data Rule Manager, users of a solution leveraging the Generic Data Monitor can add additional Data Rules for what might constitute an anomaly, based on the statistics or calculated fields made available by the Data Monitor.

Routine Methods

1. Init (Constructor)

Method: __init__
- Type: Constructor
- Allow In-Memory Execution: Yes
- Read Only: No
- Method Limits: There are no method limits for the constructor as it simply sets the schema and assigns member variables.
- Outputs Dynamic Artifacts: No
- Short Description:
  - Constructor for the Generic Data Monitor routine.
- Detailed Description:
  - Initialize the Generic Data Monitor with a fixed scan schema configured via constructor parameters.
- Inputs:
  - Required Input
    - Data Source: The source data that will be passed through to the scan table as output of the scan method.
      - Name: data_connection
      - Tooltip:
        
        Detail:
        
        Click on the drop down to specify your dataset source
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
      - Type: Must be an instance of Tabular Connection
      - Nested Model: Tabular Connection
        
        Required Input
        
        Connection: The connection type to use to access the source data.
        
        Name: tabular_connection
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: Must be one of the following
        
        SQL Server Connection
        
        Required Input
        
        Database Resource: The name of the database resource to connect to.
        
        Name: database_resource
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Database Name: The name of the database to connect to.
        
        Name: database_name
        
        Tooltip:
        
        Detail:
        
        Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Table Name: The name of the table to use.
        
        Name: table_name
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        MetaFileSystem Connection
        
        Required Input
        
        Connection Key: The MetaFileSystem connection key.
        
        Name: connection_key
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: MetaFileSystemConnectionKey
        
        File Path: The full file path to the file to ingest.
        
        Name: file_path
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Partitioned MetaFileSystem Connection
        
        Required Input
        
        Connection Key: The MetaFileSystem connection key.
        
        Name: connection_key
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: MetaFileSystemConnectionKey
        
        File Type: The type of files to read from the directory.
        
        Name: file_type
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: FileExtensions_
        
        Directory Path: The full directory path containing partitioned tabular files.
        
        Name: directory_path
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
    - Schema Name: Logical name for the scan schema (e.g., 'SalesMetrics', 'UserActivity').
      - Name: schema_name
      - Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
      - Type: str
    - Dimension Columns: Dimension columns are typically used for identifying target dimensions or rows of the dataset.
      - Name: dimension_columns
      - Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
      - Type: list[ScanTableColumnParameters]
    - Fact Columns: Fact columns typically include data that is calculated.
      - Name: fact_columns
      - Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
      - Type: list[ScanTableColumnParameters]
- Artifacts: No artifacts are returned by this method

2. Scan (Method)

Method: scan
- Type: Method
- Memory Capacity: 2.0 GB
- Allow In-Memory Execution: Yes
- Read Only: Yes
- Method Limits: N/A
- Outputs Dynamic Artifacts: No
- Short Description:
  - Perform a scan that materializes data from the configured tabular connection.
- Detailed Description:
  - The data is automatically cast to match the configured schema types. If columns cannot be cast to the expected types, an error is raised. Missing required columns will cause an error, but extra columns are logged and ignored.
- Inputs:
  - No input parameters
- Artifacts:
  - Scan Table Result: The table result that is outputted from a scan.
    - Qualified Key Annotation: scan_table
    - Aggregate Artifact: False
    - In-Memory Json Accessible: False
    - File Annotations:
      - artifacts_/@scan_table/data_/data_<int>.parquet
        
        A partitioned set of parquet files where each file will have no more than 1000000 rows.

3. Schema (Method)

Method: schema
- Type: Method
- Memory Capacity: 2.0 GB
- Allow In-Memory Execution: Yes
- Read Only: Yes
- Method Limits: There are no limits for this method as it simply returns the creates the scan table output schema based on the configured data_connection.
- Outputs Dynamic Artifacts: No
- Short Description:
  - Defines the schema of the Scan table
- Detailed Description:
  - Used by the Data Monitor system to determine the output schema of the scan
- Inputs:
  - No input parameters
- Artifacts:
  - Scan Schema: The table schema that is expected to be used for all scans done from an instantiated DataMonitor.
    - Qualified Key Annotation: table_schema
    - Aggregate Artifact: False
    - In-Memory Json Accessible: False
    - File Annotations:
      - artifacts_/@table_schema/data_/table_schema.json
        
        A JSON file of a TableSchema object.

4. Update Data Connection (Method)

Method: update_data_connection
- Type: Method
- Allow In-Memory Execution: Yes
- Read Only: No
- Method Limits: N/A
- Outputs Dynamic Artifacts: No
- Short Description:
  - Updates the data connection.
- Detailed Description:
  - Allows the caller to update the location from which the routine should read in the data
- Inputs:
  - Required Input
    - Connection: The connection type to use to access the source data.
      - Name: tabular_connection
      - Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
      - Type: Must be one of the following
        
        SQL Server Connection
        
        Required Input
        
        Database Resource: The name of the database resource to connect to.
        
        Name: database_resource
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Database Name: The name of the database to connect to.
        
        Name: database_name
        
        Tooltip:
        
        Detail:
        
        Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Table Name: The name of the table to use.
        
        Name: table_name
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        MetaFileSystem Connection
        
        Required Input
        
        Connection Key: The MetaFileSystem connection key.
        
        Name: connection_key
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: MetaFileSystemConnectionKey
        
        File Path: The full file path to the file to ingest.
        
        Name: file_path
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Partitioned MetaFileSystem Connection
        
        Required Input
        
        Connection Key: The MetaFileSystem connection key.
        
        Name: connection_key
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: MetaFileSystemConnectionKey
        
        File Type: The type of files to read from the directory.
        
        Name: file_type
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: FileExtensions_
        
        Directory Path: The full directory path containing partitioned tabular files.
        
        Name: directory_path
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
- Artifacts: No artifacts are returned by this method

Interface Definitions

1. Data Monitor Interface

An interface class to properly define a DataMonitor concrete implementation.

This IDataMonitor interface enforces a common set of methods that are expected to be implemented.

Interface Methods:

1. Schema

Method Name: schema

Short Description: Abstract Schema Method

Detailed Description: It is expected that this method returns the schema that is expected to be outputted for all scans outputted from this data monitor. In other words, this schema CANNOT change across scans.

Inputs: No properties

Input Schema (JSON):

{
  "properties": {},
  "title": "XperiflowNullParameters",
  "type": "object"
}

Artifacts:

Property	Type	Required	Description
`table_schema`	`TableSchema`	Yes	The table schema that is expected to be used for all scans done from an instantiated DataMonitor.

Artifact Schema (JSON):

{
  "additionalProperties": true,
  "description": "The schema that are expected to be outputted from a scan.\n\nAttributes:\n    table_schema: The table schema that is expected to be used for all scans done from an instantiated DataMonitor.",
  "properties": {
    "table_schema": {
      "description": "The table schema that is expected to be used for all scans done from an instantiated DataMonitor.",
      "io_factory_kwargs": {},
      "preview_factory_kwargs": null,
      "preview_factory_type": null,
      "statistic_factory_kwargs": null,
      "statistic_factory_type": null,
      "title": "Scan Schema",
      "type": "TableSchema"
    }
  },
  "required": [
    "table_schema"
  ],
  "title": "ScanSchemaArtifacts",
  "type": "object"
}

2. Scan

Method Name: scan

Short Description: Abstract Scan Method

Detailed Description: This method is expected to return the scan result artifacts that are outputted from a scan. The table artifact must always return the same polar schema as defined by the TableSchema object returned by the schema method. Note that you are allowed to subclass the ScanResultArtifacts class to include additional arbitrary artifacts that are outputted from a scan.

Inputs: No properties

Input Schema (JSON):

{
  "properties": {},
  "title": "XperiflowNullParameters",
  "type": "object"
}

Artifacts:

Property	Type	Required	Description
`scan_table`	`DataFrame`	Yes	The table result that is outputted from a scan.

Artifact Schema (JSON):

{
  "additionalProperties": true,
  "description": "The artifacts that are outputted from a scan.\n\nAttributes:\n    scan_table: The table result that is outputted from a scan. This should be of the SAME Schema that is defined from the table_schema",
  "properties": {
    "scan_table": {
      "description": "The table result that is outputted from a scan.",
      "io_factory_kwargs": {},
      "preview_factory_kwargs": null,
      "preview_factory_type": null,
      "statistic_factory_kwargs": null,
      "statistic_factory_type": null,
      "title": "Scan Table Result",
      "type": "DataFrame"
    }
  },
  "required": [
    "scan_table"
  ],
  "title": "ScanResultArtifacts",
  "type": "object"
}

Developer Docs

Routine Typename: GenericDataMonitor

Method Name	Artifact Keys
`__init__`	N/A
`scan`	scan_table
`schema`	table_schema
`update_data_connection`	N/A

Versions​

v0.1.0​

Basic Information​

Tags​

Description​

Short Description​

Long Description​

Use Cases​

1. Integrate Anomaly Detection Capabilities​

Routine Methods​

1. Init (Constructor)​

2. Scan (Method)​

3. Schema (Method)​

4. Update Data Connection (Method)​

Interface Definitions​

1. Data Monitor Interface​

1. Schema​

2. Scan​

Developer Docs​