Skip to main content

GenericDataMonitor

Versions

v0.1.0

Basic Information

Class Name: GenericDataMonitor

Title: Generic Data Monitor

Version: 0.1.0

Author: Chris Bahr

Organization: OneStream

Creation Date: 2025-12-11

Default Routine Memory Capacity: 2.0 GB

Tags

Data Monitor, Data Monitoring, Data Analysis

Description

Short Description

The generic, pass-through Data Monitor

Long Description

The Generic Data Monitor is used to act as a data monitor that materializes data from a configured tabular connection and validates it against a predefined schema. Using the Generic Data Monitor, developers can specify dimension and fact column configurations through an intuitive list-based interface and provide a tabular connection. The data will be materialized from the connection and automatically cast to match the configured schema types. The scan schema must remain consistent per instance.

Use Cases

1. Integrate Anomaly Detection Capabilities

Developers can easily integrate Anomaly Detection capabilities into their solution using the Generic Data Monitor. By pre-calculating statistics on a dataset, users can create instances of the Generic Data Monitor, and create accompanying rules to identify particular outliers or anomalies. Leveraging the Data Rule Manager, users of a solution leveraging the Generic Data Monitor can add additional Data Rules for what might constitute an anomaly, based on the statistics or calculated fields made available by the Data Monitor.

Routine Methods

1. Init (Constructor)
  • Method: __init__
    • Type: Constructor

    • Allow In-Memory Execution: Yes

    • Read Only: No

    • Method Limits: There are no method limits for the constructor as it simply sets the schema and assigns member variables.

    • Outputs Dynamic Artifacts: No

    • Short Description:

      • Constructor for the Generic Data Monitor routine.
    • Detailed Description:

      • Initialize the Generic Data Monitor with a fixed scan schema configured via constructor parameters.
    • Inputs:

      • Required Input
        • Data Source: The source data that will be passed through to the scan table as output of the scan method.
          • Name: data_connection
          • Tooltip:
            • Detail:
              • Click on the drop down to specify your dataset source
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: Must be an instance of Tabular Connection
          • Nested Model: Tabular Connection
            • Required Input
              • Connection: The connection type to use to access the source data.
                • Name: tabular_connection
                • Tooltip:
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: Must be one of the following
                  • SQL Server Connection
                    • Required Input
                      • Database Resource: The name of the database resource to connect to.
                        • Name: database_resource
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: str
                      • Database Name: The name of the database to connect to.
                        • Name: database_name
                        • Tooltip:
                          • Detail:
                            • Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: str
                      • Table Name: The name of the table to use.
                        • Name: table_name
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: str
                  • MetaFileSystem Connection
                    • Required Input
                      • Connection Key: The MetaFileSystem connection key.
                        • Name: connection_key
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: MetaFileSystemConnectionKey
                      • File Path: The full file path to the file to ingest.
                        • Name: file_path
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: str
                  • Partitioned MetaFileSystem Connection
                    • Required Input
                      • Connection Key: The MetaFileSystem connection key.
                        • Name: connection_key
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: MetaFileSystemConnectionKey
                      • File Type: The type of files to read from the directory.
                        • Name: file_type
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: FileExtensions_
                      • Directory Path: The full directory path containing partitioned tabular files.
                        • Name: directory_path
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: str
        • Schema Name: Logical name for the scan schema (e.g., 'SalesMetrics', 'UserActivity').
          • Name: schema_name
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: str
        • Dimension Columns: Dimension columns are typically used for identifying target dimensions or rows of the dataset.
          • Name: dimension_columns
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: list[ScanTableColumnParameters]
        • Fact Columns: Fact columns typically include data that is calculated.
          • Name: fact_columns
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: list[ScanTableColumnParameters]
    • Artifacts: No artifacts are returned by this method

2. Scan (Method)
  • Method: scan
    • Type: Method

    • Memory Capacity: 2.0 GB

    • Allow In-Memory Execution: Yes

    • Read Only: Yes

    • Method Limits: N/A

    • Outputs Dynamic Artifacts: No

    • Short Description:

      • Perform a scan that materializes data from the configured tabular connection.
    • Detailed Description:

      • The data is automatically cast to match the configured schema types. If columns cannot be cast to the expected types, an error is raised. Missing required columns will cause an error, but extra columns are logged and ignored.
    • Inputs:

      • No input parameters
    • Artifacts:

      • Scan Table Result: The table result that is outputted from a scan.
        • Qualified Key Annotation: scan_table
        • Aggregate Artifact: False
        • In-Memory Json Accessible: False
        • File Annotations:
          • artifacts_/@scan_table/data_/data_<int>.parquet
            • A partitioned set of parquet files where each file will have no more than 1000000 rows.
3. Schema (Method)
  • Method: schema
    • Type: Method

    • Memory Capacity: 2.0 GB

    • Allow In-Memory Execution: Yes

    • Read Only: Yes

    • Method Limits: There are no limits for this method as it simply returns the creates the scan table output schema based on the configured data_connection.

    • Outputs Dynamic Artifacts: No

    • Short Description:

      • Defines the schema of the Scan table
    • Detailed Description:

      • Used by the Data Monitor system to determine the output schema of the scan
    • Inputs:

      • No input parameters
    • Artifacts:

      • Scan Schema: The table schema that is expected to be used for all scans done from an instantiated DataMonitor.
        • Qualified Key Annotation: table_schema
        • Aggregate Artifact: False
        • In-Memory Json Accessible: False
        • File Annotations:
          • artifacts_/@table_schema/data_/table_schema.json
            • A JSON file of a TableSchema object.
4. Update Data Connection (Method)
  • Method: update_data_connection
    • Type: Method

    • Allow In-Memory Execution: Yes

    • Read Only: No

    • Method Limits: N/A

    • Outputs Dynamic Artifacts: No

    • Short Description:

      • Updates the data connection.
    • Detailed Description:

      • Allows the caller to update the location from which the routine should read in the data
    • Inputs:

      • Required Input
        • Connection: The connection type to use to access the source data.
          • Name: tabular_connection
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: Must be one of the following
            • SQL Server Connection
              • Required Input
                • Database Resource: The name of the database resource to connect to.
                  • Name: database_resource
                  • Tooltip:
                    • Validation Constraints:
                      • This input may be subject to other validation constraints at runtime.
                  • Type: str
                • Database Name: The name of the database to connect to.
                  • Name: database_name
                  • Tooltip:
                    • Detail:
                      • Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
                    • Validation Constraints:
                      • This input may be subject to other validation constraints at runtime.
                  • Type: str
                • Table Name: The name of the table to use.
                  • Name: table_name
                  • Tooltip:
                    • Validation Constraints:
                      • This input may be subject to other validation constraints at runtime.
                  • Type: str
            • MetaFileSystem Connection
              • Required Input
                • Connection Key: The MetaFileSystem connection key.
                  • Name: connection_key
                  • Tooltip:
                    • Validation Constraints:
                      • This input may be subject to other validation constraints at runtime.
                  • Type: MetaFileSystemConnectionKey
                • File Path: The full file path to the file to ingest.
                  • Name: file_path
                  • Tooltip:
                    • Validation Constraints:
                      • This input may be subject to other validation constraints at runtime.
                  • Type: str
            • Partitioned MetaFileSystem Connection
              • Required Input
                • Connection Key: The MetaFileSystem connection key.
                  • Name: connection_key
                  • Tooltip:
                    • Validation Constraints:
                      • This input may be subject to other validation constraints at runtime.
                  • Type: MetaFileSystemConnectionKey
                • File Type: The type of files to read from the directory.
                  • Name: file_type
                  • Tooltip:
                    • Validation Constraints:
                      • This input may be subject to other validation constraints at runtime.
                  • Type: FileExtensions_
                • Directory Path: The full directory path containing partitioned tabular files.
                  • Name: directory_path
                  • Tooltip:
                    • Validation Constraints:
                      • This input may be subject to other validation constraints at runtime.
                  • Type: str
    • Artifacts: No artifacts are returned by this method

Interface Definitions

1. Data Monitor Interface

An interface class to properly define a DataMonitor concrete implementation.

This IDataMonitor interface enforces a common set of methods that are expected to be implemented.

Interface Methods:

1. Schema

Method Name: schema

Short Description: Abstract Schema Method

Detailed Description: It is expected that this method returns the schema that is expected to be outputted for all scans outputted from this data monitor. In other words, this schema CANNOT change across scans.

Inputs: No properties

Input Schema (JSON):

{
"properties": {},
"title": "XperiflowNullParameters",
"type": "object"
}

Artifacts:

PropertyTypeRequiredDescription
table_schemaTableSchemaYesThe table schema that is expected to be used for all scans done from an instantiated DataMonitor.

Artifact Schema (JSON):

{
"additionalProperties": true,
"description": "The schema that are expected to be outputted from a scan.\n\nAttributes:\n table_schema: The table schema that is expected to be used for all scans done from an instantiated DataMonitor.",
"properties": {
"table_schema": {
"description": "The table schema that is expected to be used for all scans done from an instantiated DataMonitor.",
"io_factory_kwargs": {},
"preview_factory_kwargs": null,
"preview_factory_type": null,
"statistic_factory_kwargs": null,
"statistic_factory_type": null,
"title": "Scan Schema",
"type": "TableSchema"
}
},
"required": [
"table_schema"
],
"title": "ScanSchemaArtifacts",
"type": "object"
}
2. Scan

Method Name: scan

Short Description: Abstract Scan Method

Detailed Description: This method is expected to return the scan result artifacts that are outputted from a scan. The table artifact must always return the same polar schema as defined by the TableSchema object returned by the schema method. Note that you are allowed to subclass the ScanResultArtifacts class to include additional arbitrary artifacts that are outputted from a scan.

Inputs: No properties

Input Schema (JSON):

{
"properties": {},
"title": "XperiflowNullParameters",
"type": "object"
}

Artifacts:

PropertyTypeRequiredDescription
scan_tableDataFrameYesThe table result that is outputted from a scan.

Artifact Schema (JSON):

{
"additionalProperties": true,
"description": "The artifacts that are outputted from a scan.\n\nAttributes:\n scan_table: The table result that is outputted from a scan. This should be of the SAME Schema that is defined from the table_schema",
"properties": {
"scan_table": {
"description": "The table result that is outputted from a scan.",
"io_factory_kwargs": {},
"preview_factory_kwargs": null,
"preview_factory_type": null,
"statistic_factory_kwargs": null,
"statistic_factory_type": null,
"title": "Scan Table Result",
"type": "DataFrame"
}
},
"required": [
"scan_table"
],
"title": "ScanResultArtifacts",
"type": "object"
}

Developer Docs

Routine Typename: GenericDataMonitor

Method NameArtifact Keys
__init__N/A
scanscan_table
schematable_schema
update_data_connectionN/A

Was this page helpful?