GenericDataMonitor
Versions
v0.1.0
Basic Information
Class Name: GenericDataMonitor
Title: Generic Data Monitor
Version: 0.1.0
Author: Chris Bahr
Organization: OneStream
Creation Date: 2025-12-11
Default Routine Memory Capacity: 2.0 GB
Tags
Data Monitor, Data Monitoring, Data Analysis
Description
Short Description
The generic, pass-through Data Monitor
Long Description
The Generic Data Monitor is used to act as a data monitor that materializes data from a configured tabular connection and validates it against a predefined schema. Using the Generic Data Monitor, developers can specify dimension and fact column configurations through an intuitive list-based interface and provide a tabular connection. The data will be materialized from the connection and automatically cast to match the configured schema types. The scan schema must remain consistent per instance.
Use Cases
1. Integrate Anomaly Detection Capabilities
Developers can easily integrate Anomaly Detection capabilities into their solution using the Generic Data Monitor. By pre-calculating statistics on a dataset, users can create instances of the Generic Data Monitor, and create accompanying rules to identify particular outliers or anomalies. Leveraging the Data Rule Manager, users of a solution leveraging the Generic Data Monitor can add additional Data Rules for what might constitute an anomaly, based on the statistics or calculated fields made available by the Data Monitor.
Routine Methods
1. Init (Constructor)
- Method:
__init__-
Type: Constructor
-
Allow In-Memory Execution: Yes
-
Read Only: No
-
Method Limits: There are no method limits for the constructor as it simply sets the schema and assigns member variables.
-
Outputs Dynamic Artifacts: No
-
Short Description:
- Constructor for the Generic Data Monitor routine.
-
Detailed Description:
- Initialize the Generic Data Monitor with a fixed scan schema configured via constructor parameters.
-
Inputs:
- Required Input
- Data Source: The source data that will be passed through to the scan table as output of the scan method.
- Name:
data_connection - Tooltip:
- Detail:
- Click on the drop down to specify your dataset source
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: Must be an instance of Tabular Connection
- Nested Model: Tabular Connection
- Required Input
- Connection: The connection type to use to access the source data.
- Name:
tabular_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be one of the following
- SQL Server Connection
- Required Input
- Database Resource: The name of the database resource to connect to.
- Name:
database_resource - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Name: The name of the database to connect to.
- Name:
database_name - Tooltip:
- Detail:
- Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Table Name: The name of the table to use.
- Name:
table_name - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Resource: The name of the database resource to connect to.
- Required Input
- MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Path: The full file path to the file to ingest.
- Name:
file_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- Partitioned MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Type: The type of files to read from the directory.
- Name:
file_type - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: FileExtensions_
- Name:
- Directory Path: The full directory path containing partitioned tabular files.
- Name:
directory_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- SQL Server Connection
- Name:
- Connection: The connection type to use to access the source data.
- Required Input
- Name:
- Schema Name: Logical name for the scan schema (e.g., 'SalesMetrics', 'UserActivity').
- Name:
schema_name - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Dimension Columns: Dimension columns are typically used for identifying target dimensions or rows of the dataset.
- Name:
dimension_columns - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: list[ScanTableColumnParameters]
- Name:
- Fact Columns: Fact columns typically include data that is calculated.
- Name:
fact_columns - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: list[ScanTableColumnParameters]
- Name:
- Data Source: The source data that will be passed through to the scan table as output of the scan method.
- Required Input
-
Artifacts: No artifacts are returned by this method
-
2. Scan (Method)
- Method:
scan-
Type: Method
-
Memory Capacity: 2.0 GB
-
Allow In-Memory Execution: Yes
-
Read Only: Yes
-
Method Limits: N/A
-
Outputs Dynamic Artifacts: No
-
Short Description:
- Perform a scan that materializes data from the configured tabular connection.
-
Detailed Description:
- The data is automatically cast to match the configured schema types. If columns cannot be cast to the expected types, an error is raised. Missing required columns will cause an error, but extra columns are logged and ignored.
-
Inputs:
- No input parameters
-
Artifacts:
- Scan Table Result: The table result that is outputted from a scan.
- Qualified Key Annotation:
scan_table - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@scan_table/data_/data_<int>.parquet- A partitioned set of parquet files where each file will have no more than 1000000 rows.
- Qualified Key Annotation:
- Scan Table Result: The table result that is outputted from a scan.
-
3. Schema (Method)
- Method:
schema-
Type: Method
-
Memory Capacity: 2.0 GB
-
Allow In-Memory Execution: Yes
-
Read Only: Yes
-
Method Limits: There are no limits for this method as it simply returns the creates the scan table output schema based on the configured data_connection.
-
Outputs Dynamic Artifacts: No
-
Short Description:
- Defines the schema of the Scan table
-
Detailed Description:
- Used by the Data Monitor system to determine the output schema of the scan
-
Inputs:
- No input parameters
-
Artifacts:
- Scan Schema: The table schema that is expected to be used for all scans done from an instantiated DataMonitor.
- Qualified Key Annotation:
table_schema - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@table_schema/data_/table_schema.json- A JSON file of a TableSchema object.
- Qualified Key Annotation:
- Scan Schema: The table schema that is expected to be used for all scans done from an instantiated DataMonitor.
-
4. Update Data Connection (Method)
- Method:
update_data_connection-
Type: Method
-
Allow In-Memory Execution: Yes
-
Read Only: No
-
Method Limits: N/A
-
Outputs Dynamic Artifacts: No
-
Short Description:
- Updates the data connection.
-
Detailed Description:
- Allows the caller to update the location from which the routine should read in the data
-
Inputs:
- Required Input
- Connection: The connection type to use to access the source data.
- Name:
tabular_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be one of the following
- SQL Server Connection
- Required Input
- Database Resource: The name of the database resource to connect to.
- Name:
database_resource - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Name: The name of the database to connect to.
- Name:
database_name - Tooltip:
- Detail:
- Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Table Name: The name of the table to use.
- Name:
table_name - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Resource: The name of the database resource to connect to.
- Required Input
- MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Path: The full file path to the file to ingest.
- Name:
file_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- Partitioned MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Type: The type of files to read from the directory.
- Name:
file_type - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: FileExtensions_
- Name:
- Directory Path: The full directory path containing partitioned tabular files.
- Name:
directory_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- SQL Server Connection
- Name:
- Connection: The connection type to use to access the source data.
- Required Input
-
Artifacts: No artifacts are returned by this method
-
Interface Definitions
1. Data Monitor Interface
An interface class to properly define a DataMonitor concrete implementation.
This IDataMonitor interface enforces a common set of methods that are expected to be implemented.
Interface Methods:
1. Schema
Method Name: schema
Short Description: Abstract Schema Method
Detailed Description: It is expected that this method returns the schema that is expected to be outputted for all scans outputted from this data monitor. In other words, this schema CANNOT change across scans.
Inputs: No properties
Input Schema (JSON):
{
"properties": {},
"title": "XperiflowNullParameters",
"type": "object"
}
Artifacts:
| Property | Type | Required | Description |
|---|---|---|---|
table_schema | TableSchema | Yes | The table schema that is expected to be used for all scans done from an instantiated DataMonitor. |
Artifact Schema (JSON):
{
"additionalProperties": true,
"description": "The schema that are expected to be outputted from a scan.\n\nAttributes:\n table_schema: The table schema that is expected to be used for all scans done from an instantiated DataMonitor.",
"properties": {
"table_schema": {
"description": "The table schema that is expected to be used for all scans done from an instantiated DataMonitor.",
"io_factory_kwargs": {},
"preview_factory_kwargs": null,
"preview_factory_type": null,
"statistic_factory_kwargs": null,
"statistic_factory_type": null,
"title": "Scan Schema",
"type": "TableSchema"
}
},
"required": [
"table_schema"
],
"title": "ScanSchemaArtifacts",
"type": "object"
}
2. Scan
Method Name: scan
Short Description: Abstract Scan Method
Detailed Description:
This method is expected to return the scan result artifacts that are outputted from a scan. The table artifact must
always return the same polar schema as defined by the TableSchema object returned by the schema method.
Note that you are allowed to subclass the ScanResultArtifacts class to include additional arbitrary artifacts that are outputted from a scan.
Inputs: No properties
Input Schema (JSON):
{
"properties": {},
"title": "XperiflowNullParameters",
"type": "object"
}
Artifacts:
| Property | Type | Required | Description |
|---|---|---|---|
scan_table | DataFrame | Yes | The table result that is outputted from a scan. |
Artifact Schema (JSON):
{
"additionalProperties": true,
"description": "The artifacts that are outputted from a scan.\n\nAttributes:\n scan_table: The table result that is outputted from a scan. This should be of the SAME Schema that is defined from the table_schema",
"properties": {
"scan_table": {
"description": "The table result that is outputted from a scan.",
"io_factory_kwargs": {},
"preview_factory_kwargs": null,
"preview_factory_type": null,
"statistic_factory_kwargs": null,
"statistic_factory_type": null,
"title": "Scan Table Result",
"type": "DataFrame"
}
},
"required": [
"scan_table"
],
"title": "ScanResultArtifacts",
"type": "object"
}
Developer Docs
Routine Typename: GenericDataMonitor
| Method Name | Artifact Keys |
|---|---|
__init__ | N/A |
scan | scan_table |
schema | table_schema |
update_data_connection | N/A |