KMeansClusteringAnalysis
Versions
v0.1.0
Basic Information
Class Name: KMeansClusteringAnalysis
Title: Advanced K-Means
Version: 0.1.0
Author: Clustering Analytics Team
Organization: OneStream
Creation Date: 2025-08-18
Default Routine Memory Capacity: 2 GB
Tags
Clustering, Classification, Data Analysis, Data Visualization, Pattern Recognition
Description
Short Description
Cluster data points into distinct groups based on feature similarity using K-Means++ clustering algorithm.
Long Description
This routine performs K-Means++ clustering on your dataset. Unlike standard K-Means, which randomly selects initial cluster centroids, K-Means++ uses a smarter initialization strategy that probabilistically selects initial centroids that are far apart from each other. This intelligent initialization leads to faster convergence, more consistent clustering results, and reduced sensitivity to the initial centroid placement. It helps identify groupings within your data based on feature similarity, enabling deeper insights and informed decision-making. You can customize the number of clusters, the clustering dimensions, feature dimensions and weighting, and the clustering algorithm used. This offers a flexible and powerful way to analyze your data.
Use Cases
1. Performance-Based Clustering
A customer may choose to use this routine to cluster their data based on performance metrics such as sales figures, customer satisfaction scores, or operational efficiency indicators. By grouping similar performance profiles, the customer can identify high-performing segments, target underperforming areas for improvement, and optimize operational strategies. The resulting clusters can, for example, be used to find similarities amongst the high-performing or low-performing clusters to draw insights into what factors contribute to success or failure.
2. Attribute-Based Clustering
A customer may choose to use this routine to cluster their data based on specific attributes such as demographics, product features, or customer behaviors. By grouping similar attributes, the customer can identify patterns and trends within their data, enabling targeted marketing strategies, product development, and customer segmentation efforts. The resulting clusters can, for example, be used to find similarities amongst the different attribute clusters to draw insights into what attributes are most common in each cluster and how they may be causally related to performance outcomes.
Routine Methods
1. Init (Constructor)
- Method:
__init__-
Type: Constructor
-
Memory Capacity: 2.0 GB
-
Allow In-Memory Execution: Yes
-
Read Only: No
-
Method Limits: N/A
-
Outputs Dynamic Artifacts: No
-
Short Description:
- Initializes the ClusteringAnalysis routine with the provided API and parameters.
-
Detailed Description:
- This constructor sets up this instance of the clustering analysis routine. This is where the user will essentially be inputting their configured data source for the lifetime of the routine. This will specify clustering dimensions, feature dimensions, etc. from a single data source that cannot be changed during the lifetime of the routine. If the user wants to change the data source, they will need to create a new instance of the routine.
-
Inputs:
- Required Input
- Deterministic Model Configuration: Whether or not to use deterministic clustering algorithm for this analysis.
- Name:
deterministic_model - Tooltip:
- Detail:
- Please define if the clustering algorithm is to be deterministic for this analysis.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: bool
- Name:
- KMeans Hyperparameters: The number of clusters to use for this analysis.
- Name:
n_clusters - Tooltip:
- Detail:
- Please select the number of clusters to use for this analysis.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Model Configuration: The name of the clustering algorithm to use for this analysis.
- Name:
clustering_algorithm_name - Tooltip:
- Detail:
- The clustering algorithm to use for this analysis.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: Literal
- Name:
- Deterministic Model Configuration: Whether or not to use deterministic clustering algorithm for this analysis.
- Required Input
-
Artifacts: No artifacts are returned by this method
-
2. Fit (Method)
- Method:
fit-
Type: Method
-
Memory Capacity: 2.0 GB
-
Allow In-Memory Execution: No
-
Read Only: No
-
Method Limits: During scale testing this method performed with datasets up to 900,000 rows and 10 feature columns without issues. Larger datasets may cause a timeout error depending on system resources and execution environment.
-
Outputs Dynamic Artifacts: No
-
Short Description:
- Fits the clustering analysis model to the provided parameters.
-
Detailed Description:
- This method will take the parameters provided by the user and fit the clustering analysis model to them. This will include clustering dimensions, feature dimensions, etc. The user can specify the number of clusters, the clustering algorithm, and the feature weighting method to use for the analysis.
-
Inputs:
- Required Input
- Clustering Data Input: The data input configuration for the clustering analysis.
- Name:
clustering_data_input - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be an instance of Clustering Data Configuration
- Nested Model: Clustering Data Configuration
- Required Input
- Source Data Definition: Source Data Definition.
- Name:
source_data_definition - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be an instance of Tabular Connection
- Nested Model: Tabular Connection
- Required Input
- Connection: The connection type to use to access the source data.
- Name:
tabular_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be one of the following
- SQL Server Connection
- Required Input
- Database Resource: The name of the database resource to connect to.
- Name:
database_resource - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Name: The name of the database to connect to.
- Name:
database_name - Tooltip:
- Detail:
- Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Table Name: The name of the table to use.
- Name:
table_name - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Resource: The name of the database resource to connect to.
- Required Input
- MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Path: The full file path to the file to ingest.
- Name:
file_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- Partitioned MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Type: The type of files to read from the directory.
- Name:
file_type - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: FileExtensions_
- Name:
- Directory Path: The full directory path containing partitioned tabular files.
- Name:
directory_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- SQL Server Connection
- Name:
- Connection: The connection type to use to access the source data.
- Required Input
- Name:
- Clustering Dimensions: The unique combination of column values that define the “entity” that you are trying to compare to others.
- Name:
clustering_dimensions - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: list[str]
- Name:
- Feature Columns: Columns that you want to use to calculate the cluster segments.
- Name:
feature_columns - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: list[str]
- Name:
- Source Data Definition: Source Data Definition.
- Required Input
- Name:
- Clustering Data Input: The data input configuration for the clustering analysis.
- Required Input
-
Artifacts:
-
Clustering Intersection Results: Parquet file containing data about the clustering intersections and which cluster they belong to.
- Qualified Key Annotation:
cluster_intersection - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@cluster_intersection/data_/data_<int>.parquet- A partitioned set of parquet files where each file will have no more than 1000000 rows.
- Qualified Key Annotation:
-
Clustering Descriptions: Parquet file containing data about the clusters created by the clustering fit method.
- Qualified Key Annotation:
cluster_descriptions - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@cluster_descriptions/data_/data_<int>.parquet- A partitioned set of parquet files where each file will have no more than 1000000 rows.
- Qualified Key Annotation:
-
Data Utilized: Parquet file containing the data utilized in the clustering fit method.
- Qualified Key Annotation:
data_utilized - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@data_utilized/data_/data_<int>.parquet- A partitioned set of parquet files where each file will have no more than 1000000 rows.
- Qualified Key Annotation:
-
-
3. Predict (Method)
- Method:
predict-
Type: Method
-
Memory Capacity: 2.0 GB
-
Allow In-Memory Execution: No
-
Read Only: No
-
Method Limits: During scale testing this method performed with datasets up to 3,000,000 rows and 10 feature columns without issues. Larger datasets may cause a timeout error depending on system resources and execution environment.
-
Outputs Dynamic Artifacts: No
-
Short Description:
- Assigns clusters to new data based on the fitted clustering analysis model.
-
Detailed Description:
- This method will take the parameters provided by the user and assign clusters to new data based on the fitted clustering analysis model. The user must provide a data source that contains the same clustering dimensions and feature dimensions as the data used to fit the model.
-
Inputs:
- Required Input
- Prediction Datasource: Select the datasource containing observations to assign to clusters.
- Name:
datasource - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be an instance of Tabular Connection
- Nested Model: Tabular Connection
- Required Input
- Connection: The connection type to use to access the source data.
- Name:
tabular_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be one of the following
- SQL Server Connection
- Required Input
- Database Resource: The name of the database resource to connect to.
- Name:
database_resource - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Name: The name of the database to connect to.
- Name:
database_name - Tooltip:
- Detail:
- Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Table Name: The name of the table to use.
- Name:
table_name - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Resource: The name of the database resource to connect to.
- Required Input
- MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Path: The full file path to the file to ingest.
- Name:
file_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- Partitioned MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Type: The type of files to read from the directory.
- Name:
file_type - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: FileExtensions_
- Name:
- Directory Path: The full directory path containing partitioned tabular files.
- Name:
directory_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- SQL Server Connection
- Name:
- Connection: The connection type to use to access the source data.
- Required Input
- Name:
- Prediction Datasource: Select the datasource containing observations to assign to clusters.
- Required Input
-
Artifacts:
-
Clustering Intersection Results: Parquet file containing data about the clustering intersections and which cluster they belong to.
- Qualified Key Annotation:
cluster_intersection - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@cluster_intersection/data_/data_<int>.parquet- A partitioned set of parquet files where each file will have no more than 1000000 rows.
- Qualified Key Annotation:
-
Data Utilized: Parquet file containing the data utilized in the clustering predict method.
- Qualified Key Annotation:
data_utilized - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@data_utilized/data_/data_<int>.parquet- A partitioned set of parquet files where each file will have no more than 1000000 rows.
- Qualified Key Annotation:
-
-
Interface Definitions
1. Clustering Analysis Interface
An interface class requiring fit and predict methods to be implemented.
This BaseRoutineInterface class enforces a common interface for all clustering routines. The interface requires each clustering routine to implement a fit method and a predict method with the same input parameters. Each concrete class will have constructor methods where hyperparameters specific to the clustering algorithm may be set, however, this interface does not enforce any specific constructor method.
Interface Methods:
1. Fit
Method Name: fit
Short Description: Abstract Fit Method
Detailed Description: This specifies the necessary input and output parameters for the fit method on all anomaly detection routines. The input parameters contain a source data definition and time range to fit an anomaly detector to.
Inputs:
| Property | Type | Required | Description |
|---|---|---|---|
clustering_data_input | #/$defs/ClusteringDataInput | Yes | The data input configuration for the clustering analysis. |
Input Schema (JSON):
{
"$defs": {
"ClusteringDataInput": {
"properties": {
"source_data_definition": {
"$ref": "#/$defs/TabularConnection",
"description": "Source Data Definition",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "SourceDataDefinition",
"title": "Source Data Definition",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"clustering_dimensions": {
"description": "The unique combination of column values that define the \u201centity\u201d that you are trying to compare to others.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"items": {
"type": "string"
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.store.clustering_analysis.clustering.pbm.clustering_pbms:ClusteringDataInput.get_dimension_options",
"options_callback_kwargs": null,
"state_name": "ClusteringDimensions",
"title": "Clustering Dimensions",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "array"
},
"feature_columns": {
"description": "Columns that you want to use to calculate the cluster segments",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"items": {
"type": "string"
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.store.clustering_analysis.clustering.pbm.clustering_pbms:ClusteringDataInput.get_feature_options",
"options_callback_kwargs": null,
"state_name": "FeatureColumns",
"title": "Feature Columns",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "array"
}
},
"required": [
"source_data_definition",
"clustering_dimensions",
"feature_columns"
],
"title": "ClusteringDataInput",
"type": "object"
},
"FileExtensions_": {
"description": "File Extensions.",
"enum": [
".csv",
".tsv",
".psv",
".parquet",
".xlsx"
],
"title": "FileExtensions_",
"type": "string"
},
"FileTabularConnection": {
"properties": {
"connection_key": {
"$ref": "#/$defs/MetaFileSystemConnectionKey",
"description": "The MetaFileSystem connection key.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "connection_key",
"title": "Connection Key",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"file_path": {
"description": "The full file path to the file to ingest.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.filetable:FileTabularConnection.get_file_path_bound_options",
"options_callback_kwargs": null,
"state_name": "file_path",
"title": "File Path",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"connection_key",
"file_path"
],
"title": "FileTabularConnection",
"type": "object"
},
"MetaFileSystemConnectionKey": {
"enum": [
"sql-server-routine",
"sql-server-shared"
],
"title": "MetaFileSystemConnectionKey",
"type": "string"
},
"PartitionedFileTabularConnection": {
"properties": {
"connection_key": {
"$ref": "#/$defs/MetaFileSystemConnectionKey",
"description": "The MetaFileSystem connection key.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "connection_key",
"title": "Connection Key",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"file_type": {
"$ref": "#/$defs/FileExtensions_",
"description": "The type of files to read from the directory.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "file_info",
"title": "File Type",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"directory_path": {
"description": "The full directory path containing partitioned tabular files.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.partitionedfiletable:PartitionedFileTabularConnection.get_directory_path_bound_options",
"options_callback_kwargs": null,
"state_name": "file_info",
"title": "Directory Path",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"connection_key",
"file_type",
"directory_path"
],
"title": "PartitionedFileTabularConnection",
"type": "object"
},
"SqlTabularConnection": {
"properties": {
"database_resource": {
"description": "The name of the database resource to connect to.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.sqltable:SqlTabularConnection.get_database_resources",
"options_callback_kwargs": null,
"state_name": "database_resource",
"title": "Database Resource",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
},
"database_name": {
"description": "The name of the database to connect to.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.sqltable:SqlTabularConnection.get_database_schemas",
"options_callback_kwargs": null,
"state_name": "database_name",
"title": "Database Name",
"tooltip": "Detail:\nNote: If you don\u2019t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.\n\nValidation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
},
"table_name": {
"description": "The name of the table to use.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.sqltable:SqlTabularConnection.get_tables",
"options_callback_kwargs": null,
"state_name": "table_name",
"title": "Table Name",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"database_resource",
"database_name",
"table_name"
],
"title": "SqlTabularConnection",
"type": "object"
},
"TabularConnection": {
"description": "A shared parameter base model dedication to tabular connections.",
"properties": {
"tabular_connection": {
"anyOf": [
{
"$ref": "#/$defs/SqlTabularConnection"
},
{
"$ref": "#/$defs/FileTabularConnection"
},
{
"$ref": "#/$defs/PartitionedFileTabularConnection"
}
],
"description": "The connection type to use to access the source data.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "connection",
"title": "Connection",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
}
},
"required": [
"tabular_connection"
],
"title": "TabularConnection",
"type": "object"
}
},
"properties": {
"clustering_data_input": {
"$ref": "#/$defs/ClusteringDataInput",
"description": "The data input configuration for the clustering analysis.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "ClusteringDataInput",
"title": "Clustering Data Input",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
}
},
"required": [
"clustering_data_input"
],
"title": "ClusteringFitParams",
"type": "object"
}
Artifacts:
| Property | Type | Required | Description |
|---|---|---|---|
cluster_intersection | unknown | Yes | Parquet file containing data about the clustering intersections and which cluster they belong to. |
cluster_descriptions | unknown | Yes | Parquet file containing data about the clusters created by the clustering fit method. |
data_utilized | DataFrame | Yes | Parquet file containing the data utilized in the clustering fit method. |
Artifact Schema (JSON):
{
"additionalProperties": true,
"properties": {
"cluster_intersection": {
"description": "Parquet file containing data about the clustering intersections and which cluster they belong to.",
"io_factory_kwargs": {},
"preview_factory_kwargs": null,
"preview_factory_type": null,
"statistic_factory_kwargs": null,
"statistic_factory_type": null,
"title": "Clustering Intersection Results"
},
"cluster_descriptions": {
"description": "Parquet file containing data about the clusters created by the clustering fit method.",
"io_factory_kwargs": {},
"preview_factory_kwargs": null,
"preview_factory_type": null,
"statistic_factory_kwargs": null,
"statistic_factory_type": null,
"title": "Clustering Descriptions"
},
"data_utilized": {
"description": "Parquet file containing the data utilized in the clustering fit method.",
"io_factory_kwargs": {},
"preview_factory_kwargs": null,
"preview_factory_type": null,
"statistic_factory_kwargs": null,
"statistic_factory_type": null,
"title": "Data Utilized",
"type": "DataFrame"
}
},
"required": [
"cluster_intersection",
"cluster_descriptions",
"data_utilized"
],
"title": "ClusteringFitArtifacts",
"type": "object"
}
2. Predict
Method Name: predict
Short Description: Abstract Predict Method
Detailed Description: This specifies the necessary input and output parameters for the predict method on all anomaly detection routines. The input parameters contain a source data definition and a time range to detect anomalies.
Inputs:
| Property | Type | Required | Description |
|---|---|---|---|
datasource | #/$defs/TabularConnection | Yes | Select the datasource containing observations to assign to clusters. |
Input Schema (JSON):
{
"$defs": {
"FileExtensions_": {
"description": "File Extensions.",
"enum": [
".csv",
".tsv",
".psv",
".parquet",
".xlsx"
],
"title": "FileExtensions_",
"type": "string"
},
"FileTabularConnection": {
"properties": {
"connection_key": {
"$ref": "#/$defs/MetaFileSystemConnectionKey",
"description": "The MetaFileSystem connection key.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "connection_key",
"title": "Connection Key",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"file_path": {
"description": "The full file path to the file to ingest.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.filetable:FileTabularConnection.get_file_path_bound_options",
"options_callback_kwargs": null,
"state_name": "file_path",
"title": "File Path",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"connection_key",
"file_path"
],
"title": "FileTabularConnection",
"type": "object"
},
"MetaFileSystemConnectionKey": {
"enum": [
"sql-server-routine",
"sql-server-shared"
],
"title": "MetaFileSystemConnectionKey",
"type": "string"
},
"PartitionedFileTabularConnection": {
"properties": {
"connection_key": {
"$ref": "#/$defs/MetaFileSystemConnectionKey",
"description": "The MetaFileSystem connection key.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "connection_key",
"title": "Connection Key",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"file_type": {
"$ref": "#/$defs/FileExtensions_",
"description": "The type of files to read from the directory.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "file_info",
"title": "File Type",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"directory_path": {
"description": "The full directory path containing partitioned tabular files.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.partitionedfiletable:PartitionedFileTabularConnection.get_directory_path_bound_options",
"options_callback_kwargs": null,
"state_name": "file_info",
"title": "Directory Path",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"connection_key",
"file_type",
"directory_path"
],
"title": "PartitionedFileTabularConnection",
"type": "object"
},
"SqlTabularConnection": {
"properties": {
"database_resource": {
"description": "The name of the database resource to connect to.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.sqltable:SqlTabularConnection.get_database_resources",
"options_callback_kwargs": null,
"state_name": "database_resource",
"title": "Database Resource",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
},
"database_name": {
"description": "The name of the database to connect to.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.sqltable:SqlTabularConnection.get_database_schemas",
"options_callback_kwargs": null,
"state_name": "database_name",
"title": "Database Name",
"tooltip": "Detail:\nNote: If you don\u2019t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.\n\nValidation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
},
"table_name": {
"description": "The name of the table to use.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.sqltable:SqlTabularConnection.get_tables",
"options_callback_kwargs": null,
"state_name": "table_name",
"title": "Table Name",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"database_resource",
"database_name",
"table_name"
],
"title": "SqlTabularConnection",
"type": "object"
},
"TabularConnection": {
"description": "A shared parameter base model dedication to tabular connections.",
"properties": {
"tabular_connection": {
"anyOf": [
{
"$ref": "#/$defs/SqlTabularConnection"
},
{
"$ref": "#/$defs/FileTabularConnection"
},
{
"$ref": "#/$defs/PartitionedFileTabularConnection"
}
],
"description": "The connection type to use to access the source data.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "connection",
"title": "Connection",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
}
},
"required": [
"tabular_connection"
],
"title": "TabularConnection",
"type": "object"
}
},
"properties": {
"datasource": {
"$ref": "#/$defs/TabularConnection",
"description": "Select the datasource containing observations to assign to clusters.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "PredictDataSelection",
"title": "Prediction Datasource",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
}
},
"required": [
"datasource"
],
"title": "ClusteringAnalysisPredictParameters",
"type": "object"
}
Artifacts:
| Property | Type | Required | Description |
|---|---|---|---|
cluster_intersection | unknown | Yes | Parquet file containing data about the clustering intersections and which cluster they belong to. |
data_utilized | DataFrame | Yes | Parquet file containing the data utilized in the clustering predict method. |
Artifact Schema (JSON):
{
"additionalProperties": true,
"properties": {
"cluster_intersection": {
"description": "Parquet file containing data about the clustering intersections and which cluster they belong to.",
"io_factory_kwargs": {},
"preview_factory_kwargs": null,
"preview_factory_type": null,
"statistic_factory_kwargs": null,
"statistic_factory_type": null,
"title": "Clustering Intersection Results"
},
"data_utilized": {
"description": "Parquet file containing the data utilized in the clustering predict method.",
"io_factory_kwargs": {},
"preview_factory_kwargs": null,
"preview_factory_type": null,
"statistic_factory_kwargs": null,
"statistic_factory_type": null,
"title": "Data Utilized",
"type": "DataFrame"
}
},
"required": [
"cluster_intersection",
"data_utilized"
],
"title": "ClusteringPredictArtifacts",
"type": "object"
}
Developer Docs
Routine Typename: KMeansClusteringAnalysis
| Method Name | Artifact Keys |
|---|---|
__init__ | N/A |
fit | cluster_intersection, cluster_descriptions, data_utilized |
predict | cluster_intersection, data_utilized |