Skip to main content

KMeansClusteringAnalysis

Versions

v0.1.0

Basic Information

Class Name: KMeansClusteringAnalysis

Title: Advanced K-Means

Version: 0.1.0

Author: Clustering Analytics Team

Organization: OneStream

Creation Date: 2025-08-18

Default Routine Memory Capacity: 2 GB

Tags

Clustering, Classification, Data Analysis, Data Visualization, Pattern Recognition

Description

Short Description

Cluster data points into distinct groups based on feature similarity using K-Means++ clustering algorithm.

Long Description

This routine performs K-Means++ clustering on your dataset. Unlike standard K-Means, which randomly selects initial cluster centroids, K-Means++ uses a smarter initialization strategy that probabilistically selects initial centroids that are far apart from each other. This intelligent initialization leads to faster convergence, more consistent clustering results, and reduced sensitivity to the initial centroid placement. It helps identify groupings within your data based on feature similarity, enabling deeper insights and informed decision-making. You can customize the number of clusters, the clustering dimensions, feature dimensions and weighting, and the clustering algorithm used. This offers a flexible and powerful way to analyze your data.

Use Cases

1. Performance-Based Clustering

A customer may choose to use this routine to cluster their data based on performance metrics such as sales figures, customer satisfaction scores, or operational efficiency indicators. By grouping similar performance profiles, the customer can identify high-performing segments, target underperforming areas for improvement, and optimize operational strategies. The resulting clusters can, for example, be used to find similarities amongst the high-performing or low-performing clusters to draw insights into what factors contribute to success or failure.

2. Attribute-Based Clustering

A customer may choose to use this routine to cluster their data based on specific attributes such as demographics, product features, or customer behaviors. By grouping similar attributes, the customer can identify patterns and trends within their data, enabling targeted marketing strategies, product development, and customer segmentation efforts. The resulting clusters can, for example, be used to find similarities amongst the different attribute clusters to draw insights into what attributes are most common in each cluster and how they may be causally related to performance outcomes.

Routine Methods

1. Init (Constructor)
  • Method: __init__
    • Type: Constructor

    • Memory Capacity: 2.0 GB

    • Allow In-Memory Execution: Yes

    • Read Only: No

    • Method Limits: N/A

    • Outputs Dynamic Artifacts: No

    • Short Description:

      • Initializes the ClusteringAnalysis routine with the provided API and parameters.
    • Detailed Description:

      • This constructor sets up this instance of the clustering analysis routine. This is where the user will essentially be inputting their configured data source for the lifetime of the routine. This will specify clustering dimensions, feature dimensions, etc. from a single data source that cannot be changed during the lifetime of the routine. If the user wants to change the data source, they will need to create a new instance of the routine.
    • Inputs:

      • Required Input
        • Deterministic Model Configuration: Whether or not to use deterministic clustering algorithm for this analysis.
          • Name: deterministic_model
          • Tooltip:
            • Detail:
              • Please define if the clustering algorithm is to be deterministic for this analysis.
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: bool
        • KMeans Hyperparameters: The number of clusters to use for this analysis.
          • Name: n_clusters
          • Tooltip:
            • Detail:
              • Please select the number of clusters to use for this analysis.
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: str
        • Model Configuration: The name of the clustering algorithm to use for this analysis.
          • Name: clustering_algorithm_name
          • Tooltip:
            • Detail:
              • The clustering algorithm to use for this analysis.
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: Literal
    • Artifacts: No artifacts are returned by this method

2. Fit (Method)
  • Method: fit
    • Type: Method

    • Memory Capacity: 2.0 GB

    • Allow In-Memory Execution: No

    • Read Only: No

    • Method Limits: During scale testing this method performed with datasets up to 900,000 rows and 10 feature columns without issues. Larger datasets may cause a timeout error depending on system resources and execution environment.

    • Outputs Dynamic Artifacts: No

    • Short Description:

      • Fits the clustering analysis model to the provided parameters.
    • Detailed Description:

      • This method will take the parameters provided by the user and fit the clustering analysis model to them. This will include clustering dimensions, feature dimensions, etc. The user can specify the number of clusters, the clustering algorithm, and the feature weighting method to use for the analysis.
    • Inputs:

      • Required Input
        • Clustering Data Input: The data input configuration for the clustering analysis.
          • Name: clustering_data_input
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: Must be an instance of Clustering Data Configuration
          • Nested Model: Clustering Data Configuration
            • Required Input
              • Source Data Definition: Source Data Definition.
                • Name: source_data_definition
                • Tooltip:
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: Must be an instance of Tabular Connection
                • Nested Model: Tabular Connection
                  • Required Input
                    • Connection: The connection type to use to access the source data.
                      • Name: tabular_connection
                      • Tooltip:
                        • Validation Constraints:
                          • This input may be subject to other validation constraints at runtime.
                      • Type: Must be one of the following
                        • SQL Server Connection
                          • Required Input
                            • Database Resource: The name of the database resource to connect to.
                              • Name: database_resource
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: str
                            • Database Name: The name of the database to connect to.
                              • Name: database_name
                              • Tooltip:
                                • Detail:
                                  • Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: str
                            • Table Name: The name of the table to use.
                              • Name: table_name
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: str
                        • MetaFileSystem Connection
                          • Required Input
                            • Connection Key: The MetaFileSystem connection key.
                              • Name: connection_key
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: MetaFileSystemConnectionKey
                            • File Path: The full file path to the file to ingest.
                              • Name: file_path
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: str
                        • Partitioned MetaFileSystem Connection
                          • Required Input
                            • Connection Key: The MetaFileSystem connection key.
                              • Name: connection_key
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: MetaFileSystemConnectionKey
                            • File Type: The type of files to read from the directory.
                              • Name: file_type
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: FileExtensions_
                            • Directory Path: The full directory path containing partitioned tabular files.
                              • Name: directory_path
                              • Tooltip:
                                • Validation Constraints:
                                  • This input may be subject to other validation constraints at runtime.
                              • Type: str
              • Clustering Dimensions: The unique combination of column values that define the “entity” that you are trying to compare to others.
                • Name: clustering_dimensions
                • Tooltip:
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: list[str]
              • Feature Columns: Columns that you want to use to calculate the cluster segments.
                • Name: feature_columns
                • Tooltip:
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: list[str]
    • Artifacts:

      • Clustering Intersection Results: Parquet file containing data about the clustering intersections and which cluster they belong to.

        • Qualified Key Annotation: cluster_intersection
        • Aggregate Artifact: False
        • In-Memory Json Accessible: False
        • File Annotations:
          • artifacts_/@cluster_intersection/data_/data_<int>.parquet
            • A partitioned set of parquet files where each file will have no more than 1000000 rows.
      • Clustering Descriptions: Parquet file containing data about the clusters created by the clustering fit method.

        • Qualified Key Annotation: cluster_descriptions
        • Aggregate Artifact: False
        • In-Memory Json Accessible: False
        • File Annotations:
          • artifacts_/@cluster_descriptions/data_/data_<int>.parquet
            • A partitioned set of parquet files where each file will have no more than 1000000 rows.
      • Data Utilized: Parquet file containing the data utilized in the clustering fit method.

        • Qualified Key Annotation: data_utilized
        • Aggregate Artifact: False
        • In-Memory Json Accessible: False
        • File Annotations:
          • artifacts_/@data_utilized/data_/data_<int>.parquet
            • A partitioned set of parquet files where each file will have no more than 1000000 rows.
3. Predict (Method)
  • Method: predict
    • Type: Method

    • Memory Capacity: 2.0 GB

    • Allow In-Memory Execution: No

    • Read Only: No

    • Method Limits: During scale testing this method performed with datasets up to 3,000,000 rows and 10 feature columns without issues. Larger datasets may cause a timeout error depending on system resources and execution environment.

    • Outputs Dynamic Artifacts: No

    • Short Description:

      • Assigns clusters to new data based on the fitted clustering analysis model.
    • Detailed Description:

      • This method will take the parameters provided by the user and assign clusters to new data based on the fitted clustering analysis model. The user must provide a data source that contains the same clustering dimensions and feature dimensions as the data used to fit the model.
    • Inputs:

      • Required Input
        • Prediction Datasource: Select the datasource containing observations to assign to clusters.
          • Name: datasource
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: Must be an instance of Tabular Connection
          • Nested Model: Tabular Connection
            • Required Input
              • Connection: The connection type to use to access the source data.
                • Name: tabular_connection
                • Tooltip:
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: Must be one of the following
                  • SQL Server Connection
                    • Required Input
                      • Database Resource: The name of the database resource to connect to.
                        • Name: database_resource
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: str
                      • Database Name: The name of the database to connect to.
                        • Name: database_name
                        • Tooltip:
                          • Detail:
                            • Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: str
                      • Table Name: The name of the table to use.
                        • Name: table_name
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: str
                  • MetaFileSystem Connection
                    • Required Input
                      • Connection Key: The MetaFileSystem connection key.
                        • Name: connection_key
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: MetaFileSystemConnectionKey
                      • File Path: The full file path to the file to ingest.
                        • Name: file_path
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: str
                  • Partitioned MetaFileSystem Connection
                    • Required Input
                      • Connection Key: The MetaFileSystem connection key.
                        • Name: connection_key
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: MetaFileSystemConnectionKey
                      • File Type: The type of files to read from the directory.
                        • Name: file_type
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: FileExtensions_
                      • Directory Path: The full directory path containing partitioned tabular files.
                        • Name: directory_path
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: str
    • Artifacts:

      • Clustering Intersection Results: Parquet file containing data about the clustering intersections and which cluster they belong to.

        • Qualified Key Annotation: cluster_intersection
        • Aggregate Artifact: False
        • In-Memory Json Accessible: False
        • File Annotations:
          • artifacts_/@cluster_intersection/data_/data_<int>.parquet
            • A partitioned set of parquet files where each file will have no more than 1000000 rows.
      • Data Utilized: Parquet file containing the data utilized in the clustering predict method.

        • Qualified Key Annotation: data_utilized
        • Aggregate Artifact: False
        • In-Memory Json Accessible: False
        • File Annotations:
          • artifacts_/@data_utilized/data_/data_<int>.parquet
            • A partitioned set of parquet files where each file will have no more than 1000000 rows.

Interface Definitions

1. Clustering Analysis Interface

An interface class requiring fit and predict methods to be implemented.

This BaseRoutineInterface class enforces a common interface for all clustering routines. The interface requires each clustering routine to implement a fit method and a predict method with the same input parameters. Each concrete class will have constructor methods where hyperparameters specific to the clustering algorithm may be set, however, this interface does not enforce any specific constructor method.

Interface Methods:

1. Fit

Method Name: fit

Short Description: Abstract Fit Method

Detailed Description: This specifies the necessary input and output parameters for the fit method on all anomaly detection routines. The input parameters contain a source data definition and time range to fit an anomaly detector to.

Inputs:

PropertyTypeRequiredDescription
clustering_data_input#/$defs/ClusteringDataInputYesThe data input configuration for the clustering analysis.

Input Schema (JSON):

{
"$defs": {
"ClusteringDataInput": {
"properties": {
"source_data_definition": {
"$ref": "#/$defs/TabularConnection",
"description": "Source Data Definition",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "SourceDataDefinition",
"title": "Source Data Definition",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"clustering_dimensions": {
"description": "The unique combination of column values that define the \u201centity\u201d that you are trying to compare to others.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"items": {
"type": "string"
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.store.clustering_analysis.clustering.pbm.clustering_pbms:ClusteringDataInput.get_dimension_options",
"options_callback_kwargs": null,
"state_name": "ClusteringDimensions",
"title": "Clustering Dimensions",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "array"
},
"feature_columns": {
"description": "Columns that you want to use to calculate the cluster segments",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"items": {
"type": "string"
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.store.clustering_analysis.clustering.pbm.clustering_pbms:ClusteringDataInput.get_feature_options",
"options_callback_kwargs": null,
"state_name": "FeatureColumns",
"title": "Feature Columns",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "array"
}
},
"required": [
"source_data_definition",
"clustering_dimensions",
"feature_columns"
],
"title": "ClusteringDataInput",
"type": "object"
},
"FileExtensions_": {
"description": "File Extensions.",
"enum": [
".csv",
".tsv",
".psv",
".parquet",
".xlsx"
],
"title": "FileExtensions_",
"type": "string"
},
"FileTabularConnection": {
"properties": {
"connection_key": {
"$ref": "#/$defs/MetaFileSystemConnectionKey",
"description": "The MetaFileSystem connection key.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "connection_key",
"title": "Connection Key",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"file_path": {
"description": "The full file path to the file to ingest.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.filetable:FileTabularConnection.get_file_path_bound_options",
"options_callback_kwargs": null,
"state_name": "file_path",
"title": "File Path",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"connection_key",
"file_path"
],
"title": "FileTabularConnection",
"type": "object"
},
"MetaFileSystemConnectionKey": {
"enum": [
"sql-server-routine",
"sql-server-shared"
],
"title": "MetaFileSystemConnectionKey",
"type": "string"
},
"PartitionedFileTabularConnection": {
"properties": {
"connection_key": {
"$ref": "#/$defs/MetaFileSystemConnectionKey",
"description": "The MetaFileSystem connection key.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "connection_key",
"title": "Connection Key",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"file_type": {
"$ref": "#/$defs/FileExtensions_",
"description": "The type of files to read from the directory.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "file_info",
"title": "File Type",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"directory_path": {
"description": "The full directory path containing partitioned tabular files.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.partitionedfiletable:PartitionedFileTabularConnection.get_directory_path_bound_options",
"options_callback_kwargs": null,
"state_name": "file_info",
"title": "Directory Path",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"connection_key",
"file_type",
"directory_path"
],
"title": "PartitionedFileTabularConnection",
"type": "object"
},
"SqlTabularConnection": {
"properties": {
"database_resource": {
"description": "The name of the database resource to connect to.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.sqltable:SqlTabularConnection.get_database_resources",
"options_callback_kwargs": null,
"state_name": "database_resource",
"title": "Database Resource",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
},
"database_name": {
"description": "The name of the database to connect to.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.sqltable:SqlTabularConnection.get_database_schemas",
"options_callback_kwargs": null,
"state_name": "database_name",
"title": "Database Name",
"tooltip": "Detail:\nNote: If you don\u2019t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.\n\nValidation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
},
"table_name": {
"description": "The name of the table to use.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.sqltable:SqlTabularConnection.get_tables",
"options_callback_kwargs": null,
"state_name": "table_name",
"title": "Table Name",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"database_resource",
"database_name",
"table_name"
],
"title": "SqlTabularConnection",
"type": "object"
},
"TabularConnection": {
"description": "A shared parameter base model dedication to tabular connections.",
"properties": {
"tabular_connection": {
"anyOf": [
{
"$ref": "#/$defs/SqlTabularConnection"
},
{
"$ref": "#/$defs/FileTabularConnection"
},
{
"$ref": "#/$defs/PartitionedFileTabularConnection"
}
],
"description": "The connection type to use to access the source data.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "connection",
"title": "Connection",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
}
},
"required": [
"tabular_connection"
],
"title": "TabularConnection",
"type": "object"
}
},
"properties": {
"clustering_data_input": {
"$ref": "#/$defs/ClusteringDataInput",
"description": "The data input configuration for the clustering analysis.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "ClusteringDataInput",
"title": "Clustering Data Input",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
}
},
"required": [
"clustering_data_input"
],
"title": "ClusteringFitParams",
"type": "object"
}

Artifacts:

PropertyTypeRequiredDescription
cluster_intersectionunknownYesParquet file containing data about the clustering intersections and which cluster they belong to.
cluster_descriptionsunknownYesParquet file containing data about the clusters created by the clustering fit method.
data_utilizedDataFrameYesParquet file containing the data utilized in the clustering fit method.

Artifact Schema (JSON):

{
"additionalProperties": true,
"properties": {
"cluster_intersection": {
"description": "Parquet file containing data about the clustering intersections and which cluster they belong to.",
"io_factory_kwargs": {},
"preview_factory_kwargs": null,
"preview_factory_type": null,
"statistic_factory_kwargs": null,
"statistic_factory_type": null,
"title": "Clustering Intersection Results"
},
"cluster_descriptions": {
"description": "Parquet file containing data about the clusters created by the clustering fit method.",
"io_factory_kwargs": {},
"preview_factory_kwargs": null,
"preview_factory_type": null,
"statistic_factory_kwargs": null,
"statistic_factory_type": null,
"title": "Clustering Descriptions"
},
"data_utilized": {
"description": "Parquet file containing the data utilized in the clustering fit method.",
"io_factory_kwargs": {},
"preview_factory_kwargs": null,
"preview_factory_type": null,
"statistic_factory_kwargs": null,
"statistic_factory_type": null,
"title": "Data Utilized",
"type": "DataFrame"
}
},
"required": [
"cluster_intersection",
"cluster_descriptions",
"data_utilized"
],
"title": "ClusteringFitArtifacts",
"type": "object"
}
2. Predict

Method Name: predict

Short Description: Abstract Predict Method

Detailed Description: This specifies the necessary input and output parameters for the predict method on all anomaly detection routines. The input parameters contain a source data definition and a time range to detect anomalies.

Inputs:

PropertyTypeRequiredDescription
datasource#/$defs/TabularConnectionYesSelect the datasource containing observations to assign to clusters.

Input Schema (JSON):

{
"$defs": {
"FileExtensions_": {
"description": "File Extensions.",
"enum": [
".csv",
".tsv",
".psv",
".parquet",
".xlsx"
],
"title": "FileExtensions_",
"type": "string"
},
"FileTabularConnection": {
"properties": {
"connection_key": {
"$ref": "#/$defs/MetaFileSystemConnectionKey",
"description": "The MetaFileSystem connection key.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "connection_key",
"title": "Connection Key",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"file_path": {
"description": "The full file path to the file to ingest.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.filetable:FileTabularConnection.get_file_path_bound_options",
"options_callback_kwargs": null,
"state_name": "file_path",
"title": "File Path",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"connection_key",
"file_path"
],
"title": "FileTabularConnection",
"type": "object"
},
"MetaFileSystemConnectionKey": {
"enum": [
"sql-server-routine",
"sql-server-shared"
],
"title": "MetaFileSystemConnectionKey",
"type": "string"
},
"PartitionedFileTabularConnection": {
"properties": {
"connection_key": {
"$ref": "#/$defs/MetaFileSystemConnectionKey",
"description": "The MetaFileSystem connection key.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "connection_key",
"title": "Connection Key",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"file_type": {
"$ref": "#/$defs/FileExtensions_",
"description": "The type of files to read from the directory.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "file_info",
"title": "File Type",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
},
"directory_path": {
"description": "The full directory path containing partitioned tabular files.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.partitionedfiletable:PartitionedFileTabularConnection.get_directory_path_bound_options",
"options_callback_kwargs": null,
"state_name": "file_info",
"title": "Directory Path",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"connection_key",
"file_type",
"directory_path"
],
"title": "PartitionedFileTabularConnection",
"type": "object"
},
"SqlTabularConnection": {
"properties": {
"database_resource": {
"description": "The name of the database resource to connect to.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.sqltable:SqlTabularConnection.get_database_resources",
"options_callback_kwargs": null,
"state_name": "database_resource",
"title": "Database Resource",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
},
"database_name": {
"description": "The name of the database to connect to.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.sqltable:SqlTabularConnection.get_database_schemas",
"options_callback_kwargs": null,
"state_name": "database_name",
"title": "Database Name",
"tooltip": "Detail:\nNote: If you don\u2019t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.\n\nValidation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
},
"table_name": {
"description": "The name of the table to use.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": "xperiflow.source.app.routines.pbm.store.conn.sqltable:SqlTabularConnection.get_tables",
"options_callback_kwargs": null,
"state_name": "table_name",
"title": "Table Name",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime.",
"type": "string"
}
},
"required": [
"database_resource",
"database_name",
"table_name"
],
"title": "SqlTabularConnection",
"type": "object"
},
"TabularConnection": {
"description": "A shared parameter base model dedication to tabular connections.",
"properties": {
"tabular_connection": {
"anyOf": [
{
"$ref": "#/$defs/SqlTabularConnection"
},
{
"$ref": "#/$defs/FileTabularConnection"
},
{
"$ref": "#/$defs/PartitionedFileTabularConnection"
}
],
"description": "The connection type to use to access the source data.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "connection",
"title": "Connection",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
}
},
"required": [
"tabular_connection"
],
"title": "TabularConnection",
"type": "object"
}
},
"properties": {
"datasource": {
"$ref": "#/$defs/TabularConnection",
"description": "Select the datasource containing observations to assign to clusters.",
"field_type": "input",
"input_component": {
"component_type": "combobox",
"show_search": true
},
"long_description": null,
"options_callback": null,
"options_callback_kwargs": null,
"state_name": "PredictDataSelection",
"title": "Prediction Datasource",
"tooltip": "Validation Constraints:\nThis input may be subject to other validation constraints at runtime."
}
},
"required": [
"datasource"
],
"title": "ClusteringAnalysisPredictParameters",
"type": "object"
}

Artifacts:

PropertyTypeRequiredDescription
cluster_intersectionunknownYesParquet file containing data about the clustering intersections and which cluster they belong to.
data_utilizedDataFrameYesParquet file containing the data utilized in the clustering predict method.

Artifact Schema (JSON):

{
"additionalProperties": true,
"properties": {
"cluster_intersection": {
"description": "Parquet file containing data about the clustering intersections and which cluster they belong to.",
"io_factory_kwargs": {},
"preview_factory_kwargs": null,
"preview_factory_type": null,
"statistic_factory_kwargs": null,
"statistic_factory_type": null,
"title": "Clustering Intersection Results"
},
"data_utilized": {
"description": "Parquet file containing the data utilized in the clustering predict method.",
"io_factory_kwargs": {},
"preview_factory_kwargs": null,
"preview_factory_type": null,
"statistic_factory_kwargs": null,
"statistic_factory_type": null,
"title": "Data Utilized",
"type": "DataFrame"
}
},
"required": [
"cluster_intersection",
"data_utilized"
],
"title": "ClusteringPredictArtifacts",
"type": "object"
}

Developer Docs

Routine Typename: KMeansClusteringAnalysis

Method NameArtifact Keys
__init__N/A
fitcluster_intersection, cluster_descriptions, data_utilized
predictcluster_intersection, data_utilized

Was this page helpful?