MLRegressor

Versions

2.0.0

v2.0.0

Basic Information

Class Name: MLRegressor

Title: ML Regression

Version: 2.0.0

Author: Jeff Robinson

Organization: OneStream

Creation Date: 2025-02-18

Default Routine Memory Capacity: 2.0 GB

Description

Short Description

A routine that performs a machine learning regression task.

Long Description

This routine is designed to perform a machine learning regression task. The routine will take in a dataset and perform a regression task on the dataset. The routine will output a model that can be used to predict future values.

Use Cases

1. House Price Prediction (Regression)

House price prediction is a regression problem where the goal is to forecast a continuous outcome—the market value of a property. Using a historical dataset containing various features such as location, square footage, number of bedrooms, and other relevant attributes, a regression model can be trained to predict property prices. The model learns from trends and relationships within the data to provide accurate pricing estimates, enabling stakeholders to make informed decisions regarding investments, pricing strategies, and market analysis.

2. Revenue Forecasting (Regression)

Revenue forecasting is a regression task designed to predict continuous financial outcomes based on historical and current data. In a CPM environment, this involves analyzing past revenue trends, operational metrics, market conditions, and expenditure patterns to build a model that estimates future revenue. The resulting forecasts support budgeting, resource allocation, and strategic decision-making, enabling businesses to proactively manage financial performance, identify growth opportunities, and mitigate risks.

Routine Methods

1. Init (Constructor)

Method: __init__
- Type: Constructor
- Memory Capacity: 2.0 GB
- Allow In-Memory Execution: No
- Read Only: No
- Method Limits: There are no limits for the constructor method.
- Outputs Dynamic Artifacts: No
- Short Description:
  - The constructor for the ML Regression Routine.
- Detailed Description:
  - This constructor sets up the routine with the necessary API instance and parameters for training and evaluation of a regression model. The constructor is responsible for initializing the routine with the necessary data and configuration parameters.
- Inputs:
  - Required Input
    - Model Training Configuration Parameters: A mix of required and optional constructor parameters for ML Regression.
      - Name: regression_constructor_params
      - Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
      - Type: Must be an instance of ML Regressor
      - Nested Model: ML Regressor
        
        Required Input
        
        Optimize Model: Metric to optimize during tuning (e.g. MAE, MSE, RMSE, or R-squared).
        
        Name: optimize_model
        
        Tooltip:
        
        Detail:
        
        MAE (Mean Absolute Error): Measures the average absolute difference between predicted and actual values. Lower MAE indicates better performance.
        
        MSE (Mean Squared Error): Computes the average squared differences between predicted and actual values. It penalizes larger errors more heavily; lower MSE is preferable.
        
        RMSE (Root Mean Squared Error): The square root of MSE, providing error in the same units as the target variable. Lower RMSE signifies better model accuracy.
        
        R2 (R-squared): Represents the proportion of variance in the target variable explained by the model. Values closer to 1 indicate a better fit.
        
        MAPE (Mean Absolute Percentage Error): Expresses the prediction accuracy as a percentage, offering an intuitive error measure. Lower percentages denote higher accuracy.
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Training Size: Default is 0.8 (80%). Value must be between 1 and 0.
        
        Name: train_size
        
        Tooltip:
        
        Validation Constraints:
        
        The input must be greater than 0.
        
        The input must be less than 1.
        
        This input may be subject to other validation constraints at runtime.
        
        Type: float
        
        Advanced Cross-Validation: Optional cross-validation settings. None accepts default settings.
        
        Name: fold_splitting_options
        
        Tooltip:
        
        Detail:
        
        K-Fold: Provides a general approach for cross-validation in regression tasks. It partitions the dataset into k folds, then iteratively trains on k-1 folds while testing on the remaining fold to provide robust estimates of model performance..
        
        Time Series Split: Ensures temporal order is maintained for forecasting but does not shuffle data, which can limit variance reduction.
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: Must be an instance of Advanced Fold Options
        
        Nested Model: Advanced Fold Options
        
        Required Input
        
        Folds: Number of subsets or splits used in cross-validation. Default is 10 but can specify any number of folds between 1 and 20.
        
        Name: fold_num
        
        Tooltip:
        
        Validation Constraints:
        
        The input must be greater than 0.
        
        The input must be less than 20.
        
        This input may be subject to other validation constraints at runtime.
        
        Type: int
        
        Fold Shuffle: Controls the shuffle parameter of cross-validation. Only applicable when fold_strategy is K-Fold.
        
        Name: fold_shuffle
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: bool
        
        Fold Strategy: Choice of cross validation strategy. Default is K-Fold.
        
        Name: fold_strategy
        
        Tooltip:
        
        Detail:
        
        K-Fold: Provides a general approach for cross-validation in regression tasks. It partitions the dataset into k folds, then iteratively trains on k-1 folds while testing on the remaining fold to provide robust estimates of model performance..
        
        Time Series Split: Ensures temporal order is maintained for forecasting but does not shuffle data, which can limit variance reduction.
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Optional Input
        
        Session ID: Optional Model Session ID is used to ensure reproducibility of experiments. Defaults to 42.
        
        Name: session_id
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: Optional[int]
- Artifacts: No artifacts are returned by this method

2. Create Web App (Method)

Method: create_web_app
- Type: Method
- Memory Capacity: 2.0 GB
- Allow In-Memory Execution: No
- Read Only: No
- Method Limits: There are no limits for this method. It is expected to complete very quickly with a small memory allocation.
- Outputs Dynamic Artifacts: No
- Short Description:
  - Creates the web app and then passes the newly created web app object into an artifact.
- Detailed Description:
  - The run id from this routine method along with the routine instance id are used to create the URL for the web app.
- Inputs:
  - Optional Input
    - Select Formatting for SHAP Artifacts: Optionally choose how many decimal places the SHAP output should be rounded to.
      - Name: format_output
      - Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
      - Type: Optional[str]
- Artifacts:
  - ML Regression Web App: Dashboard to analyze results from the MLRegression predict and train routine runs.
    - Qualified Key Annotation: web_app
    - Aggregate Artifact: False
    - In-Memory Json Accessible: False
    - File Annotations:
      - artifacts_/@web_app/data_/data.appref
        
        json file of data relating to web app

3. Predict (Method)

Method: predict
- Type: Method
- Memory Capacity: 2.0 GB
- Allow In-Memory Execution: No
- Read Only: No
- Method Limits: This method is influenced primarily by dataset size. For a dataset with 20M rows and 8 feature columns, this method has been known to take between 20-30 minutes to complete with 10GB of memory allocated. With 10M rows and 8 feature columns, it will take closer to 5-10 minutes for this method to complete with 10GB of memory allocated.
- Outputs Dynamic Artifacts: No
- Short Description:
  - Predict regression model data using the provided parameters.
- Detailed Description:
  - This method predicts the regression model data using the provided parameters. This method also generates SHAP values for model interpretability.
- Inputs:
  - Required Input
    - Source Connection: The connection information source data.
      - Name: data_connection
      - Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
      - Type: Must be an instance of Tabular Connection
      - Nested Model: Tabular Connection
        
        Required Input
        
        Connection: The connection type to use to access the source data.
        
        Name: tabular_connection
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: Must be one of the following
        
        SQL Server Connection
        
        Required Input
        
        Database Resource: The name of the database resource to connect to.
        
        Name: database_resource
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Database Name: The name of the database to connect to.
        
        Name: database_name
        
        Tooltip:
        
        Detail:
        
        Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Table Name: The name of the table to use.
        
        Name: table_name
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        MetaFileSystem Connection
        
        Required Input
        
        Connection Key: The MetaFileSystem connection key.
        
        Name: connection_key
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: MetaFileSystemConnectionKey
        
        File Path: The full file path to the file to ingest.
        
        Name: file_path
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Partitioned MetaFileSystem Connection
        
        Required Input
        
        Connection Key: The MetaFileSystem connection key.
        
        Name: connection_key
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: MetaFileSystemConnectionKey
        
        File Type: The type of files to read from the directory.
        
        Name: file_type
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: FileExtensions_
        
        Directory Path: The full directory path containing partitioned tabular files.
        
        Name: directory_path
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
    - Index Selection: Index field to be included in prediction output.
      - Name: index_selection
      - Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
      - Type: list[str]
  - Optional Input
    - Trained Model: Trained model for making predictions.
      - Name: model_name
      - Tooltip:
        
        Detail:
        
        Trained models available for this routine instance.
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
      - Type: Optional[str]
- Artifacts:
  - Regression Prediction Report: A Regression prediction report containing top 10 predictions (by index), Prediction Distribution Plot and Sorted Prediction Plot.
    - Qualified Key Annotation: prediction_report
    - Aggregate Artifact: False
    - In-Memory Json Accessible: False
    - File Annotations:
      - artifacts_/@prediction_report/data_/document.pdf
        
        A pdf variant of the html file. Please note the interactivity that may be found in the html is lost within the pdf variant.
      - artifacts_/@prediction_report/data_/html_content.html
        
        The html content.
  - Prediction Output: The full prediction dataframe from the predict routine.
    - Qualified Key Annotation: prediction_output
    - Aggregate Artifact: False
    - In-Memory Json Accessible: False
    - File Annotations:
      - artifacts_/@prediction_output/data_/data_<int>.parquet
        
        A partitioned set of parquet files where each file will have no more than 1000000 rows.
  - SHAP Values: The SHAP values from the prediction.
    - Qualified Key Annotation: shap_values
    - Aggregate Artifact: False
    - In-Memory Json Accessible: False
    - File Annotations:
      - artifacts_/@shap_values/data_/data_<int>.parquet
        
        A partitioned set of parquet files where each file will have no more than 1000000 rows.

4. Predict Shap Interpretation (Method)

Method: predict_shap_interpretation
- Type: Method
- Memory Capacity: 2.0 GB
- Allow In-Memory Execution: Yes
- Read Only: No
- Method Limits: This method is influenced by the size of the input dataset and the number of selected rows to perform SHAP analysis on. For a dataset with 20M rows, performing SHAP analysis on 2M of those rows takes around 15 minutes to complete with a 20GB memory allocation.
- Outputs Dynamic Artifacts: No
- Short Description:
  - Predict model outputs based on the provided parameters, and return SHAP-based explanations.
- Detailed Description:
  - This method uses SHAP (SHapley Additive exPlanations) to interpret the predictions made by the regression model. It provides insights into the model's behavior and the contribution of each feature to the prediction. The method generates SHAP values, a waterfall plot for a specific prediction, and a summary plot to visualize the model's predictions and feature contributions. The SHAP values represent the impact of each feature on the prediction, and the waterfall plot provides a visual representation of how each feature contributes to the final prediction.
- Inputs:
  - Required Input
    - Source Connection: The connection information source data.
      - Name: data_connection
      - Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
      - Type: Must be an instance of Tabular Connection
      - Nested Model: Tabular Connection
        
        Required Input
        
        Connection: The connection type to use to access the source data.
        
        Name: tabular_connection
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: Must be one of the following
        
        SQL Server Connection
        
        Required Input
        
        Database Resource: The name of the database resource to connect to.
        
        Name: database_resource
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Database Name: The name of the database to connect to.
        
        Name: database_name
        
        Tooltip:
        
        Detail:
        
        Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Table Name: The name of the table to use.
        
        Name: table_name
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        MetaFileSystem Connection
        
        Required Input
        
        Connection Key: The MetaFileSystem connection key.
        
        Name: connection_key
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: MetaFileSystemConnectionKey
        
        File Path: The full file path to the file to ingest.
        
        Name: file_path
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Partitioned MetaFileSystem Connection
        
        Required Input
        
        Connection Key: The MetaFileSystem connection key.
        
        Name: connection_key
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: MetaFileSystemConnectionKey
        
        File Type: The type of files to read from the directory.
        
        Name: file_type
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: FileExtensions_
        
        Directory Path: The full directory path containing partitioned tabular files.
        
        Name: directory_path
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
    - Row Selection: Row to be used for SHAP insights.
      - Name: row_selection
      - Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
      - Type: list[int]
  - Optional Input
    - Trained Model: Trained model for making predictions.
      - Name: model_name
      - Tooltip:
        
        Detail:
        
        Trained models available for this routine instance.
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
      - Type: Optional[str]
- Artifacts:
  - SHAP Values: The SHAP values from the prediction.
    - Qualified Key Annotation: shap_values
    - Aggregate Artifact: False
    - In-Memory Json Accessible: False
    - File Annotations:
      - artifacts_/@shap_values/data_/data_<int>.parquet
        
        A partitioned set of parquet files where each file will have no more than 1000000 rows.
  - SHAP Waterfall plot: The SHAP waterfall plot for the prediction.
    - Qualified Key Annotation: waterfall_plot
    - Aggregate Artifact: False
    - In-Memory Json Accessible: False
    - File Annotations:
      - artifacts_/@waterfall_plot/data_/document.pdf
        
        A pdf variant of the html file. Please note the interactivity that may be found in the html is lost within the pdf variant.
      - artifacts_/@waterfall_plot/data_/html_content.html
        
        The html content.
  - SHAP Summary plot: The SHAP Summary plot for all data in dataframe.
    - Qualified Key Annotation: summary_plot
    - Aggregate Artifact: False
    - In-Memory Json Accessible: False
    - File Annotations:
      - artifacts_/@summary_plot/data_/document.pdf
        
        A pdf variant of the html file. Please note the interactivity that may be found in the html is lost within the pdf variant.
      - artifacts_/@summary_plot/data_/html_content.html
        
        The html content.

5. Train (Method)

Method: train
- Type: Method
- Memory Capacity: 2.0 GB
- Allow In-Memory Execution: No
- Read Only: No
- Method Limits: The runtime of this method is influenced by the input dataset size and number of feature column selections. For a 20M row dataset with 1 feature column selected, this method is expected to complete in about 25 minutes with 20GB of memory allocated. With the same configurations but 6 feature columns instead of 1, this method takes around 28 minutes to complete.
- Outputs Dynamic Artifacts: Yes
- Short Description:
  - Explore regression model data using the provided parameters.
- Detailed Description:
  - This method trains a regression model using the provided parameters. The method takes in a dataset and performs basic data exploration to understand the structure and characteristics of the data. The method generates data exploration artifacts, including summary statistics, visualizations, and insights into the dataset. The artifacts are used to guide the model training process and identify potential challenges or issues in the data.
- Inputs:
  - Required Input
    - Source Connection: The connection information source data.
      - Name: data_connection
      - Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
      - Type: Must be an instance of Tabular Connection
      - Nested Model: Tabular Connection
        
        Required Input
        
        Connection: The connection type to use to access the source data.
        
        Name: tabular_connection
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: Must be one of the following
        
        SQL Server Connection
        
        Required Input
        
        Database Resource: The name of the database resource to connect to.
        
        Name: database_resource
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Database Name: The name of the database to connect to.
        
        Name: database_name
        
        Tooltip:
        
        Detail:
        
        Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Table Name: The name of the table to use.
        
        Name: table_name
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        MetaFileSystem Connection
        
        Required Input
        
        Connection Key: The MetaFileSystem connection key.
        
        Name: connection_key
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: MetaFileSystemConnectionKey
        
        File Path: The full file path to the file to ingest.
        
        Name: file_path
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Partitioned MetaFileSystem Connection
        
        Required Input
        
        Connection Key: The MetaFileSystem connection key.
        
        Name: connection_key
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: MetaFileSystemConnectionKey
        
        File Type: The type of files to read from the directory.
        
        Name: file_type
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: FileExtensions_
        
        Directory Path: The full directory path containing partitioned tabular files.
        
        Name: directory_path
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
    - Train Model Setup: Select target, features, and model(s) to train the ml regression model(s).
      - Name: initial_model_feature_selection
      - Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
      - Type: Must be an instance of Configure 1 target model option
      - Nested Model: Configure 1 target model option
        
        Required Input
        
        Target: Select target (dependent variable) to train the ml_Regression model(s).
        
        Name: target_selection
        
        Tooltip:
        
        Validation Constraints:
        
        The input must have a minimum length of 1.
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Features: Select features (independent variables) to train the ml Regression model(s).
        
        Name: feature_selection
        
        Tooltip:
        
        Validation Constraints:
        
        The input must have a minimum length of 1.
        
        This input may be subject to other validation constraints at runtime.
        
        Type: list[str]
        
        Model(s): Select model(s) to train, with the top model finalized if more than 1 selected.
        
        Name: model_selection
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: list[str]
    - Additional Train Model Setup: Select additional target, features, and model(s) to train the ml regression model(s).
      - Name: additional_model_feature_selection
      - Tooltip:
        
        Validation Constraints:
        
        The input must have a maximum length of 5.
        
        This input may be subject to other validation constraints at runtime.
      - Type: list[RegressionTargetFeatureModelSelection]
    - Generate Data Exploration Artifact: Optionally generate data exploration artifact for ML Regression.
      - Name: show_data_exploration
      - Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
      - Type: bool
  - Optional Input
    - Model Training Configuration Parameters: A mix of required and optional constructor parameters for ML Regression.
      - Name: updated_init_params
      - Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
      - Type: Must be an instance of ML Regressor
      - Nested Model: ML Regressor
        
        Required Input
        
        Optimize Model: Metric to optimize during tuning (e.g. MAE, MSE, RMSE, or R-squared).
        
        Name: optimize_model
        
        Tooltip:
        
        Detail:
        
        MAE (Mean Absolute Error): Measures the average absolute difference between predicted and actual values. Lower MAE indicates better performance.
        
        MSE (Mean Squared Error): Computes the average squared differences between predicted and actual values. It penalizes larger errors more heavily; lower MSE is preferable.
        
        RMSE (Root Mean Squared Error): The square root of MSE, providing error in the same units as the target variable. Lower RMSE signifies better model accuracy.
        
        R2 (R-squared): Represents the proportion of variance in the target variable explained by the model. Values closer to 1 indicate a better fit.
        
        MAPE (Mean Absolute Percentage Error): Expresses the prediction accuracy as a percentage, offering an intuitive error measure. Lower percentages denote higher accuracy.
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Training Size: Default is 0.8 (80%). Value must be between 1 and 0.
        
        Name: train_size
        
        Tooltip:
        
        Validation Constraints:
        
        The input must be greater than 0.
        
        The input must be less than 1.
        
        This input may be subject to other validation constraints at runtime.
        
        Type: float
        
        Advanced Cross-Validation: Optional cross-validation settings. None accepts default settings.
        
        Name: fold_splitting_options
        
        Tooltip:
        
        Detail:
        
        K-Fold: Provides a general approach for cross-validation in regression tasks. It partitions the dataset into k folds, then iteratively trains on k-1 folds while testing on the remaining fold to provide robust estimates of model performance..
        
        Time Series Split: Ensures temporal order is maintained for forecasting but does not shuffle data, which can limit variance reduction.
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: Must be an instance of Advanced Fold Options
        
        Nested Model: Advanced Fold Options
        
        Required Input
        
        Folds: Number of subsets or splits used in cross-validation. Default is 10 but can specify any number of folds between 1 and 20.
        
        Name: fold_num
        
        Tooltip:
        
        Validation Constraints:
        
        The input must be greater than 0.
        
        The input must be less than 20.
        
        This input may be subject to other validation constraints at runtime.
        
        Type: int
        
        Fold Shuffle: Controls the shuffle parameter of cross-validation. Only applicable when fold_strategy is K-Fold.
        
        Name: fold_shuffle
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: bool
        
        Fold Strategy: Choice of cross validation strategy. Default is K-Fold.
        
        Name: fold_strategy
        
        Tooltip:
        
        Detail:
        
        K-Fold: Provides a general approach for cross-validation in regression tasks. It partitions the dataset into k folds, then iteratively trains on k-1 folds while testing on the remaining fold to provide robust estimates of model performance..
        
        Time Series Split: Ensures temporal order is maintained for forecasting but does not shuffle data, which can limit variance reduction.
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Optional Input
        
        Session ID: Optional Model Session ID is used to ensure reproducibility of experiments. Defaults to 42.
        
        Name: session_id
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: Optional[int]
- Artifacts:
  - Regression Train Report: A comprehensive Regression training report of the dataset along with relevant training data, metrics and charts.
    - Qualified Key Annotation: train_report
    - Aggregate Artifact: False
    - In-Memory Json Accessible: False
    - File Annotations:
      - artifacts_/@train_report/data_/document.pdf
        
        A pdf variant of the html file. Please note the interactivity that may be found in the html is lost within the pdf variant.
      - artifacts_/@train_report/data_/html_content.html
        
        The html content.
  - Training Dataset: The full dataset.
    - Qualified Key Annotation: train_data
    - Aggregate Artifact: False
    - In-Memory Json Accessible: False
    - File Annotations:
      - artifacts_/@train_data/data_/data_<int>.parquet
        
        A partitioned set of parquet files where each file will have no more than 1000000 rows.
  - Dynamic Artifacts Metadata: Contains metadata for the dynamic artifacts that are generated at runtime for this method.
    - Qualified Key Annotation: dynamic_artifacts_metadata
    - Aggregate Artifact: False
    - In-Memory Json Accessible: True
    - File Annotations:
      - artifacts_/@dynamic_artifacts_metadata/data_/data.json
        
        Stored json data.
      - artifacts_/@dynamic_artifacts_metadata/data_/schema.json
        
        The json schema of the json object stored in the 'data.json' file

Interface Definitions

No interface definitions found for this routine

Developer Docs

Routine Typename: MLRegressor

Method Name	Artifact Keys
`__init__`	N/A
`create_web_app`	web_app
`predict`	prediction_report, prediction_output, shap_values
`predict_shap_interpretation`	shap_values, waterfall_plot, summary_plot
`train`	train_report, train_data, dynamic_artifacts_metadata

Versions​

v2.0.0​

Basic Information​

Tags​

Description​

Short Description​

Long Description​

Use Cases​

1. House Price Prediction (Regression)​

2. Revenue Forecasting (Regression)​

Routine Methods​

1. Init (Constructor)​

2. Create Web App (Method)​

3. Predict (Method)​

4. Predict Shap Interpretation (Method)​

5. Train (Method)​

Interface Definitions​

Developer Docs​