MLRegressor
Versions
v2.0.0
Basic Information
Class Name: MLRegressor
Title: ML Regression
Version: 2.0.0
Author: Jeff Robinson
Organization: OneStream
Creation Date: 2025-02-18
Default Routine Memory Capacity: 2.0 GB
Tags
ML, Regression, Linear Models, Tree Models
Description
Short Description
A routine that performs a machine learning regression task.
Long Description
This routine is designed to perform a machine learning regression task. The routine will take in a dataset and perform a regression task on the dataset. The routine will output a model that can be used to predict future values.
Use Cases
1. House Price Prediction (Regression)
House price prediction is a regression problem where the goal is to forecast a continuous outcome—the market value of a property. Using a historical dataset containing various features such as location, square footage, number of bedrooms, and other relevant attributes, a regression model can be trained to predict property prices. The model learns from trends and relationships within the data to provide accurate pricing estimates, enabling stakeholders to make informed decisions regarding investments, pricing strategies, and market analysis.
2. Revenue Forecasting (Regression)
Revenue forecasting is a regression task designed to predict continuous financial outcomes based on historical and current data. In a CPM environment, this involves analyzing past revenue trends, operational metrics, market conditions, and expenditure patterns to build a model that estimates future revenue. The resulting forecasts support budgeting, resource allocation, and strategic decision-making, enabling businesses to proactively manage financial performance, identify growth opportunities, and mitigate risks.
Routine Methods
1. Init (Constructor)
- Method:
__init__-
Type: Constructor
-
Memory Capacity: 2.0 GB
-
Allow In-Memory Execution: No
-
Read Only: No
-
Method Limits: There are no limits for the constructor method.
-
Outputs Dynamic Artifacts: No
-
Short Description:
- The constructor for the ML Regression Routine.
-
Detailed Description:
- This constructor sets up the routine with the necessary API instance and parameters for training and evaluation of a regression model. The constructor is responsible for initializing the routine with the necessary data and configuration parameters.
-
Inputs:
- Required Input
- Model Training Configuration Parameters: A mix of required and optional constructor parameters for ML Regression.
- Name:
regression_constructor_params - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be an instance of ML Regressor
- Nested Model: ML Regressor
- Required Input
- Optimize Model: Metric to optimize during tuning (e.g. MAE, MSE, RMSE, or R-squared).
- Name:
optimize_model - Tooltip:
- Detail:
- MAE (Mean Absolute Error): Measures the average absolute difference between predicted and actual values. Lower MAE indicates better performance.
- MSE (Mean Squared Error): Computes the average squared differences between predicted and actual values. It penalizes larger errors more heavily; lower MSE is preferable.
- RMSE (Root Mean Squared Error): The square root of MSE, providing error in the same units as the target variable. Lower RMSE signifies better model accuracy.
- R2 (R-squared): Represents the proportion of variance in the target variable explained by the model. Values closer to 1 indicate a better fit.
- MAPE (Mean Absolute Percentage Error): Expresses the prediction accuracy as a percentage, offering an intuitive error measure. Lower percentages denote higher accuracy.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Training Size: Default is 0.8 (80%). Value must be between 1 and 0.
- Name:
train_size - Tooltip:
- Validation Constraints:
- The input must be greater than 0.
- The input must be less than 1.
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: float
- Name:
- Advanced Cross-Validation: Optional cross-validation settings. None accepts default settings.
- Name:
fold_splitting_options - Tooltip:
- Detail:
- K-Fold: Provides a general approach for cross-validation in regression tasks. It partitions the dataset into k folds, then iteratively trains on k-1 folds while testing on the remaining fold to provide robust estimates of model performance..
- Time Series Split: Ensures temporal order is maintained for forecasting but does not shuffle data, which can limit variance reduction.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: Must be an instance of Advanced Fold Options
- Nested Model: Advanced Fold Options
- Required Input
- Folds: Number of subsets or splits used in cross-validation. Default is 10 but can specify any number of folds between 1 and 20.
- Name:
fold_num - Tooltip:
- Validation Constraints:
- The input must be greater than 0.
- The input must be less than 20.
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: int
- Name:
- Fold Shuffle: Controls the shuffle parameter of cross-validation. Only applicable when fold_strategy is K-Fold.
- Name:
fold_shuffle - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: bool
- Name:
- Fold Strategy: Choice of cross validation strategy. Default is K-Fold.
- Name:
fold_strategy - Tooltip:
- Detail:
- K-Fold: Provides a general approach for cross-validation in regression tasks. It partitions the dataset into k folds, then iteratively trains on k-1 folds while testing on the remaining fold to provide robust estimates of model performance..
- Time Series Split: Ensures temporal order is maintained for forecasting but does not shuffle data, which can limit variance reduction.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Folds: Number of subsets or splits used in cross-validation. Default is 10 but can specify any number of folds between 1 and 20.
- Required Input
- Name:
- Optimize Model: Metric to optimize during tuning (e.g. MAE, MSE, RMSE, or R-squared).
- Optional Input
- Session ID: Optional Model Session ID is used to ensure reproducibility of experiments. Defaults to 42.
- Name:
session_id - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Optional[int]
- Name:
- Session ID: Optional Model Session ID is used to ensure reproducibility of experiments. Defaults to 42.
- Required Input
- Name:
- Model Training Configuration Parameters: A mix of required and optional constructor parameters for ML Regression.
- Required Input
-
Artifacts: No artifacts are returned by this method
-
2. Create Web App (Method)
- Method:
create_web_app-
Type: Method
-
Memory Capacity: 2.0 GB
-
Allow In-Memory Execution: No
-
Read Only: No
-
Method Limits: There are no limits for this method. It is expected to complete very quickly with a small memory allocation.
-
Outputs Dynamic Artifacts: No
-
Short Description:
- Creates the web app and then passes the newly created web app object into an artifact.
-
Detailed Description:
- The run id from this routine method along with the routine instance id are used to create the URL for the web app.
-
Inputs:
- Optional Input
- Select Formatting for SHAP Artifacts: Optionally choose how many decimal places the SHAP output should be rounded to.
- Name:
format_output - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Optional[str]
- Name:
- Select Formatting for SHAP Artifacts: Optionally choose how many decimal places the SHAP output should be rounded to.
- Optional Input
-
Artifacts:
- ML Regression Web App: Dashboard to analyze results from the MLRegression predict and train routine runs.
- Qualified Key Annotation:
web_app - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@web_app/data_/data.appref- json file of data relating to web app
- Qualified Key Annotation:
- ML Regression Web App: Dashboard to analyze results from the MLRegression predict and train routine runs.
-
3. Predict (Method)
- Method:
predict-
Type: Method
-
Memory Capacity: 2.0 GB
-
Allow In-Memory Execution: No
-
Read Only: No
-
Method Limits: This method is influenced primarily by dataset size. For a dataset with 20M rows and 8 feature columns, this method has been known to take between 20-30 minutes to complete with 10GB of memory allocated. With 10M rows and 8 feature columns, it will take closer to 5-10 minutes for this method to complete with 10GB of memory allocated.
-
Outputs Dynamic Artifacts: No
-
Short Description:
- Predict regression model data using the provided parameters.
-
Detailed Description:
- This method predicts the regression model data using the provided parameters. This method also generates SHAP values for model interpretability.
-
Inputs:
- Required Input
- Source Connection: The connection information source data.
- Name:
data_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be an instance of Tabular Connection
- Nested Model: Tabular Connection
- Required Input
- Connection: The connection type to use to access the source data.
- Name:
tabular_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be one of the following
- SQL Server Connection
- Required Input
- Database Resource: The name of the database resource to connect to.
- Name:
database_resource - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Name: The name of the database to connect to.
- Name:
database_name - Tooltip:
- Detail:
- Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Table Name: The name of the table to use.
- Name:
table_name - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Resource: The name of the database resource to connect to.
- Required Input
- MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Path: The full file path to the file to ingest.
- Name:
file_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- Partitioned MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Type: The type of files to read from the directory.
- Name:
file_type - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: FileExtensions_
- Name:
- Directory Path: The full directory path containing partitioned tabular files.
- Name:
directory_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- SQL Server Connection
- Name:
- Connection: The connection type to use to access the source data.
- Required Input
- Name:
- Index Selection: Index field to be included in prediction output.
- Name:
index_selection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: list[str]
- Name:
- Source Connection: The connection information source data.
- Optional Input
- Trained Model: Trained model for making predictions.
- Name:
model_name - Tooltip:
- Detail:
- Trained models available for this routine instance.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: Optional[str]
- Name:
- Trained Model: Trained model for making predictions.
- Required Input
-
Artifacts:
-
Regression Prediction Report: A Regression prediction report containing top 10 predictions (by index), Prediction Distribution Plot and Sorted Prediction Plot.
- Qualified Key Annotation:
prediction_report - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@prediction_report/data_/document.pdf- A pdf variant of the html file. Please note the interactivity that may be found in the html is lost within the pdf variant.
artifacts_/@prediction_report/data_/html_content.html- The html content.
- Qualified Key Annotation:
-
Prediction Output: The full prediction dataframe from the predict routine.
- Qualified Key Annotation:
prediction_output - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@prediction_output/data_/data_<int>.parquet- A partitioned set of parquet files where each file will have no more than 1000000 rows.
- Qualified Key Annotation:
-
SHAP Values: The SHAP values from the prediction.
- Qualified Key Annotation:
shap_values - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@shap_values/data_/data_<int>.parquet- A partitioned set of parquet files where each file will have no more than 1000000 rows.
- Qualified Key Annotation:
-
-
4. Predict Shap Interpretation (Method)
- Method:
predict_shap_interpretation-
Type: Method
-
Memory Capacity: 2.0 GB
-
Allow In-Memory Execution: Yes
-
Read Only: No
-
Method Limits: This method is influenced by the size of the input dataset and the number of selected rows to perform SHAP analysis on. For a dataset with 20M rows, performing SHAP analysis on 2M of those rows takes around 15 minutes to complete with a 20GB memory allocation.
-
Outputs Dynamic Artifacts: No
-
Short Description:
- Predict model outputs based on the provided parameters, and return SHAP-based explanations.
-
Detailed Description:
- This method uses SHAP (SHapley Additive exPlanations) to interpret the predictions made by the regression model. It provides insights into the model's behavior and the contribution of each feature to the prediction. The method generates SHAP values, a waterfall plot for a specific prediction, and a summary plot to visualize the model's predictions and feature contributions. The SHAP values represent the impact of each feature on the prediction, and the waterfall plot provides a visual representation of how each feature contributes to the final prediction.
-
Inputs:
- Required Input
- Source Connection: The connection information source data.
- Name:
data_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be an instance of Tabular Connection
- Nested Model: Tabular Connection
- Required Input
- Connection: The connection type to use to access the source data.
- Name:
tabular_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be one of the following
- SQL Server Connection
- Required Input
- Database Resource: The name of the database resource to connect to.
- Name:
database_resource - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Name: The name of the database to connect to.
- Name:
database_name - Tooltip:
- Detail:
- Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Table Name: The name of the table to use.
- Name:
table_name - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Resource: The name of the database resource to connect to.
- Required Input
- MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Path: The full file path to the file to ingest.
- Name:
file_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- Partitioned MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Type: The type of files to read from the directory.
- Name:
file_type - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: FileExtensions_
- Name:
- Directory Path: The full directory path containing partitioned tabular files.
- Name:
directory_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- SQL Server Connection
- Name:
- Connection: The connection type to use to access the source data.
- Required Input
- Name:
- Row Selection: Row to be used for SHAP insights.
- Name:
row_selection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: list[int]
- Name:
- Source Connection: The connection information source data.
- Optional Input
- Trained Model: Trained model for making predictions.
- Name:
model_name - Tooltip:
- Detail:
- Trained models available for this routine instance.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: Optional[str]
- Name:
- Trained Model: Trained model for making predictions.
- Required Input
-
Artifacts:
-
SHAP Values: The SHAP values from the prediction.
- Qualified Key Annotation:
shap_values - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@shap_values/data_/data_<int>.parquet- A partitioned set of parquet files where each file will have no more than 1000000 rows.
- Qualified Key Annotation:
-
SHAP Waterfall plot: The SHAP waterfall plot for the prediction.
- Qualified Key Annotation:
waterfall_plot - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@waterfall_plot/data_/document.pdf- A pdf variant of the html file. Please note the interactivity that may be found in the html is lost within the pdf variant.
artifacts_/@waterfall_plot/data_/html_content.html- The html content.
- Qualified Key Annotation:
-
SHAP Summary plot: The SHAP Summary plot for all data in dataframe.
- Qualified Key Annotation:
summary_plot - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@summary_plot/data_/document.pdf- A pdf variant of the html file. Please note the interactivity that may be found in the html is lost within the pdf variant.
artifacts_/@summary_plot/data_/html_content.html- The html content.
- Qualified Key Annotation:
-
-
5. Train (Method)
- Method:
train-
Type: Method
-
Memory Capacity: 2.0 GB
-
Allow In-Memory Execution: No
-
Read Only: No
-
Method Limits: The runtime of this method is influenced by the input dataset size and number of feature column selections. For a 20M row dataset with 1 feature column selected, this method is expected to complete in about 25 minutes with 20GB of memory allocated. With the same configurations but 6 feature columns instead of 1, this method takes around 28 minutes to complete.
-
Outputs Dynamic Artifacts: Yes
-
Short Description:
- Explore regression model data using the provided parameters.
-
Detailed Description:
- This method trains a regression model using the provided parameters. The method takes in a dataset and performs basic data exploration to understand the structure and characteristics of the data. The method generates data exploration artifacts, including summary statistics, visualizations, and insights into the dataset. The artifacts are used to guide the model training process and identify potential challenges or issues in the data.
-
Inputs:
- Required Input
- Source Connection: The connection information source data.
- Name:
data_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be an instance of Tabular Connection
- Nested Model: Tabular Connection
- Required Input
- Connection: The connection type to use to access the source data.
- Name:
tabular_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be one of the following
- SQL Server Connection
- Required Input
- Database Resource: The name of the database resource to connect to.
- Name:
database_resource - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Name: The name of the database to connect to.
- Name:
database_name - Tooltip:
- Detail:
- Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Table Name: The name of the table to use.
- Name:
table_name - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Resource: The name of the database resource to connect to.
- Required Input
- MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Path: The full file path to the file to ingest.
- Name:
file_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- Partitioned MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Type: The type of files to read from the directory.
- Name:
file_type - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: FileExtensions_
- Name:
- Directory Path: The full directory path containing partitioned tabular files.
- Name:
directory_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- SQL Server Connection
- Name:
- Connection: The connection type to use to access the source data.
- Required Input
- Name:
- Train Model Setup: Select target, features, and model(s) to train the ml regression model(s).
- Name:
initial_model_feature_selection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be an instance of Configure 1 target model option
- Nested Model: Configure 1 target model option
- Required Input
- Target: Select target (dependent variable) to train the ml_Regression model(s).
- Name:
target_selection - Tooltip:
- Validation Constraints:
- The input must have a minimum length of 1.
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Features: Select features (independent variables) to train the ml Regression model(s).
- Name:
feature_selection - Tooltip:
- Validation Constraints:
- The input must have a minimum length of 1.
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: list[str]
- Name:
- Model(s): Select model(s) to train, with the top model finalized if more than 1 selected.
- Name:
model_selection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: list[str]
- Name:
- Target: Select target (dependent variable) to train the ml_Regression model(s).
- Required Input
- Name:
- Additional Train Model Setup: Select additional target, features, and model(s) to train the ml regression model(s).
- Name:
additional_model_feature_selection - Tooltip:
- Validation Constraints:
- The input must have a maximum length of 5.
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: list[RegressionTargetFeatureModelSelection]
- Name:
- Generate Data Exploration Artifact: Optionally generate data exploration artifact for ML Regression.
- Name:
show_data_exploration - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: bool
- Name:
- Source Connection: The connection information source data.
- Optional Input
- Model Training Configuration Parameters: A mix of required and optional constructor parameters for ML Regression.
- Name:
updated_init_params - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be an instance of ML Regressor
- Nested Model: ML Regressor
- Required Input
- Optimize Model: Metric to optimize during tuning (e.g. MAE, MSE, RMSE, or R-squared).
- Name:
optimize_model - Tooltip:
- Detail:
- MAE (Mean Absolute Error): Measures the average absolute difference between predicted and actual values. Lower MAE indicates better performance.
- MSE (Mean Squared Error): Computes the average squared differences between predicted and actual values. It penalizes larger errors more heavily; lower MSE is preferable.
- RMSE (Root Mean Squared Error): The square root of MSE, providing error in the same units as the target variable. Lower RMSE signifies better model accuracy.
- R2 (R-squared): Represents the proportion of variance in the target variable explained by the model. Values closer to 1 indicate a better fit.
- MAPE (Mean Absolute Percentage Error): Expresses the prediction accuracy as a percentage, offering an intuitive error measure. Lower percentages denote higher accuracy.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Training Size: Default is 0.8 (80%). Value must be between 1 and 0.
- Name:
train_size - Tooltip:
- Validation Constraints:
- The input must be greater than 0.
- The input must be less than 1.
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: float
- Name:
- Advanced Cross-Validation: Optional cross-validation settings. None accepts default settings.
- Name:
fold_splitting_options - Tooltip:
- Detail:
- K-Fold: Provides a general approach for cross-validation in regression tasks. It partitions the dataset into k folds, then iteratively trains on k-1 folds while testing on the remaining fold to provide robust estimates of model performance..
- Time Series Split: Ensures temporal order is maintained for forecasting but does not shuffle data, which can limit variance reduction.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: Must be an instance of Advanced Fold Options
- Nested Model: Advanced Fold Options
- Required Input
- Folds: Number of subsets or splits used in cross-validation. Default is 10 but can specify any number of folds between 1 and 20.
- Name:
fold_num - Tooltip:
- Validation Constraints:
- The input must be greater than 0.
- The input must be less than 20.
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: int
- Name:
- Fold Shuffle: Controls the shuffle parameter of cross-validation. Only applicable when fold_strategy is K-Fold.
- Name:
fold_shuffle - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: bool
- Name:
- Fold Strategy: Choice of cross validation strategy. Default is K-Fold.
- Name:
fold_strategy - Tooltip:
- Detail:
- K-Fold: Provides a general approach for cross-validation in regression tasks. It partitions the dataset into k folds, then iteratively trains on k-1 folds while testing on the remaining fold to provide robust estimates of model performance..
- Time Series Split: Ensures temporal order is maintained for forecasting but does not shuffle data, which can limit variance reduction.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Folds: Number of subsets or splits used in cross-validation. Default is 10 but can specify any number of folds between 1 and 20.
- Required Input
- Name:
- Optimize Model: Metric to optimize during tuning (e.g. MAE, MSE, RMSE, or R-squared).
- Optional Input
- Session ID: Optional Model Session ID is used to ensure reproducibility of experiments. Defaults to 42.
- Name:
session_id - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Optional[int]
- Name:
- Session ID: Optional Model Session ID is used to ensure reproducibility of experiments. Defaults to 42.
- Required Input
- Name:
- Model Training Configuration Parameters: A mix of required and optional constructor parameters for ML Regression.
- Required Input
-
Artifacts:
-
Regression Train Report: A comprehensive Regression training report of the dataset along with relevant training data, metrics and charts.
- Qualified Key Annotation:
train_report - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@train_report/data_/document.pdf- A pdf variant of the html file. Please note the interactivity that may be found in the html is lost within the pdf variant.
artifacts_/@train_report/data_/html_content.html- The html content.
- Qualified Key Annotation:
-
Training Dataset: The full dataset.
- Qualified Key Annotation:
train_data - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@train_data/data_/data_<int>.parquet- A partitioned set of parquet files where each file will have no more than 1000000 rows.
- Qualified Key Annotation:
-
Dynamic Artifacts Metadata: Contains metadata for the dynamic artifacts that are generated at runtime for this method.
- Qualified Key Annotation:
dynamic_artifacts_metadata - Aggregate Artifact:
False - In-Memory Json Accessible:
True - File Annotations:
artifacts_/@dynamic_artifacts_metadata/data_/data.json- Stored json data.
artifacts_/@dynamic_artifacts_metadata/data_/schema.json- The json schema of the json object stored in the 'data.json' file
- Qualified Key Annotation:
-
-
Interface Definitions
No interface definitions found for this routine
Developer Docs
Routine Typename: MLRegressor
| Method Name | Artifact Keys |
|---|---|
__init__ | N/A |
create_web_app | web_app |
predict | prediction_report, prediction_output, shap_values |
predict_shap_interpretation | shap_values, waterfall_plot, summary_plot |
train | train_report, train_data, dynamic_artifacts_metadata |