Experiment Faster with Prediction Simulator

Author: Connor Catallo, Created: 2024-10-31

Introduction

The Prediction Simulator is a key routine within the SensibleAI Studio (STU). As a power user, running jobs can consume a lot of time throughout Rapid Project Experimentation (RPE). The Prediction Simulator routine allows power users to run their jobs sequentially and automatically without any delays in between. The tool includes capabilities to kick-off and queue the following jobs for any given project that is through the data load phase: pipeline, deploy, prediction, rebuild, and project copy. These capabilities can help improve the efficiency of a power user when dealing with multiple projects and jobs, especially when it comes to running predictions.

info

As of the SensibleAI Forecast v4.0.0 release, the Prediction Simulator is natively integrated into SensibleAI Forecast as part of the Model Build → Pipeline → Run page.

Key Terms

Routine: A piece of data science functionality catered toward solving a particular use case. All available routines may be found in the Explore Page of the SensibleAI Studio.

Routine Instance: A user-created routine object that is required in order to execute a routine method. Routine instances may be created from the Explore Page and viewed/invoked from the Runs Page.

Routine Method: A discrete, executable function that is specific to a given routine. Methods must be called/invoked on instances that have been created.

Routine Run: The term used to describe a routine method that has been executed by a user. Routine instance creations as well as routine method executions are considered routine runs.

Stateful Routine: A routine that maintains internal state. These routines keep track of internal data that may be updated when its routine methods are invoked. Stateful routines will typically require input parameters when creating a routine instance.

Stateless Routine: A routine that does not maintain any sort of internal state. Stateless routine methods can be described as static. Stateless routines will not require any input parameters when creating a routine instance.

Solution Overview

There are two main ways in which the Prediction Simulator may be executed.

SensibleAI Studio UI: Through UI invocation. This option is explored in more detail throughout this article.
SensibleAI Forecast: As part of the v4.0.0 release, the Prediction Simulator routine in natively integrated and is usable as part of the Pipeline → Run page.

Usage and Functionality

In order to utilize the Prediction Simulator during RPE, an SensibleAI Forecast project must be created and be through the data load steps. The prediction simulator allows users to run work units (jobs) associated with a project. The order of the jobs must be specified within the component workflow when defining the parameters of the routine method. Failing to follow the proper order of operations will result in the following:

This functionality can be extremely valuable when running multiple predictions. Once the best project is chosen during RPE, it is time to deploy and run predictions in order to compare results to customer benchmark data. The prediction simulator allows this to occur within one tool without having to upload multiple ranges of data into the Data Manipulator (DMA). When the routine instance is created, a user will specify the dataset to be used. Typically, when running a prediction, datasets need to be broken up to satisfy forecast horizons in order to adequately compare model predictions to benchmarks. However, when using the Prediction Simulator, the full set of target actuals and features should be passed as an input within the routine instance. The Prediction Simulator will not make any changes to the underlying datasets during a routine run. Under the hood, the routine will create additional tables based on the datasets specified, but there will never be any direct changes to the inputted project or routine datasets. When running a routine method against this instance, execution dates are required to let the simulator know when to start the prediction:

This execution date tells the system to only make the prediction based on actual data prior to the given date. The below screenshot gives a picture of how this would look within Utilization of SensibleAI Forecast. Though the dataset that was inputted into the routine instance goes through 2018, the actuals data within the graph stops at 2/1/2018, which was the execution date that was inputted for the Scenario 4 prediction:

The efficiency of a power user within RPE can be drastically improved by deploying the project and running multiple predictions within just one routine method. Not to mention, additional predictions can be invoked against the same routine instance without having to create a new routine instance or update the initial target or feature datasets. This is key functionality that differentiates the Prediction Simulator tool from the SIM solution.

Prediction Simulator Routine Instance vs Method

Upon creating the instance of the routine, it is required to specify a given project, source dataset, and optional feature set that the routine will be used for. Therefore, a new routine instance must be defined for each project and dataset that a power user is intending on running through the simulator. It is important to note that on STU SV100, the dataset cannot be updated after it is defined within the routine instance, therefore a new instance must be created if additional data points are added to the set. Because this routine is stateful, multiple methods can be used against the same routine instance without having to create a new routine instance each time. For example, a method can be invoked to run data pipeline and deploy jobs one day on a project, and the same routine instance can be used to invoke a method that runs multiple predict jobs on the same project the next day. In summary, creating a Prediction Simulator routine is required when applying a new project and dataset. The inability to update data within an existing routine is a limitation of the tool in its current state. Conversely, multiple methods can be invoked against a routine if the project and datasets have not changed.

JSON vs Interactive Input

When creating a routine instance or routine method, a user has the option to choose between Interactive or JSON input methodology. When first invoking a method, the interactive methodology is helpful to use in order to properly understand the inputs the method can read. The prompts are clear to read and follow, guiding end users in a sequential order:

On the other hand, as someone who is familiar with the structure of the tool, utilizing JSON code to manage inputs can generate further efficiency while running methods. The following code provides an easy to read and sequential format to follow when inputting parameters into a routine method. These code snippets can be copied and updated according to the parameters desired in a routine method:

JSON Code	Explanation
`{ "workUnitOptions": { "work_unit_type": "pipeline", "model_build_enddate": "2015-01-31T00:00:00" } }`	This code allows a user to automatically run a data pipeline job within the model build phase of RPE. The end date parameter should match the last day of actual data that was loaded into the model build.
`{ "workUnitOptions": { "work_unit_type": "deploy" } }`	This code will run the deploy job for the project that is specified within the routine instance.
`{ "workUnitOptions": { "work_unit_type": "prediction", "scenario_name": "Scenario", "execution_date": "2015-02-01T00:00:00", "include_softtrain": true } }`	This code will kick-off a prediction job with a specified scenario name and execution date. In this scenario, the prediction will start on 2/1/2015, utilizing actual data prior to this date to make the prediction. Marking softtrain as true will retrain the models using the historical data as well as the most recent month of actuals. On the other hand, when it’s false, predictions will only be made using data up until the model build end date.
`{ "workUnitOptions": { "work_unit_type": "project_copy", "project_copy_name": "CCCopiedProject" } }`	This code will kick off a copy job. Project copy name is the only parameter that would need to be updated here.
`{ "workUnitOptions": { "work_unit_type": "rebuild", "model_build_end_date": "2015-03-01T00:00:00" } }`	This code will run a rebuild job for a productionized project if needed. The model build end date specifies the cut-off date for data that is used to train the model.

JSON Code

Explanation

{
    "workUnitOptions": {
        "work_unit_type": "pipeline",
        "model_build_enddate": "2015-01-31T00:00:00"
    }
}

This code allows a user to automatically run a data pipeline job within the model build phase of RPE. The end date parameter should match the last day of actual data that was loaded into the model build.

{
    "workUnitOptions": {
        "work_unit_type": "deploy"
    }
}

This code will run the deploy job for the project that is specified within the routine instance.

{
    "workUnitOptions": {
        "work_unit_type": "prediction",
        "scenario_name": "Scenario",
        "execution_date": "2015-02-01T00:00:00",
        "include_softtrain": true
    }
}

This code will kick-off a prediction job with a specified scenario name and execution date. In this scenario, the prediction will start on 2/1/2015, utilizing actual data prior to this date to make the prediction. Marking softtrain as true will retrain the models using the historical data as well as the most recent month of actuals. On the other hand, when it’s false, predictions will only be made using data up until the model build end date.

{
    "workUnitOptions": {
        "work_unit_type": "project_copy",
        "project_copy_name": "CCCopiedProject"
    }
}

This code will kick off a copy job. Project copy name is the only parameter that would need to be updated here.

{
    "workUnitOptions": {
        "work_unit_type": "rebuild",
        "model_build_end_date": "2015-03-01T00:00:00"
    }
}

This code will run a rebuild job for a productionized project if needed. The model build end date specifies the cut-off date for data that is used to train the model.

Conclusion

In the SensibleAI Studio, the Prediction Simulator makes it easy to run multiple jobs against a project sequentially once a project is through the data load steps, without losing time in between. The routine method run will clarify execution dates so the user does not have to upload multiple datasets with different date ranges, providing a more efficient experience for an implementer in RPE. Because this routine is stateful, multiple different methods can be invoked against the same routine instance when running jobs for a specific project. However, if there are datasets that require any changes for that project, a new instance of the routine must be created.

Introduction​

Key Terms​

Solution Overview​

Usage and Functionality​

Prediction Simulator Routine Instance vs Method​

JSON vs Interactive Input​

Conclusion​