Skip to main content

Merging and Overlaying Forecasts

Author: Matthew Rutty, Created: 2024-04-01

During the course of Rapid Project Experimentation (RPE), Proof of Values (POVs), or full-scale engagements, what we return from a forecast table isn’t necessarily representative of what we want our final project to be. For example, consider a situation in which a customer wants to have two different forecasts, overlayed into one, such as a 5-week and 52-week forecast. In this situation, you would want to end up with just one, 52-week forecast, where the first 5 weeks are taken up by the short term forecast (with the added benefit of less lagged features), and the next 47 are from the annual forecast.

In a similar vein, we could have multiple values in the ForecastName field from our cascaded models. If we did Scenario Modeling over the course of a multi-forecast period, we might end up with various forecasts of different lengths. Consider in the photo below, where our “Baseline” scenario extends all the way from the first to last forecast date, but our Plant Shutdown and Shifted Shutdown scenarios were only executed over a few months. We want to slice those 9 months of baseline into the history of our plant shutdown and shifted shutdowns to integrate the full length of our forecasts for OneStream actuals dashboarding.

image-20240402-023909.png

info

As part of the SensibleAI Forecast v3.4.0, SensibleAI Forecast natively supports Forecast Overlays by allowing the user to specify multiple forecast ranges. The remaining sections of the articles are only applicable for users on a release version earlier than v3.4.0.

This article will explain the methodology of solving these two problems, as well as provide sample SQL Server code for execution in RPE and deployed model forecasting, or for inspiration integrating similar concepts into C## data pipelines in full engagements. This guide will start with the source data tables being in the structure of an exported consumption group, staged in Xperiflow Data Tools (XDT).

Overlaying Forecasts

Overlaying forecasts is a simple concept to grasp. Utilizing the example in the introduction section, we will consider a situation with 2 forecasts (5 week and 52 week) we want to overlay.

The code below can be broken into a few parts:

  1. Establishing the Tables: The 5 week table operates with the a. alias, while the 52 week table in our example operates with the b. alias.
  2. Inner Join Clause: This is the critical clause of the query. It establishes a unique set of intersections from the cascaded table, with the combination of your Features, SeriesName, ForecastName, ForecastStartDate, and the maximum date for that ForecastStartDate.
note

Because we use the maximum Date for each ForecastStartDate, we can more simply overlap the 5 week and 52 week forecasts without the need for a complex query.

  1. Final Select Statement: We establish our new table in the select statement at the end, sorting by our desired features.
WITH Combined AS (
SELECT
Date,
[FeatureName01], [FeatureName02], [FeatureName03],
SeriesName, ForecastName, Value,
ForecastStartDate, Model, TargetName, ModelRank,
PredictionCallID, PredictionScheduledTime,
XperimentKernelID, XperimentBuildID,
XperimentSetID, BuildInfoID,
ConsumptionID, ProjectID, ConsumptionRunID,
ConsumptionRunTime, ModelCategory,
CascadedSeriesDetail, SeriesGenerator
FROM SML_[DatasetName]_5WK_P01V01_DMF01_W_CSC01

UNION ALL

SELECT
b.Date,
b.[FeatureName01], b.[FeatureName02], b.[FeatureName03],
b.SeriesName, b.ForecastName, b.Value,
b.ForecastStartDate, b.Model, b.TargetName,
b.ModelRank, b.PredictionCallID,
b.PredictionScheduledTime, b.XperimentKernelID,
b.XperimentBuildID, b.XperimentSetID,
b.BuildInfoID, b.ConsumptionID,
b.ProjectID, b.ConsumptionRunID, b.ConsumptionRunTime,
b.ModelCategory, b.CascadedSeriesDetail, b.SeriesGenerator
FROM SML_[DatasetName]_52WK_P01V01_DMF01_W_CSC01 b
INNER JOIN (
SELECT
[FeatureName01], [FeatureName02], [FeatureName03],
SeriesName, ForecastName, ForecastStartDate,
MAX(Date) as MaxDate
FROM SML_[DatasetName]_5WK_P01V01_DMF01_W_CSC01
GROUP BY
[FeatureName01], [FeatureName02], [FeatureName03],
SeriesName, ForecastName, ForecastStartDate
) a
ON b.[FeatureName01] = a.[FeatureName01]
AND b.[FeatureName02]= a.[FeatureName02]
AND b.[FeatureName03] = a.[FeatureName03]
AND b.SeriesName = a.SeriesName
AND b.ForecastName = a.ForecastName
AND b.ForecastStartDate = a.ForecastStartDate
WHERE b.Date > a.MaxDate
)

SELECT *
INTO SML_[DatasetName]_5WK_52WK_P01V01_DMF01_W_CSC01
FROM Combined
ORDER BY
[FeatureName01], [FeatureName02], [FeatureName03],
SeriesName, ForecastName, ForecastStartDate, Date;

The result of this query gives an overlayed table, with 52 results (52 weeks) for each unique combination of Target Intersection, SeriesName, ForecastName, and ForecastStartDate. The first 5 are from our 5 week table, the last 47 from the 52 week table.

Merging Forecasts

Now consider a similar situation to the one outlined in the image from the introduction. We have 3 forecasts, but due to unpredictability of our forecast, or desire to limit our computation time, we’ve only developed one full-length scenario for the full time, where a couple of the others start at a point in time. In this situation, we would want to merge them together. To resolve this, we will use a Common Table Expression (CTE) in SQL.

The code below can be broken into a few parts:

  1. Create Original Table: We begin with creating a table that is identical to the Baseline scenario before our start date, but with the name of one of the ForecastName which is lacking history.
  2. Insert Next ForecastName Into: Because of the nature of CTEs, we must use INSERT INTO to continue to add data to this table. We do the same thing as the previous query, but with another ForecastName.
  3. Insert the Rest Into Our Table: Taking stock of what we have at this point is a table is the Baseline Data from before our start date for both of the other ForecastName. This means we can just add the original table and have a full, complete set of all 3 forecasts with the full history. We can then sort and organize the table as we see fit.
caution

Important Note: If you’re keeping track, we actually have 3 Actuals Series (one for each ForecastName). The where clause at the end intends to remove two of those actuals series from the other forecasts. This confirms we won’t have any issues putting this duplicate data into OneStream, even though it could be filtered out in the data connector to just be one ForecastName (like Baseline).

SELECT
Date,
[FeatureName01], [FeatureName02], [FeatureName03],
SeriesName, '[ForecastName01]' AS ForecastName,
Value, ForecastStartDate, Model,
TargetName, ModelRank, PredictionCallID,
PredictionScheduledTime, XperimentKernelID,
XperimentBuildID, XperimentSetID,
BuildInfoID, ConsumptionID, ProjectID,
ConsumptionRunID, ConsumptionRunTime,
ModelCategory, CascadedSeriesDetail,
SeriesGenerator
INTO [MergedTable]
FROM
SML_[ProjectName]_P01V01_DMF01_W_CSC01
WHERE
ForecastName = '[Baseline]'
AND ForecastStartDate < '[MergeStartDate]';

INSERT INTO [MergedTable] (
Date,
[FeatureName01], [FeatureName02], [FeatureName03],
SeriesName, ForecastName, Value,
ForecastStartDate, Model, TargetName,
ModelRank, PredictionCallID,
PredictionScheduledTime, XperimentKernelID,
XperimentBuildID, XperimentSetID,
BuildInfoID, ConsumptionID,
ProjectID, ConsumptionRunID,
ConsumptionRunTime, ModelCategory,
CascadedSeriesDetail,
SeriesGenerator
)
SELECT
Date,
[FeatureName01], [FeatureName02], [FeatureName03],
SeriesName, '[ForecastName02]' AS ForecastName,
Value, ForecastStartDate, Model,
TargetName, ModelRank, PredictionCallID,
PredictionScheduledTime, XperimentKernelID,
XperimentBuildID, XperimentSetID,
BuildInfoID, ConsumptionID, ProjectID,
ConsumptionRunID, ConsumptionRunTime,
ModelCategory, CascadedSeriesDetail,
SeriesGenerator
FROM
SML_[ProjectName]_P01V01_DMF01_W_CSC01
WHERE
ForecastName = '[Baseline]'
AND ForecastStartDate < '[MergeStartDate]';

INSERT INTO [MergedTable] (
Date,
[FeatureName01], [FeatureName02],[FeatureName03],
SeriesName, ForecastName, Value,
ForecastStartDate, Model,
TargetName, ModelRank, PredictionCallID,
PredictionScheduledTime, XperimentKernelID,
XperimentBuildID, XperimentSetID,
BuildInfoID, ConsumptionID, ProjectID,
ConsumptionRunID, ConsumptionRunTime,
ModelCategory, CascadedSeriesDetail,
SeriesGenerator
)
SELECT
Date,
[FeatureName01], [FeatureName02], [FeatureName03],
SeriesName, ForecastName, Value,
ForecastStartDate, Model, TargetName,
ModelRank, PredictionCallID,
PredictionScheduledTime, XperimentKernelID,
XperimentBuildID, XperimentSetID,
BuildInfoID, ConsumptionID, ProjectID,
ConsumptionRunID, ConsumptionRunTime,
ModelCategory, CascadedSeriesDetail,
SeriesGenerator
FROM
SML_[ProjectName]_P01V01_DMF01_W_CSC01
WHERE NOT (
SeriesName = 'Actuals'
AND ForecastName NOT IN ('[ForecastName01]', '[ForecastName02]));

Wrap Up

Forecasting data in SensibleAI Forecast is extremely powerful. Scenario Modeling and Model Overlays augment the strengths of this forecasting and allows us customizable solutions for customers. However, we must use SQL Queries to refine the output and make it into a digestible output for custom dashboards and build-out of OneStream solutions.

Was this page helpful?