Introduction to FVA Analysis - Win Margin TS (Filter) View
It is critical to leverage the Forecast Value Add solution during project experimentation to guide continuous improvement and after project experimentation to identify how an optimal project compares to the stakeholder’s benchmark forecasts. Specifically, the Win Margin view and the TS (Filter) tab can call out how and where a project wins or loses on individual targets. This article will explain the limitations of the Win Margin views, identifying what a user can and cannot conclude, as well as present a recommended path for leveraging this view to make decisions on how to improve project configurations and prioritize targets for deep dives on what is going wrong.
What you CAN and CAN’T Conclude
The basis of the Win Margin view on the Forecast Value Add dashboard is project comparison. The Win Margin metric itself can only be interpreted as an improvement or deterioration between two project runs. Therefore, statements like “Forecast configuration 1 is X% more accurate than Forecast configuration 2” or “Forecast 1 is more accurate for this subset of products, while Forecast 2 is more accurate for a different subset” are supported, while statements like “Forecast configuration 1 is X% accurate” or “We’re really good at forecasting for N targets” are not supported.
If you consider overall engagement goals, SensibleAI Forecast should deliver a highly accurate forecast that is more accurate than the current forecasting methodology. The Win Margin view (and by extension, the TS Filter tab) can tell an implementor by how much a SensibleAI Forecast forecast is more accurate than the current forecasting methodology but can make no statement on how accurate the forecast is in relation to actuals. This is important to keep in mind when comparing two SensibleAI Forecast forecasts; one might be better than the other, but no conclusion can be made on if either is “good” to begin with. All analysis in the Win Margin view must be conducted with the mindset of project comparison and how different elements can affect project performance.
FVA Snapshot Format
Based on what a power user can conclude from the Win Margin TS Filter view, it becomes very important to select which projects to include in the FVA table. a power user will gain the most understanding by choosing projects that are very similar in construction, differing by a minimal number of factors. If a power user compares two dissimilar projects with varying features, events, grouping techniques, granularities, or forecast horizons, it becomes impossible to disentangle what caused performance improvement. While a power user can determine which project is better, they will not be able to understand why that project is better or theorize future improvements. Given the near-infinite possibilities of project configurations, the power user must eliminate aimless experimentation. The greater insight the power user has on the dataset and which specific elements cause which specific accuracy improvements or drops, the more targeted the next project iterations can be.
Next, a power user should consider which dimensions to include on the FVA snapshot. At a bare minimum, the snapshot dimensions should consist of the Forecast Start Date (if looking at Deployed Model Forecasts) and every target dimension that was included in the model build, including Location if applicable. Finally, since my recommendation for utilizing this view includes exporting the underlying data for deeper Excel analysis, I suggest including auxiliary categorical data that exists with the data but not necessarily in the model build target dimensions. This not only includes higher-level dimensions like part family, market segment, global subregion, or region but also target metadata like number of data points or earliest start date. The reason for these extra factors is to allow similarity analysis beyond what is apparent by the targets alone. For example, a power user might note that forecasts for several end customers are consistently poor when compared to a benchmark forecast. However, when the power user includes subregion as a parameter, they might notice that all of these end-customers are in the same region and that the forecasting failures are limited to a particular region. Therefore, their work may shift towards exploring how to improve accuracy for the entire region, rather than independently for specific plants.
Lastly, for FVA Snapshot construction, a power user has agency on the dimension order of analysis. For the Win Margin Time Series view, there is no drillability so the order doesn’t necessarily matter, but typically power users will want to reuse snapshots for as much as possible, which means general rules for dimension order can apply. As a general rule of thumb, the most important dimension is the first snapshot dimension, as it is the easiest to drill into to isolate performance comparisons on a single category. Then, I typically will structure the rest of the dimensions as follows:
Forecast Start Date(*) > OneStream Entity (typically plant or channel) > Region > Ship-To Customer > Part > Forecast Start Date(*)
If I don’t have a specific goal for my snapshot, I find it best to structure it similar to how the stakeholder’s OneStream Cube is setup, or along aggregation levels that they care about. In general, stakeholders will not analyze for a specific SKU, but rather unit volumes per entity or region, and then get into the customer or part level. Next, the forecast start date will either be the first dimension or the last dimension in my snapshots, depending on if I care about how the forecasts change over time or using the Win Margin differences per forecast version to identify hidden information that the stakeholders knows in their benchmark, but SensibleAI Forecast does not. Alternatively, if a power user has a specific goal to understand a category or dimension’s effect on accuracy, then dimension order should change to facilitate that analysis.
Here’s a suggested first dimension, depending on the primary goal of the analysis. Many times, the same FVA snapshot can be duplicated to repeat analysis but with different primary views.
First Snapshot Dimension | Purpose |
---|---|
Forecast Start Date | Get a single project-wide accuracy improvement value for each forecast (as if in production). Best used for benchmark comparison to report on how many forecasts SensibleAI Forecast won on or understand how forecast accuracy degrades or improves over time. |
Geographical Region or Channel | Understand how project settings affect a specific geography. Useful if location is a primary driving factor in demand volume. |
Ship-To Customer | Understand how project settings affect specific customers. Best used when the client identifies key clients that are of high import. |
Plant | Understand how project settings affect accuracy per manufacturing plants. Useful if analyzing constrained demand (affected by product factors) and when plants (typically the OneStream Entity) have different data collection and reporting practices that affect data length or quality. |
Item / SKU / Part Type | Not typically recommended, as generally there are too many types of units to visually analyze or isolate specific items. In addition, parts may be tied to a specific geographical region or plant, so it might be easier to break them out that way. |
Target Metadata | This will be a project-wide accuracy improvement value but for specific subsets of targets that might feed into project segmentation. As an example, one can identify how targets with 80+ data points compare to <30 data points across projects, potentially signaling segmentation techniques. |
Finally, when selecting the Win Margin view in the Visualize page, a power user must specify the Top Series, Bottom Series, and Metric. While in technical terms, which series is top and bottom doesn’t matter, but using the “complicated” or “of-interest” project as the top series, and the “simple” or “competitor” project as the bottom series can help with framing conclusive statements. Examples of the statements are “Including unemployment as a feature helps [subset of targets] and improves accuracy by X%” or “SensibleAI Forecast is X% better at predicting [subset of targets] than the customer benchmark.” The structure of the statements will always be in the form of:
[Top Series]
is[Win Margin %]
more accurate on[Metrics]
than[Bottom Series]
on[filter or drill view applied]
.
Generally, the selected metric should be based on the agreed-upon metric for performance validation. Many times, we propose Score % (a combination of Bias % and Mean Absolute Error %) as the performance metric, Therefore, our analysis should occur on the Absolute Error metric. However, there may be cases (especially if a Data Science team is involved) where they would care or prefer Mean Squared Error as the reported result. Always defer to the stakeholder, but Absolute Error is typically the preferred judgment tool.
Auxiliary Reports
After selecting the top and bottom series, and evaluation metric, the user must select the TS (Filter) tab to see the reports specific to this article. The screen is composed of six boxes. Starting from top left to bottom right, there is a filter selection box, the win margin and significance plots, a Win Margin % value, then a series error comparison with a rectangular pie chart, and finally a time series plot of the series compared to the actuals with a time slider.
The core functionality of this view depends on the filter parameters selected in the top left box. By default, all snapshot dimensions will be selected, and the first list of choices will be the first snapshot dimension.
In this sample picture, the user can choose between different forecast start dates, since that was the first snapshot dimension. Then, by expanding each forecast start date, the next snapshot dimension is visible. In this tree hierarchy, every target intersection for each forecast date can be selected. When different filter settings are applied, all other boxes will update to reflect the graphs or values to match the filter.
I will typically only adjust the filter settings if I care to isolate a specific forecast or high-level target group. If I do, I will also build my snapshot dimension hierarchy so that those dimensions are the top 2, so they can be easily selected from the filter settings.
Next, the win margin and significance plot is a combined plot of a bar chart showing the win margin (in the units of the chosen error metric) for each snapshot dimension intersection (typically target dimensions per forecast) above a line plot of the total units of actuals that intersection has. Since the significance line plot is a decreasing function, the targets in the win margin bar chart are in descending order. Scrolling your mouse over each bar reports the win margin value and the snapshot intersection concatenated into a single string with “~” in between dimensions. Depending on the display size and number of targets, it can be difficult to identify a single bar, so the user can scroll up or down with their mouse to zoom in or out on a particular section.
The value for each bar can be interpreted as the following:
For the
[Snapshot Intersection]
, the[Top Series]
configuration is[X]
[Metric]
units more accurate than the[Bottom Series]
configuration.
To the left of the significance plot, the Win Margin % value displays a normalized win margin value based on the specified error metric for the dimensions selected in the filter box. This value summarizes the entire target portfolio so it can be cleanly reported to the stakeholder. By filtering on each forecast start date, a power user can report on the accuracy improvement across the backtest period.
Below, two charts display the aggregate error units across the top and bottom series, and the proportion of intersections in which the top or bottom series win and tie. The error chart reinforces the win margin value by displaying the values used in calculating the win margin. It is likely more important to pay attention to the below rectangular pie chart rather than the error chart, as this directly reports the number of intersections for which each series wins or ties. When considering a single forecast, this is identical to the number of targets each series wins on. It is possible that the top series wins overall but on fewer targets than the bottom series. Especially when comparing to a stakeholder benchmark, it is critical that the SensibleAI Forecast forecast wins overall and wins on a plurality of targets. These numbers can also be reported to the stakeholder, transparently calling out the number of targets where SensibleAI Forecast beats or matches the stakeholder forecast.
Finally, the time series plot displays the aggregated actuals for each forecast and target selected. Coupled with a slider to select the start and stop time, the user can isolate periods for visual inspection. While this graph will never be the primary method of analysis in this view, it can be useful once certain targets or groups have been identified as poor performers, allowing the user to identify any anomalies or trends the predictions fail to model, rather than having to jump into a different FVA view to analyze.
Win Margin Analysis
How can an implementor use these tools to gain insight into project improvement? Analysis routes will differ depending on the series context, whether a SensibleAI Forecast configuration is being compared to another SensibleAI Forecast configuration or a client benchmark forecast. As a simplified rule, project iteration can follow the guiding question: “Which high-significance targets am I bad at forecasting for, and how can I improve these?”. Unfortunately, in the context of comparing projects together, you can’t necessarily identify the targets for which you are bad at forecasting, because win margin doesn’t directly convert to overall accuracy. Rather, using the Win Margin view for project comparison should revolve around identifying patterns around targets that different project configurations are better at forecasting. This way, future iterations can use these identified patterns to establish new features mapped to those targets or segmentation techniques to get the best of both projects in a single forecast. I recommend skipping to the Excel Analysis sections to learn how to perform this project comparison win margin analysis.
On the other hand, project-to-benchmark comparisons lend themselves to performing much analysis inside this view. Leveraging the win margin and significance plot, an implementor should note down a list of targets with high significance and high loss margins for future study on why the benchmark performs significantly better than SensibleAI Forecast. Perhaps this is due to a failure in SensibleAI Forecast model configuration, where incorporating additional events, features, or changing model settings can improve performance. Other times, this is due to benchmark forecasts having extra information that SensibleAI Forecast forecasts do not. Some examples that have been seen before include:
- Certain benchmark forecasts have been manipulated so that the first month matches actuals.
- Benchmark forecasts include adjustments from sales managers who know upcoming contracts.
- Forecasts include adjustments from plant managers who know about rush planning, scheduling overtime, and extended hours to fill a quota before a quarter-end or holiday period.
Identifying these targets and periods and holding discussions with the client on the root cause may identify new feature sets or adjust expectations on forecast performance given the anomalous behavior that occurs in planning that ML models can struggle to identify given their irregularity.
Finally, in a final executive readout of project performance, an implementor should document the overall win margin and Top Series Win count, as well as the win margin and target win count for each forecast individually. These can be used to show accuracy lift consistency and improvement across the entire target portfolio.
Extracting Data for Excel Analysis
When working with thousands of targets across multiple forecasts, it can be difficult to isolate and identify the specific targets with large loss margins. I find success in exporting the win margin graph to Excel. To do this, right-click on the Win Margin graph, and select export to Excel with the default export settings.
Opening the newly saved Excel file will show the data in a columnar format, with a Target (combining FVA dimensions into a single string), Win Margin value, and Significance row, with each column containing unique values. This isn’t a preferred format for working with data in Excel, since the end goal is to put this into a table so that we can filter for specific targets or sort by significance and win margin. Therefore, it is necessary to transpose this data. Select the entire dataset, copy, and special paste transposed (right click, paste special, transpose option). After this is complete, the column-wise data can be deleted, and the row-wise data can be converted into a table.
Finally, I like to extract the dimensions from the concatenated dimension string into their own respective columns. Typically, the dimension string is formatted in the following format: [Forecast Start Date]~[Target Dimension 1]~[Target Dimension 2]~
and so on. Therefore, the following Excel formulas can be used to extract the values between each respective tilde.
Extract Forecast Start Date (first dimension):
=DATEVALUE(LEFT(A2, FIND("~", A2) - 1))
Extract Target Dimension 1 (second dimension):
=MID(A2, FIND("~", A2) + 1, FIND("~", A2, FIND("~", A2) + 1) - FIND("~", A2) - 1)
Extract Target Dimension 2 (third dimension):
=MID(A2, FIND("~", A2, FIND("~", A2) + 1) + 1, FIND("~", A2, FIND("~", A2, FIND("~", A2) + 1) + 1) - FIND("~", A2, FIND("~", A2) + 1) - 1)
This can be extended for each additional dimension. In brief, each formula will search for the string of characters between the n-1th and n-th tilde in the string.
The Win Margin table is primed for analysis when the following columns exist:
Column Name | Meaning |
---|---|
Dimension ID | Direct extract from FVA. Concatenation of Snapshot Dimensions. |
Forecast Start Date | Generally, the string to the left of the first tilde. |
Target Dimension 1 | Generally, the string between the first and second tilde. |
... | Repeat the above extracts for as many non-blank dimensions. |
Win Margin | Direct extract from FVA. Generally units of Abstract Error. |
Significance | Direct extract from FVA. In units of the respective target. |
Excel Analysis - Project Comparison
Here is a walkthrough of how an implementor can analyze a comparison where the top and bottom series are two iterations of project experimentation. Typically, the two projects compared should be very similar, differing by one significant project parameter. A type of conclusion that can be reached by this comparison is “Should this particular parameter, feature, or event be mapped to be a specific subset of targets, rather than the entire project?” Since the two projects differ by only one feature, event, or parameter, the positive/negative win margin values can be directly translated to a boolean yes/no of whether that target should be run with that factor.
List of possible factors to analyze:
-
Which targets to run at weekly or monthly? Compare a weekly project (forecast resampled to monthly) to a monthly project.
-
Which targets to run grouped or ungrouped? Compare a grouped project to an ungrouped project.
-
Which targets are hurt by incorporating a feature? Compare a project with that feature to a project without that feature (identical to other settings).
-
Which targets are hurt by incorporating an event? Compare a project with that event to a project without that event (identical to other settings).
Let’s take grouping as a concrete example. When one is dealing with datasets that run with targets with vastly different time lengths (imagine an online segment with only 2 years of history combined with brick-and-mortar stores with 10+ years of history), some aspect of grouping can greatly help overall project accuracy. That said, grouping can diminish the accuracy for other targets. It may be prudent to segment one’s projects so that targets that benefit from grouping can run separately from targets that worsen with grouping. How can a power user identify how to segment these targets? With thousands of targets, a power user should not go through them one by one to sort them. Fortunately, general sorting rules can be established via Excel analysis of the Win Margin report.
With the win margin table of a grouping v. non-grouped project in Excel, the Win Margin column can be sorted from largest to smallest. In this order, every positive forecast start date - target intersection indicates targets that want to be grouped and negatives for targets that should remain ungrouped. A more optimal project will be a forecast overlay between the grouped and ungrouped projects, with each forecast chosen based on a positive/negative win margin. That said, this is a terrible solution to deliver to a stakeholder. Not only will SQL or C# code have to explicitly categorize thousands of targets, but it cannot robustly handle target additions. The goal of this analysis should be to find an underlying pattern between high-performing and low-performing targets to construct simple categorization rules.
With XLOOKUPs and other formulas, it can be trivial to add additional columns to the Win Margin tables to identify auxiliary data that may help categorization like earliest data start date, number of data points, or other target hierarchy factors not present in model building, like market segment.
Then, by sorting by win margin, a power user can look for repeated similarities with low-performing targets. Perhaps there’s a consistent sales region or market category in the lowest echelon of win margin. This way, projects can be segmented on that specific trait, something much easier and more likely to successfully accommodate target additions.
In general, I believe comparing win margins between projects is only useful if the goal is project segmentation. Otherwise, there are other FVA views that one should use to analyze project performance or forecast to actuals. Identifying the best parts of each project is only useful if the power user intends to use both projects and overlay them in some fashion.
An interesting idea I have is to synthetically create an “actuals” forecast, allowing the winning margin to describe absolute accuracy in relation to actuals. All win margins will be negative since the “Actuals Forecast” perfectly matches the actuals, but the view of significance vs. absolute accuracy (now identical to the win margin) can be extremely beneficial.
Now, a power user can repeat the above analysis to find trends between high-performing and low-performing targets in absolute terms, rather than relative to another project. One can use these trends to establish new features or events to incorporate, change data processing/collection, or create new grouping techniques.
Excel Analysis - Benchmark Comparison
I believe benchmark comparison is the best way to use the win margin view (my above note on creating a synthetic actuals forecast might be better, but I need to explore some more). Primarily, the guiding question is: “What information does the benchmark have that my models don’t?” This is especially relevant if you’re comparing against a consensus forecast that will include manual adjustments.
Clear tell-tale signs of the benchmark knowing extra information is when there are sharp changes in win margin between forecast start dates for a specific target. Imagine a situation when forecasting for shipped units. There’s to be a large spike in December sales due to a new order put in in September. As of January’s, April’s, and July’s forecast, the customer doesn’t know this is coming. However, come October’s forecast, they now know this large order was submitted, and while their forecast generator won’t expect anything out of the ordinary, they will manually adjust to match this new purchase order. Even if SensibleAI Forecast’s models are better than the normal customer’s forecast generator, it will show a large miss come the December spike. In the win margin table, we might see 3 forecasts where SensibleAI Forecast beats the benchmark, and then the final forecast shows a large loss. While power users cannot immediately attribute the reason for that loss, it can be brought up to the customer for clarification. Perhaps this opens the door to incorporating those human adjustments into the models via additional features.
Alternatively, perhaps the benchmark consistently outperforms in a specific target segment. The team in charge of that specific segment can be probed further to see what they do differently in forecasting and how they create such successful forecasts. In summary, any time the benchmark outperforms SensibleAI Forecast’s models indicates an opportunity for improvement, as the customer is likely doing something additional or different.
Finally, when dealing with thousands of targets, it isn’t feasible or scalable to identify and rectify every failing target. Emphasis should be placed on the highest significance targets (by price if available, by volume otherwise) with the largest loss margins to narrow the search.