Skip to main content
Author: Ansley Wunderlich, Created: 2025-07-31

The purpose of this article is to dive deeper into 11 routines in SensibleAI Studio to get a better look at the advantages of each routine and where each provides the most value to users.

Aggregate Data

Overview

The Aggregate Data routine allows users to aggregate data based on specified columns and aggregation methods. It supports a variety of aggregation types, including:

  • Sum: Total values for specified groups
  • Mean: Average values within groups
  • Min: The smallest value in the group
  • Max: The largest value in the group

This routine is particularly useful for time series data and can handle multiple columns, making it a versatile option for data analysts.

Key Features

Feature

Description

Flexible Grouping

Group by user-specified fields

Multiple Aggregations

Supports various aggregation types for different columns

Easy Input Management

Options to continue, add, or modify aggregations

Efficient Memory Use

Default capacity of 2.0 GB

Use Cases of Aggregate Data

Aggregated Dimension Insights

A national retail chain with hundreds of stores generates extensive transaction data. To enhance performance insights, the chain employs the Aggregate Data routine to:

  1. Group Data by Regions: Aggregate sales and transaction data to understand performance across various locations.
  2. Generate Summary Statistics: Calculate totals, averages, and counts to identify high and low-performing stores.

By analyzing this aggregated data, the retail chain can develop targeted strategies for improvement and leverage advanced analytics, including machine learning, to predict trends and optimize inventory. This approach not only streamlines data management but also fosters a culture of continuous improvement.

Time Series Trend Analysis

To analyze consumer behavior over time, the retail chain uses the Aggregate Data routine to:

  • Analyze Sales Trends: Aggregate sales data across specific time intervals – daily, monthly, quarterly, or annually – to observe trends.
  • Adjust Marketing Strategies: Quickly adapt to market demands based on real-time insights from aggregated data.

Menu Item Sales Analysis for Restaurants

As a consultant for a restaurant chain, you can help stakeholders understand their sales data better by aggregating menu item sales. By utilizing the Aggregate Data routine, you can:

  • Calculate Average Sales: Determine the average sales for each category of menu item at individual locations.
  • Inform Future Forecasting: Use aggregated data to identify trends and inform future inventory and marketing strategies.

Routine Method Overview

Description

The Aggregate Data routine enables users to run aggregation routines by grouping specified fields and selecting aggregation types. Here’s how it works:

Input Requirement

Description

Source Connection

Connection information for the data source (must be a TabularConnection)

Columns to Group

Specify which columns to group by

Aggregation Step Input

Options to continue, add another column, or modify previous inputs

Output

The output of the Aggregate Data routine provides aggregated data based on user specifications, facilitating insightful analysis.

Summary of Benefits

Benefit

Description

Streamlined Data Management

Reduces effort in data aggregation tasks

Actionable Insights

Provides clarity for strategic decision-making

Enhanced Predictive Analytics

Supports advanced forecasting and trend analysis

Stakeholder-Centric Approach

Improves understanding of consumer behavior

Forecast Allocation

Overview

The Forecast Allocation routine expands on forecast outputs by allowing users to approximate sales at a granular level. By using historical datasets alongside forecasts, businesses can allocate predicted sales to individual products or stores.

Use Cases

  1. Products Within Stores: When forecasting overall sales across multiple stores, this routine helps estimate sales for individual products, even those not included in the original forecast. It requires setting dimension columns to match historical data and forecasting targets.
  2. Stores Within Regions by Month: For forecasts predicting sales across regions, the routine can provide detailed forecasts for individual stores within those regions, accounting for monthly sales variations.
  3. Large Scale Forecasting: When stakeholders need forecasts for a large number of targets, the Forecast Allocation routine can help scale down the forecast to manageable levels, allocating values based on historical averages.

Routine Method Overview

Input Requirements

Users must provide historical data, define allocation and dimension columns, and specify date and value columns.

Output

The routine generates an allocation dataset reflecting the applied forecast.


Frequency Resampler

Overview

The Frequency Resampler is designed for time series data, allowing users to change the periodicity of their datasets. This can involve both upward aggregations (e.g., daily to weekly) and downward allocations (e.g., monthly to daily). The routine supports various summarization methods, such as sum and average, enabling efficient exploration of data trends.

Key Features

  • Flexibility in Periodicity: Users can quickly resample data to different frequencies, facilitating various modeling scenarios.
  • Aggregation Methods: Users can choose from multiple aggregation techniques to best fit their data analysis needs.

Use Cases

  • Data Exploration: For businesses like Company A, the Resampler allows exploration of historical sales data at different granularities (daily, weekly, monthly) to optimize forecasting accuracy.
  • Anomaly Detection: Companies, such as Company B, can aggregate high-frequency IoT data into hourly or daily summaries to enhance anomaly detection capabilities.
  • Preprocessing for SensibleAI Forecast: Consultants can resample data before loading it into SensibleAI Forecast to ensure the accuracy of predictions at the desired granularity.

Routine Method Overview

The Resample routine requires various inputs:

  • Connection Type: TabularConnection, SQLTabularConnection, etc.
  • Frequency Specifications: Source and destination frequencies (e.g., daily to monthly).
  • Key Columns: Columns used as keys for the resampling process.

Output

The routine generates a resampled dataset that can be used for further analysis.


Kalman Filter V2

Overview

The Kalman Filter V2 excels at cleansing time series data by predicting and correcting estimates based on noisy measurements. It updates predictions iteratively, filtering out noise and revealing underlying trends.

Key Features

  • Noise Reduction: The filter balances predicted states against new measurements, enhancing the accuracy of time series data.
  • Dynamic Updates: It adapts continuously as new data arrives, making it ideal for dynamic environments like finance.

Use Cases

  • Handling Missing Data: The Kalman Filter is instrumental for businesses experiencing data gaps due to system outages or maintenance, ensuring continuity in data analysis.
  • Dealing with Anomalies: During events like the COVID-19 pandemic, the filter can identify and remove outliers from datasets, improving forecasting models.
  • Cleansing Time Series Data: It effectively corrects point-based anomalies, ensuring data integrity and reliability for predictive modeling.

Routine Methodology

The Kalman Filter V2 requires:

  • Configuration Method: Automatic or manual optimization of hyperparameters.
  • Connection Type: Similar to the Resample routine, it uses various connection types.
  • Dimension Columns: Specifies the columns used for filtering and cleansing.

Output

The routine provides cleansed data, including original and filtered values.


Model Forecast Stage

Overview

The Model Forecast Stage is designed to transform traditional forecasting tables from Sensible ML into a format suitable for ingestion into Forecast Value Add (FVA) dashboards. This routine simplifies the selection of top-performing models for each business target prediction, allowing users to focus on the most reliable forecasts.

Key Use Cases

  • Cascading Stage Best ML Models:

    • Scenario: A user updates their predictions and wants to filter for the best-ranked model per target.
    • Process: The user specifies a hierarchy for model selection: Best ML, Best Intelligent, Best, and Best Baseline. The routine trims forecast ranges to match actuals and avoids overlapping forecasts.
    • Outcome: A refined table comparing SensibleAI Forecast predictions against stakeholder benchmarks.
  • Backtest Model Forecast:

    • Scenario: A consultant experiments with various project configurations and needs to evaluate their performance.
    • Process: The routine filters Backtest Model Forecast (BMF) tables, selecting top models based on specified criteria.
    • Outcome: Multiple FVA tables that feed into a Forecast Snapshot dashboard for direct comparison.
  • Implementation Comparisons:

    • Scenario: Consultants must provide clear comparisons between forecasts generated by SensibleAI Forecast and stakeholder forecasts.
    • Outcome: A streamlined process for selecting models that enhances clarity and insight during engagements.

Input and Output Specifications

Input Component

Description

Source Connection

Connection details for accessing the source data (Tabular Connection)

Configure Convert Types

Select hierarchical transformations for the model forecast table

Overlapped Forecasts Handling

Options for managing overlapping forecasts: Use Latest, Use Oldest, No Merge

Forecast Bounds Handling

Options for trimming forecast values relative to actual values

Actuals Handling

Options for managing actuals from the DMF table: Remove, Copy per Version

Output

A staged data table with hierarchical selections of top-ranking models, ready for FVA analysis.

Example Output Schema

Column Name

Data Type

Is Nullable

Model

String

False

TargetName

String

False

Value

Float64

False

Date

DateTime

False

ModelRank

Int64

False

PredictionCallID

Object

False

...

...

...


Numeric Data Fill

Overview

The Numeric Data Fill routine addresses the challenge of null values in datasets, ensuring that analysis and machine learning models are based on complete data. This routine offers various strategies to fill missing values, thus enhancing data integrity.

Key Features

  • Filling Strategies:

    • Options include filling with zero, mean, median, mode, min, max, custom values, forward fill, and backward fill.
    • Forward and backward fills leverage the last known values for matching dimensions, adding contextual relevance.
  • Use Cases:

    • Scenario: A dataset has records but contains null values that could skew analysis.
    • Implementation: Users can choose an appropriate fill strategy based on the nature of the data.

Input and Output Specifications

Input Component

Description

Source Data Definition

Connection details and specifications for the source data (TimeSeriesTable)

Dimension Columns

Columns used as dimensions for filling

Date Column

Column representing the date

Value Column

Column containing the values to fill

Data Fill Definition

Specifies fill strategies for columns

Output

A data table where missing values have been filled according to the specified strategies.


Prediction Simulator

Overview

The Prediction Simulator allows users to manage and execute multiple data jobs on any project that has passed the data load step in SensibleAI Forecast. It replaces the traditional SIM solution, providing a streamlined process for running jobs such as pipeline, deploy, prediction, model rebuild, and project copy. Importantly, users can upload all necessary source data, and the simulator handles updates based on user-defined dates.

Key Features

  • Automated Job Management: Users can schedule jobs to run in a specific order, reducing the risk of projects sitting idle.
  • User-Friendly Scheduling: Allows running multiple jobs overnight without needing to monitor them actively.

Use Cases

  • Busy Consultants: Ideal for consultants juggling multiple projects across various environments, enabling efficient job management.
  • Overnight Processing: Users can execute a series of jobs overnight, ensuring all tasks complete by morning.

Routine Methods

Method

Description

Memory Capacity

Simulator

Simulates and runs a specified list of tasks in order

2.0 GB

Constructor

Initializes the prediction simulator routine

0 GB


Principal Component Analysis

Overview

PCA is a statistical technique used for dimensionality reduction, data compression, and feature extraction. It identifies the principal components that capture the most variance in the data, simplifying complex datasets while retaining essential information.

Key Features

  • Dimensionality Reduction: Reduces complexity by transforming datasets into principal components.
  • Enhanced Visualization: Makes it easier to analyze and visualize high-dimensional data.

Use Cases

  • Anomaly Detection: Identifies unusual patterns in transaction data, aiding in fraud detection.
  • Forecasting: Simplifies forecasting models by identifying significant components from various features.

Routine Methods

Method

Description

Memory Capacity

Run PCA

Preprocesses and runs PCA on the input dataset.

2.0 GB


Replace Special Characters

Overview

This routine focuses on cleansing datasets by identifying and replacing special characters based on a defined schema. It allows users to target specific columns, ensuring data consistency and validity.

Key Features

  • Customizable Cleansing: Users can define multiple find-and-replace operations for various columns.
  • Improved Data Quality: Ensures data is clean and ready for analysis, reducing errors in subsequent processing.

Use Cases

  • Data Standardization: Helps standardize entity identifiers in datasets for accurate forecasting and analysis.
  • Error Prevention: Cleanses unrecognized characters that could cause errors during data ingestion.

Routine Methods

Method

Description

Memory Capacity

Cleanse Data

Finds and replaces special characters based on user input.

2.0 GB


Target Flagging Analysis

Overview

In data analytics, understanding metrics is crucial for making informed decisions. The Target Flagging Analysis (Stateless) routine is designed to calculate various metrics based on both source data and model forecast data. This routine allows users to identify key performance indicators, assess forecast accuracy, and flag potential issues within target dimensions. In this article, we will explore the functionalities, use cases, and different routines associated with this analysis.

The Target Flagging Analysis routine can generate a variety of metrics, enabling users to evaluate data quality and forecast accuracy. Here’s a breakdown of its key components:

Key Metrics Generated

Metric

Source Data

Model Forecast Data

Actuals Summation

✔️

Target Start/End Date

✔️

✔️

Collection Lag Days/Periods

✔️

Start Up Lag Days/Periods

✔️

IsForecastable

✔️

Local Density

✔️

Global Density

✔️

Mean Absolute Error (MAE)

✔️

Root Mean Squared Error

✔️

Bias Error

✔️

Growth Rate

✔️

Visual Outputs

The routine also generates plots comparing actual values against MAE% and Score%. These visualizations aid in identifying areas needing attention.

Use Cases

The routine supports multiple use cases, each tailored to specific user needs. Here’s a summary of each:

  1. Generate All Metrics Without Flags

This routine creates metrics tables for source and model forecast data, helping users understand the overall data landscape without flagging any issues.

Output:

  • Metrics tables for both source data and forecast data
  • MAE vs Actuals and Score vs Actuals plots
  1. Generate Source Metrics Without Flags

This is ideal for data analysts interested in insights from source time series data alone.

Output:

  • Source data metrics table
  • Excludes plots and flagging
  1. Generate Forecast Metrics Without Flags

This routine focuses on the forecast data, providing insights into its accuracy and quality.

Output:

  • Forecast metrics table
  • MAE vs Actuals and Score vs Actuals plots
  1. All Metrics Analysis

As an implementation consultant, this routine assists in evaluating forecasts across all targets, helping to pinpoint inaccuracies caused by limited data points.

Output:

Combined metrics table for source and forecast data

Detailed Routine Methods

The routine consists of three primary methods, each serving a unique purpose. Below is a table summarizing their functionalities:

Routine Method

Description

Target Flagging

HTML Output

Source Metrics Analysis

Generates source metrics without flagging

No

Interactive report

Forecast Metrics Analysis

Generates forecast metrics without flagging

No

Interactive report

All Metrics Analysis

Combines source and forecast metrics without flagging

No

Comprehensive report

Input Requirements

Each method has specific input requirements, such as data connection types and dimension specifications. Common inputs include:

Source Data Definition: Must be a TimeSeriesTableDefinition

Connection: Can be SQLTabularConnection, FileTabularConnection, etc.

Dimension Columns, Date Column, Value Column: Specify the relevant columns

Output Formats

The outputs can be generated in various formats, including HTML reports and Parquet files, ensuring flexibility for users in analyzing their data.


Time Series Data Analysis

In the realm of data analytics, time series analysis plays a crucial role, particularly for businesses that track data across various targets. This article explores the Time Series Data Analysis routine, designed to enhance understanding and streamline insights from time series datasets.

Use Cases

  1. Implementation Insights for Retail

As an implementation consultant, you may work with a retail organization that tracks daily metrics across multiple targets. Identifying the underlying trends can be challenging, especially if this analysis is new ground for both you and the stakeholder. The Time Series Data Analysis routine lets you quickly generate insights and share findings, deepening everyone’s understanding before you move on to the next phases of implementation.

  1. Quality Assurance in SensibleAI Forecast Projects

Quality checks are critical in validating dataset integrity. A comprehensive time series analysis report can significantly reduce implementation timelines. By generating target-level statistics and visuals, the routine not only expedites the process of deriving actionable insights but also promotes transparent communication with the stakeholder regarding data quality.

  1. Exploratory Data Analysis

This routine serves a diverse audience—data scientists, analysts, and business professionals alike—helping them extract meaningful insights from time series data. The comprehensive report it generates includes visualizations and interpretations, facilitating a deeper understanding of the dataset.

Routine Methods

Overview of Routine Methods

The Time Series Data Analysis routine offers two primary methods: Generic Analysis and Advanced Analysis. Each generates an HTML report rich in statistics and visualizations but caters to different needs.

Routine Method

Description

Key Features

Generic Analysis

Basic interactive report using YData Profiling

High-level dataset summaries, alerts on stationarity, seasonality, distributions

Advanced Analysis

Comprehensive custom report

Filterable metrics summary, detailed visualizations, target-level plots

Generic Analysis

The Generic Analysis method employs the open-source YData Profiling library to generate a report that includes:

Alerts about data characteristics like stationarity and seasonality.

A correlation matrix (optional) to help visualize relationships between dimensions.

Required Inputs:

  • Source Data Definition
  • Connection to source data
  • Dimension, date, and value columns
  • Title for the report

Output:

Generic Time Series Report in HTML format, providing an overview of the dataset.

Advanced Analysis

The Advanced Analysis method goes further by creating a more tailored report that includes:

  • A filterable summary table with key metrics
  • Time series decomposition plots
  • Auto-correlation and partial auto-correlation plots

Required Inputs:

Similar to the Generic Analysis, but also includes options for target-level plots

Output:

An Advanced Time Series Report in HTML format, rich with detailed analysis.

Detailed Comparison of Routine Methods

To better illustrate the differences between the two routine methods, here’s a summary table highlighting their features:

Feature

Generic Analysis

Advanced Analysis

Utilizes YData Profiling

Yes

No

Customization

Limited

Extensive

Summary Statistics

Yes

Yes

Time Series Decomposition Plots

No

Yes

Auto-Correlation Plots

Yes

Yes

Warning Flags for Metrics

Yes

Yes

Filterable Summary Table

No

Yes

Correlation Matrix

Optional

Not available


Comparative Analysis of SensibleAI Studio Routines

The following table summarizes key features, input types, and memory capacities for each routine, providing a quick reference for users:

Routine

Purpose

Input Types

Key Outputs

Memory Capacity

Aggregate Data

Data consolidation and summarization

Various data sources

Unified data views

2.0 GB

Forecast Allocation

Distributes forecasted values

Tabular data

Allocated forecast data

2.0 GB

Frequency Resampler

Resamples time series data

TimeSeriesTable

Resampled data

2.0 GB

Kalman Filter V2

Refines forecasts with filtering

Time series data

Smoothed forecasts

2.0 GB

Model Forecast Stage

Prepares data for FVA analysis

Tabular connection

Staged forecast data

3.0 GB

Numeric Data Fill

Fills missing values

TimeSeriesTableDefinition

Complete datasets

2.0 GB

Prediction Simulator

Automates job execution

Project data

Scheduled job reports

2.0 GB

Principal Component Analysis

Reduces dimensionality

Data matrices

Principal components and visualizations

2.0 GB

Replace Special Characters

Cleanses data of special characters

Tabular data

Cleansed datasets

2.0 GB

Target Flagging Analysis

Evaluates performance metrics

TimeSeriesTable

Metrics reports with visual outputs

2.0 GB

Time Series Data Analysis

Analyzes time series data

TimeSeriesTable

Detailed analysis reports

2.0 GB

Was this page helpful?