Resample
Versions
v1.0.0
Basic Information
Class Name: Resample
Title: Frequency Resampler
Version: 1.0.0
Author: Patrick Kroppe
Organization: OneStream
Creation Date: 2024-02-27
Default Routine Memory Capacity: 2 GB
Tags
Data Transformation, Data Resampling, Time Series, Interpretability, Data Preprocessing
Description
Short Description
Frequency resampling for time series data.
Long Description
Resampling is a method to change the periodicity of a time series dataset. This can involve both upwards aggregations (Daily to Weekly, Weekly to Monthly, Daily to Monthly, etc.) and downwards allocations (Monthly to Daily, Weekly to Daily, etc.). Additionally, when running these aggregations, a user has the ability to select his or her summarization/aggregation method (sum, average, etc.). Quick and easy changes in periodicity enables a user to efficiently explore data trends and run modeling at different levels, which could result in different outcomes for forecast accuracy, insights, and modeling speed.
Use Cases
1. Resample for Data Exploration
Customer A is tackling a new sales forecasting use case to support its financial planning processes. The historical sales data is collected and stored at a daily granularity, but Customer A is looking to receive its final forecasts at a monthly view. Given that, in this case, SensibleAI Forecast can forecast at the daily level (with post-forecast aggregations), weekly level (with post-forecast aggregations), or monthly level, Customer A’s AI Services Administrator seeks to explore the trends and data forecastability at different frequential granularities to identify that which is most likely to generate the highest level of accuracy. To do this exploration, the AI Services Admin leverages the Resample routine to quickly run the required aggregations. This process enables a comprehensive comparison of forecast accuracy across different time frames, facilitating an informed decision on the optimal forecasting approach to adopt. By analyzing the variance and patterns within the aggregated data, the AI Services Administrator can discern the granularity that best captures the underlying sales trends and predict future sales more accurately. This tailored approach not only enhances the precision of the sales forecasts but also allows Customer A to adjust its financial planning and resource allocation more effectively. Incorporating external variables, such as seasonal trends, holiday impacts, and economic indicators, into the forecasting model at the chosen granularity further refines the accuracy of the predictions. The flexibility of SensibleAI Forecast, coupled with the Resample routine, empowers Customer A to adapt its forecasting methodology as new data becomes available or as its business environment evolves. This dynamic capability ensures that Customer A remains agile in its financial planning processes, optimizing operations and maximizing profitability. Moreover, the insights gained from this granular data analysis contribute to a deeper understanding of market dynamics and customer behavior, enabling strategic decisions that drive sustainable growth. Through this meticulous and adaptive approach to sales forecasting, Customer A sets a standard for leveraging advanced analytics in financial planning, paving the way for continuous improvement and competitive advantage.
2. Resample for Anomaly Detection
A company operates a large network of IoT devices that generate a vast amount of sensor data every minute, crucial for monitoring environmental conditions and equipment health. However, the sheer volume of data at such a fine granularity poses challenges for timely anomaly detection and analysis. To enhance the efficiency and responsiveness of its monitoring systems, Company B's Data Science team decides to apply anomaly detection algorithms at varying time granularities. By using the Resample routine, the team is able to aggregate the minute-level data into hourly and daily summaries. This not only significantly reduces the computational load but also helps in identifying broader trends and anomalies that may not be apparent at the minute level, thus improving the overall monitoring and maintenance strategies. This approach allows for more efficient data management and a clearer understanding of when and where anomalies occur, leading to faster and more accurate decision-making. Additionally, it enables predictive maintenance, potentially saving costs and preventing equipment failure. The strategic use of data aggregation and anomaly detection at different granularity thus significantly enhances Company B's operational efficiency and reliability.
Routine Methods
1. Resample (Method)
- Method:
resample-
Type: Method
-
Memory Capacity: 2.0 GB
-
Allow In-Memory Execution: No
-
Read Only: Yes
-
Method Limits: This method is quite performant with large data volumes. This method has been tested on a daily dataset with 30K targets, 20M rows, and 6 columns of data resampling to both monthly and quarterly frequencies. These runs had a memory capacity set at 40GB and completed in just 5 minutes. Performance with less granular data (monthly or quarterly) is generally better as the amount of data is often less. This method supports data volumes substantially larger than 20M rows and 6 columns given a sufficient memory allocation. The main bottleneck is loading the original data into memory from the source location.
-
Outputs Dynamic Artifacts: No
-
Short Description:
- Resample the provided input data.
-
Detailed Description:
- Run a resample routine against the provided input data to change the frequency.
-
Inputs:
- Required Input
- Source Connection: The connection information source data.
- Name:
data_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be an instance of Tabular Connection
- Nested Model: Tabular Connection
- Required Input
- Connection: The connection type to use to access the source data.
- Name:
tabular_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be one of the following
- SQL Server Connection
- Required Input
- Database Resource: The name of the database resource to connect to.
- Name:
database_resource - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Name: The name of the database to connect to.
- Name:
database_name - Tooltip:
- Detail:
- Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Table Name: The name of the table to use.
- Name:
table_name - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Resource: The name of the database resource to connect to.
- Required Input
- MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Path: The full file path to the file to ingest.
- Name:
file_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- Partitioned MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Type: The type of files to read from the directory.
- Name:
file_type - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: FileExtensions_
- Name:
- Directory Path: The full directory path containing partitioned tabular files.
- Name:
directory_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- SQL Server Connection
- Name:
- Connection: The connection type to use to access the source data.
- Required Input
- Name:
- Source Frequency: The frequency of the source data.
- Name:
source_frequency - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: ResampleFrequencies_
- Name:
- Destination Frequency: The expected output frequency of the resampled data.
- Name:
destination_frequency - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: ResampleFrequencies_
- Name:
- Key Columns: The columns to group by when resampling the data.
- Name:
key_columns - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: list[str]
- Name:
- Date Column: The column containing the dates.
- Name:
date_column - Tooltip:
- Detail:
- The frequency of the dates for a given key column grouping should match the source frequency.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Aggregation and Value Column Setup: The column to perform aggregation on and the type of aggregation to use. The same value column can be aggregated using multiple functions, creating more columns (ex. Value_sum, Value_mean).
- Name:
aggregation_value_pair - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: list[AggtypeValuePair]
- Name:
- Source Connection: The connection information source data.
- Required Input
-
Artifacts:
- Resampled Data: The data with newly resampled and aggregated value columns.
- Qualified Key Annotation:
resampled_data - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@resampled_data/data_/data_<int>.parquet- A partitioned set of parquet files where each file will have no more than 1000000 rows.
- Qualified Key Annotation:
- Resampled Data: The data with newly resampled and aggregated value columns.
-
Interface Definitions
No interface definitions found for this routine
Developer Docs
Routine Typename: Resample
| Method Name | Artifact Keys |
|---|---|
resample | resampled_data |