Skip to main content

Resample

Versions

v1.0.0

Basic Information

Class Name: Resample

Title: Frequency Resampler

Version: 1.0.0

Author: Patrick Kroppe

Organization: OneStream

Creation Date: 2024-02-27

Default Routine Memory Capacity: 2 GB

Tags

Data Transformation, Data Resampling, Time Series, Interpretability, Data Preprocessing

Description

Short Description

Frequency resampling for time series data.

Long Description

Resampling is a method to change the periodicity of a time series dataset. This can involve both upwards aggregations (Daily to Weekly, Weekly to Monthly, Daily to Monthly, etc.) and downwards allocations (Monthly to Daily, Weekly to Daily, etc.). Additionally, when running these aggregations, a user has the ability to select his or her summarization/aggregation method (sum, average, etc.). Quick and easy changes in periodicity enables a user to efficiently explore data trends and run modeling at different levels, which could result in different outcomes for forecast accuracy, insights, and modeling speed.

Use Cases

1. Resample for Data Exploration

Customer A is tackling a new sales forecasting use case to support its financial planning processes. The historical sales data is collected and stored at a daily granularity, but Customer A is looking to receive its final forecasts at a monthly view. Given that, in this case, SensibleAI Forecast can forecast at the daily level (with post-forecast aggregations), weekly level (with post-forecast aggregations), or monthly level, Customer A’s AI Services Administrator seeks to explore the trends and data forecastability at different frequential granularities to identify that which is most likely to generate the highest level of accuracy. To do this exploration, the AI Services Admin leverages the Resample routine to quickly run the required aggregations. This process enables a comprehensive comparison of forecast accuracy across different time frames, facilitating an informed decision on the optimal forecasting approach to adopt. By analyzing the variance and patterns within the aggregated data, the AI Services Administrator can discern the granularity that best captures the underlying sales trends and predict future sales more accurately. This tailored approach not only enhances the precision of the sales forecasts but also allows Customer A to adjust its financial planning and resource allocation more effectively. Incorporating external variables, such as seasonal trends, holiday impacts, and economic indicators, into the forecasting model at the chosen granularity further refines the accuracy of the predictions. The flexibility of SensibleAI Forecast, coupled with the Resample routine, empowers Customer A to adapt its forecasting methodology as new data becomes available or as its business environment evolves. This dynamic capability ensures that Customer A remains agile in its financial planning processes, optimizing operations and maximizing profitability. Moreover, the insights gained from this granular data analysis contribute to a deeper understanding of market dynamics and customer behavior, enabling strategic decisions that drive sustainable growth. Through this meticulous and adaptive approach to sales forecasting, Customer A sets a standard for leveraging advanced analytics in financial planning, paving the way for continuous improvement and competitive advantage.

2. Resample for Anomaly Detection

A company operates a large network of IoT devices that generate a vast amount of sensor data every minute, crucial for monitoring environmental conditions and equipment health. However, the sheer volume of data at such a fine granularity poses challenges for timely anomaly detection and analysis. To enhance the efficiency and responsiveness of its monitoring systems, Company B's Data Science team decides to apply anomaly detection algorithms at varying time granularities. By using the Resample routine, the team is able to aggregate the minute-level data into hourly and daily summaries. This not only significantly reduces the computational load but also helps in identifying broader trends and anomalies that may not be apparent at the minute level, thus improving the overall monitoring and maintenance strategies. This approach allows for more efficient data management and a clearer understanding of when and where anomalies occur, leading to faster and more accurate decision-making. Additionally, it enables predictive maintenance, potentially saving costs and preventing equipment failure. The strategic use of data aggregation and anomaly detection at different granularity thus significantly enhances Company B's operational efficiency and reliability.

Routine Methods

1. Resample (Method)
  • Method: resample
    • Type: Method

    • Memory Capacity: 2.0 GB

    • Allow In-Memory Execution: No

    • Read Only: Yes

    • Method Limits: This method is quite performant with large data volumes. This method has been tested on a daily dataset with 30K targets, 20M rows, and 6 columns of data resampling to both monthly and quarterly frequencies. These runs had a memory capacity set at 40GB and completed in just 5 minutes. Performance with less granular data (monthly or quarterly) is generally better as the amount of data is often less. This method supports data volumes substantially larger than 20M rows and 6 columns given a sufficient memory allocation. The main bottleneck is loading the original data into memory from the source location.

    • Outputs Dynamic Artifacts: No

    • Short Description:

      • Resample the provided input data.
    • Detailed Description:

      • Run a resample routine against the provided input data to change the frequency.
    • Inputs:

      • Required Input
        • Source Connection: The connection information source data.
          • Name: data_connection
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: Must be an instance of Tabular Connection
          • Nested Model: Tabular Connection
            • Required Input
              • Connection: The connection type to use to access the source data.
                • Name: tabular_connection
                • Tooltip:
                  • Validation Constraints:
                    • This input may be subject to other validation constraints at runtime.
                • Type: Must be one of the following
                  • SQL Server Connection
                    • Required Input
                      • Database Resource: The name of the database resource to connect to.
                        • Name: database_resource
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: str
                      • Database Name: The name of the database to connect to.
                        • Name: database_name
                        • Tooltip:
                          • Detail:
                            • Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: str
                      • Table Name: The name of the table to use.
                        • Name: table_name
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: str
                  • MetaFileSystem Connection
                    • Required Input
                      • Connection Key: The MetaFileSystem connection key.
                        • Name: connection_key
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: MetaFileSystemConnectionKey
                      • File Path: The full file path to the file to ingest.
                        • Name: file_path
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: str
                  • Partitioned MetaFileSystem Connection
                    • Required Input
                      • Connection Key: The MetaFileSystem connection key.
                        • Name: connection_key
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: MetaFileSystemConnectionKey
                      • File Type: The type of files to read from the directory.
                        • Name: file_type
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: FileExtensions_
                      • Directory Path: The full directory path containing partitioned tabular files.
                        • Name: directory_path
                        • Tooltip:
                          • Validation Constraints:
                            • This input may be subject to other validation constraints at runtime.
                        • Type: str
        • Source Frequency: The frequency of the source data.
          • Name: source_frequency
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: ResampleFrequencies_
        • Destination Frequency: The expected output frequency of the resampled data.
          • Name: destination_frequency
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: ResampleFrequencies_
        • Key Columns: The columns to group by when resampling the data.
          • Name: key_columns
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: list[str]
        • Date Column: The column containing the dates.
          • Name: date_column
          • Tooltip:
            • Detail:
              • The frequency of the dates for a given key column grouping should match the source frequency.
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: str
        • Aggregation and Value Column Setup: The column to perform aggregation on and the type of aggregation to use. The same value column can be aggregated using multiple functions, creating more columns (ex. Value_sum, Value_mean).
          • Name: aggregation_value_pair
          • Tooltip:
            • Validation Constraints:
              • This input may be subject to other validation constraints at runtime.
          • Type: list[AggtypeValuePair]
    • Artifacts:

      • Resampled Data: The data with newly resampled and aggregated value columns.
        • Qualified Key Annotation: resampled_data
        • Aggregate Artifact: False
        • In-Memory Json Accessible: False
        • File Annotations:
          • artifacts_/@resampled_data/data_/data_<int>.parquet
            • A partitioned set of parquet files where each file will have no more than 1000000 rows.

Interface Definitions

No interface definitions found for this routine

Developer Docs

Routine Typename: Resample

Method NameArtifact Keys
resampleresampled_data

Was this page helpful?