TimeSeriesDataAnalysisRoutine
Versions
v1.0.0
Basic Information
Class Name: TimeSeriesDataAnalysisRoutine
Title: Time Series Data Analysis
Version: 1.0.0
Author: Ben DeGrieck, Evan Rasmussen
Organization: OneStream
Creation Date: 2024-08-01
Default Routine Memory Capacity: 2 GB
Tags
Time Series, Statistics, Information Retrieval, Time Series, Data Analysis, Exploratory Analysis, Data Visualization, Report Generation
Description
Short Description
Perform an analysis of a time series dataset.
Long Description
This routine is used to get important insights into a time series dataset. There are two different methods that can be run, both of which generate HTML reports that contain a vast array of statistics and visualizations. The generic analysis method makes use of the open-source YData Profiling library to generate the report, while the advanced analysis method generates a custom-built report that more directly caters to SensibleAI Forecast users.
Use Cases
1. SensibleAI Forecast Data Analysis
Implementation consultants undertake an exhaustive array of quality checks to validate the dataset’s integrity. The expedited generation of a comprehensive time series data analysis report including various target-level statistics and visuals could considerably abbreviate the implementation timeline. This report serves a dual purpose by not only speeding up the process to actionable insights, but also by fostering transparent communication with the client, eliminating the need for prolonged exchanges regarding data quality concerns. Some of the metrics collected for every target of the provided dataset include the percentages of missing and zero values, the date ranges relative to the global date range, the significance percentage of every target (how much of the total value is made up by an individual target), and the density of targets. Users also get a glimpse at several different plots including auto-correlation plots, partial auto-correlation plots, and decomposed time series plots. These plots could be instrumental in identifying patterns in the data. The routine also flags any duplicate intersections and special characters in the dataset, ensuring data integrity. Finally, warning flags are raised for any targets that have statistics that fall outside of what is considered internally to be an acceptable range or threshold.
2. Time Series Exploratory Data Analysis
This routine caters to a wide audience, ranging from data scientists to analysts to business professionals seeking to extract meaningful insights from time series data. The routine generates a comprehensive report including an array of visualizations accompanied by descriptions and instructions on how they may be interpreted. These visualizations may include a summary table highlighting the percentages of missing values, zero values, date ranges, density, and significance of all targets contained in the dataset. Additionally included are several time series decomposition plots, box-and-whisker plots, and auto-correlative plots which are instrumental in uncovering hidden patterns and trends in the data. The routine also flags any targets that have statistics falling outside of acceptable ranges, ensuring data integrity. The report enables users to not only quickly identify key statistics, but to also share these insights with other stakeholders in the form of an easily-digestible document. Users of this routine will quickly gain an intimate understanding of the dataset they feed into it.
Routine Methods
1. Advanced Analysis (Method)
- Method:
advanced_analysis-
Type: Method
-
Memory Capacity: 2.0 GB
-
Allow In-Memory Execution: No
-
Read Only: Yes
-
Method Limits: When the Include Target Level Plots input parameter is set to true, this method does not work with datasets containing more than 1,000 targets. In such cases, the routine run will likely timeout after 5 hours. The method is also likely to timeout when the Include Target Level Plots input parameter is set to false and datasets have 5K targets (400K rows) or more.
-
Outputs Dynamic Artifacts: No
-
Short Description:
- Generate a report to help the user better understand the dataset.
-
Detailed Description:
- This routine generates a custom, interactive web page of a comprehensive time series analysis. The report includes a filterable summary table of important metrics calculated for each target, several time series decomposition plots for each target, partial auto-correlation and auto-correlation plots, as well as warnings for targets whose calculated metrics fall outside the range of acceptable thresholds.
-
Inputs:
- Required Input
- Source Data Definition: The source data definition.
- Name:
source_data_definition - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be an instance of Time Series Source Data
- Nested Model: Time Series Source Data
- Required Input
- Connection: The connection to the source data.
- Name:
data_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be an instance of Tabular Connection
- Nested Model: Tabular Connection
- Required Input
- Connection: The connection type to use to access the source data.
- Name:
tabular_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be one of the following
- SQL Server Connection
- Required Input
- Database Resource: The name of the database resource to connect to.
- Name:
database_resource - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Name: The name of the database to connect to.
- Name:
database_name - Tooltip:
- Detail:
- Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Table Name: The name of the table to use.
- Name:
table_name - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Resource: The name of the database resource to connect to.
- Required Input
- MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Path: The full file path to the file to ingest.
- Name:
file_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- Partitioned MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Type: The type of files to read from the directory.
- Name:
file_type - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: FileExtensions_
- Name:
- Directory Path: The full directory path containing partitioned tabular files.
- Name:
directory_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- SQL Server Connection
- Name:
- Connection: The connection type to use to access the source data.
- Required Input
- Name:
- Dimension Columns: The columns to use as dimensions.
- Name:
dimension_columns - Tooltip:
- Validation Constraints:
- The input must have a minimum length of 1.
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: list[str]
- Name:
- Date Column: The column to use as the date.
- Name:
date_column - Tooltip:
- Detail:
- The date column must in a DateTime readable format.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Value Column: The column to use as the value.
- Name:
value_column - Tooltip:
- Detail:
- The value column must be a numeric (int, float, double, decimal, etc.) column.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Connection: The connection to the source data.
- Required Input
- Name:
- Include Target Level Plots: Whether to include target level plots in the final report.
- Name:
include_target_level_plots - Long Description: These plots will include Seasonal-Trend Decomposition and Box and Whisker plots showing aggregated value at various frequencies for each target. Setting this to False will still include these plots, but aggregated for the dataset as a whole. Note that the size of the final report may increase significantly if this option is enabled, depending on the number of targets within the dataset.
- Tooltip:
- Detail:
- Setting this to True may significantly increase the size of the final report.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: bool
- Name:
- Source Data Definition: The source data definition.
- Required Input
-
Artifacts:
- Advanced Time Series Report: A comprehensive report of the time series provided. This report includes summary metrics and visuals for all targets contained in the dataset as well as possible warning messages.
- Qualified Key Annotation:
advanced_report - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@advanced_report/data_/html_content.html- The html content.
- Qualified Key Annotation:
- Advanced Time Series Report: A comprehensive report of the time series provided. This report includes summary metrics and visuals for all targets contained in the dataset as well as possible warning messages.
-
2. Generic Analysis (Method)
- Method:
generic_analysis-
Type: Method
-
Memory Capacity: 2.0 GB
-
Allow In-Memory Execution: No
-
Read Only: Yes
-
Method Limits: This method is best with datasets containing 600K rows or less. The number of targets is not necessarily a factor with this method. For example, datasets with 5K targets, 8K targets, and 15K targets, each with 600K rows of data all complete successfully in about 5 minutes. However, with the same number of targets but 700K rows, each of these runs failed due to a memory error while creating the HTML report.
-
Outputs Dynamic Artifacts: No
-
Short Description:
- Generate a report to help the user better understand the dataset.
-
Detailed Description:
- This routine provides an interactive web page that contains a vast range of statistics and visualizations including high-level summaries about the dataset as a whole. Receive alerts about your data's stationarity, seasonality, distributions and more. This routine makes use of the <a href="https://docs.profiling.ydata.ai/latest/">YData Profiling</a> Python library to generate the report.
-
Inputs:
- Required Input
- Source Data Definition: The source data definition.
- Name:
source_data_definition - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be an instance of Time Series Source Data
- Nested Model: Time Series Source Data
- Required Input
- Connection: The connection to the source data.
- Name:
data_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be an instance of Tabular Connection
- Nested Model: Tabular Connection
- Required Input
- Connection: The connection type to use to access the source data.
- Name:
tabular_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be one of the following
- SQL Server Connection
- Required Input
- Database Resource: The name of the database resource to connect to.
- Name:
database_resource - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Name: The name of the database to connect to.
- Name:
database_name - Tooltip:
- Detail:
- Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Table Name: The name of the table to use.
- Name:
table_name - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Resource: The name of the database resource to connect to.
- Required Input
- MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Path: The full file path to the file to ingest.
- Name:
file_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- Partitioned MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Type: The type of files to read from the directory.
- Name:
file_type - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: FileExtensions_
- Name:
- Directory Path: The full directory path containing partitioned tabular files.
- Name:
directory_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- SQL Server Connection
- Name:
- Connection: The connection type to use to access the source data.
- Required Input
- Name:
- Dimension Columns: The columns to use as dimensions.
- Name:
dimension_columns - Tooltip:
- Validation Constraints:
- The input must have a minimum length of 1.
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: list[str]
- Name:
- Date Column: The column to use as the date.
- Name:
date_column - Tooltip:
- Detail:
- The date column must in a DateTime readable format.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Value Column: The column to use as the value.
- Name:
value_column - Tooltip:
- Detail:
- The value column must be a numeric (int, float, double, decimal, etc.) column.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Connection: The connection to the source data.
- Required Input
- Name:
- Title Name: The title to be included in the final report.
- Name:
title - Tooltip:
- Detail:
- This title will be displayed along the top of the HTML artifact file.
- Validation Constraints:
- The input must have a maximum length of 100.
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Correlation Table Enablement: Whether to include a correlation matrix of the dimensions in the final report.
- Name:
include_correlation_table - Tooltip:
- Detail:
- Setting this to True may have performance implications as this is an expensive computation.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: bool
- Name:
- Source Data Definition: The source data definition.
- Required Input
-
Artifacts:
- Generic Time Series Report: A comprehensive report of the time series provided. Report generated by third party Python library ydata_profiling.
- Qualified Key Annotation:
general_report - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@general_report/data_/html_content.html- The html content.
- Qualified Key Annotation:
- Generic Time Series Report: A comprehensive report of the time series provided. Report generated by third party Python library ydata_profiling.
-
Interface Definitions
No interface definitions found for this routine
Developer Docs
Routine Typename: TimeSeriesDataAnalysisRoutine
| Method Name | Artifact Keys |
|---|---|
advanced_analysis | advanced_report |
generic_analysis | general_report |