RPETargetSubsampler
Versions
v1.0.0
Basic Information
Class Name: RPETargetSubsampler
Title: Target Sub-Sampler
Version: 1.0.0
Author: Christian Reyes-Avina
Organization: OneStream
Creation Date: 2025-03-19
Default Routine Memory Capacity: 2.0 GB
Tags
Sensible AI Forecast, Data Cleansing, Data Preprocessing, Clustering
Description
Short Description
A routine to sub-sample targets from a dataset using various sampling methods.
Long Description
This routine provides methods to sub-sample targets from a dataset using various sampling methods, such as dynamic time warping, semi-random sampling, and significance breakdown sampling. In dynamic time warping, the routine uses hierarchical clustering to group similar time series data and selects medoids from each cluster. In semi-random sampling, the routine randomly selects targets from the dataset based on user-defined dimensions and values. In significance breakdown sampling, the routine selects targets based on the significance of their aggregated values. This routine supports flexible dimension grouping, and dynamic control over how many targets to retain — either via a fixed count or a percentile-based threshold. By leveraging this sub-sampling routine, users can ensure that their models train on the most informative and varied subsets of time series data.
Use Cases
1. Sub-Sample Targets from a large dataset.
In large-scale time series datasets, analyzing or modeling every individual time series can be computationally expensive and often is unnecessary. This routine enables users to intelligently downsample their dataset by identifying the most representative time series targets using Dynamic Time Warping (DTW), Significance Breakdown Sampling, or Semi-Random Sampling. For example, a dataset with daily sales from 2,000 stores across 12 countries can be reduced to 200 stores that capture the core behavioral patterns across geographies and store types or other dimensions. This is particularly useful for model prototyping, time series forecasting, or simulation scenarios where training time, cost, or complexity needs to be minimized without compromising data representativeness.
Routine Methods
1. Dynamic Time Warping (Method)
- Method:
dynamic_time_warping-
Type: Method
-
Memory Capacity: 2.0 GB
-
Allow In-Memory Execution: No
-
Read Only: No
-
Method Limits: Dynamic time warping is computationally expensive, yielding significantly longer runtimes when compared to the other methods on this routine. On a dataset with 8K targets, 1.5M rows, and 6 columns, this method completed in about 45 minutes with 35GB of memory allocated. However, on a dataset with 15K targets, 7.5M rows, and 6 columns, the routine run timed out after 5 hours with no updates. This method scales quadratically, which means the cost grows very quickly as more targets are added.
-
Outputs Dynamic Artifacts: No
-
Short Description:
- Selects most representative targets using Dynamic Time Warping (DTW) and hierarchical clustering.
-
Detailed Description:
- This method performs sub-selection by analyzing the similarity between time series across different targets. It begins by pivoting the dataset into a time series matrix based on the specified dimensions, filling any missing values using the provided fill method. The resulting time series are then standardized to ensure consistent scale. Next, the method computes a DTW distance matrix representing pairwise similarity between all time series. Hierarchical clustering is applied to this matrix to group similar time series together. The number of clusters can be determined in three ways: - If neither the number of clusters nor the percentile is specified, the method uses the elbow method on intra-cluster distances to determine the optimal number of clusters. - If the number of clusters is specified, the method will generate up to that number of clusters. - If a percentile is specified, the number of clusters is derived as a percentage of the total targets (minimum 2). Within each cluster, the medoid is selected as the most representative target—defined as the time series with the lowest total DTW distance to others in its cluster. The method returns: - A filtered dataset containing only the selected medoid targets. - The full dataset with two new columns: -
target_is_in_subselection: indicating if a target is part of the selected subset. -cluster: the cluster assignment for each target. - A list of the selected target names. The routine also handles cases where the specified dimension(s) generate a pivot table with only one target column,
- This method performs sub-selection by analyzing the similarity between time series across different targets. It begins by pivoting the dataset into a time series matrix based on the specified dimensions, filling any missing values using the provided fill method. The resulting time series are then standardized to ensure consistent scale. Next, the method computes a DTW distance matrix representing pairwise similarity between all time series. Hierarchical clustering is applied to this matrix to group similar time series together. The number of clusters can be determined in three ways: - If neither the number of clusters nor the percentile is specified, the method uses the elbow method on intra-cluster distances to determine the optimal number of clusters. - If the number of clusters is specified, the method will generate up to that number of clusters. - If a percentile is specified, the number of clusters is derived as a percentage of the total targets (minimum 2). Within each cluster, the medoid is selected as the most representative target—defined as the time series with the lowest total DTW distance to others in its cluster. The method returns: - A filtered dataset containing only the selected medoid targets. - The full dataset with two new columns: -
-
Inputs:
- Required Input
- Source Data Definition: The source data definition.
- Name:
source_data_definition - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be an instance of Time Series Source Data
- Nested Model: Time Series Source Data
- Required Input
- Connection: The connection to the source data.
- Name:
data_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be an instance of Tabular Connection
- Nested Model: Tabular Connection
- Required Input
- Connection: The connection type to use to access the source data.
- Name:
tabular_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be one of the following
- SQL Server Connection
- Required Input
- Database Resource: The name of the database resource to connect to.
- Name:
database_resource - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Name: The name of the database to connect to.
- Name:
database_name - Tooltip:
- Detail:
- Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Table Name: The name of the table to use.
- Name:
table_name - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Resource: The name of the database resource to connect to.
- Required Input
- MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Path: The full file path to the file to ingest.
- Name:
file_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- Partitioned MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Type: The type of files to read from the directory.
- Name:
file_type - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: FileExtensions_
- Name:
- Directory Path: The full directory path containing partitioned tabular files.
- Name:
directory_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- SQL Server Connection
- Name:
- Connection: The connection type to use to access the source data.
- Required Input
- Name:
- Dimension Columns: The columns to use as dimensions.
- Name:
dimension_columns - Tooltip:
- Validation Constraints:
- The input must have a minimum length of 1.
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: list[str]
- Name:
- Date Column: The column to use as the date.
- Name:
date_column - Tooltip:
- Detail:
- The date column must in a DateTime readable format.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Value Column: The column to use as the value.
- Name:
value_column - Tooltip:
- Detail:
- The value column must be a numeric (int, float, double, decimal, etc.) column.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Connection: The connection to the source data.
- Required Input
- Name:
- Dimension Sub-Selections: The dimension column(s) to group the time series.
- Name:
dimension_subselections - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: list[str]
- Name:
- Fill Method: The method to use to fill in missing data when pivoting time series.
- Name:
fill_method - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: FillMethod_
- Name:
- Custom Fill Value: The custom value to use when filling missing values in the time series pivot table.
- Name:
custom_fill_value - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: int | float | NoneType
- Name:
- Source Data Definition: The source data definition.
- Optional Input
- Number of Clusters: The number of clusters to generate.
- Name:
num_clusters - Tooltip:
- Validation Constraints:
- The input must be greater than 0.
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Optional[int]
- Name:
- Percentile: The percentile of clusters to retain for cluster assignment.
- Name:
percentile - Tooltip:
- Validation Constraints:
- The input must be greater than 0.
- The input must be less than or equal to 100.
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Optional[float]
- Name:
- Number of Clusters: The number of clusters to generate.
- Required Input
-
Artifacts:
-
List of Target Names: The subselection of targets.
- Qualified Key Annotation:
list_of_target_names - Aggregate Artifact:
False - In-Memory Json Accessible:
True - File Annotations:
artifacts_/@list_of_target_names/data_/list.json- A json list object stored in a json file.
- Qualified Key Annotation:
-
Full Target Dataset: The original dataset with an additional column stating whether the target is included in the subselection sample.
- Qualified Key Annotation:
full_target_dataset - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@full_target_dataset/data_/data_<int>.parquet- A partitioned set of parquet files where each file will have no more than 1000000 rows.
- Qualified Key Annotation:
-
Filtered Target Dataset: The original dataset filtered down to include only the selected targets
- Qualified Key Annotation:
filtered_target_dataset - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@filtered_target_dataset/data_/data_<int>.parquet- A partitioned set of parquet files where each file will have no more than 1000000 rows.
- Qualified Key Annotation:
-
-
2. Semi Random Sample (Method)
- Method:
semi_random_sample-
Type: Method
-
Memory Capacity: 2.0 GB
-
Allow In-Memory Execution: No
-
Read Only: No
-
Method Limits: This routine method easily supports substantial volumes of data. This method has been tested with a 40K target dataset containing 29M rows and 6 columns, completing in just 5 minutes. There is very minimal difference in compute and runtime when utilizing the target count input parameter vs. the dimension subselections input parameter.
-
Outputs Dynamic Artifacts: No
-
Short Description:
- Performs semi-random sampling of targets from a dataset based on user-defined dimensions and values.
-
Detailed Description:
- This method enables sub-sampling of targets either through dimension-based filtering or broad random selection. If the user specifies dimension/value pairs along with a percentage, the method filters the dataset based on each pair and randomly samples the specified percentage of targets from the matching group(s). If no dimension filters are provided, the method randomly selects a fixed number of unique targets from the entire dataset. This approach is useful for testing or prototyping with a smaller, representative subset of targets, without relying on clustering or significance metrics.
-
Inputs:
- Required Input
- Source Data Definition: The source data definition.
- Name:
source_data_definition - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be an instance of Time Series Source Data
- Nested Model: Time Series Source Data
- Required Input
- Connection: The connection to the source data.
- Name:
data_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be an instance of Tabular Connection
- Nested Model: Tabular Connection
- Required Input
- Connection: The connection type to use to access the source data.
- Name:
tabular_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be one of the following
- SQL Server Connection
- Required Input
- Database Resource: The name of the database resource to connect to.
- Name:
database_resource - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Name: The name of the database to connect to.
- Name:
database_name - Tooltip:
- Detail:
- Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Table Name: The name of the table to use.
- Name:
table_name - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Resource: The name of the database resource to connect to.
- Required Input
- MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Path: The full file path to the file to ingest.
- Name:
file_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- Partitioned MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Type: The type of files to read from the directory.
- Name:
file_type - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: FileExtensions_
- Name:
- Directory Path: The full directory path containing partitioned tabular files.
- Name:
directory_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- SQL Server Connection
- Name:
- Connection: The connection type to use to access the source data.
- Required Input
- Name:
- Dimension Columns: The columns to use as dimensions.
- Name:
dimension_columns - Tooltip:
- Validation Constraints:
- The input must have a minimum length of 1.
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: list[str]
- Name:
- Date Column: The column to use as the date.
- Name:
date_column - Tooltip:
- Detail:
- The date column must in a DateTime readable format.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Value Column: The column to use as the value.
- Name:
value_column - Tooltip:
- Detail:
- The value column must be a numeric (int, float, double, decimal, etc.) column.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Connection: The connection to the source data.
- Required Input
- Name:
- Dimension Sub-Selections: The dimensions to sub-select and randomly sample.
- Name:
dimensions_subselections - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: list[DimensionSubselection]
- Name:
- Source Data Definition: The source data definition.
- Optional Input
- Number of Targets: The number of targets to include in random sample.
- Name:
num_targets - Tooltip:
- Validation Constraints:
- The input must be greater than 0.
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Optional[int]
- Name:
- Number of Targets: The number of targets to include in random sample.
- Required Input
-
Artifacts:
-
List of Target Names: The subselection of targets.
- Qualified Key Annotation:
list_of_target_names - Aggregate Artifact:
False - In-Memory Json Accessible:
True - File Annotations:
artifacts_/@list_of_target_names/data_/list.json- A json list object stored in a json file.
- Qualified Key Annotation:
-
Full Target Dataset: The original dataset with an additional column stating whether the target is included in the subselection sample.
- Qualified Key Annotation:
full_target_dataset - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@full_target_dataset/data_/data_<int>.parquet- A partitioned set of parquet files where each file will have no more than 1000000 rows.
- Qualified Key Annotation:
-
Filtered Target Dataset: The original dataset filtered down to include only the selected targets
- Qualified Key Annotation:
filtered_target_dataset - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@filtered_target_dataset/data_/data_<int>.parquet- A partitioned set of parquet files where each file will have no more than 1000000 rows.
- Qualified Key Annotation:
-
-
3. Significance Breakdown Sample (Method)
- Method:
significance_breakdown_sample-
Type: Method
-
Memory Capacity: 2.0 GB
-
Allow In-Memory Execution: No
-
Read Only: No
-
Method Limits: This routine method easily supports substantial volumes of data. This method has been tested with a 40K target dataset containing 29M rows and 6 columns, completing in just 5 minutes. There is very minimal difference in compute and runtime when utilizing the global significance sampling input parameter vs. the dimension based significance subselections input parameter. These limits are very similar to those of the Semi Random Sample method.
-
Outputs Dynamic Artifacts: No
-
Short Description:
- Significance breakdown sampling of targets
-
Detailed Description:
- This method supports two types of significance-based sub-sampling: 1. Dimension-Based Significance Subselection: Users provide specific dimension names, corresponding values, and a significance percentage. The method filters the dataset by each (dimension, value) pair, aggregates the value column, and selects the top targets whose cumulative values fall within the specified significance threshold. 2.Global Significance Sampling: Users provide only a significance percentage. The method aggregates the value column across all targets and selects the top targets whose cumulative values fall within the threshold, regardless of dimension. Both approaches aim to retain the most impactful targets based on their contribution to the overall value metric.
-
Inputs:
- Required Input
- Source Data Definition: The source data definition.
- Name:
source_data_definition - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be an instance of Time Series Source Data
- Nested Model: Time Series Source Data
- Required Input
- Connection: The connection to the source data.
- Name:
data_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be an instance of Tabular Connection
- Nested Model: Tabular Connection
- Required Input
- Connection: The connection type to use to access the source data.
- Name:
tabular_connection - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be one of the following
- SQL Server Connection
- Required Input
- Database Resource: The name of the database resource to connect to.
- Name:
database_resource - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Name: The name of the database to connect to.
- Name:
database_name - Tooltip:
- Detail:
- Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Table Name: The name of the table to use.
- Name:
table_name - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Database Resource: The name of the database resource to connect to.
- Required Input
- MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Path: The full file path to the file to ingest.
- Name:
file_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- Partitioned MetaFileSystem Connection
- Required Input
- Connection Key: The MetaFileSystem connection key.
- Name:
connection_key - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: MetaFileSystemConnectionKey
- Name:
- File Type: The type of files to read from the directory.
- Name:
file_type - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: FileExtensions_
- Name:
- Directory Path: The full directory path containing partitioned tabular files.
- Name:
directory_path - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: str
- Name:
- Connection Key: The MetaFileSystem connection key.
- Required Input
- SQL Server Connection
- Name:
- Connection: The connection type to use to access the source data.
- Required Input
- Name:
- Dimension Columns: The columns to use as dimensions.
- Name:
dimension_columns - Tooltip:
- Validation Constraints:
- The input must have a minimum length of 1.
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: list[str]
- Name:
- Date Column: The column to use as the date.
- Name:
date_column - Tooltip:
- Detail:
- The date column must in a DateTime readable format.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Value Column: The column to use as the value.
- Name:
value_column - Tooltip:
- Detail:
- The value column must be a numeric (int, float, double, decimal, etc.) column.
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Detail:
- Type: str
- Name:
- Connection: The connection to the source data.
- Required Input
- Name:
- Significance Breakdown: The significance breakdown to use.
- Name:
significance_breakdown - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: Must be one of the following
- Broad Significance Sample
- Required Input
- Significance Percentage: The significance percentage of the targets to include in the sub-selection.
- Name:
significance_percentage - Tooltip:
- Validation Constraints:
- The input must be greater than 0.
- The input must be less than or equal to 100.
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: int | float
- Name:
- Significance Percentage: The significance percentage of the targets to include in the sub-selection.
- Required Input
- Significance Subselection Sample
- Required Input
- Significance Dimension Sub-Selections: The dimensions to sub-select on.
- Name:
significance_dimension_subselections - Tooltip:
- Validation Constraints:
- This input may be subject to other validation constraints at runtime.
- Validation Constraints:
- Type: list[SignificanceDimensionSubselection]
- Name:
- Significance Dimension Sub-Selections: The dimensions to sub-select on.
- Required Input
- Broad Significance Sample
- Name:
- Source Data Definition: The source data definition.
- Required Input
-
Artifacts:
-
List of Target Names: The subselection of targets.
- Qualified Key Annotation:
list_of_target_names - Aggregate Artifact:
False - In-Memory Json Accessible:
True - File Annotations:
artifacts_/@list_of_target_names/data_/list.json- A json list object stored in a json file.
- Qualified Key Annotation:
-
Full Target Dataset: The original dataset with an additional column stating whether the target is included in the subselection sample.
- Qualified Key Annotation:
full_target_dataset - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@full_target_dataset/data_/data_<int>.parquet- A partitioned set of parquet files where each file will have no more than 1000000 rows.
- Qualified Key Annotation:
-
Filtered Target Dataset: The original dataset filtered down to include only the selected targets
- Qualified Key Annotation:
filtered_target_dataset - Aggregate Artifact:
False - In-Memory Json Accessible:
False - File Annotations:
artifacts_/@filtered_target_dataset/data_/data_<int>.parquet- A partitioned set of parquet files where each file will have no more than 1000000 rows.
- Qualified Key Annotation:
-
-
Interface Definitions
No interface definitions found for this routine
Developer Docs
Routine Typename: RPETargetSubsampler
| Method Name | Artifact Keys |
|---|---|
dynamic_time_warping | list_of_target_names, full_target_dataset, filtered_target_dataset |
semi_random_sample | list_of_target_names, full_target_dataset, filtered_target_dataset |
significance_breakdown_sample | list_of_target_names, full_target_dataset, filtered_target_dataset |