Xperiflow Tasks

Author: Drew Shea, Created: 2025-11-16

Tasks in Xperiflow are similar to OneStream Data Management Steps — they represent the granular work that must be completed to achieve a larger goal. However, they differs in that Xperiflow Tasks are often dynamically created as the Job is running meaning that there is not a pre-defined set of steps prior to runtime.

Core Concepts

Tasks vs Jobs

Understanding the relationship between Tasks and Jobs is fundamental:

Jobs: High-level containers that represent complete operations. You interact with Jobs directly.
Tasks: The actual execution units that perform work. Tasks are created automatically by Jobs.

Think of it like a construction project:

The Job is the complete project (e.g., "build a house")
The Tasks are the individual steps (foundation, framing, plumbing, electrical) that must be completed

When you submit a Job, the system automatically creates a Task tree that breaks down the work into executable steps.

Task Trees

Tasks are organized in hierarchical trees:

Root Task: The top-level Task created from a Job. It represents the entire operation.
Parent Tasks: Tasks that create and coordinate child Tasks.
Child Tasks: Subtasks that perform specific work under a parent.
Leaf Tasks: Tasks at the bottom of the tree that perform actual work without creating subtasks.

The tree structure allows complex operations to be broken down into manageable pieces, with each level handling a different aspect of coordination or execution.

Execution Types

Tasks have different execution types that determine how they run:

(F79B0573-8328-4B36-BDB9-445A6AD65B24)-20251116-172041.png

Sequential Tasks:

Execute their child Tasks one after another
Each child must complete before the next begins
Used when steps depend on previous steps completing
Example: Load data → Process data → Save results

Parallel Tasks:

Execute their child Tasks simultaneously
All children can run at the same time
Used when steps are independent and can run concurrently
Example: Train multiple models in parallel

Atomic Tasks:

Perform actual work without creating subtasks
Execute on worker processes
Represent the fundamental units of work
Example: Train a single model, load a dataset, calculate metrics

Synthetic Tasks:

Special tasks used for coordination and state management
Don't perform actual work but help organize the tree
Used internally by the system

Dynamic Task Trees

Task trees are dynamic—the number of child Tasks for Sequential and Parallel Tasks is determined at runtime, not when the Job is created. The structure depends on:

Data-Driven Factors:

Data Size: A task processing data might create one child per data partition or batch
Number of Targets: A modeling task might create one child per target variable
Number of Models: A model training task might create children based on how many models need training
Feature Count: A feature engineering task might create children based on the number of feature groups

Configuration-Driven Factors:

User Settings: Parameters you provide can determine how work is divided
System Configuration: Resource limits and parallelization settings affect task creation
Project State: Current project state (existing models, data versions, etc.) influences task structure

Runtime Discovery:

Dynamic Analysis: Tasks often analyze data or system state during execution to determine how to proceed
Conditional Logic: Tasks may create different numbers of children based on conditions discovered at runtime
Adaptive Behavior: The system may adjust task structure based on available resources or workload

Example Scenarios:

Scenario 1: Model Training

A modeling task might discover it needs to train 5 models based on the number of target variables in your project
It creates 5 parallel child tasks, one per model
If you run the same job on a different project with 10 targets, it creates 10 child tasks instead

Scenario 2: Data Processing

A data processing task might analyze the input data and determine it needs to process 3 large partitions
It creates 3 sequential child tasks to process each partition in order
The number of partitions depends on data size and system configuration, not known until execution

Scenario 3: Feature Engineering

A feature engineering task might create children based on the number of feature groups that need processing
The number of groups depends on your feature configuration and data characteristics
This is discovered when the task examines the project state

Implications:

Progress Tracking: Initial progress estimates may be inaccurate because the total number of tasks isn't known upfront
Resource Planning: Memory and resource needs can vary based on the actual task structure created
Execution Time: Total execution time depends on the runtime-determined structure
Monitoring: Task trees may look different for similar jobs depending on runtime conditions

This dynamic behavior allows the system to adapt to different scenarios, optimize resource usage, and handle varying data sizes and configurations efficiently.

Task Properties

Tasks have several important properties:

Hierarchical Properties:

Level: The depth of the Task in the tree (0 for root, 1 for first-level children, etc.)
Order: The position of the Task among siblings at the same level
Task Path: A unique path identifying the Task's location in the tree (e.g., "PipelineOrchestrator/FeaturePipelineOrchestrator/DataLoad")

Execution Properties:

Execution Type: How the Task executes (Sequential, Parallel, Atomic, Synthetic)
Memory Allocation: How much memory (in GB) the Task requires
Priority: The Task's priority level (1-10, where 1 is highest)
Weighting Factor: How much this Task contributes to parent progress (0.0 to 1.0)

Status Properties:

Activity Status: Current state of the Task
Percent Complete: Progress from 0% to 100%
Detail Name: Description of current activity

Task Lifecycle

Tasks progress through a well-defined lifecycle similar to Jobs.

Lifecycle States

A Task can be in one of several states:

Initial States:

INITIALIZED: The Task has been created but not yet queued

Waiting States:

QUEUED: The Task is waiting to be picked up for execution
WORKERQUEUED: The Task has been sent to a worker queue. The Task is on deck to be worked on.

Active States:

RUNNING: The Task is currently executing
RUNNINGSUBTASKS: The Task is coordinating its child Tasks (only applicable for Sequential and Parallel Execution Types)

Paused States:

PAUSED: Execution has been temporarily paused
PENDING_PAUSED: A pause request is being processed (the Task is still running)

Completion States:

COMPLETED: The Task finished successfully
USERCANCELLED: The Task was cancelled by a user
SYSCANCELLED: The Task was cancelled by the system
SYSERROR: The Task failed due to an error
SYSERROR_RERUNNABLE: The Task failed but can be safely rerun

Lifecycle Flow

In general, for any given Task, the flow looks like so:

┌─────────────┐

│ INITIALIZED │ Task is created

└──────┬──────┘

│

▼

┌─────────────┐

│ QUEUED │ Waiting for execution

└──────┬──────┘

│

▼

┌─────────────┐

│ RUNNING │ Executing work

└──────┬──────┘

│

├─→ (if Parallel or Sequential Execution Type)

│ ┌─────────────┐

│ │RUNNINGSUBTASKS │ Children are RUNNING

│ └──────┬──────┘

│ │

└──────────┘

│

▼

┌─────────────┐

│ COMPLETED │ Finished successfully

└─────────────┘

│

▼ (on failure)

┌─────────────┐

│ SYSERROR │ Failed (may be rerunnable)

└─────────────┘

Time Tracking

Tasks track several important timestamps:

Queued Time: When the Task entered the queue
Start Time: When execution actually began
End Time: When execution completed
Last Activity Time: The most recent update to the Task

Task Execution

How Tasks Execute

The execution process depends on the Task's execution type:

Sequential Tasks:

Task starts executing
Creates first child Task
Waits for first child to complete
Creates second child Task
Waits for second child to complete
Continues until all children complete
Task completes

Parallel Tasks:

Task starts executing
Creates all child Tasks simultaneously
All children execute concurrently
Waits for all children to complete
Task completes

Atomic Tasks:

Task is sent to a worker process
Worker executes the actual work
Task completes when work finishes

Task Coordination

Parent Tasks coordinate their children:

Sequential Coordination: Ensures children execute in order, waiting for each to complete
Parallel Coordination: Manages multiple children running simultaneously, tracking their progress
Progress Aggregation: Combines children's progress to update parent's percent complete
Error Handling: If a child fails, the error and progress execution will bubble up recursively to the parent.

Memory and Resource Management

Tasks specify their resource requirements:

Memory Allocation: Each Task declares how much memory (in GB) it need
System Validation: The system ensures sufficient resources are available before executing
Resource Limits: Tasks respect user and system resource limits
Dynamic Allocation: Some Tasks calculate memory needs based on input data size

Task Types

Tasks come in many types, each designed for specific operations:

Orchestration Tasks

Tasks that coordinate other Tasks. These can be of either Sequential or Parallel Execution Type:

PipelineOrchestrator: Coordinates the complete pipeline execution
FeaturePipelineOrchestrator: Coordinates feature engineering steps
ModelingPipelineOrchestrator: Coordinates model training steps
RoutineExecutionTask: Executes routine workflows

Data Tasks

Tasks that handle data operations:

DataLoad: Loads data from sources
DataUpdate: Updates existing data
DataValidation: Validates data quality
FrameBuilder: Builds data frames for modeling

Feature Engineering Tasks

Tasks for feature creation and transformation:

FeatureGeneration: Generates new features
FeatureTransformation: Transforms features
FeatureSelection: Selects important features

Modeling Tasks

Tasks for machine learning operations:

HardTrain: Trains models with full datasets
SoftTrain: Retrains models with new data
HyperTune: Performs hyperparameter tuning
ModelSelection: Selects best models
Backtest: Tests models on historical data
ModelDeploy: Deploys models for production

Analysis Tasks

Tasks for analysis and insights:

FeatureImpact: Analyzes feature importance
PredictionExplanation: Explains model predictions
Insights: Generates analytical insights

System Tasks

Tasks for system operations:

Checkpoint: Creates system checkpoints
BackgroundCheckpoint: Creates background checkpoints

info

**Note:** This is not an exhaustive list. The available Task types depend on your Xperiflow installation and the specific operations you're performing as defined by the Xperiflow Job.

Task Properties in Detail

Hierarchical Structure

Level:

Indicates how deep the Task is in the tree
Root Tasks have level 0
Each level down increments the level
Helps understand Task relationships

Order:

Position among siblings at the same level
Sequential Tasks use order to determine execution sequence
Parallel Tasks may have order -1 to indicate concurrent execution

Task Path:

Unique identifier showing the Task's location
Format: "ParentTask/ChildTask/GrandchildTask"
Ex: PipelineOrchestrator.FeaturePipelineOrchestrator.DataLoad
Helps trace execution flow and debug issues

Resource Properties

Memory Allocation:

Specified in gigabytes (GB)
System ensures availability before execution
Can be calculated dynamically based on data size
Includes idle memory overhead

Priority:

Range from 1 (highest) to 10 (lowest)
Inherited from parent Task or Job by default
Higher priority Tasks execute before lower priority ones

Weighting Factor:

Determines how much this Task contributes to parent's progress
Range from 0.0 to 1.0
Used to calculate accurate progress percentages
Example: If a parent has 3 children with equal weight, each contributes 0.33

Execution Control

Execution Type:

Determines how the Task runs
Set when Task is created
Cannot be changed for most Task types
Atomic Tasks can sometimes be converted to other types

Retries:

Number of times the Task has been retried after failure
Maximum retries are configured per Task type
System automatically retries on transient failures

Task Monitoring

Progress Tracking

Tasks provide detailed progress information:

Percent Complete: Current progress (0% to 100%)
Detail Name: Description of current activity (e.g., "Load Data", "Train Model")
Status: Current lifecycle state
Children Progress: For parent Tasks, aggregated progress from children

Task Details

Tasks record detailed information about their execution:

Task Details: Timestamped records of what the Task is doing
Detail Names: Standard names like "Task Started", "Load Data", "Task Completed"
Detail Info: Additional context about the current operation

Task Tree Visualization

The Task tree can be visualized to understand:

Overall structure and hierarchy
Which Tasks are running, completed, or failed
Progress at each level
Relationships between Tasks

Task Error Handling

Retry Mechanism

Tasks can automatically retry on failure:

Transient Failures: Automatically retried
Retry Limits: Maximum number of retries configured per Task type
Retry Backoff: System may wait before retrying
Retry Tracking: Number of retries is tracked and reported

Error Propagation

When Tasks fail:

Child Failure: Parent Task is notified
Parent Decision: Parent can handle error or propagate upward
Job Impact: Failure may cause entire Job to fail
Error Logs: Detailed error information is recorded

Cleanup

When Tasks fail, cleanup operations:

Database Cleanup: Remove partial results
Resource Release: Free allocated memory and resources
State Reversion: Revert to previous known good state if possible

Task Pausing and Resumption

Pausing Tasks

Tasks can be paused during execution:

Atomic Tasks Only: Only atomic Tasks can be paused
Pause Request: System processes pause requests
State Preservation: Task state is saved when paused
Resume Capability: Paused Tasks can be resumed later

Resuming Tasks

Paused Tasks can be resumed:

State Restoration: Previous state is restored
Continuation: Execution continues from pause point
Metadata Preservation: Communication metadata is preserved

Best Practices

Progress Interpretation

Parent Task progress is aggregated from children
Weighting factors affect progress calculations
Sequential Tasks show linear progress
Parallel Tasks may show non-linear progress as children complete

Core Concepts​

Tasks vs Jobs​

Task Trees​

Execution Types​

Dynamic Task Trees​

Task Properties​

Task Lifecycle​

Lifecycle States​

Lifecycle Flow​

Time Tracking​

Task Execution​

How Tasks Execute​

Task Coordination​

Memory and Resource Management​

Task Types​

Orchestration Tasks​

Data Tasks​

Feature Engineering Tasks​

Modeling Tasks​

Analysis Tasks​

System Tasks​

Task Properties in Detail​

Hierarchical Structure​

Resource Properties​

Execution Control​

Task Monitoring​

Progress Tracking​

Task Details​

Task Tree Visualization​

Task Error Handling​

Retry Mechanism​

Error Propagation​

Cleanup​

Task Pausing and Resumption​

Pausing Tasks​

Resuming Tasks​

Best Practices​

Progress Interpretation​

Core Concepts

Tasks vs Jobs

Task Trees

Execution Types

Dynamic Task Trees

Task Properties

Task Lifecycle

Lifecycle States

Lifecycle Flow

Time Tracking

Task Execution

How Tasks Execute

Task Coordination

Memory and Resource Management

Task Types

Orchestration Tasks

Data Tasks

Feature Engineering Tasks

Modeling Tasks

Analysis Tasks

System Tasks

Task Properties in Detail

Hierarchical Structure

Resource Properties

Execution Control

Task Monitoring

Progress Tracking

Task Details

Task Tree Visualization

Task Error Handling

Retry Mechanism

Error Propagation

Cleanup

Task Pausing and Resumption

Pausing Tasks

Resuming Tasks

Best Practices

Progress Interpretation