Skip to main content

Xperiflow Tasks

Author: Drew Shea, Created: 2025-11-16

Tasks in Xperiflow are similar to OneStream Data Management Steps — they represent the granular work that must be completed to achieve a larger goal. However, they differs in that Xperiflow Tasks are often dynamically created as the Job is running meaning that there is not a pre-defined set of steps prior to runtime.


Core Concepts

Tasks vs Jobs

Understanding the relationship between Tasks and Jobs is fundamental:

  • Jobs: High-level containers that represent complete operations. You interact with Jobs directly.
  • Tasks: The actual execution units that perform work. Tasks are created automatically by Jobs.

Think of it like a construction project:

  • The Job is the complete project (e.g., "build a house")
  • The Tasks are the individual steps (foundation, framing, plumbing, electrical) that must be completed

When you submit a Job, the system automatically creates a Task tree that breaks down the work into executable steps.

Task Trees

Tasks are organized in hierarchical trees:

  • Root Task: The top-level Task created from a Job. It represents the entire operation.
  • Parent Tasks: Tasks that create and coordinate child Tasks.
  • Child Tasks: Subtasks that perform specific work under a parent.
  • Leaf Tasks: Tasks at the bottom of the tree that perform actual work without creating subtasks.

The tree structure allows complex operations to be broken down into manageable pieces, with each level handling a different aspect of coordination or execution.

Execution Types

Tasks have different execution types that determine how they run:

(F79B0573-8328-4B36-BDB9-445A6AD65B24)-20251116-172041.png

Sequential Tasks:

  • Execute their child Tasks one after another
  • Each child must complete before the next begins
  • Used when steps depend on previous steps completing
  • Example: Load data → Process data → Save results

Parallel Tasks:

  • Execute their child Tasks simultaneously
  • All children can run at the same time
  • Used when steps are independent and can run concurrently
  • Example: Train multiple models in parallel

Atomic Tasks:

  • Perform actual work without creating subtasks
  • Execute on worker processes
  • Represent the fundamental units of work
  • Example: Train a single model, load a dataset, calculate metrics

Synthetic Tasks:

  • Special tasks used for coordination and state management
  • Don't perform actual work but help organize the tree
  • Used internally by the system

Dynamic Task Trees

Task trees are dynamic—the number of child Tasks for Sequential and Parallel Tasks is determined at runtime, not when the Job is created. The structure depends on:

Data-Driven Factors:

  • Data Size: A task processing data might create one child per data partition or batch
  • Number of Targets: A modeling task might create one child per target variable
  • Number of Models: A model training task might create children based on how many models need training
  • Feature Count: A feature engineering task might create children based on the number of feature groups

Configuration-Driven Factors:

  • User Settings: Parameters you provide can determine how work is divided
  • System Configuration: Resource limits and parallelization settings affect task creation
  • Project State: Current project state (existing models, data versions, etc.) influences task structure

Runtime Discovery:

  • Dynamic Analysis: Tasks often analyze data or system state during execution to determine how to proceed
  • Conditional Logic: Tasks may create different numbers of children based on conditions discovered at runtime
  • Adaptive Behavior: The system may adjust task structure based on available resources or workload

Example Scenarios:

Scenario 1: Model Training

  • A modeling task might discover it needs to train 5 models based on the number of target variables in your project
  • It creates 5 parallel child tasks, one per model
  • If you run the same job on a different project with 10 targets, it creates 10 child tasks instead

Scenario 2: Data Processing

  • A data processing task might analyze the input data and determine it needs to process 3 large partitions
  • It creates 3 sequential child tasks to process each partition in order
  • The number of partitions depends on data size and system configuration, not known until execution

Scenario 3: Feature Engineering

  • A feature engineering task might create children based on the number of feature groups that need processing
  • The number of groups depends on your feature configuration and data characteristics
  • This is discovered when the task examines the project state

Implications:

  • Progress Tracking: Initial progress estimates may be inaccurate because the total number of tasks isn't known upfront
  • Resource Planning: Memory and resource needs can vary based on the actual task structure created
  • Execution Time: Total execution time depends on the runtime-determined structure
  • Monitoring: Task trees may look different for similar jobs depending on runtime conditions

This dynamic behavior allows the system to adapt to different scenarios, optimize resource usage, and handle varying data sizes and configurations efficiently.

Task Properties

Tasks have several important properties:

Hierarchical Properties:

  • Level: The depth of the Task in the tree (0 for root, 1 for first-level children, etc.)
  • Order: The position of the Task among siblings at the same level
  • Task Path: A unique path identifying the Task's location in the tree (e.g., "PipelineOrchestrator/FeaturePipelineOrchestrator/DataLoad")

Execution Properties:

  • Execution Type: How the Task executes (Sequential, Parallel, Atomic, Synthetic)
  • Memory Allocation: How much memory (in GB) the Task requires
  • Priority: The Task's priority level (1-10, where 1 is highest)
  • Weighting Factor: How much this Task contributes to parent progress (0.0 to 1.0)

Status Properties:

  • Activity Status: Current state of the Task
  • Percent Complete: Progress from 0% to 100%
  • Detail Name: Description of current activity

Task Lifecycle

Tasks progress through a well-defined lifecycle similar to Jobs.

Lifecycle States

A Task can be in one of several states:

Initial States:

  • INITIALIZED: The Task has been created but not yet queued

Waiting States:

  • QUEUED: The Task is waiting to be picked up for execution
  • WORKERQUEUED: The Task has been sent to a worker queue. The Task is on deck to be worked on.

Active States:

  • RUNNING: The Task is currently executing
  • RUNNINGSUBTASKS: The Task is coordinating its child Tasks (only applicable for Sequential and Parallel Execution Types)

Paused States:

  • PAUSED: Execution has been temporarily paused
  • PENDING_PAUSED: A pause request is being processed (the Task is still running)

Completion States:

  • COMPLETED: The Task finished successfully
  • USERCANCELLED: The Task was cancelled by a user
  • SYSCANCELLED: The Task was cancelled by the system
  • SYSERROR: The Task failed due to an error
  • SYSERROR_RERUNNABLE: The Task failed but can be safely rerun

Lifecycle Flow

In general, for any given Task, the flow looks like so:

┌─────────────┐

│ INITIALIZED │ Task is created

└──────┬──────┘

┌─────────────┐

│ QUEUED │ Waiting for execution

└──────┬──────┘

┌─────────────┐

│ RUNNING │ Executing work

└──────┬──────┘

├─→ (if Parallel or Sequential Execution Type)

│ ┌─────────────┐

│ │RUNNINGSUBTASKS │ Children are RUNNING

│ └──────┬──────┘

│ │

└──────────┘

┌─────────────┐

│ COMPLETED │ Finished successfully

└─────────────┘

▼ (on failure)

┌─────────────┐

│ SYSERROR │ Failed (may be rerunnable)

└─────────────┘

Time Tracking

Tasks track several important timestamps:

  • Queued Time: When the Task entered the queue
  • Start Time: When execution actually began
  • End Time: When execution completed
  • Last Activity Time: The most recent update to the Task

Task Execution

How Tasks Execute

The execution process depends on the Task's execution type:

Sequential Tasks:

  1. Task starts executing
  2. Creates first child Task
  3. Waits for first child to complete
  4. Creates second child Task
  5. Waits for second child to complete
  6. Continues until all children complete
  7. Task completes

Parallel Tasks:

  1. Task starts executing
  2. Creates all child Tasks simultaneously
  3. All children execute concurrently
  4. Waits for all children to complete
  5. Task completes

Atomic Tasks:

  1. Task is sent to a worker process
  2. Worker executes the actual work
  3. Task completes when work finishes

Task Coordination

Parent Tasks coordinate their children:

  • Sequential Coordination: Ensures children execute in order, waiting for each to complete
  • Parallel Coordination: Manages multiple children running simultaneously, tracking their progress
  • Progress Aggregation: Combines children's progress to update parent's percent complete
  • Error Handling: If a child fails, the error and progress execution will bubble up recursively to the parent.

Memory and Resource Management

Tasks specify their resource requirements:

  • Memory Allocation: Each Task declares how much memory (in GB) it need
  • System Validation: The system ensures sufficient resources are available before executing
  • Resource Limits: Tasks respect user and system resource limits
  • Dynamic Allocation: Some Tasks calculate memory needs based on input data size

Task Types

Tasks come in many types, each designed for specific operations:

Orchestration Tasks

Tasks that coordinate other Tasks. These can be of either Sequential or Parallel Execution Type:

  • PipelineOrchestrator: Coordinates the complete pipeline execution
  • FeaturePipelineOrchestrator: Coordinates feature engineering steps
  • ModelingPipelineOrchestrator: Coordinates model training steps
  • RoutineExecutionTask: Executes routine workflows

Data Tasks

Tasks that handle data operations:

  • DataLoad: Loads data from sources
  • DataUpdate: Updates existing data
  • DataValidation: Validates data quality
  • FrameBuilder: Builds data frames for modeling

Feature Engineering Tasks

Tasks for feature creation and transformation:

  • FeatureGeneration: Generates new features
  • FeatureTransformation: Transforms features
  • FeatureSelection: Selects important features

Modeling Tasks

Tasks for machine learning operations:

  • HardTrain: Trains models with full datasets
  • SoftTrain: Retrains models with new data
  • HyperTune: Performs hyperparameter tuning
  • ModelSelection: Selects best models
  • Backtest: Tests models on historical data
  • ModelDeploy: Deploys models for production

Analysis Tasks

Tasks for analysis and insights:

  • FeatureImpact: Analyzes feature importance
  • PredictionExplanation: Explains model predictions
  • Insights: Generates analytical insights

System Tasks

Tasks for system operations:

  • Checkpoint: Creates system checkpoints
  • BackgroundCheckpoint: Creates background checkpoints
info

**Note:** This is not an exhaustive list. The available Task types depend on your Xperiflow installation and the specific operations you're performing as defined by the Xperiflow Job.


Task Properties in Detail

Hierarchical Structure

Level:

  • Indicates how deep the Task is in the tree
  • Root Tasks have level 0
  • Each level down increments the level
  • Helps understand Task relationships

Order:

  • Position among siblings at the same level
  • Sequential Tasks use order to determine execution sequence
  • Parallel Tasks may have order -1 to indicate concurrent execution

Task Path:

  • Unique identifier showing the Task's location
  • Format: "ParentTask/ChildTask/GrandchildTask"
  • Ex: PipelineOrchestrator.FeaturePipelineOrchestrator.DataLoad
  • Helps trace execution flow and debug issues

Resource Properties

Memory Allocation:

  • Specified in gigabytes (GB)
  • System ensures availability before execution
  • Can be calculated dynamically based on data size
  • Includes idle memory overhead

Priority:

  • Range from 1 (highest) to 10 (lowest)
  • Inherited from parent Task or Job by default
  • Higher priority Tasks execute before lower priority ones

Weighting Factor:

  • Determines how much this Task contributes to parent's progress
  • Range from 0.0 to 1.0
  • Used to calculate accurate progress percentages
  • Example: If a parent has 3 children with equal weight, each contributes 0.33

Execution Control

Execution Type:

  • Determines how the Task runs
  • Set when Task is created
  • Cannot be changed for most Task types
  • Atomic Tasks can sometimes be converted to other types

Retries:

  • Number of times the Task has been retried after failure
  • Maximum retries are configured per Task type
  • System automatically retries on transient failures

Task Monitoring

Progress Tracking

Tasks provide detailed progress information:

  • Percent Complete: Current progress (0% to 100%)
  • Detail Name: Description of current activity (e.g., "Load Data", "Train Model")
  • Status: Current lifecycle state
  • Children Progress: For parent Tasks, aggregated progress from children

Task Details

Tasks record detailed information about their execution:

  • Task Details: Timestamped records of what the Task is doing
  • Detail Names: Standard names like "Task Started", "Load Data", "Task Completed"
  • Detail Info: Additional context about the current operation

Task Tree Visualization

The Task tree can be visualized to understand:

  • Overall structure and hierarchy
  • Which Tasks are running, completed, or failed
  • Progress at each level
  • Relationships between Tasks

Task Error Handling

Retry Mechanism

Tasks can automatically retry on failure:

  • Transient Failures: Automatically retried
  • Retry Limits: Maximum number of retries configured per Task type
  • Retry Backoff: System may wait before retrying
  • Retry Tracking: Number of retries is tracked and reported

Error Propagation

When Tasks fail:

  • Child Failure: Parent Task is notified
  • Parent Decision: Parent can handle error or propagate upward
  • Job Impact: Failure may cause entire Job to fail
  • Error Logs: Detailed error information is recorded

Cleanup

When Tasks fail, cleanup operations:

  • Database Cleanup: Remove partial results
  • Resource Release: Free allocated memory and resources
  • State Reversion: Revert to previous known good state if possible

Task Pausing and Resumption

Pausing Tasks

Tasks can be paused during execution:

  • Atomic Tasks Only: Only atomic Tasks can be paused
  • Pause Request: System processes pause requests
  • State Preservation: Task state is saved when paused
  • Resume Capability: Paused Tasks can be resumed later

Resuming Tasks

Paused Tasks can be resumed:

  • State Restoration: Previous state is restored
  • Continuation: Execution continues from pause point
  • Metadata Preservation: Communication metadata is preserved

Best Practices

Progress Interpretation

  • Parent Task progress is aggregated from children
  • Weighting factors affect progress calculations
  • Sequential Tasks show linear progress
  • Parallel Tasks may show non-linear progress as children complete

Was this page helpful?