Xperiflow Tasks
Tasks in Xperiflow are similar to OneStream Data Management Steps — they represent the granular work that must be completed to achieve a larger goal. However, they differs in that Xperiflow Tasks are often dynamically created as the Job is running meaning that there is not a pre-defined set of steps prior to runtime.
Core Concepts
Tasks vs Jobs
Understanding the relationship between Tasks and Jobs is fundamental:
- Jobs: High-level containers that represent complete operations. You interact with Jobs directly.
- Tasks: The actual execution units that perform work. Tasks are created automatically by Jobs.
Think of it like a construction project:
- The Job is the complete project (e.g., "build a house")
- The Tasks are the individual steps (foundation, framing, plumbing, electrical) that must be completed
When you submit a Job, the system automatically creates a Task tree that breaks down the work into executable steps.
Task Trees
Tasks are organized in hierarchical trees:
- Root Task: The top-level Task created from a Job. It represents the entire operation.
- Parent Tasks: Tasks that create and coordinate child Tasks.
- Child Tasks: Subtasks that perform specific work under a parent.
- Leaf Tasks: Tasks at the bottom of the tree that perform actual work without creating subtasks.
The tree structure allows complex operations to be broken down into manageable pieces, with each level handling a different aspect of coordination or execution.
Execution Types
Tasks have different execution types that determine how they run:
-20251116-172041-17c08050922d1cb0aadef2a2fa81e6da.png)
Sequential Tasks:
- Execute their child Tasks one after another
- Each child must complete before the next begins
- Used when steps depend on previous steps completing
- Example: Load data → Process data → Save results
Parallel Tasks:
- Execute their child Tasks simultaneously
- All children can run at the same time
- Used when steps are independent and can run concurrently
- Example: Train multiple models in parallel
Atomic Tasks:
- Perform actual work without creating subtasks
- Execute on worker processes
- Represent the fundamental units of work
- Example: Train a single model, load a dataset, calculate metrics
Synthetic Tasks:
- Special tasks used for coordination and state management
- Don't perform actual work but help organize the tree
- Used internally by the system
Dynamic Task Trees
Task trees are dynamic—the number of child Tasks for Sequential and Parallel Tasks is determined at runtime, not when the Job is created. The structure depends on:
Data-Driven Factors:
- Data Size: A task processing data might create one child per data partition or batch
- Number of Targets: A modeling task might create one child per target variable
- Number of Models: A model training task might create children based on how many models need training
- Feature Count: A feature engineering task might create children based on the number of feature groups
Configuration-Driven Factors:
- User Settings: Parameters you provide can determine how work is divided
- System Configuration: Resource limits and parallelization settings affect task creation
- Project State: Current project state (existing models, data versions, etc.) influences task structure
Runtime Discovery:
- Dynamic Analysis: Tasks often analyze data or system state during execution to determine how to proceed
- Conditional Logic: Tasks may create different numbers of children based on conditions discovered at runtime
- Adaptive Behavior: The system may adjust task structure based on available resources or workload
Example Scenarios:
Scenario 1: Model Training
- A modeling task might discover it needs to train 5 models based on the number of target variables in your project
- It creates 5 parallel child tasks, one per model
- If you run the same job on a different project with 10 targets, it creates 10 child tasks instead
Scenario 2: Data Processing
- A data processing task might analyze the input data and determine it needs to process 3 large partitions
- It creates 3 sequential child tasks to process each partition in order
- The number of partitions depends on data size and system configuration, not known until execution
Scenario 3: Feature Engineering
- A feature engineering task might create children based on the number of feature groups that need processing
- The number of groups depends on your feature configuration and data characteristics
- This is discovered when the task examines the project state
Implications:
- Progress Tracking: Initial progress estimates may be inaccurate because the total number of tasks isn't known upfront
- Resource Planning: Memory and resource needs can vary based on the actual task structure created
- Execution Time: Total execution time depends on the runtime-determined structure
- Monitoring: Task trees may look different for similar jobs depending on runtime conditions
This dynamic behavior allows the system to adapt to different scenarios, optimize resource usage, and handle varying data sizes and configurations efficiently.
Task Properties
Tasks have several important properties:
Hierarchical Properties:
- Level: The depth of the Task in the tree (0 for root, 1 for first-level children, etc.)
- Order: The position of the Task among siblings at the same level
- Task Path: A unique path identifying the Task's location in the tree (e.g., "PipelineOrchestrator/FeaturePipelineOrchestrator/DataLoad")
Execution Properties:
- Execution Type: How the Task executes (Sequential, Parallel, Atomic, Synthetic)
- Memory Allocation: How much memory (in GB) the Task requires
- Priority: The Task's priority level (1-10, where 1 is highest)
- Weighting Factor: How much this Task contributes to parent progress (0.0 to 1.0)
Status Properties:
- Activity Status: Current state of the Task
- Percent Complete: Progress from 0% to 100%
- Detail Name: Description of current activity
Task Lifecycle
Tasks progress through a well-defined lifecycle similar to Jobs.
Lifecycle States
A Task can be in one of several states:
Initial States:
- INITIALIZED: The Task has been created but not yet queued
Waiting States:
- QUEUED: The Task is waiting to be picked up for execution
- WORKERQUEUED: The Task has been sent to a worker queue. The Task is on deck to be worked on.
Active States:
- RUNNING: The Task is currently executing
- RUNNINGSUBTASKS: The Task is coordinating its child Tasks (only applicable for Sequential and Parallel Execution Types)
Paused States:
- PAUSED: Execution has been temporarily paused
- PENDING_PAUSED: A pause request is being processed (the Task is still running)
Completion States:
- COMPLETED: The Task finished successfully
- USERCANCELLED: The Task was cancelled by a user
- SYSCANCELLED: The Task was cancelled by the system
- SYSERROR: The Task failed due to an error
- SYSERROR_RERUNNABLE: The Task failed but can be safely rerun
Lifecycle Flow
In general, for any given Task, the flow looks like so:
┌─────────────┐
│ INITIALIZED │ Task is created
└──────┬──────┘
│
▼
┌─────────────┐
│ QUEUED │ Waiting for execution
└──────┬──────┘
│
▼
┌─────────────┐
│ RUNNING │ Executing work
└──────┬──────┘
│
├─→ (if Parallel or Sequential Execution Type)
│ ┌─────────────┐
│ │RUNNINGSUBTASKS │ Children are RUNNING
│ └──────┬──────┘
│ │
└──────────┘
│
▼
┌─────────────┐
│ COMPLETED │ Finished successfully
└─────────────┘
│
▼ (on failure)
┌─────────────┐
│ SYSERROR │ Failed (may be rerunnable)
└─────────────┘
Time Tracking
Tasks track several important timestamps:
- Queued Time: When the Task entered the queue
- Start Time: When execution actually began
- End Time: When execution completed
- Last Activity Time: The most recent update to the Task
Task Execution
How Tasks Execute
The execution process depends on the Task's execution type:
Sequential Tasks:
- Task starts executing
- Creates first child Task
- Waits for first child to complete
- Creates second child Task
- Waits for second child to complete
- Continues until all children complete
- Task completes
Parallel Tasks:
- Task starts executing
- Creates all child Tasks simultaneously
- All children execute concurrently
- Waits for all children to complete
- Task completes
Atomic Tasks:
- Task is sent to a worker process
- Worker executes the actual work
- Task completes when work finishes
Task Coordination
Parent Tasks coordinate their children:
- Sequential Coordination: Ensures children execute in order, waiting for each to complete
- Parallel Coordination: Manages multiple children running simultaneously, tracking their progress
- Progress Aggregation: Combines children's progress to update parent's percent complete
- Error Handling: If a child fails, the error and progress execution will bubble up recursively to the parent.
Memory and Resource Management
Tasks specify their resource requirements:
- Memory Allocation: Each Task declares how much memory (in GB) it need
- System Validation: The system ensures sufficient resources are available before executing
- Resource Limits: Tasks respect user and system resource limits
- Dynamic Allocation: Some Tasks calculate memory needs based on input data size
Task Types
Tasks come in many types, each designed for specific operations:
Orchestration Tasks
Tasks that coordinate other Tasks. These can be of either Sequential or Parallel Execution Type:
- PipelineOrchestrator: Coordinates the complete pipeline execution
- FeaturePipelineOrchestrator: Coordinates feature engineering steps
- ModelingPipelineOrchestrator: Coordinates model training steps
- RoutineExecutionTask: Executes routine workflows
Data Tasks
Tasks that handle data operations:
- DataLoad: Loads data from sources
- DataUpdate: Updates existing data
- DataValidation: Validates data quality
- FrameBuilder: Builds data frames for modeling
Feature Engineering Tasks
Tasks for feature creation and transformation:
- FeatureGeneration: Generates new features
- FeatureTransformation: Transforms features
- FeatureSelection: Selects important features
Modeling Tasks
Tasks for machine learning operations:
- HardTrain: Trains models with full datasets
- SoftTrain: Retrains models with new data
- HyperTune: Performs hyperparameter tuning
- ModelSelection: Selects best models
- Backtest: Tests models on historical data
- ModelDeploy: Deploys models for production
Analysis Tasks
Tasks for analysis and insights:
- FeatureImpact: Analyzes feature importance
- PredictionExplanation: Explains model predictions
- Insights: Generates analytical insights
System Tasks
Tasks for system operations:
- Checkpoint: Creates system checkpoints
- BackgroundCheckpoint: Creates background checkpoints
**Note:** This is not an exhaustive list. The available Task types depend on your Xperiflow installation and the specific operations you're performing as defined by the Xperiflow Job.
Task Properties in Detail
Hierarchical Structure
Level:
- Indicates how deep the Task is in the tree
- Root Tasks have level 0
- Each level down increments the level
- Helps understand Task relationships
Order:
- Position among siblings at the same level
- Sequential Tasks use order to determine execution sequence
- Parallel Tasks may have order -1 to indicate concurrent execution
Task Path:
- Unique identifier showing the Task's location
- Format: "ParentTask/ChildTask/GrandchildTask"
- Ex:
PipelineOrchestrator.FeaturePipelineOrchestrator.DataLoad - Helps trace execution flow and debug issues
Resource Properties
Memory Allocation:
- Specified in gigabytes (GB)
- System ensures availability before execution
- Can be calculated dynamically based on data size
- Includes idle memory overhead
Priority:
- Range from 1 (highest) to 10 (lowest)
- Inherited from parent Task or Job by default
- Higher priority Tasks execute before lower priority ones
Weighting Factor:
- Determines how much this Task contributes to parent's progress
- Range from 0.0 to 1.0
- Used to calculate accurate progress percentages
- Example: If a parent has 3 children with equal weight, each contributes 0.33
Execution Control
Execution Type:
- Determines how the Task runs
- Set when Task is created
- Cannot be changed for most Task types
- Atomic Tasks can sometimes be converted to other types
Retries:
- Number of times the Task has been retried after failure
- Maximum retries are configured per Task type
- System automatically retries on transient failures
Task Monitoring
Progress Tracking
Tasks provide detailed progress information:
- Percent Complete: Current progress (0% to 100%)
- Detail Name: Description of current activity (e.g., "Load Data", "Train Model")
- Status: Current lifecycle state
- Children Progress: For parent Tasks, aggregated progress from children
Task Details
Tasks record detailed information about their execution:
- Task Details: Timestamped records of what the Task is doing
- Detail Names: Standard names like "Task Started", "Load Data", "Task Completed"
- Detail Info: Additional context about the current operation
Task Tree Visualization
The Task tree can be visualized to understand:
- Overall structure and hierarchy
- Which Tasks are running, completed, or failed
- Progress at each level
- Relationships between Tasks
Task Error Handling
Retry Mechanism
Tasks can automatically retry on failure:
- Transient Failures: Automatically retried
- Retry Limits: Maximum number of retries configured per Task type
- Retry Backoff: System may wait before retrying
- Retry Tracking: Number of retries is tracked and reported
Error Propagation
When Tasks fail:
- Child Failure: Parent Task is notified
- Parent Decision: Parent can handle error or propagate upward
- Job Impact: Failure may cause entire Job to fail
- Error Logs: Detailed error information is recorded
Cleanup
When Tasks fail, cleanup operations:
- Database Cleanup: Remove partial results
- Resource Release: Free allocated memory and resources
- State Reversion: Revert to previous known good state if possible
Task Pausing and Resumption
Pausing Tasks
Tasks can be paused during execution:
- Atomic Tasks Only: Only atomic Tasks can be paused
- Pause Request: System processes pause requests
- State Preservation: Task state is saved when paused
- Resume Capability: Paused Tasks can be resumed later
Resuming Tasks
Paused Tasks can be resumed:
- State Restoration: Previous state is restored
- Continuation: Execution continues from pause point
- Metadata Preservation: Communication metadata is preserved
Best Practices
Progress Interpretation
- Parent Task progress is aggregated from children
- Weighting factors affect progress calculations
- Sequential Tasks show linear progress
- Parallel Tasks may show non-linear progress as children complete