MetaFileSystem

Author: Chris Bahr, Created: 2026-03-31

Summary: The MetaFileSystem is XperiFlow’s unified storage layer for file data. It provides one consistent way to store and retrieve files across multiple logical stores (framework, project, shared, routine, ephemeral) by routing each request to the right store based on the path’s protocol prefix. It supports the usual file system operations and is designed to handle large volumes of data.

Overview

When working with data science workflows, you need reliable file storage that can handle everything from small configuration files to large datasets. The MetaFileSystem solves this by providing a single, consistent interface regardless of where your files are physically stored.

The MetaFileSystem provides:

Feature	Description
Unified Access	One interface to work with files across all storage locations
Protocol-Based Routing	Intuitive path prefixes that route to the correct storage
Implements the fsspec AbstractFileSystem	The MetaFileSystem can be used anywhere an fsspec filesystem is expected to work.

Feature

Description

Unified Access

One interface to work with files across all storage locations

Protocol-Based Routing

Intuitive path prefixes that route to the correct storage

Implements the fsspec AbstractFileSystem

The MetaFileSystem can be used anywhere an fsspec filesystem is expected to work.

Think of the MetaFileSystem like a smart filing cabinet that knows exactly where everything is stored, even if the actual documents are spread across different rooms in the building.

Why the MetaFileSystem Exists

The Problem

Modern data science and ML workflows generate significant amounts of data: trained models, intermediate datasets, configuration files, artifacts, logs, and user-uploaded files. These files often have different lifecycles, access patterns, and security requirements:

Some files are project-specific and should only be accessible within that project's context
Some files need to be shared across routines or people within an organization
Some files are ephemeral—temporary working data that doesn't need long-term persistence

Without an abstraction layer, you would need to:

Manage multiple storage connections manually
Handle different authentication mechanisms for each storage backend
Manually organize files across disparate systems
Write different code for each storage type

The Solution

The MetaFileSystem eliminates this complexity by providing:

Benefit	What It Means
Logical Separation	Different storage contexts (protocols) for different use cases
Consistent API	The same file operations work across all storage areas
Security Isolation	Each storage area can have independent access controls
Path-Based Routing	Simply use a protocol prefix to target the right storage area
Backend Abstraction	The underlying storage technology can change without affecting your workflows

Benefit

What It Means

Logical Separation

Different storage contexts (protocols) for different use cases

Consistent API

The same file operations work across all storage areas

Security Isolation

Each storage area can have independent access controls

Path-Based Routing

Simply use a protocol prefix to target the right storage area

Backend Abstraction

The underlying storage technology can change without affecting your workflows

Core Concepts

Physical Storage Architecture

The MetaFileSystem abstracts over cloud-based blob storage (such as Azure Blob Storage). This has several important implications:

Aspect	What This Means
Network Access	File operations involve network calls to cloud storage—they're not free local reads like reading from your laptop's hard drive
Latency Considerations	Reading and writing files has network latency. For performance-critical operations, minimize the number of file operations
No Traditional Folders	Blob storage doesn't have true folders. The MetaFileSystem simulates folder structures using path prefixes in file names
Scalability	Cloud storage scales automatically—you don't need to worry about running out of disk space

Aspect

What This Means

Network Access

File operations involve network calls to cloud storage—they're not free local reads like reading from your laptop's hard drive

Latency Considerations

Reading and writing files has network latency. For performance-critical operations, minimize the number of file operations

No Traditional Folders

Blob storage doesn't have true folders. The MetaFileSystem simulates folder structures using path prefixes in file names

Scalability

Cloud storage scales automatically—you don't need to worry about running out of disk space

Storage Protocols and File Stores

The MetaFileSystem organizes storage into distinct file stores, each identified by a protocol prefix. When you construct a file path, you specify which store to use by prefixing the path with the protocol:

{protocol}://{path/to/file}

Available File Stores

Protocol	Store Type	Purpose	Typical Contents
`routine://`	Routine Store	Routine execution data and artifacts	Run outputs, model artifacts, execution results
`shared://`	Shared Store	Cross-routine resources accessible to multiple workflows	Reference data, shared reports, common configurations
`framework://`	Framework Store	System-wide resources provided by Xperiflow	Templates, default configurations, system libraries
`ephemeral://`	Ephemeral Store	Temporary storage for intermediate processing	Cache data, working files, intermediate calculations
`project-[id]://`	Project Store	Project-specific isolated storage	Project configs, analysis results, project datasets

Protocol

Store Type

Purpose

Typical Contents

routine://

Routine Store

Routine execution data and artifacts

Run outputs, model artifacts, execution results

shared://

Shared Store

Cross-routine resources accessible to multiple workflows

Reference data, shared reports, common configurations

framework://

Framework Store

System-wide resources provided by Xperiflow

Templates, default configurations, system libraries

ephemeral://

Ephemeral Store

Temporary storage for intermediate processing

Cache data, working files, intermediate calculations

project-[id]://

Project Store

Project-specific isolated storage

Project configs, analysis results, project datasets

Example Paths

Path	What It Accesses
`routine://data/cluster_results.parquet`	A data file in routine storage
`shared://reports/quarterly_summary.xlsx`	A shared report accessible to multiple routines
`framework://templates/default_config.json`	A system template provided by Xperiflow
`project-42://analysis/customer_segments.csv`	A file specific to Project 42

Path

What It Accesses

routine://data/cluster_results.parquet

A data file in routine storage

shared://reports/quarterly_summary.xlsx

A shared report accessible to multiple routines

framework://templates/default_config.json

A system template provided by Xperiflow

project-42://analysis/customer_segments.csv

A file specific to Project 42

Key Insight: You don't need to know the physical storage location. The protocol tells the system where to look, and the MetaFileSystem handles the rest.

Available Storage Types

The MetaFileSystem is designed to handle the diverse file types generated by data science and analytics workflows. Here's what you can store:

File Category	Examples	Typical Extensions
Data Files	Datasets, tables, query results	`.parquet` , `.csv` , `.xlsx` , `.json`
Model Artifacts	Trained ML models, model weights	`.pkl` , `.joblib` , `.onnx`
Configuration	Settings, parameters, mappings	`.json` , `.yaml` , `.xml`
Reports & Outputs	Generated reports, visualizations	`.xlsx` , `.pdf` , `.html`
Logs & Diagnostics	Execution logs, error traces	`.log` , `.txt`
Reference Data	Lookup tables, master data	`.parquet` , `.csv`
Intermediate Results	Temporary processing outputs	`.pkl` , `.parquet`

File Category

Examples

Typical Extensions

Data Files

Datasets, tables, query results

.parquet

.csv

.xlsx

.json

Model Artifacts

Trained ML models, model weights

.pkl

.joblib

.onnx

Configuration

Settings, parameters, mappings

.json

.yaml

.xml

Reports & Outputs

Generated reports, visualizations

.xlsx

.pdf

.html

Logs & Diagnostics

Execution logs, error traces

.log

.txt

Reference Data

Lookup tables, master data

.parquet

.csv

Intermediate Results

Temporary processing outputs

.pkl

.parquet

Important: The MetaFileSystem is optimized for structured and semi-structured data files typical in analytics workflows. It's not currently intended for storing application binaries, media files, or other non-workflow content.

Common Operations

The MetaFileSystem supports standard operations through a consistent interface:

File Operations

Operation	Description	Example Use
Read	Retrieve file contents	Load a dataset for analysis
Write	Create or update a file	Save model results
List	View directory contents	Browse available files
Delete	Remove a file	Clean up temporary data
Copy	Duplicate a file	Create backup before processing
Info	Get file metadata	Check file size and modification date

Operation

Description

Example Use

Read

Retrieve file contents

Load a dataset for analysis

Write

Create or update a file

Save model results

List

View directory contents

Browse available files

Delete

Remove a file

Clean up temporary data

Copy

Duplicate a file

Create backup before processing

Info

Get file metadata

Check file size and modification date

Directory Operations

Directories are not explicitly supported in the current Meta File System implementation. Directory operations are implicit, meaning a directory exists only if it contains at least one file. Deleting a directory simply removes all items within it.

Operation	Description
Remove Directory	Delete empty directories
Walk	Recursively traverse directory trees

Operation

Description

Remove Directory

Delete empty directories

Walk

Recursively traverse directory trees

Signed URLs

For scenarios where direct file access is needed (e.g., downloads in web applications), the MetaFileSystem can generate signed URLs:

Request: Generate temporary access URL
Path: project-42://reports/customer_segments.xlsx

Response: https://storage.azure.com/container/path/file.xlsx?sig=xxxxx&exp=1234567890
          ↑                                                    ↑
          Direct access URL                                    Time-limited signature

Use Cases:

Allowing users to download files directly
Embedding files in reports or dashboards
Sharing files temporarily with external systems

File Attributes

Files can carry custom attributes—key-value pairs that describe the file beyond standard metadata:

File: routine://instances/abc123/runs/run-001/cluster_model.pkl
───────────────────────────────────────────────────────────────

Standard Metadata:
  • Name: cluster_model.pkl
  • Size: 15.2 MB
  • Created: 2024-12-15 09:30:00
  • Modified: 2024-12-15 09:30:00
  • Version: 1

Custom Attributes:
  • model_type: "kmeans"
  • num_custers: 5
  • silhouette_score: 0.72
  • training_date: "2024-12-15"

Attributes enable:

Workflow context: Track processing state
Integration data: Store external system references

Access Controls

Access controls are currently governed at the protocol level. There are currently no per-file/directory permissions that can be set by user/group. This is a future enhancement for the meta filesystem.

Where the MetaFileSystem is Used

Routine Execution

When routines run, the MetaFileSystem manages all file storage automatically:

Routine Storage Structure

Within routine storage, files are organized hierarchically:

routine://
└── instances/
    └── {routine_instance_id}/
        ├── shared/                    ← Instance-level shared files
        │   └── {filename}.{ext}         (persists across runs)
        └── runs/
            └── {routine_run_id}/      ← Run-specific data
                └── artifacts/
                    └── {artifact}.{ext}

Storage Scopes Within Routines

Scope	Access	Lifetime	Use Case
Routine Instance	Read-only	Persistent	Access artifacts from previous runs
Shared Instance	Read/Write	Persistent	State shared across runs
Shared Run	Read/Write	Per-run	Files shared within a single run
Fileshare	Read/Write	Global	Cross-routine accessible storage

Scope

Access

Lifetime

Use Case

Routine Instance

Read-only

Persistent

Access artifacts from previous runs

Shared Instance

Read/Write

Persistent

State shared across runs

Shared Run

Read/Write

Per-run

Files shared within a single run

Fileshare

Read/Write

Global

Cross-routine accessible storage

Common Scenarios

Scenario 1: Accessing Routine Artifacts

After running KMeans clustering, you want to access the results:

Artifacts stored at: routine://instances/[instance_id]/runs/[run_id]/artifacts/
Access cluster assignments: System retrieves from optimized storage
Fast metadata lookup: Know file size and creation time instantly
Load data: Actual content retrieved only when needed

Your ML Classification routine needs reference data from a previous analysis:

Store reference data: shared://reference/customer_categories.parquet
Multiple routines access: Both KMeans and Classification can read
Single source of truth: Updates visible to all consumers
No duplication: Data stored once, accessed many times

Scenario 3: Project-Specific Storage

Different projects need isolated storage:

Project A files: project-1://analysis/results.parquet
Project B files: project-2://analysis/results.parquet
Complete isolation: Projects can't accidentally access each other's data
Same interface: Use identical code patterns across projects

Best Practices

1. Choose the Right Storage Area

If you need to...	Use this protocol
Store routine-specific outputs	`routine://`
Share data across routines	`shared://`
Store temporary/cache files	`ephemeral://`
Access system templates	`framework://`
Store project-specific data	`project-[id]://`

If you need to...

Use this protocol

Store routine-specific outputs

routine://

Share data across routines

shared://

Store temporary/cache files

ephemeral://

Access system templates

framework://

Store project-specific data

project-[id]://

2. Organize Files Logically

Good Structure:
routine://instances/[id]/runs/[run_id]/
├── artifacts/
│   ├── cluster_assignments.parquet
│   └── clustering_metrics.json
└── logs/
    └── execution.log

Avoid:
routine://
├── file1.csv
├── file2.csv
├── model.pkl
├── output.xlsx
└── temp.json

3. Consider the Cost of Reading and Writing the same files

Due to the distributed network storage of the metafilesystem, each read is not as fast or “free” as a local filesystem read. This means that every call has to go over a network to retrieve or send results. Consider this when interacting with the meta filesystem to reduce unnecessary reads/writes.

Batch operations when possible to reduce network round-trips
Be mindful of file sizes—large files take longer to transfer

4. Use Attributes for Context

Instead of encoding information in file names:

❌ model_kmeans_k5_score72_2024-12-15.pkl

Use attributes:

✅ cluster_model.pkl
   Attributes: {model_type: "kmeans", k: 5, score: 0.72, date: "2024-12-15"}

Troubleshooting

File Not Found

Symptom: Attempting to access a file returns "File not found"

Possible Causes:

Incorrect protocol prefix
File path typo
File was deleted or moved
Accessing a project file without proper project context

Resolution: Verify the complete path including protocol prefix

This file could not be deleted. Deleting a system-generated file is not permitted.

Symptom: Attempting to delete a file that was generated by the application rather than a user.

Possible Causes:

Accessing unintended data
Attempting to delete something to clear up space that isn’t allowed
Misinterpreting when something is system generated versus externally generated

Resolution: There isn’t a workaround for this as it is the intended functionality of the application. If you are 100 percent confident that the file is externally generated, then report the bug accordingly.

Summary

The MetaFileSystem is the intelligent storage foundation of Xperiflow:

Feature	What It Provides
Protocol-based routing	Simple, intuitive file access
Multiple storage areas	Organized, purpose-driven storage
Signed URLs	Secure external access
Custom attributes	Rich file metadata
Backend abstraction	Optimized storage without complexity

Feature

What It Provides

Protocol-based routing

Simple, intuitive file access

Multiple storage areas

Organized, purpose-driven storage

Signed URLs

Secure external access

Custom attributes

Rich file metadata

Backend abstraction

Optimized storage without complexity

By abstracting storage complexity, the MetaFileSystem lets you focus on your data science workflows while ensuring your files are safely stored, easily accessible, and properly organized.

Overview​

Why the MetaFileSystem Exists​

The Problem​

The Solution​

Core Concepts​

Physical Storage Architecture​

Storage Protocols and File Stores​

Available File Stores​

Example Paths​

Available Storage Types​

Common Operations​

File Operations​

Directory Operations​

Signed URLs​

File Attributes​

Access Controls​

Where the MetaFileSystem is Used​

Routine Execution​

Routine Storage Structure​

Storage Scopes Within Routines​

Common Scenarios​

Scenario 1: Accessing Routine Artifacts​

Scenario 2: Sharing Data Across Routines​

Scenario 3: Project-Specific Storage​

Best Practices​

1. Choose the Right Storage Area​

2. Organize Files Logically​

3. Consider the Cost of Reading and Writing the same files​

4. Use Attributes for Context​

Troubleshooting​

File Not Found​

This file could not be deleted. Deleting a system-generated file is not permitted.​

Summary​

Overview

Why the MetaFileSystem Exists

The Problem

The Solution

Core Concepts

Physical Storage Architecture

Storage Protocols and File Stores

Available File Stores

Example Paths

Available Storage Types

Common Operations

File Operations

Directory Operations

Signed URLs

File Attributes

Access Controls

Where the MetaFileSystem is Used

Routine Execution

Routine Storage Structure

Storage Scopes Within Routines

Common Scenarios

Scenario 1: Accessing Routine Artifacts

Scenario 2: Sharing Data Across Routines

Scenario 3: Project-Specific Storage

Best Practices

1. Choose the Right Storage Area

2. Organize Files Logically

3. Consider the Cost of Reading and Writing the same files

4. Use Attributes for Context

Troubleshooting

File Not Found

This file could not be deleted. Deleting a system-generated file is not permitted.

Summary