Skip to main content

MetaFileSystem

Author: Chris Bahr, Created: 2026-03-31

Summary: The MetaFileSystem is XperiFlow’s unified storage layer for file data. It provides one consistent way to store and retrieve files across multiple logical stores (framework, project, shared, routine, ephemeral) by routing each request to the right store based on the path’s protocol prefix. It supports the usual file system operations and is designed to handle large volumes of data.


Overview

When working with data science workflows, you need reliable file storage that can handle everything from small configuration files to large datasets. The MetaFileSystem solves this by providing a single, consistent interface regardless of where your files are physically stored.

The MetaFileSystem provides:

Feature

Description

Unified Access

One interface to work with files across all storage locations

Protocol-Based Routing

Intuitive path prefixes that route to the correct storage

Implements the fsspec AbstractFileSystem

The MetaFileSystem can be used anywhere an fsspec filesystem is expected to work.

Think of the MetaFileSystem like a smart filing cabinet that knows exactly where everything is stored, even if the actual documents are spread across different rooms in the building.


Why the MetaFileSystem Exists

The Problem

Modern data science and ML workflows generate significant amounts of data: trained models, intermediate datasets, configuration files, artifacts, logs, and user-uploaded files. These files often have different lifecycles, access patterns, and security requirements:

  • Some files are project-specific and should only be accessible within that project's context
  • Some files need to be shared across routines or people within an organization
  • Some files are ephemeral—temporary working data that doesn't need long-term persistence

Without an abstraction layer, you would need to:

  • Manage multiple storage connections manually
  • Handle different authentication mechanisms for each storage backend
  • Manually organize files across disparate systems
  • Write different code for each storage type

The Solution

The MetaFileSystem eliminates this complexity by providing:

Benefit

What It Means

Logical Separation

Different storage contexts (protocols) for different use cases

Consistent API

The same file operations work across all storage areas

Security Isolation

Each storage area can have independent access controls

Path-Based Routing

Simply use a protocol prefix to target the right storage area

Backend Abstraction

The underlying storage technology can change without affecting your workflows


Core Concepts

Physical Storage Architecture

The MetaFileSystem abstracts over cloud-based blob storage (such as Azure Blob Storage). This has several important implications:

Aspect

What This Means

Network Access

File operations involve network calls to cloud storage—they're not free local reads like reading from your laptop's hard drive

Latency Considerations

Reading and writing files has network latency. For performance-critical operations, minimize the number of file operations

No Traditional Folders

Blob storage doesn't have true folders. The MetaFileSystem simulates folder structures using path prefixes in file names

Scalability

Cloud storage scales automatically—you don't need to worry about running out of disk space

Storage Protocols and File Stores

The MetaFileSystem organizes storage into distinct file stores, each identified by a protocol prefix. When you construct a file path, you specify which store to use by prefixing the path with the protocol:

{protocol}://{path/to/file}

Available File Stores

Protocol

Store Type

Purpose

Typical Contents

routine://

Routine Store

Routine execution data and artifacts

Run outputs, model artifacts, execution results

shared://

Shared Store

Cross-routine resources accessible to multiple workflows

Reference data, shared reports, common configurations

framework://

Framework Store

System-wide resources provided by Xperiflow

Templates, default configurations, system libraries

ephemeral://

Ephemeral Store

Temporary storage for intermediate processing

Cache data, working files, intermediate calculations

project-[id]://

Project Store

Project-specific isolated storage

Project configs, analysis results, project datasets

Example Paths

Path

What It Accesses

routine://data/cluster_results.parquet

A data file in routine storage

shared://reports/quarterly_summary.xlsx

A shared report accessible to multiple routines

framework://templates/default_config.json

A system template provided by Xperiflow

project-42://analysis/customer_segments.csv

A file specific to Project 42

Key Insight: You don't need to know the physical storage location. The protocol tells the system where to look, and the MetaFileSystem handles the rest.


Available Storage Types

The MetaFileSystem is designed to handle the diverse file types generated by data science and analytics workflows. Here's what you can store:

File Category

Examples

Typical Extensions

Data Files

Datasets, tables, query results

.parquet

,

.csv

,

.xlsx

,

.json

Model Artifacts

Trained ML models, model weights

.pkl

,

.joblib

,

.onnx

Configuration

Settings, parameters, mappings

.json

,

.yaml

,

.xml

Reports & Outputs

Generated reports, visualizations

.xlsx

,

.pdf

,

.html

Logs & Diagnostics

Execution logs, error traces

.log

,

.txt

Reference Data

Lookup tables, master data

.parquet

,

.csv

Intermediate Results

Temporary processing outputs

.pkl

,

.parquet

Important: The MetaFileSystem is optimized for structured and semi-structured data files typical in analytics workflows. It's not currently intended for storing application binaries, media files, or other non-workflow content.

Common Operations

The MetaFileSystem supports standard operations through a consistent interface:

File Operations

Operation

Description

Example Use

Read

Retrieve file contents

Load a dataset for analysis

Write

Create or update a file

Save model results

List

View directory contents

Browse available files

Delete

Remove a file

Clean up temporary data

Copy

Duplicate a file

Create backup before processing

Info

Get file metadata

Check file size and modification date

Directory Operations

Directories are not explicitly supported in the current Meta File System implementation. Directory operations are implicit, meaning a directory exists only if it contains at least one file. Deleting a directory simply removes all items within it.

Operation

Description

Remove Directory

Delete empty directories

Walk

Recursively traverse directory trees


Signed URLs

For scenarios where direct file access is needed (e.g., downloads in web applications), the MetaFileSystem can generate signed URLs:

Request: Generate temporary access URL
Path: project-42://reports/customer_segments.xlsx

Response: https://storage.azure.com/container/path/file.xlsx?sig=xxxxx&exp=1234567890
          ↑                                                    ↑
          Direct access URL                                    Time-limited signature

Use Cases:

  • Allowing users to download files directly
  • Embedding files in reports or dashboards
  • Sharing files temporarily with external systems

File Attributes

Files can carry custom attributes—key-value pairs that describe the file beyond standard metadata:

File: routine://instances/abc123/runs/run-001/cluster_model.pkl
───────────────────────────────────────────────────────────────

Standard Metadata:
  • Name: cluster_model.pkl
  • Size: 15.2 MB
  • Created: 2024-12-15 09:30:00
  • Modified: 2024-12-15 09:30:00
  • Version: 1

Custom Attributes:
  • model_type: "kmeans"
  • num_custers: 5
  • silhouette_score: 0.72
  • training_date: "2024-12-15"

Attributes enable:

  • Workflow context: Track processing state
  • Integration data: Store external system references

Access Controls

Access controls are currently governed at the protocol level. There are currently no per-file/directory permissions that can be set by user/group. This is a future enhancement for the meta filesystem.


Where the MetaFileSystem is Used

Routine Execution

When routines run, the MetaFileSystem manages all file storage automatically:

loading...

Routine Storage Structure

Within routine storage, files are organized hierarchically:

routine://
└── instances/
    └── {routine_instance_id}/
        ├── shared/                    ← Instance-level shared files
        │   └── {filename}.{ext}         (persists across runs)
        └── runs/
            └── {routine_run_id}/      ← Run-specific data
                └── artifacts/
                    └── {artifact}.{ext}

Storage Scopes Within Routines

Scope

Access

Lifetime

Use Case

Routine Instance

Read-only

Persistent

Access artifacts from previous runs

Shared Instance

Read/Write

Persistent

State shared across runs

Shared Run

Read/Write

Per-run

Files shared within a single run

Fileshare

Read/Write

Global

Cross-routine accessible storage


Common Scenarios

Scenario 1: Accessing Routine Artifacts

After running KMeans clustering, you want to access the results:

  • Artifacts stored at: routine://instances/[instance_id]/runs/[run_id]/artifacts/
  • Access cluster assignments: System retrieves from optimized storage
  • Fast metadata lookup: Know file size and creation time instantly
  • Load data: Actual content retrieved only when needed

Scenario 2: Sharing Data Across Routines

Your ML Classification routine needs reference data from a previous analysis:

  • Store reference data: shared://reference/customer_categories.parquet
  • Multiple routines access: Both KMeans and Classification can read
  • Single source of truth: Updates visible to all consumers
  • No duplication: Data stored once, accessed many times

Scenario 3: Project-Specific Storage

Different projects need isolated storage:

  • Project A files: project-1://analysis/results.parquet
  • Project B files: project-2://analysis/results.parquet
  • Complete isolation: Projects can't accidentally access each other's data
  • Same interface: Use identical code patterns across projects

Best Practices

1. Choose the Right Storage Area

If you need to...

Use this protocol

Store routine-specific outputs

routine://

Share data across routines

shared://

Store temporary/cache files

ephemeral://

Access system templates

framework://

Store project-specific data

project-[id]://

2. Organize Files Logically

Good Structure:
routine://instances/[id]/runs/[run_id]/
├── artifacts/
│   ├── cluster_assignments.parquet
│   └── clustering_metrics.json
└── logs/
    └── execution.log

Avoid:
routine://
├── file1.csv
├── file2.csv
├── model.pkl
├── output.xlsx
└── temp.json

3. Consider the Cost of Reading and Writing the same files

Due to the distributed network storage of the metafilesystem, each read is not as fast or “free” as a local filesystem read. This means that every call has to go over a network to retrieve or send results. Consider this when interacting with the meta filesystem to reduce unnecessary reads/writes.

  • Batch operations when possible to reduce network round-trips
  • Be mindful of file sizes—large files take longer to transfer

4. Use Attributes for Context

Instead of encoding information in file names:

❌ model_kmeans_k5_score72_2024-12-15.pkl

Use attributes:

✅ cluster_model.pkl
   Attributes: {model_type: "kmeans", k: 5, score: 0.72, date: "2024-12-15"}

Troubleshooting

File Not Found

Symptom: Attempting to access a file returns "File not found"

Possible Causes:

  • Incorrect protocol prefix
  • File path typo
  • File was deleted or moved
  • Accessing a project file without proper project context

Resolution: Verify the complete path including protocol prefix

This file could not be deleted. Deleting a system-generated file is not permitted.

Symptom: Attempting to delete a file that was generated by the application rather than a user.

Possible Causes:

  • Accessing unintended data
  • Attempting to delete something to clear up space that isn’t allowed
  • Misinterpreting when something is system generated versus externally generated

Resolution: There isn’t a workaround for this as it is the intended functionality of the application. If you are 100 percent confident that the file is externally generated, then report the bug accordingly.


Summary

The MetaFileSystem is the intelligent storage foundation of Xperiflow:

Feature

What It Provides

Protocol-based routing

Simple, intuitive file access

Multiple storage areas

Organized, purpose-driven storage

Signed URLs

Secure external access

Custom attributes

Rich file metadata

Backend abstraction

Optimized storage without complexity

By abstracting storage complexity, the MetaFileSystem lets you focus on your data science workflows while ensuring your files are safely stored, easily accessible, and properly organized.

Was this page helpful?