Skip to main content

Interface IExtractor<T>

Defines the contract for extracting data from various sources in ETL (Extract, Transform, Load) operations.

Namespace: Workspace.XBR.Xperiflow.Etl

Assembly: Xperiflow.dll

Declaration
public interface IExtractor<T>

Examples

Database extractor implementation:

public class SqlDataTableExtractor : IExtractor
{
private readonly string _connectionString;
private readonly string _query;

public SqlDataTableExtractor(string connectionString, string query)
{
_connectionString = connectionString;
_query = query;
}

public DataTable Extract()
{
using var connection = new SqlConnection(_connectionString);
using var adapter = new SqlDataAdapter(_query, connection);
var dataTable = new DataTable();
adapter.Fill(dataTable);
return dataTable;
}
}

// Usage
var extractor = new SqlDataTableExtractor(connectionString, "SELECT * FROM Sales");
var data = extractor.Extract();

JSON file extractor implementation:

public class JsonFileExtractor : IExtractor
{
private readonly string _filePath;

public JsonFileExtractor(string filePath)
{
_filePath = filePath;
}

public T Extract()
{
var json = File.ReadAllText(_filePath);
return JsonConvert.DeserializeObject(json);
}
}

// Usage
var extractor = new JsonFileExtractor>("customers.json");
var customers = extractor.Extract();

Remarks

The Workspace.XBR.Xperiflow.Etl.IExtractor%601 interface represents the "Extract" phase of ETL operations, providing a standardized way to retrieve data from various sources such as databases, files, APIs, or other data repositories. This interface is designed to be flexible and support a wide range of data extraction scenarios.

Key Design Principles:

  • Type SafetyGeneric type parameter ensures compile-time type safety for extracted data

  • Source AgnosticAbstracts the data source, allowing implementations for databases, files, APIs, etc.

  • ComposabilityCan be easily combined with other ETL components for complex data pipelines

  • TestabilityInterface-based design enables easy mocking and unit testing

Common Implementation Scenarios:

  • Database query extractors that return DataTable, DataSet, or custom objects

  • File readers for CSV, JSON, XML, or other structured formats

  • Web API clients that retrieve data from REST endpoints

  • Stream processors for real-time data extraction

  • Configuration or metadata extractors

Error Handling:

Implementations should handle source-specific errors appropriately, such as connection failures, authentication issues, or data format problems. Consider wrapping exceptions in domain-specific exception types for better error handling in ETL pipelines.

Methods

Extract()

Extracts data from the configured source and returns it as the specified type.

Declaration
T Extract()
Remarks

This method performs the actual data extraction operation from the configured source. The implementation should handle all aspects of connecting to the source, retrieving the data, and converting it to the specified type T.

Implementation Guidelines:

Performance Considerations:

For large datasets, consider implementing streaming or paging mechanisms to avoid memory issues. The choice of return type T should balance between ease of use and memory efficiency.

Returns

<T>

An object of type T containing the extracted data from the source

Exceptions

System.InvalidOperationException Thrown when the extraction operation fails due to source connectivity issues, data format problems, or other operational errors System.UnauthorizedAccessException Thrown when the extractor lacks necessary permissions to access the data source

Was this page helpful?