Class ColumnInference
Provides methods to infer the data type of columns in a tabular data source.
Namespace: Workspace.XBR.Xperiflow.Etl.Tabular.Utilities
Assembly: Xperiflow.dll
public static class ColumnInference
Methods
GetColumnNamesWithTypes(SessionInfo, IDataReader, int, bool, Dictionary<int, ColumnInferenceInfo>?, ColumnInferenceHandler)
Retrieves the column names and their inferred data types from an System.Data.IDataReader
.
To get a dictionary of column names mapped to their inferred Type
, invoke this method as follows:
Example:
` var columnNamesWithTypes = ColumnInference.GetColumnNamesWithTypes(reader); `
public static Dictionary<string, Type> GetColumnNamesWithTypes(SessionInfo sessionInfo, IDataReader reader, int inferRows = 1000, bool headersIncluded = false, Dictionary<int, ColumnInferenceInfo>? columnInferenceInfoDict = null, ColumnInferenceHandler defaultHandler = ColumnInferenceHandler.InferDataType)
Returns
System.Collections.Generic.Dictionary<System.String,System.Type>
Parameters
Type | Name | Description |
---|---|---|
OneStream.Shared.Common.SessionInfo | sessionInfo | |
System.Data.IDataReader | reader | The System.Data.IDataReader to infer column types from |
System.Int32 | inferRows | The number of rows to use to infer columns |
System.Boolean | headersIncluded | A boolean value indicating whether the first row is a header row |
System.Collections.Generic.Dictionary<System.Int32,Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceInfo> | columnInferenceInfoDict | A dictionary of column indexes mapped to [Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceInfo](../Xperiflow.Etl.Tabular.Utilities/ColumnInferenceInfo.md) objects |
Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceHandler | defaultHandler | The default [Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceHandler](../Xperiflow.Etl.Tabular.Utilities/ColumnInferenceHandler.md) to use |
InferColumnTypes(IDataReader, int, bool, Dictionary<int, ColumnInferenceInfo>?, ColumnInferenceHandler)
Infers the column types of data based on a provided IDataReader. NOTE: This will advance the position of IDataReader by n (rowsToRead) rows.
public static List<Type> InferColumnTypes(IDataReader reader, int inferRows = 1000, bool headersIncluded = false, Dictionary<int, ColumnInferenceInfo>? columnInferenceInfoDict = null, ColumnInferenceHandler defaultHandler = ColumnInferenceHandler.InferDataType)
Returns
System.Collections.Generic.List<System.Type>
List of Type objects representing the inferred data type of each column
Parameters
Type | Name | Description |
---|---|---|
System.Data.IDataReader | reader | An IDataReader |
System.Int32 | inferRows | An int of number of rows to use to infer columns. |
System.Boolean | headersIncluded | A boolean of whether headers are included in the data passed in. |
System.Collections.Generic.Dictionary<System.Int32,Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceInfo> | columnInferenceInfoDict | A dictionary of column indexes mapped to ColumnInferenceInfo objects |
Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceHandler | defaultHandler | The default ColumnInferenceHandler to use |
InferColumnTypes(List<List<string>>, int, bool, Dictionary<int, ColumnInferenceInfo>?, ColumnInferenceHandler)
Infers the column data types based on a provided 2D List
of strings.
public static List<Type> InferColumnTypes(List<List<string>> rows, int inferRows = 1000, bool headersIncluded = false, Dictionary<int, ColumnInferenceInfo>? columnInferenceInfoDict = null, ColumnInferenceHandler defaultHandler = ColumnInferenceHandler.InferDataType)
Returns
System.Collections.Generic.List<System.Type>
A List
of Type
objects representing the inferred data type of each column
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.List<System.Collections.Generic.List{System.String}> | rows | The 2D List of strings to infer column types from |
System.Int32 | inferRows | The number of rows to use to infer columns |
System.Boolean | headersIncluded | A boolean value indicating whether the first row is a header row |
System.Collections.Generic.Dictionary<System.Int32,Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceInfo> | columnInferenceInfoDict | A dictionary of column indexes mapped to ColumnInferenceInfo objects |
Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceHandler | defaultHandler | The default [Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceHandler](../Xperiflow.Etl.Tabular.Utilities/ColumnInferenceHandler.md) to use |
InferColumnTypes(DataTable, int, bool, Dictionary<int, ColumnInferenceInfo>?, ColumnInferenceHandler)
Infers the column data types based on a provided DataTable
.
public static List<Type> InferColumnTypes(DataTable data, int inferRows = 1000, bool headersIncluded = false, Dictionary<int, ColumnInferenceInfo>? columnInferenceInfoDict = null, ColumnInferenceHandler defaultHandler = ColumnInferenceHandler.InferDataType)
Returns
System.Collections.Generic.List<System.Type>
A List
of Type
objects representing the inferred data type of each column
Parameters
Type | Name | Description |
---|---|---|
System.Data.DataTable | data | The DataTable to infer column types from |
System.Int32 | inferRows | The number of rows to use to infer columns |
System.Boolean | headersIncluded | A boolean value indicating whether the first row is a header row |
System.Collections.Generic.Dictionary<System.Int32,Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceInfo> | columnInferenceInfoDict | A dictionary of column indexes mapped to ColumnInferenceInfo objects |
Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceHandler | defaultHandler | The default [Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceHandler](../Xperiflow.Etl.Tabular.Utilities/ColumnInferenceHandler.md) to use |
GuessColumn(IEnumerable<string?>, float)
Guesses the data type of a column based on a provided IEnumerable
of strings.
With a default confidenceThreshold of 1.0, all values in the provided column data must match a single data type,
otherwise string
will be returned
The method works by looping through a list of IDataTypeChecker
objects for each value in the column data,
and if a TryParse is successful, the MatchCount
property of the [Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.IDataTypeChecker](../Xperiflow.Etl.Tabular.Utilities/IDataTypeChecker.md)
object is incremented.
Finally the highest ConfidenceLevel
is retrieved, and if it's higher than the confidenceThreshold, the DataTypeName
is returned, otherwise string
is returned.
public static Type GuessColumn(IEnumerable<string?> columnData, float confidenceThreshold = 1)
Returns
System.Type
The inferred Type
of the column
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.String> | columnData | The IEnumerable of strings representing the column data to infer the data type from |
System.Single | confidenceThreshold | The confidence threshold to use when inferring the data type |
Inherited Members
System.Object.Equals(System.Object)
System.Object.Equals(System.Object,System.Object)
System.Object.GetHashCode
System.Object.GetType
System.Object.MemberwiseClone
System.Object.ReferenceEquals(System.Object,System.Object)
System.Object.ToString