Skip to main content

Class ColumnInference

Provides methods to infer the data type of columns in a tabular data source.

Namespace: Workspace.XBR.Xperiflow.Etl.Tabular.Utilities

Assembly: Xperiflow.dll

Declaration
public static class ColumnInference

Methods

GetColumnNamesWithTypes(SessionInfo, IDataReader, int, bool, Dictionary<int, ColumnInferenceInfo>?, ColumnInferenceHandler)

Retrieves the column names and their inferred data types from an System.Data.IDataReader.

To get a dictionary of column names mapped to their inferred Type, invoke this method as follows:

Example:

` var columnNamesWithTypes = ColumnInference.GetColumnNamesWithTypes(reader); `
Declaration
public static Dictionary<string, Type> GetColumnNamesWithTypes(SessionInfo sessionInfo, IDataReader reader, int inferRows = 1000, bool headersIncluded = false, Dictionary<int, ColumnInferenceInfo>? columnInferenceInfoDict = null, ColumnInferenceHandler defaultHandler = ColumnInferenceHandler.InferDataType)
Returns

System.Collections.Generic.Dictionary<System.String,System.Type>

Parameters
TypeNameDescription
OneStream.Shared.Common.SessionInfosessionInfo
System.Data.IDataReaderreaderThe System.Data.IDataReader to infer column types from
System.Int32inferRowsThe number of rows to use to infer columns
System.BooleanheadersIncludedA boolean value indicating whether the first row is a header row
System.Collections.Generic.Dictionary<System.Int32,Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceInfo>columnInferenceInfoDictA dictionary of column indexes mapped to [Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceInfo](../Xperiflow.Etl.Tabular.Utilities/ColumnInferenceInfo.md) objects
Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceHandlerdefaultHandlerThe default [Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceHandler](../Xperiflow.Etl.Tabular.Utilities/ColumnInferenceHandler.md) to use

InferColumnTypes(IDataReader, int, bool, Dictionary<int, ColumnInferenceInfo>?, ColumnInferenceHandler)

Infers the column types of data based on a provided IDataReader. NOTE: This will advance the position of IDataReader by n (rowsToRead) rows.

Declaration
public static List<Type> InferColumnTypes(IDataReader reader, int inferRows = 1000, bool headersIncluded = false, Dictionary<int, ColumnInferenceInfo>? columnInferenceInfoDict = null, ColumnInferenceHandler defaultHandler = ColumnInferenceHandler.InferDataType)
Returns

System.Collections.Generic.List<System.Type>

List of Type objects representing the inferred data type of each column

Parameters
TypeNameDescription
System.Data.IDataReaderreaderAn IDataReader
System.Int32inferRowsAn int of number of rows to use to infer columns.
System.BooleanheadersIncludedA boolean of whether headers are included in the data passed in.
System.Collections.Generic.Dictionary<System.Int32,Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceInfo>columnInferenceInfoDictA dictionary of column indexes mapped to ColumnInferenceInfo objects
Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceHandlerdefaultHandlerThe default ColumnInferenceHandler to use

InferColumnTypes(List<List<string>>, int, bool, Dictionary<int, ColumnInferenceInfo>?, ColumnInferenceHandler)

Infers the column data types based on a provided 2D List of strings.

Declaration
public static List<Type> InferColumnTypes(List<List<string>> rows, int inferRows = 1000, bool headersIncluded = false, Dictionary<int, ColumnInferenceInfo>? columnInferenceInfoDict = null, ColumnInferenceHandler defaultHandler = ColumnInferenceHandler.InferDataType)
Returns

System.Collections.Generic.List<System.Type>

A List of Type objects representing the inferred data type of each column

Parameters
TypeNameDescription
System.Collections.Generic.List<System.Collections.Generic.List{System.String}>rowsThe 2D List of strings to infer column types from
System.Int32inferRowsThe number of rows to use to infer columns
System.BooleanheadersIncludedA boolean value indicating whether the first row is a header row
System.Collections.Generic.Dictionary<System.Int32,Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceInfo>columnInferenceInfoDictA dictionary of column indexes mapped to ColumnInferenceInfo objects
Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceHandlerdefaultHandlerThe default [Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceHandler](../Xperiflow.Etl.Tabular.Utilities/ColumnInferenceHandler.md) to use

InferColumnTypes(DataTable, int, bool, Dictionary<int, ColumnInferenceInfo>?, ColumnInferenceHandler)

Infers the column data types based on a provided DataTable.

Declaration
public static List<Type> InferColumnTypes(DataTable data, int inferRows = 1000, bool headersIncluded = false, Dictionary<int, ColumnInferenceInfo>? columnInferenceInfoDict = null, ColumnInferenceHandler defaultHandler = ColumnInferenceHandler.InferDataType)
Returns

System.Collections.Generic.List<System.Type>

A List of Type objects representing the inferred data type of each column

Parameters
TypeNameDescription
System.Data.DataTabledataThe DataTable to infer column types from
System.Int32inferRowsThe number of rows to use to infer columns
System.BooleanheadersIncludedA boolean value indicating whether the first row is a header row
System.Collections.Generic.Dictionary<System.Int32,Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceInfo>columnInferenceInfoDictA dictionary of column indexes mapped to ColumnInferenceInfo objects
Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceHandlerdefaultHandlerThe default [Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.ColumnInferenceHandler](../Xperiflow.Etl.Tabular.Utilities/ColumnInferenceHandler.md) to use

GuessColumn(IEnumerable<string?>, float)

Guesses the data type of a column based on a provided IEnumerable of strings. With a default confidenceThreshold of 1.0, all values in the provided column data must match a single data type, otherwise string will be returned

The method works by looping through a list of IDataTypeChecker objects for each value in the column data, and if a TryParse is successful, the MatchCount property of the [Workspace.XBR.Xperiflow.Etl.Tabular.Utilities.IDataTypeChecker](../Xperiflow.Etl.Tabular.Utilities/IDataTypeChecker.md) object is incremented. Finally the highest ConfidenceLevel is retrieved, and if it's higher than the confidenceThreshold, the DataTypeName is returned, otherwise string is returned.

Declaration
public static Type GuessColumn(IEnumerable<string?> columnData, float confidenceThreshold = 1)
Returns

System.Type

The inferred Type of the column

Parameters
TypeNameDescription
System.Collections.Generic.IEnumerable<System.String>columnDataThe IEnumerable of strings representing the column data to infer the data type from
System.SingleconfidenceThresholdThe confidence threshold to use when inferring the data type

Inherited Members

  • System.Object.Equals(System.Object)
  • System.Object.Equals(System.Object,System.Object)
  • System.Object.GetHashCode
  • System.Object.GetType
  • System.Object.MemberwiseClone
  • System.Object.ReferenceEquals(System.Object,System.Object)
  • System.Object.ToString

Was this page helpful?