NaturalLanguageTablesQuery

Versions

1.0.0

v1.0.0

Basic Information

Class Name: NaturalLanguageTablesQuery

Title: Natural Language Tables Query

Version: 1.0.0

Author: Drew Shea

Organization: OneStream

Creation Date: 2024-12-24

Default Routine Memory Capacity: 2.0 GB

Description

Short Description

Ask data related questions to tabular datasets and receive answers as natural language, charts, and tables.

Long Description

This routine allows users to pose natural language questions to various tabular data sources and receive intelligible, context-rich responses that may include numerical insights, full dataframes, or plotted visual representations. By leveraging large language models, the system interprets complex queries in plain English (or another language), identifies relevant data sources, and returns interpretable outputs. The routine is designed to serve a range of business scenarios, mostly oriented around adhoc data exploration. This approach transcends standard SQL queries by making analytics more intuitive. Overall, it is intended to make data-driven decisions more readily accessible to consultants, analysts, and business stakeholders without requiring them to deeply understand the technical underpinnings of data manipulation (like SQL). Such democratized analytics empowers teams to drill down into information faster, collaborate more effectively, and reduce the overall friction in business intelligence tasks. From visualizations to numeric summaries, this routine opens up new possibilities for interactive data exploration.

Use Cases

1. Adhoc Data Analysis and Exploration

When business users or consultants need to perform spontaneous or on-the-fly investigations into their organization’s data, they often lack the time or resources to navigate complex database systems or coding environments. This use case addresses the need for immediate insights by allowing users to phrase their questions in natural language and retrieve results from relevant tables or files. For example, someone preparing for a client meeting might urgently want to understand sales trends, discover hidden correlations, or validate operational metrics. By leveraging this routine, users can ask direct questions like 'What were the top-selling products last quarter?' and receive comprehensible responses—potentially visualizations or tables—without wrestling with SQL queries or data preparation. This ad hoc exploration capability not only speeds up the analysis process but also fosters a more intuitive relationship with data. The solution thus helps organizations reduce bottlenecks in business intelligence processes, freeing up both technical and non-technical team members to focus on strategic decision-making rather than technical complexities.

2. Adhoc Data Manipulations

Sometimes, business processes demand quick changes to datasets that do not necessarily require building out full ETL (Extract, Transform, Load) pipelines or orchestrating a series of transformations through specialized tools. This use case caters to quick manipulations, such as filtering datasets based on dynamic criteria, computing group totals, merging separate data sources for a combined view, or even creating new derived columns on the fly. Here, the routine empowers consultants and analysts to modify the underlying data using straightforward language, such as 'Filter out all records before 2022 and show me the sum of monthly revenues', resulting in timely, actionable outputs. This streamlined approach encourages users to experiment with their data without the overhead of extensive coding or back-and-forth requests to data engineering teams. By making data manipulations readily accessible, the solution plays a pivotal role in improving agility, supporting rapid prototyping, all while minimizing disruptions to ongoing operational data systems.

Routine Methods

1. Analyze (Method)

Method: analyze
- Type: Method
- Memory Capacity: 2.0 GB
- Allow In-Memory Execution: No
- Read Only: No
- Method Limits: This method has been tested on various time series datasets at 40 GB memory capacity. With a dataset containing 10K targets and 1.1M rows, this method completed in 1 minute. With a dataset containing 25K targets and 7.4M rows, this method completed in 5 minutes. With a dataset containing 27.5K targets and 20.1M rows, this method completed in 10 minutes. With a dataset containing 5K targets, this method completed in 7 minutes.
- Outputs Dynamic Artifacts: Yes
- Short Description:
  - Ask a natural language question.
- Detailed Description:
  - Ask a natural language question to data sources and receive an answer as a number, dataframe, plot, or text. The question may relate to the data in the source tables or the source tables themselves.
- Inputs:
  - Required Input
    - Data Definition: The connection type to use to access the source data.
      - Name: data_connection
      - Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
      - Type: Must be an instance of Natural Language Tabular Connection
      - Nested Model: Natural Language Tabular Connection
        
        Required Input
        
        Connection: The connection type to use to access the source data.
        
        Name: tabular_connection
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: Must be one of the following
        
        SQL Server Connection
        
        Required Input
        
        Database Resource: The name of the database resource to connect to.
        
        Name: database_resource
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Database Name: The name of the database to connect to.
        
        Name: database_name
        
        Tooltip:
        
        Detail:
        
        Note: If you don’t see the database name that you are looking for in this list, it is recommended that you first move the data to be used within a database that is available within this list.
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        Table Names: The names of the tables to use.
        
        Name: table_names
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: list[str]
        
        MetaFileSystem Connection
        
        Required Input
        
        Connection Key: The MetaFileSystem connection key.
        
        Name: connection_key
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: MetaFileSystemConnectionKey
        
        Root Directory: The root directory to search for files within.
        
        Name: root_dir
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: str
        
        File Paths: The full file path to the file to ingest.
        
        Name: file_paths
        
        Tooltip:
        
        Validation Constraints:
        
        This input may be subject to other validation constraints at runtime.
        
        Type: list[str]
    - Natural Language Query: The natural language query to execute on the data.
      - Name: query
      - Tooltip:
        
        Validation Constraints:
        
        The input must have a minimum length of 1.
        
        The input must have a maximum length of 500.
        
        This input may be subject to other validation constraints at runtime.
      - Type: str
- Artifacts:
  - Response Metadata: Describes basic information about the query and response.
    - Qualified Key Annotation: metadata
    - Aggregate Artifact: False
    - In-Memory Json Accessible: True
    - File Annotations:
      - artifacts_/@metadata/data_/data.json
        
        Stored json data.
      - artifacts_/@metadata/data_/schema.json
        
        The json schema of the json object stored in the 'data.json' file
  - Dynamic Artifacts Metadata: Contains metadata for the dynamic artifacts that are generated at runtime for this method.
    - Qualified Key Annotation: dynamic_artifacts_metadata
    - Aggregate Artifact: False
    - In-Memory Json Accessible: True
    - File Annotations:
      - artifacts_/@dynamic_artifacts_metadata/data_/data.json
        
        Stored json data.
      - artifacts_/@dynamic_artifacts_metadata/data_/schema.json
        
        The json schema of the json object stored in the 'data.json' file

Interface Definitions

No interface definitions found for this routine

Developer Docs

Routine Typename: NaturalLanguageTablesQuery

Method Name	Artifact Keys
`analyze`	metadata, dynamic_artifacts_metadata

Versions​

v1.0.0​

Basic Information​

Tags​

Description​

Short Description​

Long Description​

Use Cases​

1. Adhoc Data Analysis and Exploration​

2. Adhoc Data Manipulations​

Routine Methods​

1. Analyze (Method)​

Interface Definitions​

Developer Docs​