QueryGPT – Natural Language to SQL Using Generative AI

Jeffrey Johnson, Callie Busch, Abhi Khune, Pradeep Chakka

Uber

•

Jeffrey Johnson, Callie Busch, Abhi Khune, Pradeep Chakka

•14 min read•intermediate•

--

•View Original

Generative AIGPTGPT-4SQL

Overview

The article discusses QueryGPT, a tool developed by Uber that converts natural language prompts into SQL queries using generative AI. It highlights the productivity gains achieved by automating SQL query generation, the architecture of QueryGPT, and the challenges encountered during its development.

What You'll Learn

1

How to generate SQL queries from natural language prompts using QueryGPT

2

Why automating SQL query generation can enhance productivity in data operations

3

How to implement an intent agent to classify user prompts for better query generation

Prerequisites & Requirements

Understanding of SQL and data querying concepts
Familiarity with generative AI and LLMs(optional)

Key Questions Answered

How does QueryGPT improve SQL query generation at Uber?

QueryGPT automates the process of generating SQL queries from natural language prompts, significantly reducing the time taken from an average of 10 minutes to about 3 minutes. This automation enhances productivity for users who need to access and manipulate large datasets.

What challenges were faced during the development of QueryGPT?

The development faced challenges such as handling large schemas that could exceed token limits, ensuring accuracy in generated queries, and addressing hallucinations where the model produced incorrect or non-existent table references. These issues required iterative improvements and the introduction of specialized agents.

What is the architecture of QueryGPT?

QueryGPT's architecture evolved through multiple iterations, starting from a simple retrieval-augmented generation (RAG) model to a more complex system that includes intent agents, table agents, and column pruning agents. This evolution aimed to enhance the accuracy and efficiency of SQL query generation.

How does QueryGPT handle user prompts?

QueryGPT processes user prompts through an intent agent that classifies the question into relevant business domains, allowing for more accurate schema and SQL sample retrieval. This classification improves the relevance of generated queries.

Key Statistics & Figures

Monthly interactive queries handled

1.2 million

This statistic highlights the scale at which Uber's data platform operates, emphasizing the need for efficient query generation tools.

Percentage of queries generated by Operations

36%

This indicates that a significant portion of the queries processed come from the Operations organization, showcasing the tool's impact on a major user group.

Average time saved per query with QueryGPT

7 minutes

By reducing query authoring time from 10 minutes to 3 minutes, QueryGPT offers substantial productivity gains.

Technologies & Tools

Backend

Generative AI

Used to convert natural language prompts into SQL queries.

Backend

Large Language Models (llm)

Core technology behind QueryGPT for understanding and processing user inputs.

Backend

Vector Databases

Facilitates similarity searches to retrieve relevant SQL samples and schemas.

Key Actionable Insights

1
Leverage QueryGPT to automate SQL query generation for faster data access.
By using QueryGPT, teams can significantly reduce the time spent on crafting SQL queries, allowing them to focus on analysis and decision-making rather than query writing.

2
Implement intent classification to improve query accuracy.
Using an intent agent can help narrow down the search for relevant schemas and SQL samples, leading to more precise query generation and reducing the likelihood of errors.

3
Monitor and evaluate the performance of QueryGPT regularly.
Establishing a standardized evaluation procedure helps track improvements and identify areas for further enhancement, ensuring that the tool remains effective over time.

Common Pitfalls

1

Over-reliance on user input for query generation can lead to inaccuracies.

User prompts may lack context or detail, resulting in generated queries that do not meet the user's needs. Implementing a prompt enhancer can help improve the quality of input to the LLM.

2

Hallucinations in generated queries can lead to errors.

LLMs may produce queries referencing non-existent tables or columns. Continuous refinement of prompts and the introduction of validation agents are necessary to mitigate this issue.

Related Concepts

Generative AI Applications In Data Querying

Natural Language Processing For SQL Generation

Productivity Tools For Data Analysis