QueryGPT: Transforming SQL Query Generation with Generative AI

QueryGPT: Revolutionizing SQL Query Generation with Natural Language Processing

In the rapidly evolving landscape of data-driven decision making, Uber has developed a groundbreaking tool that is transforming how engineers, operations managers, and data scientists interact with their vast data repositories. QueryGPT, an innovative application of generative AI, is bridging the gap between natural language and complex SQL queries, significantly enhancing productivity across the organization.

The Genesis of QueryGPT

At Uber, where approximately 1.2 million interactive queries are processed monthly, the need for efficient query generation became increasingly apparent. With the Operations team alone contributing to 36% of these queries, the potential for time savings was substantial. QueryGPT emerged as a solution to this challenge, promising to reduce query authoring time from an average of 10 minutes to just 3 minutes.

The concept of QueryGPT was born during Uber’s Generative AI Hackdays in May 2023. Since its inception, the tool has undergone numerous iterations, evolving from a proof of concept to a production-ready service that is reshaping how Uber employees interact with data.

Architecture and Evolution

Initial Design

The first version of QueryGPT utilized a straightforward Retrieval-Augmented Generation (RAG) system to fetch relevant samples for query generation. It vectorized user prompts and performed similarity searches on SQL samples and schemas to identify pertinent tables and query examples.

Current Architecture

As QueryGPT matured, its architecture became more sophisticated:

Workspaces: Curated collections of SQL samples and tables for specific business domains.
Intent Agent: Maps user questions to appropriate business domains.
Table Agent: Selects and verifies the correct tables for query generation.
Column Prune Agent: Optimizes schema input by removing irrelevant columns.

This refined structure has significantly improved the accuracy and relevance of generated queries while managing token limits and reducing latency.

Key Features and Improvements

Workspaces

The introduction of workspaces has been a game-changer for QueryGPT. By categorizing SQL samples and tables into specific business domains such as Mobility, Core Services, and Ads, the system can now provide more focused and accurate query suggestions.

Intent Classification

The Intent Agent plays a crucial role in understanding user queries. By mapping natural language prompts to specific business domains, it narrows down the search scope and improves the relevance of the generated SQL.

Table Selection and Verification

The Table Agent allows users to verify and modify the tables selected for query generation, ensuring that the most appropriate data sources are used.

Schema Optimization

To address token limit issues with large schemas, the Column Prune Agent intelligently reduces the schema size by removing irrelevant columns, improving both efficiency and cost-effectiveness.

Evaluation and Quality Assurance

Uber has implemented a robust evaluation system to ensure the continuous improvement of QueryGPT:

A curated set of golden question-to-SQL mappings for standardized testing.
Flexible evaluation procedures to capture performance signals throughout the query generation process.
Metrics tracking for intent accuracy, table selection, query execution success, and output quality.

This comprehensive approach allows the team to identify patterns, address shortcomings, and measure incremental improvements over time.

Challenges and Learnings

Throughout the development of QueryGPT, the team at Uber encountered several challenges:

LLM Capabilities and Limitations

While Large Language Models (LLMs) proved excellent at classification tasks, they also demonstrated a tendency for hallucinations—generating queries with non-existent tables or columns.

User Input Variability

The quality and detail of user prompts varied significantly, necessitating the development of prompt enhancement techniques to ensure consistent query generation.

High Expectations for Accuracy

Users expect highly accurate and functional SQL output, setting a high bar for the system’s performance.

Impact and Future Directions

QueryGPT has made a significant impact at Uber:

Approximately 300 daily active users across Operations and Support teams.
78% of users report reduced query writing time.

As QueryGPT continues to evolve, the team at Uber is focusing on:

Reducing hallucinations through improved prompts and validation agents.
Expanding the tool’s capabilities to handle more complex queries.
Integrating user feedback to refine and enhance performance.

Conclusion

QueryGPT represents a significant leap forward in the intersection of natural language processing and database querying. By leveraging the power of generative AI, Uber has created a tool that not only boosts productivity but also democratizes data access across the organization. As the system continues to learn and improve, it promises to become an indispensable asset in Uber’s data-driven culture.

The success of QueryGPT demonstrates the potential of AI-assisted tools in simplifying complex technical tasks. As more companies grapple with the challenges of big data, solutions like QueryGPT may well become the standard for efficient data interaction and analysis. The journey of QueryGPT from a hackathon idea to a widely-used production tool serves as an inspiring example of how innovative thinking and iterative development can lead to transformative solutions in the tech industry.

Read more such articles from our Newsletter here.

QueryGPT: Revolutionizing SQL Query Generation with Natural Language Processing

Jump to

The Genesis of QueryGPT

Architecture and Evolution

Initial Design

Current Architecture

Key Features and Improvements

Workspaces

Intent Classification

Table Selection and Verification

Schema Optimization

Evaluation and Quality Assurance

Challenges and Learnings

LLM Capabilities and Limitations

User Input Variability

High Expectations for Accuracy

Impact and Future Directions

Conclusion

Prachi Kothiyal

Leave a Comment Cancel Reply

You may also like

How NVIDIA vGPU 19.0 Empowers AI and Graphics Virtualization on Blackwell GPUs

Building an AI-Powered Food Visualization Service—A Rapid Weekend Project

AI Is Advancing Developer Speed—But Not Without Trade-Offs

Categories

Recent Posts

Interested in working with Backend, Newsletters ?

Home

Discover Jobs

Enterprise blog

Professionals blog

About us

Terms of use

Privacy policy

Contact us