Navigating the LLM Landscape: Uber’s Innovation with GenAI Gateway

Tse-Chi Wang, Roopansh Bansal
15 min readintermediate
--
View Original

Overview

The article discusses Uber's GenAI Gateway, a unified platform designed to streamline the integration of Large Language Models (LLMs) across various teams within the company. It highlights the challenges faced in LLM integration, the architectural decisions made to enhance interoperability, and the benefits of using the GenAI Gateway for improved operational efficiency and security.

What You'll Learn

1

How to integrate multiple LLMs using a unified API

2

Why implementing a PII redactor is crucial for data security

3

When to use Uber-hosted LLMs versus third-party vendors

Prerequisites & Requirements

  • Understanding of Large Language Models and their applications
  • Familiarity with API integration and security practices(optional)

Key Questions Answered

What is the purpose of the GenAI Gateway at Uber?
The GenAI Gateway serves as a centralized platform that simplifies the integration of various LLMs, allowing teams to access models from different vendors like OpenAI and Vertex AI through a consistent interface. This streamlines operations and enhances security by standardizing data handling practices.
How does the PII redactor function in the GenAI Gateway?
The PII redactor scans input data to identify and replace personally identifiable information with anonymized placeholders. This process helps protect sensitive data before it is sent to third-party vendors, while an un-redaction process restores the original data in the responses.
What challenges does Uber face with LLM integration?
Uber encounters challenges such as inconsistent integration strategies among teams, latency issues introduced by the PII redactor, and difficulties in maintaining response quality due to anonymization. These challenges necessitate a robust solution like the GenAI Gateway to streamline operations.
What impact has the GenAI Gateway had on customer support at Uber?
The GenAI Gateway has significantly improved customer support efficiency, with 97% of generated summaries deemed useful for resolving customer issues. Agents have reported time savings, allowing them to respond to users six seconds faster than before.

Key Statistics & Figures

Monthly queries served by GenAI Gateway
16 million
This statistic highlights the scale at which the GenAI Gateway operates, serving a significant volume of requests across various customer teams.
Peak queries per second (QPS)
25
This peak performance metric indicates the system's capability to handle high traffic efficiently, ensuring responsiveness during busy periods.
Time savings for agents
6 seconds faster
This improvement in response time demonstrates the operational efficiency gained through the implementation of LLMs in customer support.
Percentage of useful summaries generated
97%
This high percentage reflects the effectiveness of LLMs in providing valuable insights for resolving customer issues.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend
Genai Gateway
A unified platform for integrating various LLMs and managing their usage across Uber.
AI/ML
Openai
One of the LLM vendors integrated into the GenAI Gateway.
AI/ML
Vertex AI
Another LLM vendor integrated into the GenAI Gateway.

Key Actionable Insights

1
Implement a unified API for LLM access to enhance operational efficiency.
By standardizing the API interface across various teams, Uber can reduce redundancy and improve integration speed, leading to faster deployment of LLM capabilities.
2
Utilize the PII redactor to ensure compliance with data privacy regulations.
Incorporating a PII redactor is essential for protecting sensitive information, especially when dealing with third-party vendors, thus minimizing the risk of data breaches.
3
Encourage the use of Uber-hosted LLMs for sensitive applications.
Using in-house models can eliminate the need for PII redaction, improving response times and maintaining data integrity, which is crucial for sensitive customer interactions.

Common Pitfalls

1
Failing to account for latency introduced by the PII redactor.
This can lead to slower response times, especially with large input requests. To mitigate this, it's important to optimize the redaction process and consider using CPU-optimized models.
2
Inconsistent handling of anonymized placeholders can cause issues in LLM caching.
When the same entity is redacted differently in various contexts, it can lead to inaccuracies in cached responses. Maintaining a consistent approach to anonymization is crucial for effective caching.

Related Concepts

Large Language Models (llms)
Data Privacy And Security Practices
API Design And Integration Strategies