Building Scalable, Real-Time Chat to Improve Customer Experience

Avijit Singh, Vivek Shah, Ankit Tyagi
14 min readintermediate
--
View Original

Overview

The article discusses Uber's efforts to build a scalable, real-time chat system to enhance customer experience. It outlines the challenges faced, the architectural changes made, and the improvements in reliability and efficiency achieved through the new system.

What You'll Learn

1

How to scale a chat system to handle increased contact volume

2

Why using GraphQL subscriptions improves real-time communication

3

How to implement a push pipeline for efficient message delivery

Key Questions Answered

What challenges did Uber face in scaling their chat system?
Uber faced several challenges including reliability issues with message delivery, lack of insights into chat contact health, and limitations of their legacy architecture. These challenges hindered their ability to effectively scale the chat system to meet growing customer demands.
How did Uber improve the reliability of their chat system?
Uber improved reliability by transitioning to a new architecture that reduced the error rate of contact delivery from 46% in the old stack to approximately 0.45% in the new system. This was achieved through better observability and a simplified architecture that facilitated easier scaling.
What technologies were used in the new chat architecture?
The new chat architecture utilized GraphQL for real-time data communication, WebSocket for bidirectional communication, and Apache Kafka® for message handling. These technologies were chosen for their reliability and efficiency in managing high volumes of chat interactions.

Key Statistics & Figures

Percentage of overall Uber contact volume handled by chat
36%
This metric reflects the success of the new chat system in scaling operations.
Error rate of delivering contacts
0.45%
This significant reduction from the previous 46% error rate demonstrates the improved reliability of the new architecture.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend
Graphql
Used for real-time data communication between the server and client.
Backend
Websocket
Facilitates bidirectional communication between agent machines and the proxy layer.
Backend
Apache Kafka®
Serves as a message service for reliable and fast message handling.

Key Actionable Insights

1
Implement a push pipeline to enhance real-time communication in chat applications.
A push pipeline allows for efficient message delivery and reduces latency, which is crucial for maintaining customer satisfaction in high-volume environments.
2
Utilize GraphQL subscriptions to streamline data flow between server and client.
GraphQL subscriptions provide a robust mechanism for real-time updates, ensuring that agents receive timely information to assist customers effectively.
3
Focus on observability to track the health of chat interactions.
Implementing monitoring tools can help identify issues in real-time, allowing for quicker resolutions and improved customer experiences.

Common Pitfalls

1
Failing to account for cookie management can disrupt authentication processes.
When agents clear their browser cookies, it can lead to authentication failures, causing delays in handling customer queries. Ensuring robust cookie management can prevent these issues.
2
Overlooking the importance of observability can lead to undetected issues.
Without proper monitoring tools, teams may struggle to identify whether delays in chat responses are due to technical issues or staffing problems, leading to inefficient resource allocation.