Uber’s Next Gen Push Platform on gRPC

Anirudh Raja, Shahbaz Kaladiya, Shivani Bhatia, Xinlin Peng
19 min readadvanced
--
View Original

Overview

The article discusses Uber's transition from a Server-Sent Events (SSE) architecture to a gRPC-based push platform, detailing the motivations, implementation challenges, and outcomes of this migration. It highlights improvements in latency, message delivery success rates, and overall system reliability.

What You'll Learn

1

How to implement bidirectional streaming using gRPC

2

Why QUIC/HTTP3 improves mobile networking latency

3

How to manage connection lifecycles effectively in gRPC

Prerequisites & Requirements

  • Understanding of gRPC and streaming protocols
  • Familiarity with Netty and Apache Cassandra(optional)

Key Questions Answered

What were the main motivations for Uber to switch to gRPC?
Uber transitioned to gRPC to enhance real-time messaging capabilities, improve latency, and enable bidirectional streaming, which allows for instant acknowledgments and better network utilization. This change was driven by the limitations of the previous SSE architecture, particularly in handling critical messages efficiently.
How did the migration to gRPC impact message delivery success rates?
The migration to gRPC resulted in a minimum increase of 1-2% in push success rates across all apps. This improvement is attributed to the enhanced reliability and efficiency of the gRPC framework compared to the previous SSE-based system.
What challenges did Uber face during the transition to gRPC?
Uber encountered several challenges, including managing connection lifecycles, ensuring message delivery reliability, and handling concurrency issues. The need to maintain a consistent implementation across multiple clients added complexity to the migration process.
What are the key results achieved after implementing gRPC?
Post-implementation, Uber observed a 45% improvement in gRPC connect latency (p95) and an increase in message throughput. The consistent implementation across clients has reduced the likelihood of outages and improved overall system performance.

Key Statistics & Figures

gRPC Connect Latency (p95)
45%
This improvement was achieved as a result of the migration to gRPC, allowing features to start earlier.
Push success rates
1-2%
This increase was observed across all apps following the transition to the gRPC framework.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend
Grpc
Used for implementing the new RAMEN messaging protocol.
Networking
Quic/Http3
Leveraged to improve mobile networking latency.
Backend
Netty
Used for the internal server implementation of gRPC.
Database
Apache Cassandra
Utilized for storing messages in the backend.

Key Actionable Insights

1
Implementing gRPC can significantly enhance real-time communication in applications.
By leveraging gRPC's bidirectional streaming capabilities, developers can achieve lower latency and improved message acknowledgment, which is crucial for applications that require immediate feedback.
2
Utilizing QUIC/HTTP3 can lead to substantial performance improvements.
QUIC/HTTP3's ability to reduce head-of-line blocking can enhance mobile networking experiences, particularly in environments with variable connectivity.
3
Maintaining a consistent implementation across different clients is vital for reducing outages.
A unified approach to client implementation can streamline development and maintenance, ensuring that all clients benefit from the latest features and improvements.

Common Pitfalls

1
Failing to manage connection lifecycles can lead to high CPU and memory usage.
If connection callbacks are not handled properly, tasks may run indefinitely, consuming resources unnecessarily. Implementing checks to ensure that streams are writable before attempting to write can mitigate this issue.
2
Not synchronizing access to shared resources can cause concurrency issues.
Using a single instance of StreamObserver without proper synchronization can lead to thread safety problems. Following gRPC specifications for synchronization is essential to avoid these pitfalls.

Related Concepts

Grpc And Its Advantages Over Traditional HTTP Protocols
The Role Of Quic/Http3 In Modern Networking
Best Practices For Managing Mobile Client Connections