Building Uber’s Fulfillment Platform for Planet-Scale using Google Cloud Spanner

Ankit Srivastava, Fabin Jose, Jean He, Nandakumar Gopalakrishnan, bowie@uber.com, Ramachandran Iyer, Uday Kiran Medisetty

•

Ankit Srivastava, Fabin Jose, Jean He, Nandakumar Gopalakrishnan, [email protected], Ramachandran Iyer, Uday Kiran Medisetty

•20 min read•advanced•

--

•View Original

AWSCockroachDBGoogle CloudgRPCMySQLSQL

Overview

This article discusses the development of Uber's Fulfillment Platform using Google Cloud Spanner, focusing on its architecture, scalability, and operational efficiency. It highlights the challenges faced during the transition from a NoSQL to a NewSQL paradigm and the strategies implemented to optimize performance and reliability.

What You'll Learn

1

How to leverage Google Cloud Spanner for scalable database architecture

2

Why transitioning from NoSQL to NewSQL can enhance data consistency

3

How to implement effective caching strategies to improve performance

Prerequisites & Requirements

Understanding of database architectures and distributed systems
Familiarity with Google Cloud Platform services(optional)

Key Questions Answered

What are the main challenges in transitioning from NoSQL to NewSQL?

The main challenges include designing application workloads that align with NewSQL paradigms, building resilient networking architecture, and optimizing a new cloud database to handle Uber's scale. These challenges require careful planning and execution to ensure a smooth transition without compromising performance.

How does Uber ensure high availability with Cloud Spanner?

Uber achieves high availability by utilizing a multi-region configuration for Cloud Spanner, designed for 99.999% availability. This setup allows for low-latency, high-throughput reads while ensuring that write operations are managed effectively across regions.

What optimizations were made to improve gRPC performance?

Improvements included optimizing gRPC's channel pool to automatically forward requests to backup healthy channels during TCP resets, significantly reducing error rates. This proactive approach helps maintain high reliability in network communications essential for Cloud Spanner transactions.

Key Statistics & Figures

Availability guarantee

99.999%

This is achieved through a multi-region configuration in Cloud Spanner.

Cost contribution of node expenses

80%

Node costs significantly impact the overall expenses of maintaining Cloud Spanner.

Error reduction

4x

Improvements in gRPC protocol layer optimizations led to a significant decrease in error rates.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Database

Google Cloud Spanner

Used as the primary database solution for Uber's Fulfillment Platform.

Communication Protocol

Grpc

Optimized for high reliability and performance in network communications.

Key Actionable Insights

1
Implement a multi-region configuration in Cloud Spanner to enhance availability and performance.
This approach allows for low-latency reads and high availability, crucial for applications with a global user base like Uber.

2
Utilize caching strategies to reduce database load and improve response times.
By implementing an on-prem cache, Uber can serve stale reads quickly, minimizing the impact on Cloud Spanner and optimizing resource usage.

3
Monitor and analyze transaction performance to identify and mitigate conflicts.
Using tools like the Transaction Analyzer helps Uber manage transaction states effectively, reducing the occurrence of errors and improving overall system reliability.

Common Pitfalls

1

Failing to optimize database queries can lead to performance bottlenecks.

Without careful query modeling and optimization, applications may experience increased latency and reduced throughput, especially under heavy load.

2

Neglecting the impact of network latency on transaction performance.

Intermittent network issues can cause significant delays in transaction processing, leading to user dissatisfaction and operational inefficiencies.

Related Concepts

Distributed Systems

Database Scalability

Cloud Architecture Best Practices