Code Migration in Production: Rewriting the Sharding Layer of Uber’s Schemaless Datastore

Jesper Lindstrom Nielsen, Anders Johnsen
8 min readintermediate
--
View Original

Overview

The article discusses Uber's experience in rewriting the sharding layer of its Schemaless datastore from Python to Go, highlighting the challenges and successes of performing this migration in a production environment. The project, named Frontless, aimed to improve performance and resource utilization while ensuring zero downtime during the transition.

What You'll Learn

1

How to migrate a production system from Python to Go without downtime

2

Why using goroutines in Go improves resource utilization

3

How to validate new implementations against legacy systems in production

4

When to implement automated integration tests for new endpoints

Prerequisites & Requirements

  • Understanding of Python and Go programming languages
  • Experience with production system migrations(optional)

Key Questions Answered

What were the performance improvements after migrating to Go?
After migrating to Go, the median request latency decreased by 85 percent, and the p99 request latency decreased by 70 percent. Additionally, CPU utilization dropped by more than 85 percent, allowing for reduced worker nodes across Schemaless instances.
How did Uber ensure zero downtime during the migration?
Uber ensured zero downtime by implementing the Frontless worker node as a proxy to the existing uWSGI Schemaless worker nodes. This allowed requests to be validated and compared in real-time, ensuring that both systems produced the same results without disrupting service.
What validation methods were used for the new endpoints?
Validation involved executing requests on both the Frontless and Schemaless workers and comparing their responses. This method allowed for immediate identification of discrepancies, ensuring that the new implementation matched the legacy system's output.
What challenges did Uber face during the migration process?
Uber faced the challenge of migrating a critical system while continuing to develop new features and bug fixes in the existing production environment. This required an iterative approach to development to continuously validate and migrate features from Python to Go.

Key Statistics & Figures

Median request latency reduction
85 percent
This was achieved after all read endpoints for the Mezzanine datastore were handled by Frontless.
p99 request latency reduction
70 percent
This improvement was part of the overall performance enhancements following the migration.
CPU utilization reduction
more than 85 percent
This efficiency gain allowed for a reduction in the number of worker nodes used across all Schemaless instances.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implementing an iterative development process can significantly reduce risks during system migrations.
By validating each endpoint before going live, Uber was able to ensure that the new implementation met performance standards and functioned correctly without disrupting existing services.
2
Utilizing lightweight concurrency features in Go can lead to substantial performance improvements.
The use of goroutines allowed Uber to handle more traffic with fewer resources, demonstrating the importance of choosing the right programming language for scalability.
3
Automated integration tests are essential for validating new implementations against legacy systems.
These tests can expedite the development cycle and ensure that new features do not introduce bugs, maintaining system reliability.

Common Pitfalls

1
Failing to validate new implementations against legacy systems can lead to undetected bugs.
Without proper validation, discrepancies between the old and new systems may go unnoticed, resulting in potential service disruptions and degraded user experiences.
2
Underestimating the complexity of migrating critical systems in production.
Migration processes can introduce unexpected challenges, especially when new features are continuously being developed, making it crucial to adopt an iterative approach.

Related Concepts

Microservices Architecture
Continuous Integration And Deployment
Concurrency In Programming Languages