•Akshay Jetli, Deepak Bobbarjung, Sergey Gitlin, Andy Maule•15 min read•intermediate•
--
•View OriginalOverview
This article discusses the significant improvements made to Uber's Experiment Evaluation Engine, achieving a 100x reduction in latency by transitioning from a remote evaluation architecture to a local evaluation architecture. It highlights the challenges faced during this transition and the resulting impact on Uber's experimentation platform.
What You'll Learn
1
How to implement local experiment evaluation to improve latency
2
Why transitioning from RPC-based to client-side evaluation enhances reliability
3
When to apply shadow testing for verifying system changes at scale
Prerequisites & Requirements
- Understanding of A/B testing and experimentation concepts
- Experience with microservices architecture(optional)
Key Questions Answered
How did Uber reduce experiment evaluation latency by 100x?
Uber achieved a 100x reduction in experiment evaluation latency by moving from a remote evaluation architecture, which relied on RPC calls, to a local evaluation architecture that performed computations on the client side. This shift allowed evaluations to occur in microseconds instead of milliseconds, significantly enhancing performance.
What challenges did Uber face during the transition to local evaluation?
Uber faced several challenges, including ensuring the reliability of the new Parameter Service, managing logging volume due to increased evaluation speed, and the need for shadow testing to verify the accuracy of the new system against the legacy one. These challenges required careful planning and execution to mitigate risks.
What improvements were observed in user-facing functionality after the changes?
Post-implementation, Uber observed a 20% reduction in end-to-end backend latency for the UberEats search suggestion indexing, decreasing p99 latency from 250 ms to 200 ms. This improvement led to a better user experience in the mobile app.
Key Statistics & Figures
p99 latency of experiment evaluations
from 10 ms to 100 µs
This improvement was achieved through the transition to local evaluation architecture.
Reduction in UberEats search suggestion indexing latency
20%
This reduction improved the mobile app experience significantly.
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Backend
Golang
Used for implementing the backend microservices that benefit from the new local evaluation architecture.
Messaging
Apache Kafka
Utilized for processing and distributing experiment logs.
Key Actionable Insights
1Implement local evaluation for experiments to significantly reduce latency.By processing evaluations on the client side, organizations can achieve faster response times, which is crucial for real-time applications like Uber's services.
2Utilize shadow testing to ensure new implementations match legacy systems.This method allows teams to validate the accuracy of new systems against established ones, ensuring reliability before full deployment.
3Monitor logging volume closely when implementing faster evaluation systems.Increased evaluation speeds can lead to a surge in log production, potentially overwhelming processing systems. Implementing telemetry can help manage this risk.
Common Pitfalls
1
Overlooking the need for shadow testing can lead to undetected discrepancies between old and new systems.
Without shadow testing, teams may miss critical bugs that could affect performance and reliability, especially when transitioning to new architectures.
Related Concepts
A/B Testing
Microservices Architecture
Shadow Testing
Experimentation Platforms