Measuring Latency Overhead with Own Time

by: Jimmy O’Neill

Jimmy O’Neill
9 min readadvanced
--
View Original

Overview

The article discusses the introduction of a new metric called 'own time' to measure latency overhead in Airbnb's Viaduct framework, a GraphQL-based data-oriented service mesh. It highlights the challenges of accurately assessing performance due to dependencies on downstream services and explains how 'own time' helps isolate Viaduct's performance characteristics for better optimization.

What You'll Learn

1

How to compute the 'own time' metric for performance analysis

2

Why isolating Viaduct's performance from downstream services is crucial

3

How to set performance goals based on the normalized own time metric

4

When to apply cache optimizations based on own time metrics

Prerequisites & Requirements

  • Understanding of GraphQL and service-oriented architecture
  • Familiarity with performance monitoring tools(optional)

Key Questions Answered

What is the 'own time' metric and how is it calculated?
'Own time' measures the portion of a request's wall-clock time when there are no downstream requests in flight. It is calculated by analyzing the root request time span against the time spans of downstream fetches, allowing for a clearer understanding of Viaduct's overhead.
How does the 'own time' metric help in optimizing Viaduct's performance?
The 'own time' metric isolates Viaduct's overhead from downstream service performance, enabling targeted optimizations. By focusing on this metric, teams can identify and reduce latency caused by Viaduct itself, rather than external factors.
What impact does the own time ratio have on performance optimization strategies?
A low own time ratio suggests that most latency improvements should focus on downstream services, while a high ratio indicates that optimizing Viaduct's internal operations could yield significant latency reductions.
What role does caching play in the performance of the Viaduct framework?
Caching is critical for performance in Viaduct, as it reduces execution time. The article discusses how cache misses can lead to increased latency and how monitoring own time can help identify issues related to caching.

Key Statistics & Figures

Own time ratio
20%
Indicates the percentage of request runtime in Viaduct spent with no downstream requests in flight, which dropped to 17% after a performance improvement.
Reduction in own time for operations
25%
A change to the multithreading model for field execution resulted in a 25% decrease in own time for all operations.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implement the 'own time' metric in your performance monitoring strategy to gain clearer insights into your application's latency.
By isolating the performance of your service from external dependencies, you can make more informed decisions about where to focus optimization efforts.
2
Regularly analyze the own time ratio across different operations to identify potential performance bottlenecks.
This proactive approach allows teams to address issues before they impact user experience, ensuring smoother application performance.
3
Use the normalized own time metric to set quarterly performance improvement goals.
This helps maintain focus on reducing internal overhead while adapting to changing query patterns, ensuring consistent performance improvements.

Common Pitfalls

1
Ignoring the impact of downstream service performance on overall latency measurements can lead to misguided optimization efforts.
Without isolating Viaduct's performance, teams may incorrectly focus on optimizing internal processes when the real issues lie with external services.

Related Concepts

Performance Monitoring
Caching Strategies
Graphql Optimization Techniques