Upscaling LinkedIn's Profile Datastore While Reducing Costs

LinkedIn Engineering Team
18 min readadvanced
--
View Original

Overview

The article discusses LinkedIn's strategy to upscale its profile datastore while reducing operational costs. It details the transition from Oracle to Espresso, the introduction of Couchbase for caching, and the resulting performance improvements and cost savings achieved.

What You'll Learn

1

How to implement a hybrid caching strategy using Couchbase

2

Why resilience against cache failures is critical in distributed systems

3

How to reduce tail latency in high-throughput environments

Prerequisites & Requirements

  • Understanding of caching concepts and distributed systems
  • Familiarity with Couchbase and Espresso(optional)

Key Questions Answered

What were the main challenges faced when adopting Couchbase?
The main challenges included ensuring resilience against Couchbase failures, maintaining all-time cached data availability, and preventing data divergence between the cache and the source of truth. These challenges were addressed through specific design principles and a hybrid caching strategy.
How much did tail latency reduce after implementing Couchbase?
After implementing Couchbase, the 99th percentile latency for multi-get requests dropped by 60.73%, and the 99.9th percentile latency dropped by 63.66%. This significant reduction demonstrates the effectiveness of the new caching strategy in improving performance.
What cost savings were achieved with the new caching strategy?
The new Espresso hybrid cache tier allowed LinkedIn to reduce the number of Espresso storage nodes by 90%, resulting in an estimated annual cost saving of about 10% for servicing member profile requests. This was achieved despite the additional costs incurred for new infrastructure.

Key Statistics & Figures

Cache hit rate
99%
Achieved with the Couchbase cache, significantly improving read performance.
Reduction in tail latency for multi-get requests
60.73%
Reduction observed in the 99th percentile latency after implementing Couchbase.
Annual cost savings
10%
Estimated savings achieved through the new caching strategy.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Database
Couchbase
Used as a distributed key-value cache to enhance read scaling and reduce latency.
Database
Espresso
LinkedIn's primary datastore that was transitioned from Oracle.
Data Format
Avro
Used for serializing profile data in binary format.
Stream Processing
Samza
Used for implementing cache updater and bootstrapper jobs.
Change Data Capture
Brooklin
Used to capture and synchronize changes from Espresso to Couchbase.

Key Actionable Insights

1
Implement a health monitoring system for your cache to prevent failures from cascading through your application.
By tracking the health of cache components, you can proactively manage issues and maintain system stability, which is crucial for high-availability applications.
2
Consider using a hybrid caching strategy that combines local and distributed caches to optimize read performance.
This approach allows for quick access to frequently used data while ensuring that less common data is still available without overloading the primary datastore.
3
Regularly bootstrap your cache to prevent data divergence and ensure consistency with the source of truth.
Setting a finite Time-To-Live (TTL) for cache records and periodically refreshing the cache helps maintain data accuracy and reduces the risk of stale data being served.

Common Pitfalls

1
Failing to implement a robust health monitoring system for the cache can lead to widespread failures.
Without monitoring, issues in the cache can go unnoticed, causing client timeouts and degraded performance across the application.
2
Not setting a finite TTL for cache records can result in permanent data divergence.
Allowing records to persist indefinitely can lead to stale data being served, especially if deletions in the source database are not reflected in the cache.

Related Concepts

Distributed Caching Strategies
Eventual Consistency In Distributed Systems
Data Serialization Formats Like Avro