Jellyfish: Cost-Effective Data Tiering for Uber’s Largest Storage System

Mohammed Khatib
12 min readadvanced
--
View Original

Overview

The article discusses Jellyfish, a cost-effective data tiering solution developed by Uber for their largest storage system, Schemaless. It addresses the challenges of managing large volumes of data while maintaining performance and cost efficiency through innovative data access patterns and compression techniques.

What You'll Learn

1

How to implement data tiering to optimize storage costs

2

Why backward compatibility is crucial in system upgrades

3

When to apply compression techniques for data storage

Prerequisites & Requirements

  • Understanding of data storage systems and access patterns
  • Experience with high-performance database systems(optional)

Key Questions Answered

What is the purpose of Jellyfish in Uber's storage system?
Jellyfish is designed to optimize data storage costs by implementing data tiering based on access patterns. It allows frequently accessed data to incur higher costs while reducing costs for infrequently accessed data, thus improving overall efficiency.
How does Jellyfish maintain backward compatibility?
Backward compatibility is maintained by ensuring that existing APIs remain unchanged, allowing customers to continue using their current code without modifications while benefiting from the new efficiencies introduced by Jellyfish.
What challenges were faced during the rollout of Jellyfish?
Challenges included managing high loads from services accessing old data, ensuring data consistency during migration, and coordinating with other data-intensive tasks. These issues required careful planning and collaboration with customers to mitigate risks.

Key Statistics & Figures

Storage space reduction
33%
This reduction was achieved after the phased deletion of data from the old backend following the rollout of Jellyfish.
Latency SLA
hundreds of milliseconds
Jellyfish was confirmed to meet the latency SLA during stress testing, ensuring performance under load.

Technologies & Tools

Backend
Schemaless
Used as the primary storage system for Uber's data, facilitating the implementation of Jellyfish.
Compression
Zstd
Utilized for compressing data to achieve significant storage savings.

Key Actionable Insights

1
Implement data tiering in your storage solutions to optimize costs based on data access frequency.
By categorizing data into tiers, you can significantly reduce storage costs while maintaining performance for frequently accessed data.
2
Prioritize backward compatibility when upgrading existing systems to minimize disruption.
Ensuring that existing APIs remain functional allows for smoother transitions and reduces the risk of breaking changes for users.
3
Utilize compression techniques like ZSTD to enhance storage efficiency.
Applying effective compression can lead to substantial savings in storage space, as demonstrated by Jellyfish achieving up to 40% savings.

Common Pitfalls

1
Overlooking the importance of testing during system upgrades can lead to significant downtime and data inconsistency.
Without thorough testing, issues may arise that affect user experience and data integrity, making it crucial to validate all changes before full deployment.

Related Concepts

Data Tiering Strategies
Compression Algorithms
Backward Compatibility In Software Design