The exabyte club: LinkedIn’s journey of scaling the Hadoop Distributed File System

Konstantin V. Shvachko
28 min readintermediate
--
View Original

Overview

The article discusses LinkedIn's significant advancements in scaling the Hadoop Distributed File System (HDFS), achieving the milestone of storing 1 exabyte of data and optimizing performance through various engineering innovations. It highlights the challenges faced and solutions implemented to enhance scalability, availability, and security within their big data infrastructure.

What You'll Learn

1

How to optimize HDFS performance through Java heap tuning

2

Why implementing High Availability in HDFS is crucial for system upgrades

3

How to manage small files in HDFS using a satellite cluster

4

How to implement consistent reads from a Standby Node in HDFS

Prerequisites & Requirements

  • Understanding of Hadoop and HDFS architecture
  • Experience with big data systems and performance tuning(optional)

Key Questions Answered

What milestones has LinkedIn achieved in scaling HDFS?
LinkedIn has achieved the milestone of storing 1 exabyte of data across all Hadoop clusters, with their largest cluster containing 500 PB of data and 1 billion objects. This reflects their rapid growth in big data analytics infrastructure, which has nearly doubled every year.
How does High Availability improve HDFS performance?
High Availability in HDFS allows multiple NameNodes to operate, eliminating the single point of failure. This enables rolling upgrades without disrupting services, significantly improving cluster availability and reducing downtime during maintenance.
What challenges does HDFS face with small files?
HDFS struggles with small files as they inflate metadata disproportionately compared to their data size, leading to scalability limits. LinkedIn addressed this by creating a satellite cluster to manage small files, improving the block-to-file ratio.
How does the Observer node enhance HDFS read performance?
The Observer node allows read requests to be served from a Standby NameNode, distributing the read workload and improving overall throughput. This design leverages the fact that reads constitute 95% of namespace operations, thus enhancing performance.

Key Statistics & Figures

Total data stored
1 exabyte
This is the total data stored across all Hadoop clusters at LinkedIn.
Largest cluster size
500 PB
The largest 10,000-node cluster at LinkedIn stores this amount of data.
Namespace objects
1 billion
The largest cluster maintains this number of objects on a single NameNode.
Average latency for RPCs
under 10 milliseconds
This is the average latency for Remote Procedure Calls on the NameNode.
Performance improvement from non-fair locking
10x
This improvement was observed in NameNode RPC latency after implementing non-fair locking.
Throughput increase with Observer node
3x
The total throughput of namespace operations increased significantly with the introduction of the Observer node.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend
Hadoop
Used as the basis for LinkedIn's big data analytics infrastructure.
Programming Language
Java
Used for tuning the NameNode's performance.
Data Transfer Tool
Apache Gobblin
Used for performing fast distributed copies in the Wormhole data transfer system.

Key Actionable Insights

1
Implement High Availability in your HDFS setup to ensure continuous service during upgrades.
This approach minimizes downtime and allows for seamless transitions between NameNodes, which is critical for maintaining service reliability in production environments.
2
Utilize a satellite cluster to handle small files effectively in HDFS.
By offloading small file management to a separate cluster, you can significantly improve metadata performance and scalability, which is essential for large-scale data operations.
3
Optimize Java heap settings for the NameNode to enhance performance.
Adjusting the heap size and tuning garbage collection can prevent long pauses and improve responsiveness, especially as the namespace grows.

Common Pitfalls

1
Overloading the NameNode with too many small files can lead to performance degradation.
This happens because small files disproportionately increase metadata, causing scalability issues. To avoid this, consider using a satellite cluster to manage small files separately.

Related Concepts

Hdfs Architecture And Design Patterns
Big Data Scalability Challenges
Performance Tuning In Distributed Systems