The exabyte club: LinkedIn’s journey of scaling the Hadoop Distributed File System

Konstantin V. Shvachko

•

Konstantin V. Shvachko

•28 min read•intermediate•

--

•View Original

ApacheAzureAzure Blob StorageHTTPSJavaJenkinsV

Overview

The article discusses LinkedIn's significant advancements in scaling the Hadoop Distributed File System (HDFS), achieving the milestone of storing 1 exabyte of data and optimizing performance through various engineering innovations. It highlights the challenges faced and solutions implemented to enhance scalability, availability, and security within their big data infrastructure.

What You'll Learn

1

How to optimize HDFS performance through Java heap tuning

2

Why implementing High Availability in HDFS is crucial for system upgrades

3

How to manage small files in HDFS using a satellite cluster

4

How to implement consistent reads from a Standby Node in HDFS

Prerequisites & Requirements

Understanding of Hadoop and HDFS architecture
Experience with big data systems and performance tuning(optional)

Key Questions Answered

What milestones has LinkedIn achieved in scaling HDFS?

LinkedIn has achieved the milestone of storing 1 exabyte of data across all Hadoop clusters, with their largest cluster containing 500 PB of data and 1 billion objects. This reflects their rapid growth in big data analytics infrastructure, which has nearly doubled every year.

How does High Availability improve HDFS performance?

High Availability in HDFS allows multiple NameNodes to operate, eliminating the single point of failure. This enables rolling upgrades without disrupting services, significantly improving cluster availability and reducing downtime during maintenance.

What challenges does HDFS face with small files?

HDFS struggles with small files as they inflate metadata disproportionately compared to their data size, leading to scalability limits. LinkedIn addressed this by creating a satellite cluster to manage small files, improving the block-to-file ratio.

How does the Observer node enhance HDFS read performance?

The Observer node allows read requests to be served from a Standby NameNode, distributing the read workload and improving overall throughput. This design leverages the fact that reads constitute 95% of namespace operations, thus enhancing performance.

Key Statistics & Figures

Total data stored

1 exabyte

This is the total data stored across all Hadoop clusters at LinkedIn.

Largest cluster size

500 PB

The largest 10,000-node cluster at LinkedIn stores this amount of data.

Namespace objects

1 billion

The largest cluster maintains this number of objects on a single NameNode.

Average latency for RPCs

under 10 milliseconds

This is the average latency for Remote Procedure Calls on the NameNode.

Performance improvement from non-fair locking

10x

This improvement was observed in NameNode RPC latency after implementing non-fair locking.

Throughput increase with Observer node

3x

The total throughput of namespace operations increased significantly with the introduction of the Observer node.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend

Hadoop

Used as the basis for LinkedIn's big data analytics infrastructure.

Programming Language

Java

Used for tuning the NameNode's performance.

Data Transfer Tool

Apache Gobblin

Used for performing fast distributed copies in the Wormhole data transfer system.

Key Actionable Insights

1
Implement High Availability in your HDFS setup to ensure continuous service during upgrades.
This approach minimizes downtime and allows for seamless transitions between NameNodes, which is critical for maintaining service reliability in production environments.

2
Utilize a satellite cluster to handle small files effectively in HDFS.
By offloading small file management to a separate cluster, you can significantly improve metadata performance and scalability, which is essential for large-scale data operations.

3
Optimize Java heap settings for the NameNode to enhance performance.
Adjusting the heap size and tuning garbage collection can prevent long pauses and improve responsiveness, especially as the namespace grows.

Common Pitfalls

1

Overloading the NameNode with too many small files can lead to performance degradation.

This happens because small files disproportionately increase metadata, causing scalability issues. To avoid this, consider using a satellite cluster to manage small files separately.

Related Concepts

Hdfs Architecture And Design Patterns

Big Data Scalability Challenges

Performance Tuning In Distributed Systems