The Underlying Technology of Messages

Visit the post for more.

Kannan Muthukkaruppan
4 min readintermediate
--
View Original

Overview

The article discusses the technology behind Facebook's Messages infrastructure, highlighting the transition from MySQL and Cassandra to Apache HBase for better scalability and performance. It details the challenges faced and the solutions implemented to support a robust messaging system for over 350 million users.

What You'll Learn

1

How to evaluate different database technologies for scalability

2

Why HBase was chosen over MySQL and Cassandra for messaging infrastructure

3

How to design a messaging application server that interfaces with multiple services

Key Questions Answered

What were the key factors in choosing HBase over other databases?
HBase was chosen for its scalability, performance, and simpler consistency model compared to Cassandra. MySQL struggled with large datasets and performance issues, while HBase provided features like auto load balancing and failover, making it more suitable for the messaging infrastructure.
How many messages does the current Messages infrastructure handle?
The Messages infrastructure handles over 15 billion person-to-person messages per month from over 350 million users. Additionally, the chat service supports over 300 million users sending over 120 billion messages monthly.
What challenges did Facebook face with their previous messaging infrastructure?
Facebook faced challenges with MySQL's performance as data sets grew large, leading to difficulties in handling the long tail of data. Cassandra's eventual consistency model also posed challenges for the new Messages infrastructure.

Key Statistics & Figures

Monthly person-to-person messages
15 billion
Handled by the current Messages infrastructure
Monthly chat messages
120 billion
Sent by over 300 million users
Total users of Messages infrastructure
350 million
Current user base

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Database
Apache Hbase
Chosen for its scalability and performance for the messaging infrastructure
Database
Cassandra
Previously considered but found challenging due to its eventual consistency model
Database
Mysql
Previously used but struggled with large datasets and performance
Service
Apache Zookeeper
Used for user discovery service
Service
Haystack
Used for storing attachments
Filesystem
Hdfs
Underlying filesystem used by HBase, providing replication and checksums

Key Actionable Insights

1
Evaluate the scalability of your database choice based on user growth projections.
As seen with Facebook's transition to HBase, understanding how your database handles increasing loads is crucial for maintaining performance.
2
Consider the consistency model of your database when designing real-time applications.
Facebook found that Cassandra's eventual consistency model was challenging for their messaging needs, leading them to choose HBase for its simpler model.
3
Develop a dedicated application server for complex messaging systems instead of relying on generic web infrastructure.
This approach allows for better decision-making and integration with various services, as demonstrated by Facebook's new Messages architecture.

Common Pitfalls

1
Relying on a single database technology without evaluating alternatives can lead to performance issues.
Facebook's experience with MySQL highlighted the importance of assessing how well a database can scale with growing data and user demands.
2
Ignoring the consistency model of a database can complicate application design.
The challenges faced with Cassandra's eventual consistency model underscored the need to align database capabilities with application requirements.