Facebook’s Scribe technology now open source

Visit the post for more.

Robert Johnson
5 min readbeginner
--
View Original

Overview

Facebook's Scribe technology, designed to handle massive data collection from servers, is now open source. The article discusses its design decisions, including scalability, reliability, and simplicity in the data model, which have allowed it to manage tens of billions of messages daily.

What You'll Learn

1

How to implement a flexible network topology for scalable data collection systems

2

Why a simple data model can improve logging system performance

3

When to prioritize reliability versus performance in data logging solutions

Key Questions Answered

How does Facebook's Scribe technology handle data collection?
Facebook's Scribe technology collects data from servers by utilizing a flexible network topology arranged in a directed graph. This allows for easy scaling and efficient message batching across data centers, handling tens of billions of messages daily.
What design decisions were made to ensure Scribe's reliability?
Scribe was designed to be reliably sufficient for most use cases without the overhead of heavyweight protocols. It spools data to disk to manage intermittent connectivity but does not sync every message, allowing for a small chance of data loss during failures.
What is the data model used in Scribe?
The data model in Scribe is intentionally simple, consisting of two strings: a category and the actual message. This design avoids complications associated with logging levels and rules, allowing for easy addition of new categories without modifying the source code.
Which programming languages can log messages to Scribe?
Scribe supports logging from multiple programming languages, including PHP, Python, C++, and Java. This flexibility is enhanced by using Thrift, which simplifies development and integration.

Key Statistics & Figures

Messages handled daily
tens of billions
Scribe manages this volume across over 100 use cases, demonstrating its capability to scale with Facebook's growth.

Technologies & Tools

Backend
Thrift
Thrift was used to accelerate development and enhance the flexibility of Scribe, allowing it to log messages from various programming languages.

Key Actionable Insights

1
Implementing a flexible network topology can significantly enhance the scalability of your data collection systems.
By allowing servers to communicate in a directed graph without needing to understand the entire network layout, you can easily adapt to growth and changes in your infrastructure.
2
Prioritizing a simple data model can streamline your logging processes and reduce development overhead.
A straightforward approach, like using just a category and message string, can facilitate easier maintenance and adaptability as new use cases arise.
3
Understanding the trade-offs between reliability and performance is crucial when designing logging systems.
Scribe's design balances these factors, providing enough reliability for most applications without the need for complex protocols, which can slow down performance.

Common Pitfalls

1
Overcomplicating the data model can lead to increased maintenance and reduced performance.
Many logging systems try to incorporate complex features like logging levels and schemas, which can hinder scalability and adaptability. Keeping the data model simple, as Scribe does, allows for easier expansion and use.