Flannel: An Application-Level Edge Cache to Make Slack Scale

Professor Robin Dunbar, when studying Neolithic farming villages and primate troupes in the 90s, theorized that the maximum number of stable relationships we can keep is around 148, known popularly as Dunbar’s number. This upper bound is due to the mental dossier kept on individual’s relationships, but more importantly, the number of cross relationships between…

Bing Wei
8 min readadvanced
--
View Original

Overview

The article discusses Flannel, an application-level edge cache developed by Slack to enhance scalability and performance for large teams. It addresses the challenges of data loading and client overhead as user bases grow, introducing lazy loading and proactive data caching to improve user experience.

What You'll Learn

1

How to implement lazy loading in client applications

2

Why proactive data caching improves application performance

3

When to apply consistent hashing for cache efficiency

Prerequisites & Requirements

  • Understanding of caching concepts and client-server architecture
  • Familiarity with WebSocket connections(optional)

Key Questions Answered

How does Flannel improve Slack's performance for large teams?
Flannel enhances Slack's performance by implementing lazy loading and proactive data caching. This allows the client to load only essential data at startup, reducing connection times and memory usage, which is crucial for teams with tens of thousands of users.
What are the main challenges Slack faced with larger teams?
As teams grew, Slack encountered issues such as increased connection times, larger client memory footprints, and expensive reconnections. These challenges stemmed from the need to load extensive data upfront, which became unmanageable for larger user bases.
What is the role of consistent hashing in Flannel?
Consistent hashing is used in Flannel to direct users from the same networking region to the same Flannel instance. This approach ensures optimal cache efficiency and minimizes the impact of reconnection storms on Slack's backend servers.

Key Statistics & Figures

Simultaneous connections supported by Flannel
4 million
Flannel serves this number of connections at peak times, showcasing its scalability.
Client queries per second handled by Flannel
600K
This high query rate indicates Flannel's efficiency in managing data requests.
Data size reduction for client bootstrap on a 32K user team
44 times
This significant reduction demonstrates Flannel's impact on optimizing data loading.

Technologies & Tools

Backend
Flannel
An application-level caching service developed to improve data loading and performance.
Communication
Websocket
Used to maintain real-time connections between clients and Slack servers.

Key Actionable Insights

1
Implement lazy loading techniques in your applications to enhance performance.
Lazy loading allows applications to load data only when necessary, reducing initial load times and memory usage, especially in large-scale environments.
2
Utilize proactive caching strategies to anticipate user data needs.
By predicting which data users will request next, applications can push relevant information to clients, improving responsiveness and user experience.
3
Consider consistent hashing for distributed caching solutions.
This technique helps maintain cache efficiency and minimizes server load during high traffic periods, particularly beneficial for applications with large user bases.

Common Pitfalls

1
Loading all data upfront can lead to performance bottlenecks.
This happens because large datasets increase connection times and memory usage, especially in applications with many users. Implementing lazy loading can mitigate this issue.

Related Concepts

Caching Strategies
Scalability In Software Architecture
Client-server Communication Models