Flannel: An Application-Level Edge Cache to Make Slack Scale

Bing Wei

Professor Robin Dunbar, when studying Neolithic farming villages and primate troupes in the 90s, theorized that the maximum number of stable relationships we can keep is around 148, known popularly as Dunbar’s number. This upper bound is due to the mental dossier kept on individual’s relationships, but more importantly, the number of cross relationships between…

Slack

•

Bing Wei

•8 min read•advanced•

--

•View Original

AWSChefPythonReactTypeScriptWebSocket

Overview

The article discusses Flannel, an application-level edge cache developed by Slack to enhance scalability and performance for large teams. It addresses the challenges of data loading and client overhead as user bases grow, introducing lazy loading and proactive data caching to improve user experience.

What You'll Learn

1

How to implement lazy loading in client applications

2

Why proactive data caching improves application performance

3

When to apply consistent hashing for cache efficiency

Prerequisites & Requirements

Understanding of caching concepts and client-server architecture
Familiarity with WebSocket connections(optional)

Key Questions Answered

How does Flannel improve Slack's performance for large teams?

Flannel enhances Slack's performance by implementing lazy loading and proactive data caching. This allows the client to load only essential data at startup, reducing connection times and memory usage, which is crucial for teams with tens of thousands of users.

What are the main challenges Slack faced with larger teams?

As teams grew, Slack encountered issues such as increased connection times, larger client memory footprints, and expensive reconnections. These challenges stemmed from the need to load extensive data upfront, which became unmanageable for larger user bases.

What is the role of consistent hashing in Flannel?

Consistent hashing is used in Flannel to direct users from the same networking region to the same Flannel instance. This approach ensures optimal cache efficiency and minimizes the impact of reconnection storms on Slack's backend servers.

Key Statistics & Figures

Simultaneous connections supported by Flannel

4 million

Flannel serves this number of connections at peak times, showcasing its scalability.

Client queries per second handled by Flannel

600K

This high query rate indicates Flannel's efficiency in managing data requests.

Data size reduction for client bootstrap on a 32K user team

44 times

This significant reduction demonstrates Flannel's impact on optimizing data loading.

Technologies & Tools

Backend

Flannel

An application-level caching service developed to improve data loading and performance.

Communication

Websocket

Used to maintain real-time connections between clients and Slack servers.

Key Actionable Insights

1
Implement lazy loading techniques in your applications to enhance performance.
Lazy loading allows applications to load data only when necessary, reducing initial load times and memory usage, especially in large-scale environments.

2
Utilize proactive caching strategies to anticipate user data needs.
By predicting which data users will request next, applications can push relevant information to clients, improving responsiveness and user experience.

3
Consider consistent hashing for distributed caching solutions.
This technique helps maintain cache efficiency and minimizes server load during high traffic periods, particularly beneficial for applications with large user bases.

Common Pitfalls

1

Loading all data upfront can lead to performance bottlenecks.

This happens because large datasets increase connection times and memory usage, especially in applications with many users. Implementing lazy loading can mitigate this issue.

Related Concepts

Caching Strategies

Scalability In Software Architecture

Client-server Communication Models

webpack is a brilliant tool for bundling frontend assets. When things start to slow down, though, its batteries-included nature and the ocean of third-party tooling can make it difficult to optimize. Poor performance is the norm and not the exception. But it doesn’t have to be that way, and so — after many hours of research, trial,…

JavaScriptTypeScriptReact

16 min read

Includes Code

Has Summary

--

Slack

Advanced

Refactoring Backend Engineering Hiring at Slack

For anyone who’s ever been involved in the hiring process, it’s no easy feat — particularly in a growing company. To get hiring practices right, it takes iteration based on feedback — both on the internal processes within your company as well as on the external process a candidate experiences. Continuously improving hiring is important for a host of…

TypeScriptReactJulia

12 min read

Has Summary

--

Slack

Advanced

Rewriting the Slack Python SDK

Have you ever been given a relatively inactive project and asked to fix a bug? What about having to update code that’s used by thousands of projects without the guidance of the original author? I stepped into a circumstance like that when I joined the Developer Relations Tools Team at Slack. At the start of 2019,…

TypeScriptPythonJavaScript

20 min read

Includes Code

Has Summary

--

These articles from Slack and other leading engineering teams share similar topics with "Flannel: An Application-Level Edge Cache to Make Slack Scale". Explore more engineering insights on JavaScript, TypeScript, React.