Building Background Effects for Clips

Last September, Slack released Clips, allowing users to capture video, audio, and screen recordings in messages to help distributed teams connect and share their work. We’ve continued iterating on Clips since its release, adding thumbnail selection, background blur, and most recently, background image replacement. This blog post provides a deep dive into our implementation of…

Albert Xing
8 min readadvanced
--
View Original

Overview

The article discusses the implementation of background effects, specifically background blur and background image replacement, for Slack Clips, utilizing web technologies like WebGL and WebAssembly to ensure performance. It details the processing pipeline, including segmentation models and the challenges faced in achieving high-quality video effects.

What You'll Learn

1

How to implement background effects using WebGL and WebAssembly

2

Why using a worker thread improves video processing performance

3

When to apply bilateral filtering for video segmentation

4

How to utilize the MediaPipe selfie segmentation model for alpha masking

Prerequisites & Requirements

  • Understanding of real-time video processing concepts
  • Familiarity with WebGL and WebAssembly(optional)

Key Questions Answered

How does Slack implement background effects for video clips?
Slack uses a real-time video processing pipeline that leverages WebGL and WebAssembly to apply background blur and image replacement. The process involves reading video frames from webcam and screen captures, applying a segmentation model to create an alpha mask, and then compositing the effects while minimizing latency.
What technologies are used for video processing in Slack Clips?
The article mentions the use of WebGL for rendering graphics and WebAssembly for executing performance-critical code. Additionally, the Media Streams API and Insertable Streams API are utilized for handling video streams, while the MediaPipe selfie segmentation model is employed for generating alpha masks.
What challenges are faced in background image replacement?
Background image replacement requires precise edge detection to avoid ghosting effects, as users expect clear boundaries between the foreground and background. The article discusses the need for careful compositing to ensure that the edges of the segmented person are well-defined and visually appealing.
Why is bilateral filtering applied in the processing pipeline?
Bilateral filtering is used to smooth the edges of the segmentation mask and reduce noise, which helps eliminate artifacts such as choppy edges or haloing around the subject. This preprocessing step is crucial for achieving a more natural appearance in the final video output.

Technologies & Tools

Frontend
Webgl
Used for rendering graphics in the video processing pipeline.
Backend
Webassembly
Used for executing performance-critical code in the video processing.
Frontend
Media Streams API
Used for reading video frames from webcam and screen capture.
Frontend
Insertable Streams API
Exposes media streams as readable streams of video frames.
Machine Learning
Mediapipe
Utilized for the selfie segmentation model to create alpha masks.

Key Actionable Insights

1
Implementing background effects in video applications can significantly enhance user experience by providing a more professional look.
This is especially relevant for remote work tools like Slack Clips, where users want to maintain a polished appearance during video calls.
2
Utilizing worker threads for video processing can reduce latency and improve performance, especially in real-time applications.
This approach minimizes frame drops caused by background activities, ensuring smoother video playback.
3
Applying bilateral filtering can enhance the quality of segmentation masks by preserving edges while reducing noise.
This technique is vital for achieving high-quality visual effects in video processing, particularly in applications that require accurate subject isolation.

Common Pitfalls

1
Relying solely on the segmentation model without preprocessing can lead to poor quality edges and artifacts in the final video.
It's important to apply techniques like bilateral filtering to enhance the segmentation mask before compositing, ensuring a more polished output.
2
Not optimizing the rendering pipeline can result in frame drops and latency issues during video playback.
Utilizing worker threads to handle video processing tasks can mitigate these issues, allowing for smoother performance.