Strobelight: A profiling service built on open source technology

We’re sharing details about Strobelight, Meta’s profiling orchestrator. Strobelight combines several technologies, many open source, into a single service that helps engineers at Meta improve effic…

Jordan Rome
15 min readadvanced
--
View Original

Overview

The article discusses Strobelight, Meta's profiling orchestrator that integrates multiple open-source technologies to enhance efficiency and resource utilization across its server fleet. It highlights the significant capacity savings achieved through detailed performance data collection and analysis, empowering engineers to optimize their code and identify bottlenecks.

What You'll Learn

1

How to utilize Strobelight for profiling applications in production environments

2

Why eBPF is crucial for low-overhead data collection in system profiling

3

When to apply continuous profiling to optimize resource usage

Prerequisites & Requirements

  • Familiarity with performance profiling concepts
  • Basic understanding of eBPF and its applications(optional)

Key Questions Answered

What is Strobelight and how does it improve efficiency at Meta?
Strobelight is a profiling orchestrator that integrates various open-source technologies to collect detailed performance metrics from production hosts at Meta. It helps engineers identify performance bottlenecks and optimize code, leading to significant resource savings, such as an estimated 15,000 servers' worth of annual capacity savings.
How does Strobelight enable engineers to identify performance issues before they reach production?
By combining performance data with existing tools, Strobelight allows engineers to analyze code changes and estimate their impact on compute costs. This proactive approach helps catch inefficiencies early, preventing costly issues when services handle millions of requests per minute.
What types of profilers does Strobelight offer?
Strobelight includes 42 different profilers, such as memory profilers powered by jemalloc, function call count profilers, event-based profilers for various languages, and AI/GPU profilers. These tools help engineers collect data on resource usage and optimize their services.
How does Strobelight handle data normalization for profiling across different hosts?
Strobelight adjusts the weight of profile samples based on the run probability and sampling rate, ensuring that data can be accurately aggregated and compared across different hosts and services. This normalization prevents bias in performance analysis.

Key Statistics & Figures

Annual capacity savings
15,000 servers
This figure represents the estimated capacity savings achieved through the use of Strobelight's profiling capabilities.
Reduction in CPU cycles
up to 20%
This reduction is observed in some of Meta's largest services due to optimizations made possible by data collected from the last branch record profiler.

Technologies & Tools

Backend
Ebpf
Used for low-overhead data collection in system profiling.
Memory Management
Jemalloc
Powering memory profilers within Strobelight.
Data Visualization
Scuba
Used for querying and visualizing profiling data collected by Strobelight.
Profiling Tool
Bpftrace
Allows engineers to write custom eBPF programs for specific profiling needs.

Key Actionable Insights

1
Leverage Strobelight's command line tool or web UI to collect profiling data on demand.
This allows engineers to quickly identify performance bottlenecks and optimize their code in real-time, enhancing overall system efficiency.
2
Utilize eBPF profilers to gather low-overhead performance data without impacting system performance.
eBPF enables safe code injection into the Linux kernel, allowing for efficient data collection and analysis, which is crucial for maintaining high performance in production environments.
3
Implement continuous profiling to automatically collect performance data for all Meta services.
This proactive approach ensures that engineers have access to vital performance metrics without manual intervention, facilitating quicker identification of issues.

Common Pitfalls

1
Overlooking the impact of profiling on system performance can lead to degraded service quality.
Engineers must be cautious when enabling multiple profilers simultaneously, as this can cause resource contention and affect the performance of the services being monitored.

Related Concepts

Performance Profiling
Resource Optimization Techniques
Ebpf Applications In System Monitoring