Visit the post for more.
Overview
The article discusses the development of CT-Scan, a performance monitoring and prediction platform at Facebook aimed at reducing regressions in mobile applications. It outlines the challenges faced in maintaining app performance and details the design principles and methodologies implemented to ensure efficient performance testing throughout the development life cycle.
What You'll Learn
1
How to implement performance monitoring in mobile applications
2
Why statistical analysis is crucial for detecting performance regressions
3
When to use lab vs. real-world environments for performance testing
Prerequisites & Requirements
- Understanding of mobile app performance metrics
- Familiarity with A/B testing methodologies(optional)
Key Questions Answered
How does Facebook's CT-Scan help in performance monitoring?
CT-Scan is a performance monitoring and prediction platform that helps Facebook engineers understand the performance implications of code changes. It captures important metrics, performs statistical analyses, and uses machine learning techniques to predict potential performance issues, ultimately reducing regressions in app performance.
What are the main challenges in mobile app performance at Facebook?
Facebook faces challenges such as thousands of code changes weekly, the complexity of the app leading to performance regressions, and the need to maintain app speed, data usage, and battery efficiency while iterating rapidly on development.
What is the process of continuous experiments in staging?
In staging, continuous experiments involve running tests on all revisions whenever a change is checked into the main branch. This helps identify performance changes by analyzing various interactions under different configurations, although it is more efficient to run these tests every N diffs instead of on every single revision.
How does Facebook ensure minimal impact during real-world performance sampling?
Facebook employs statistical significance by dynamically determining sample sizes for performance counters. They randomly sample performance data at low frequencies, such as one out of every 1,000 interactions, ensuring minimal impact on user experience and data usage.
Key Statistics & Figures
Number of continuous experiments run
hundreds of thousands
During the last six months of 2014, CT-Scan ran hundreds of thousands of continuous experiments to monitor performance.
Number of diagnostic and profiling runs
thousands
CT-Scan performed thousands of diagnostic and profiling runs to prevent regressions.
Time to feedback for performance experiments
30 to 60 minutes
The system provides feedback to engineers within 30 to 60 minutes after running performance experiments.
Technologies & Tools
Version Control
Git
Used for specifying revisions in the performance monitoring system.
Version Control
Mercurial
Supported by the performance monitoring system for revision specification.
Data Collection
Scribe
Leveraged for collecting performance data in real-time.
Data Analysis
Scuba
Used for analyzing performance data collected from user devices.
Key Actionable Insights
1Implement a performance monitoring system like CT-Scan to proactively identify regressions in your mobile applications.By using a structured approach to monitor performance metrics and predict potential issues, you can maintain app performance and enhance user experience.
2Utilize both lab and real-world environments for performance testing to capture a comprehensive view of app behavior.Lab environments allow for controlled testing, while real-world data provides insights into actual user interactions, helping to identify issues that may not appear in a lab setting.
3Incorporate statistical analysis techniques to interpret performance data effectively.Using distributions and visualizations can provide clearer insights into performance changes rather than relying solely on averages, which may be misleading.
Common Pitfalls
1
Relying solely on lab environments for performance testing can lead to missing critical issues that only appear in real-world conditions.
Lab tests may not accurately mimic user interactions, so it's essential to balance lab testing with real-world data collection to ensure comprehensive performance monitoring.