Performance @Scale 2019 recap

Marty Greenia

Developing apps and services that scale to millions or billions of people can present uniquely complex performance challenges. Optimizing infrastructure, scaling web services, and developing fast m…

Overview

The article recaps the Performance @Scale 2019 event, where industry leaders from Facebook, Google, and NVIDIA discussed performance challenges and solutions for large-scale systems. Key topics included optimizing infrastructure, scaling web services, and enhancing mobile app performance.

What You'll Learn

1

How to analyze performance inefficiencies in AI workloads

2

Why scaling machine learning models on TPUs can reduce training time

3

How to optimize perceived performance using behavioral analytics

4

When to apply new web technologies for better performance

Prerequisites & Requirements

Understanding of performance optimization techniques
Experience with AI/ML workloads and performance analysis(optional)

Key Questions Answered

What are the main performance challenges in scaling Facebook apps?

The main challenges include optimizing app size, startup times, and crash rates. Facebook's product and central performance teams collaborate to improve over 150 metrics across various apps and platforms, ensuring scalability and performance.

How do Tensor Processing Units (TPUs) improve machine learning training times?

TPUs, specifically the TPU v3 Pod, offer over 100 PFLOPs of compute, which dramatically reduces the training time for machine learning models. This scalability allows for more efficient processing of complex models.

What techniques are used to scale deep learning workloads on GPUs?

Techniques for scaling deep learning workloads on GPUs include optimizing training processes and using tools like TensorRT Inference Server for deploying and balancing load during inference, which is crucial for achieving state-of-the-art results.

How does Bing optimize perceived performance using behavioral analytics?

Bing leverages behavioral analytics to identify usability bottlenecks and optimize perceived performance. This involves conducting performance experiments and learning from both successful and failed initiatives to enhance user experience.

Key Statistics & Figures

Compute power of TPU v3 Pod

over 100 PFLOPs

This level of compute power leads to significant reductions in training time for machine learning models.

Number of metrics scaled across Facebook apps

more than 150

This extensive metric tracking is part of Facebook's effort to improve app performance and user experience.

Technologies & Tools

Hardware

Tensor Processing Units

Used for accelerating machine learning workloads.

Hardware

Gpus

Utilized for running deep learning training at scale.

Web Technology

Isinputpending() API

A new API aimed at improving web app performance.

Key Actionable Insights

1
Implement a systematic approach to analyze AI workloads for performance inefficiencies.
By utilizing a top-down methodology, engineers can uncover inefficiencies and optimize their code, which is essential for maintaining performance in production environments serving billions of users.

2
Adopt Tensor Processing Units (TPUs) to significantly reduce machine learning model training times.
Using TPUs can lead to dramatic improvements in training efficiency, making them a valuable asset for teams working on large-scale machine learning projects.

3
Utilize behavioral analytics to enhance the perceived performance of applications.
This approach allows developers to identify and address usability issues, ultimately leading to a better user experience and higher satisfaction.

4
Contribute to open source browser projects to improve web app performance.
By participating in open source initiatives, developers can help bridge the performance gap between web and native applications, enabling the creation of more sophisticated web apps.

Common Pitfalls

1

Failing to optimize function ordering in iOS binaries can lead to increased page faults.

This issue arises from poor binary layout, which can significantly degrade startup performance. Developers should use order files to direct the linker for better function ordering.

Related Concepts

Performance Optimization Techniques

Machine Learning Model Scalability

Behavioral Analytics In Application Performance