2018 @Scale Conference recap

Meta

The @Scale Conference is an invitation-only technical event for engineers who work on large-scale platforms and technologies. This year’s event took place on September 13 at the San Jose Conv…

Overview

The 2018 @Scale Conference brought together over 2,500 engineers to explore the challenges and innovations in building scalable applications and services. Keynotes from industry leaders highlighted advancements in AI infrastructure, machine learning platforms, and distributed databases, showcasing how these technologies are evolving to meet the demands of large-scale systems.

What You'll Learn

1

How to build and deploy machine learning solutions at scale using Uber's Michelangelo platform

2

Why resource management is critical for SQL analytics in large-scale data warehouses

3

How to utilize Presto for fast SQL analytics over big data

4

When to apply geo-replication and geo-partitioning in distributed databases

5

How to implement automated fault-finding in mobile applications using Sapienz

Key Questions Answered

What are the key features of NVIDIA's AI infrastructure for self-driving cars?

NVIDIA's AI infrastructure for self-driving cars focuses on deep learning, simulation, and validation at scale, enabling the processing of terabytes of data per hour from multiple cars. It addresses challenges in training and testing across diverse scenarios, ensuring safety through extensive simulations and real-world testing.

How does Presto enhance SQL analytics at Facebook?

Presto is an open-source distributed SQL query engine that allows data engineers and scientists at Facebook to perform fast SQL analytics over large datasets. It supports interactive analytics and custom workloads, significantly improving the efficiency of data processing and reducing latency.

What challenges does CockroachDB address for global applications?

CockroachDB tackles high latency and data sovereignty issues by utilizing geo-replication and geo-partitioning, allowing data to remain close to users while ensuring strong consistency. This design helps businesses serve global customers more effectively without compromising performance.

What is the significance of the Glow compiler in AI infrastructure?

The Glow compiler is significant as it allows hardware partners to optimize silicon products for AI and machine learning workloads. By providing an open-source framework, it helps in addressing the bottlenecks caused by increasing AI workloads, enabling faster and more efficient hardware design.

Key Statistics & Figures

Number of attendees at the @Scale Conference

2,500

This number reflects the growing interest and participation in discussions around scalable technologies and applications.

Data produced by each car in NVIDIA's AI infrastructure

terabytes per hour

This highlights the massive scale of data processing required for training self-driving AI systems.

Size of data warehouses managed by Apache Impala

up to 40 PB

This demonstrates the capability of Impala to handle large-scale data analytics efficiently.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Database

Presto

Used for fast SQL analytics over big data at Facebook.

Database

Cockroachdb

Distributed SQL database that addresses global scale challenges.

Machine Learning Platform

Michelangelo

Uber's platform for building and deploying machine learning solutions.

Compiler

Glow

Community-driven compiler for optimizing AI and ML workloads.

Key Actionable Insights

1
Implementing a scalable machine learning platform like Michelangelo can streamline the development and deployment of ML models across teams.
This approach allows data scientists and engineers to focus on building effective models without getting bogged down by infrastructure concerns, thus accelerating innovation and improving productivity.

2
Utilizing Presto for SQL analytics can significantly reduce query times and improve data accessibility for large datasets.
By leveraging Presto's capabilities, organizations can enhance their data analysis processes, enabling faster decision-making and more responsive business operations.

3
Adopting automated fault-finding tools like Sapienz can enhance the reliability of mobile applications.
This proactive approach to testing can help identify and resolve issues before they impact users, leading to better app performance and user satisfaction.

Common Pitfalls

1

Failing to account for the complexities of scaling machine learning systems can lead to performance bottlenecks.

Many organizations underestimate the infrastructure requirements for handling large datasets and concurrent processing, which can result in slow model training and deployment.

2

Neglecting data privacy and compliance when deploying AI services can lead to significant legal issues.

As AI applications often handle sensitive data, it's crucial to implement robust privacy measures and adhere to regulations to avoid potential penalties.

Related Concepts

Scalable Machine Learning Platforms

Distributed Databases

AI Infrastructure Challenges

Performance Optimization Techniques