@Scale 2014: Recap of Data Track

Visit the post for more.

Sambavi Muthukrishnan
5 min readadvanced
--
View Original

Overview

The @Scale 2014 conference's Data track focused on the challenges of building scalable data processing systems for applications serving hundreds of millions of users. Presentations from leading companies like Facebook, Netflix, and YouTube highlighted innovative solutions and frameworks for mobile analytics, caching systems, and data architecture.

What You'll Learn

1

How to implement a real-time analytics pipeline for mobile applications

2

Why using AWS S3 as a central data hub is beneficial for cloud data infrastructure

3

How to build scalable caching systems using mcrouter

4

When to use a graph data model for storage solutions

5

How to conduct real-time A/B testing using automated analysis tools

Key Questions Answered

How does Facebook handle mobile analytics differently than web analytics?
Facebook's approach to mobile analytics involves combining limited releases with a real-time analytics pipeline to address the unique challenges of collecting data from mobile devices. This method allows for rapid iteration and ensures that the mobile experience is stable and engaging.
What is the role of S3 in Netflix's data platform?
S3 serves as the central data hub for Netflix's data infrastructure, where virtually all data is stored and managed in the AWS cloud. This setup allows various technologies to work together efficiently, supporting Netflix's extensive data processing needs.
What challenges does Box face with third-party API calls?
Box handles over 6 billion third-party API calls per month, which presents challenges related to system availability and consistency. They have developed solutions like Tron and Credence to address these issues and ensure a robust infrastructure.
What are the key features of Vitess for YouTube's backend?
Vitess is an open-source storage solution developed by YouTube to manage data as the service scales. Key features include a lock server for global information and vttablet, which enhances MySQL performance by rewriting problematic queries.

Key Statistics & Figures

Operations per second processed by Facebook's caching infrastructure
4 billion
This statistic highlights the scale at which Facebook's caching system operates, underscoring the importance of efficient caching solutions.
Monthly third-party API calls handled by Box
6 billion
This figure illustrates the high demand and complexity of managing API interactions in a cloud-based enterprise environment.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Cloud Infrastructure
AWS
Used by Netflix for data platform architecture.
Storage
S3
Central data hub for Netflix's data infrastructure.
Caching
Mcrouter
A memcached protocol router for scaling cache deployments at Facebook.
Database
Mysql
Used in Facebook's Iris system for messaging.
Database
Vitess
Scalable storage solution developed by YouTube based on MySQL.

Key Actionable Insights

1
Implementing a real-time analytics pipeline can significantly enhance mobile application performance.
By adopting a real-time analytics approach, teams can quickly identify and address issues, leading to a more stable and engaging user experience.
2
Utilizing S3 for data storage can streamline data management in cloud environments.
S3's central role in Netflix's data architecture demonstrates how cloud storage solutions can facilitate efficient data processing and integration across various applications.
3
Building scalable caching systems is crucial for handling high traffic volumes.
Facebook's mcrouter showcases how effective caching strategies can manage billions of operations per second, ensuring system reliability and performance.
4
A/B testing tools like Deltoid can provide immediate insights into user interactions.
Real-time analysis of A/B tests allows companies to make informed decisions quickly, optimizing user experience and product features.

Common Pitfalls

1
Failing to adapt analytics tools for mobile can lead to missed insights.
Many companies overlook the unique challenges of mobile data collection, which can result in a lack of real-time feedback and slower iteration cycles.

Related Concepts

Mobile Analytics Frameworks
Data Processing Architectures
Caching Strategies
A/B Testing Methodologies