Horizon: The first open source reinforcement learning platform for large-scale products and services

An end-to-end platform built on PyTorch 1.0 that is designed to jump start RL’s transition from research papers to production

Jason Gauci
12 min readintermediate
--
View Original

Overview

Horizon is the first open source end-to-end platform that employs applied reinforcement learning (RL) to optimize systems in large-scale production environments. Developed by Facebook, it bridges the gap between RL research and practical applications, demonstrating significant improvements in various internal applications.

What You'll Learn

1

How to implement reinforcement learning models for large-scale applications

2

Why Horizon is significant for bridging RL research and production use cases

3

How to preprocess data for reinforcement learning using Apache Spark

4

When to apply counterfactual policy evaluation in RL systems

Prerequisites & Requirements

  • Understanding of reinforcement learning concepts
  • Familiarity with PyTorch and Apache Spark(optional)

Key Questions Answered

What is Horizon and how does it optimize large-scale systems?
Horizon is an open source reinforcement learning platform developed by Facebook that optimizes systems in large-scale production environments. It uses RL to make decisions and adapt based on feedback, improving applications like video streaming and notifications.
How does Horizon handle data preprocessing for RL?
Horizon preprocesses state and action features in parallel using Apache Spark, which allows it to handle large datasets effectively. This preprocessing is crucial for training RL models that are sensitive to noisy and unnormalized data.
What impact has Horizon had on Facebook's applications?
Horizon has improved various applications at Facebook, including optimizing streaming video quality and personalizing notifications. It uses real-time feedback to enhance user experience, demonstrating the practical benefits of RL in production.
What are the components of Horizon's pipeline?
Horizon’s pipeline consists of three main components: timeline generation across thousands of CPUs, training on many GPUs, and serving across thousands of machines. This architecture enables it to scale effectively with Facebook's data sets.

Key Statistics & Figures

Performance improvement in notifications
Improvement in relevance without increasing the total number of notifications sent out
This was achieved after replacing the previous supervised learning-based system with Horizon's RL-enabled version.
Video quality optimization
Real-time optimization of bit rate parameters for 360-degree video
Horizon adjusts video quality based on available bandwidth and buffered video, enhancing user experience.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Utilize Horizon to implement RL in your production systems to enhance decision-making processes.
By leveraging Horizon, engineers can create systems that adapt in real-time to user feedback, significantly improving user engagement and satisfaction.
2
Incorporate counterfactual policy evaluation to assess the performance of RL models before deployment.
This technique allows for safer deployment of models by providing insights into potential performance, reducing the risk of negative impacts on users.
3
Preprocess your data using Apache Spark to ensure that your RL models are trained on clean and normalized datasets.
Effective data preprocessing is essential for the success of RL applications, as models are sensitive to noisy and unnormalized data.

Common Pitfalls

1
Neglecting the importance of data preprocessing can lead to poor model performance.
Many engineers underestimate how sensitive RL models are to data quality. Ensuring that data is clean and normalized is crucial for effective training and deployment.
2
Relying solely on online training for RL models can be risky in production environments.
Given the scale and impact of systems like those at Facebook, starting with a designed policy and using offline training methods is essential to mitigate risks associated with randomness in decision-making.

Related Concepts

Reinforcement Learning
Machine Learning Applications
Data Preprocessing Techniques
Counterfactual Policy Evaluation