Open Sourcing Photon ML

LinkedIn Engineering Team
7 min readintermediate
--
View Original

Overview

The article discusses the open sourcing of Photon ML, a machine learning library developed by LinkedIn that integrates with Apache Spark. It highlights the library's capabilities in supporting large-scale regression and its potential impact on machine learning practices across various fields.

What You'll Learn

1

How to utilize Photon ML for large-scale regression tasks

2

Why integrating Photon ML with Apache Spark enhances model training speed

3

When to apply generalized additive mixed effect models in machine learning

Prerequisites & Requirements

  • Understanding of machine learning concepts and regression techniques
  • Familiarity with Apache Spark and Hadoop ecosystems(optional)

Key Questions Answered

What is Photon ML and how is it used at LinkedIn?
Photon ML is a machine learning library developed by LinkedIn that integrates with Apache Spark to facilitate large-scale regression tasks. It provides tools for model training, diagnostics, and supports various regression types, enhancing the efficiency of machine learning workflows.
What are the performance improvements achieved by using Photon ML?
Switching from Hadoop MapReduce to Spark on Yarn has resulted in a 10-30x increase in the speed of model training for LinkedIn. This significant performance boost allows for faster iterations and more efficient model development.
How does Photon ML support generalized additive mixed effect models?
Photon ML includes an experimental implementation of generalized additive mixed effect models (GAME), which allows for the modeling of fixed and random effects. This approach helps to better capture the complexity of user interactions in recommendation systems.
What impact does Photon ML have on job recommendations at LinkedIn?
Initial A/B tests have shown that GLMix models trained using Photon ML improved job recommendations by 15 to 30 percent in job applications and enhanced email article recommendations by 10 to 20 percent based on clickthrough rates.

Key Statistics & Figures

Speed increase in model training
10-30x
This improvement was achieved by switching workflows from Hadoop MapReduce to Spark on Yarn.
Improvement in job recommendations
15-30%
This improvement was observed in job applications using GLMix models trained with Photon ML.
Improvement in email article recommendations
10-20%
This increase was based on clickthrough rates from recommendations generated by Photon ML.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend
Apache Spark
Photon ML is built to integrate with Apache Spark for processing large datasets efficiently.
Backend
Hadoop
Photon ML runs on a cluster that supports both Spark and Hadoop Map/Reduce applications.

Key Actionable Insights

1
Leverage Photon ML for building scalable machine learning models that can handle large datasets efficiently.
Using Photon ML allows engineers to create high-quality models that can be deployed quickly, making it suitable for applications requiring real-time data processing and analysis.
2
Incorporate generalized additive mixed effect models in your machine learning workflows to improve recommendation systems.
These models can provide a more nuanced understanding of user behavior by capturing random effects, which can lead to better personalization in applications.
3
Utilize the model diagnostics features of Photon ML to refine and optimize your machine learning models.
The ability to generate charts and tables for model diagnostics can help identify issues in model fit and guide improvements, enhancing overall model performance.

Common Pitfalls

1
Overlooking the importance of model diagnostics can lead to poorly performing models.
Without proper diagnostics, engineers may miss critical insights about model fit and performance, resulting in suboptimal recommendations.

Related Concepts

Machine Learning
Data Science
Open Source Software
Recommendation Systems