Scaling Machine Learning Productivity at LinkedIn

Joel Young
11 min readadvanced
--
View Original

Overview

The article discusses LinkedIn's initiative to scale machine learning productivity through the Pro-ML program, which aims to enhance the effectiveness of machine learning engineers and democratize access to AI tools across the organization. It outlines the challenges faced with disparate AI systems and the structured approach taken to streamline machine learning processes.

What You'll Learn

1

How to leverage existing machine learning components to improve productivity

2

Why a domain-specific language (DSL) can enhance model training and evaluation

3

How to implement a feature marketplace for better data management

Prerequisites & Requirements

  • Understanding of machine learning concepts and algorithms
  • Familiarity with tools like Hadoop, Spark, and Azkaban(optional)

Key Questions Answered

What is the Pro-ML initiative at LinkedIn?
Pro-ML, or Productive Machine Learning, is LinkedIn's program aimed at doubling the effectiveness of machine learning engineers while making AI tools accessible to engineers across the organization. It focuses on streamlining machine learning processes and improving collaboration among teams.
How does LinkedIn ensure the health of machine learning models in production?
LinkedIn employs a health assurance layer that includes automated services to validate that online and offline features are statistically similar. It also monitors model behavior to ensure it aligns with expected outcomes, allowing engineers to diagnose issues when anomalies are detected.
What are the key stages in the Pro-ML life cycle?
The Pro-ML life cycle includes stages such as exploring and authoring, training, deploying, running, health assurance, and managing a feature marketplace. Each stage is designed to enhance the efficiency and effectiveness of machine learning processes at LinkedIn.
What organizational structure supports the Pro-ML initiative?
The Pro-ML initiative is organized around five main pillars that align AI teams with product teams while maintaining a reporting relationship to the parent AI organization. This structure promotes collaboration and best practice sharing among engineers working on similar challenges.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend
Hadoop
Used for offline training of machine learning models.
Backend
Spark
Utilized for running actual training processes.
Backend
Azkaban
Employed for managing and scheduling training jobs.

Key Actionable Insights

1
Implement a domain-specific language (DSL) for your machine learning projects to streamline feature transformation and model evaluation.
Using a DSL can help standardize the process of capturing input features and their transformations, making it easier for engineers to collaborate and iterate on models.
2
Create a centralized feature marketplace to manage and monitor the vast array of features used in machine learning models.
A feature marketplace allows teams to discover and utilize existing features efficiently, reducing redundancy and improving model performance.
3
Adopt an agile-inspired strategy in your machine learning initiatives to ensure that each step delivers tangible value.
This approach helps prioritize improvements that benefit product lines and encourages continuous feedback and adaptation in the development process.

Common Pitfalls

1
Failing to validate that online and offline features are statistically similar can lead to discrepancies in model performance.
This often occurs when teams do not have a robust health assurance process in place, making it difficult to diagnose issues when models perform unexpectedly.

Related Concepts

Machine Learning
Artificial Intelligence
Feature Engineering
Model Deployment Strategies