Challenges and practical lessons from building a deep-learning-based ads CTR prediction model

LinkedIn Engineering Team
11 min readintermediate
--
View Original

Overview

The article discusses the challenges and practical lessons learned from building a deep-learning-based click-through rate (CTR) prediction model for LinkedIn ads. It highlights the transition from a GLMix model to a deep learning architecture, which resulted in an 8.5% increase in CTR, and elaborates on the model's three-tower architecture and the unique challenges faced in its implementation.

What You'll Learn

1

How to implement a three-tower architecture for CTR prediction models

2

Why complete feature interaction is crucial for deep learning models

3

How to perform frequent model retraining to maintain freshness of ad data

4

When to use shallow towers to alleviate over-prediction issues in deep models

Prerequisites & Requirements

  • Understanding of machine learning concepts and deep learning architectures
  • Familiarity with LinkedIn ML frameworks such as GDMix and Lambda Learner(optional)

Key Questions Answered

What are the main challenges in building a deep-learning-based CTR prediction model?
The main challenges include achieving complete feature interaction across member, ad, and context features, ensuring fast memorization of historical performance through frequent retraining, and addressing calibration issues to prevent over-prediction. Each of these challenges is addressed through the model's three-tower architecture.
How does the shallow tower help in calibrating the CTR model?
The shallow tower acts as a linear layer that reduces over-prediction by providing a simpler model alongside the deep tower. This combination helps to balance the predictions, leading to a reduction in over-prediction from 40% to about 10%, thus improving the model's reliability.
What is the significance of frequent retraining in the wide tower?
Frequent retraining of the wide tower ensures that the model remains up-to-date with the latest data, which is crucial for maintaining the accuracy of ad performance predictions. This allows the model to adapt quickly to changes in ad trends and advertiser behavior.
Why is complete feature interaction important for CTR prediction?
Complete feature interaction allows the model to better understand the relationships between different features, such as member profiles, ad content, and context. This leads to more accurate predictions and ultimately enhances the relevance of ads shown to users.

Key Statistics & Figures

CTR improvement
8.5%
This improvement was achieved after transitioning from the GLMix model to the deep-learning-based system.

Technologies & Tools

Machine Learning Framework
Gdmix
Used as part of the infrastructure to support deep learning model serving.
Machine Learning Framework
Lambda Learner
Facilitates nearline learning on data streams for the CTR prediction model.

Key Actionable Insights

1
Implement a three-tower architecture to enhance CTR prediction accuracy.
This architecture allows for the integration of deep learning's feature interaction capabilities with the memorization strengths of linear models, resulting in improved ad relevance.
2
Regularly retrain the wide tower to keep the model's predictions fresh.
Frequent updates based on the latest data ensure that the model adapts to changing ad performance trends, which is crucial for maintaining advertiser satisfaction.
3
Use a shallow tower to mitigate over-prediction issues in deep learning models.
Incorporating a linear model alongside a deep model can help balance predictions, reducing the risk of overcharging advertisers due to inflated CTR predictions.

Common Pitfalls

1
Over-prediction in deep learning models can lead to significant discrepancies in ad pricing.
This occurs when the model's predictions are too optimistic, resulting in advertisers being charged more than they should. To avoid this, it's essential to implement calibration techniques and consider the integration of simpler models.

Related Concepts

Deep Learning
Click-through Rate (ctr) Prediction
Feature Engineering
Model Calibration