Building world-class product search at Shopify: Where C++ excellence meets ML innovation

Learn how we solved a major search engineering dilemma—running machine learning models at native C++ speed.

Mikhail Shakhray
6 min readintermediate
--
View Original

Overview

The article discusses Shopify's innovative approach to building a high-performance product search engine that integrates Machine Learning (ML) models with C++ speed. It highlights the challenges of modern commerce search and introduces RankFlow, a Domain-Specific Language that allows data scientists to deploy ML models efficiently while maintaining system performance.

What You'll Learn

1

How to deploy ML models trained on billions of queries in minutes using RankFlow

2

Why C++ is essential for achieving low-latency performance in high-volume search applications

3

How to balance rapid ML iteration with high-performance infrastructure in commerce search

Prerequisites & Requirements

  • Understanding of Machine Learning concepts and search algorithms
  • Experience with C++ programming and performance optimization

Key Questions Answered

What are the key components of Shopify's search ranking system?
Shopify's search ranking system combines relevance, purchase popularity, brand trust, and merchant intent to optimize search results. This ensures that results accurately reflect what shoppers are looking for, prioritizing products that convert over mere clicks.
How does RankFlow improve the deployment of ML models?
RankFlow is a Domain-Specific Language that allows data scientists to write Python-like code, enabling them to deploy ranking changes instantly without needing C++ expertise. This eliminates barriers to experimentation, facilitating rapid iteration and innovation in search ranking.
Why did Shopify choose to build its own search engine instead of using off-the-shelf solutions?
Shopify opted to build its own search engine to maintain control over relevance, latency, and costs at scale. Off-the-shelf solutions like Elasticsearch would require significant re-architecture to meet Shopify's unique demands for real-time inventory updates and complex product variants.
What is the machine learning workflow used by Shopify's ML team?
The ML workflow involves training models on historical query data, validating them offline with precision and recall metrics, conducting online A/B testing, deploying the model to production, and using RankFlow for inference at C++ speed. This structured approach ensures effective ranking improvements.

Key Statistics & Figures

Queries served during Black Friday Cyber Monday
billions
Shopify's search system is designed to handle massive query volumes during peak shopping events.
Speedup achieved in ranking feature computation
48%
This improvement was realized during the development of the TurboDSL engine, enhancing overall system performance.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend
C++
Used for performance optimization and handling high query volumes in the search engine.
Machine Learning
Lightgbm
Utilized for training ranking models on historical query data.
Machine Learning
Catboost
Another framework used for training ranking models.

Key Actionable Insights

1
Implementing a Domain-Specific Language like RankFlow can significantly enhance the efficiency of deploying ML models in production.
By allowing data scientists to work with a familiar syntax while leveraging the performance of C++, teams can iterate quickly without sacrificing speed or reliability.
2
Prioritizing purchase popularity in search ranking can lead to higher conversion rates for e-commerce platforms.
This approach ensures that products with proven sales history are highlighted, which can enhance shopper trust and improve overall sales performance.
3
Building a custom search engine tailored to specific business needs can provide better control over performance and relevance.
This is particularly important in high-volume environments where off-the-shelf solutions may not meet the unique demands of real-time inventory and multi-tenant architecture.

Common Pitfalls

1
Relying solely on flexible programming languages for ML iteration can lead to unacceptable latency in high-volume search applications.
While languages like Python enable rapid development, they often introduce performance overhead that can hinder the user experience in real-time applications.
2
Neglecting the importance of performance visibility in code changes can result in unforeseen regressions.
Without automated performance analysis, teams may overlook the impact of their changes, leading to degraded search performance and user dissatisfaction.

Related Concepts

Machine Learning In Search Applications
Performance Optimization Techniques
Domain-specific Languages For Data Science