Enhancing the Apparel Shopping Experience with AI, Emoji-Aware OCR, and Snapchat’s Screenshop

Ever spotted someone in a photo wearing a cool shirt or some unique apparel and wondered where they got it? How much did it cost? Maybe you’ve even thought…

Amr Elmeleegy
7 min readintermediate
--
View Original

Overview

The article discusses how Snap's ML engineering team enhanced the apparel shopping experience using AI, specifically through the Screenshop service integrated into Snapchat. It highlights the transition to NVIDIA Triton Inference Server for improved model serving and the use of NVIDIA TensorRT for optimizing inference performance.

What You'll Learn

1

How to utilize NVIDIA Triton Inference Server for serving multiple AI models

2

Why using TensorRT can optimize AI model inference on NVIDIA GPUs

3

How to implement Model Ensembles in Triton for streamlined AI pipelines

Prerequisites & Requirements

  • Understanding of AI model serving and inference concepts
  • Familiarity with NVIDIA Triton Inference Server and TensorRT(optional)

Key Questions Answered

How does Screenshop use AI to enhance the shopping experience?
Screenshop employs AI to identify clothing items in images and recommend similar items available for purchase. It uses an object detection model to recognize clothing and a fashion embeddings model for similarity search, providing users with relevant shopping options.
What challenges did Snap face with a multiframework AI pipeline?
Snap's ML team faced difficulties in managing multiple AI models across different frameworks, leading to the need for a unified inference serving platform. They adopted NVIDIA Triton Inference Server to streamline model serving and reduce operational complexity.
What performance improvements were achieved using NVIDIA TensorRT?
By implementing NVIDIA TensorRT, Snap's team reduced model precision from FP32 to FP16, achieving a 3x increase in throughput and a 66% reduction in operational costs, significantly enhancing the efficiency of the Screenshop service.
How did Snap scale their services to handle a growing user base?
Snap scaled their OCR models using NVIDIA Triton Inference Server across over 1,000 NVIDIA T4 and L4 GPUs, ensuring efficient service delivery to their user base exceeding 800 million. This scalability was crucial for handling increased demand for AI-enabled services.

Key Statistics & Figures

Throughput increase
3x
Achieved by using NVIDIA TensorRT to optimize model inference.
Cost reduction
66%
Realized through the application of TensorRT's optimization features.
GPU scaling
1,000 GPUs
Used to meet the demands of Snapchat's growing user base.
User base
800 million
The number of Snapchat users relying on AI-enabled services.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend
Nvidia Triton Inference Server
Used for serving multiple AI models in a unified platform.
Backend
Nvidia Tensorrt
Optimizes AI model inference on NVIDIA GPUs.
Backend
Tensorflow
Initially used for developing deep learning models for Screenshop.
Backend
Pytorch
Used for an alternative fashion embeddings model to enhance accuracy.

Key Actionable Insights

1
Leverage NVIDIA Triton Inference Server to streamline your AI model serving process.
Using Triton's framework-agnostic design can significantly reduce the complexity of managing multiple AI models, allowing for more efficient updates and lower operational costs.
2
Implement TensorRT to optimize your AI models for better performance and cost efficiency.
By reducing model precision without sacrificing quality, you can achieve substantial improvements in throughput and reductions in costs, which is critical for high-demand applications.
3
Utilize Model Ensembles in Triton to create efficient AI pipelines without extensive coding.
This approach allows for the integration of pre- and post-processing steps seamlessly, reducing latency and improving overall system performance.

Common Pitfalls

1
Failing to manage multiple AI models across different frameworks can lead to operational complexity.
This often results in increased costs and resource allocation for maintaining separate serving platforms, which can be avoided by adopting a unified solution like NVIDIA Triton.

Related Concepts

AI Model Serving
Machine Learning Optimization
Deep Learning Frameworks
Scalability In AI Applications