LiteRT: Maximum performance, simplified

Mogan Shieh, Terry (Woncheol) Heo, Jingjiang Li

LiteRT has been improved to boost AI model performance and efficiency on mobile devices by effectively utilizing GPUs and NPUs, now requiring significantly less code, enabling simplified hardware accelerator selection, and more for optimal on-device performance.

Google

•

Mogan Shieh, Terry (Woncheol) Heo, Jingjiang Li

•7 min read•intermediate•

--

•View Original

Transformer

Overview

LiteRT is a new API designed to simplify and enhance AI model performance on mobile devices by leveraging GPU and NPU acceleration. The article discusses the improvements made to LiteRT, including better data organization, workgroup optimization, and advanced inference features that significantly increase performance while reducing power consumption.

What You'll Learn

1

How to accelerate AI models using LiteRT on mobile devices

2

Why using NPUs can improve AI model performance by up to 25x

3

How to implement asynchronous execution for better resource utilization

Key Questions Answered

What improvements have been made to LiteRT for GPU acceleration?

The latest version of LiteRT introduces MLDrift, which enhances GPU acceleration through smarter data organization, workgroup optimization, and improved data handling, resulting in faster performance for AI models compared to previous versions and CPUs.

How does LiteRT simplify NPU support for developers?

LiteRT provides a uniform way to develop and deploy models on NPUs, abstracting complexities associated with vendor-specific SDKs and allowing automatic downloading of necessary components when installing the LiteRT package.

What is the benefit of asynchronous execution in LiteRT?

Asynchronous execution allows different parts of an AI model to run concurrently across CPU, GPU, and NPUs, improving efficiency and responsiveness by leveraging idle compute cycles and minimizing latency in real-time applications.

Key Statistics & Figures

Performance improvement with NPU acceleration

up to 25x faster

Compared to CPU performance in internal testing

Power efficiency with NPU acceleration

up to 5x more efficient

Compared to CPU power consumption in internal testing

Technologies & Tools

API

Litert

Used for accelerating AI model inference on mobile devices

Hardware

Npu

Used for efficient AI model execution on mobile devices

Hardware

GPU

Used for accelerating AI model computations

Key Actionable Insights

1
Leverage the new LiteRT API to simplify the deployment of AI models on mobile devices.
By using LiteRT, developers can avoid the complexities of vendor-specific SDKs, making it easier to implement high-performance AI applications.

2
Utilize MLDrift for optimizing GPU performance in AI models.
Implementing the smarter data organization and workgroup optimization features can lead to significant performance improvements, particularly for larger models.

3
Adopt asynchronous execution techniques to enhance application responsiveness.
This approach allows for better resource utilization and can significantly reduce latency in applications requiring real-time AI interactions.

Common Pitfalls

1

Failing to properly manage data transfer between CPU and GPU can lead to performance bottlenecks.

Developers should utilize the new TensorBuffer API to minimize unnecessary CPU overhead and optimize data handling for better performance.

Related Concepts

AI Model Acceleration

Mobile AI Applications

GPU And Npu Integration