DragonCrawl: Generative AI for High-Quality Mobile Testing

Juan Marcano, Mengdie Zhang, Ali Zamani, Anam Hira

Uber

•

Juan Marcano, Mengdie Zhang, Ali Zamani, Anam Hira

•18 min read•advanced•

--

•View Original

EmbeddingGenerative AIGPTLarge Language ModelsMachine LearningRoBERTaT5Transformer

Overview

The article discusses DragonCrawl, a generative AI system developed by Uber to enhance mobile testing by mimicking human-like interactions with applications. It highlights the challenges of traditional mobile testing and how DragonCrawl addresses these issues through the use of large language models (LLMs), ultimately improving testing efficiency and quality.

What You'll Learn

1

How to implement generative AI for mobile testing using large language models

2

Why traditional mobile testing methods are inefficient and costly

3

How to evaluate the performance of language models in testing scenarios

Prerequisites & Requirements

Understanding of mobile application testing concepts
Familiarity with large language models and their applications

Key Questions Answered

How does DragonCrawl improve mobile testing efficiency?

DragonCrawl uses large language models to simulate human-like interactions with mobile applications, allowing it to adapt to UI changes without requiring extensive manual updates. This significantly reduces the time developers spend on maintenance and enhances the scalability of testing across different languages and cities.

What challenges did Uber face in developing DragonCrawl?

Uber encountered several challenges, including setting up GPS locations for testing, managing adversarial cases where the model made unexpected choices, and addressing hallucinations in the model's outputs. These challenges required innovative solutions to ensure reliable testing outcomes.

What are the key benefits of using DragonCrawl?

DragonCrawl offers high stability in executing tests, requires no manual maintenance, and demonstrates high reusability across different cities. It has successfully completed tests in 85 out of 89 evaluated cities, showcasing its adaptability and efficiency in mobile testing.

Key Statistics & Figures

Stability of DragonCrawl

99%+ stability

This stability was observed during the execution of core-trip flows in November and December 2023.

Cities successfully tested

85 out of 89

DragonCrawl successfully requested and completed trips in 85 of the 89 evaluated cities.

Time spent on maintenance by traditional testing methods

hundreds of hours

Traditional testing methods required extensive manual updates, which DragonCrawl has eliminated.

Technologies & Tools

Backend

Mpnet

Used as the underlying model for DragonCrawl to enhance language understanding and testing capabilities.

Key Actionable Insights

1
Implementing DragonCrawl can drastically reduce testing maintenance costs and improve efficiency.
By leveraging AI to automate mobile testing, teams can focus on developing new features rather than spending time on manual test updates, ultimately enhancing productivity.

2
Utilizing smaller language models can lead to more stable and reliable testing outcomes.
The article highlights that smaller models like MPNet provide high-quality embeddings while minimizing complexity, making them suitable for real-time testing applications.

3
Adversarial training can help mitigate the risks associated with model hallucinations.
By preparing for adversarial cases, teams can enhance the robustness of their testing frameworks, ensuring that unexpected model behaviors do not compromise testing integrity.

Common Pitfalls

1

Relying solely on traditional testing methods can lead to high maintenance costs and inefficiencies.

Many teams face challenges in keeping test scripts updated with frequent UI changes, which can consume significant developer time and resources.

2

Neglecting adversarial cases can result in unexpected model behaviors during testing.

Without addressing adversarial cases, models may make choices that are not aligned with user expectations, leading to unreliable testing outcomes.

Related Concepts

Generative AI In Software Testing

Large Language Models And Their Applications

Mobile Application Testing Best Practices