GraphQL Data Mocking at Scale with LLMs and @generateMock

Michael Rebello

How Airbnb combines GraphQL infra, product context, and LLMs to generate and maintain convincing, type-safe mock data using a new directive.

Airbnb

•

Michael Rebello

•13 min read•intermediate•

--

•View Original

GeminiGraphQLJSONKotlinSwiftTypeScript

Overview

Airbnb built a system that combines GraphQL infrastructure, product context, and LLMs to automatically generate and maintain realistic, type-safe mock data using a custom @generateMock directive. The solution eliminates manual mock creation, enables client engineers to develop without waiting for backend implementations, and keeps mocks synchronized with evolving GraphQL queries through hash-based version tracking.

What You'll Learn

1

How to use LLMs with GraphQL schema context to generate realistic, type-safe mock data automatically

2

How to design a directive-based approach (@generateMock, @respondWithMock) for seamless mock generation in existing developer workflows

3

How to validate LLM-generated mock data against a GraphQL schema using a self-healing retry mechanism

4

How to keep mock data synchronized with evolving GraphQL queries using hash-based version tracking

5

How to enable client-server parallel development by combining production and mock data at the field level

Prerequisites & Requirements

Understanding of GraphQL queries, fragments, directives, and schema definitions
Familiarity with mock data patterns for testing and prototyping
Basic understanding of LLMs and prompt engineering concepts
Experience with GraphQL code generation tools and client-server development workflows(optional)

Key Questions Answered

How does Airbnb use LLMs to generate realistic GraphQL mock data?

Airbnb's Niobe CLI collects rich context including the GraphQL query definitions, a pruned subset of the schema with inline documentation, design mockup images, developer-provided hints, platform information, and a curated list of valid image URLs. This context is consolidated into a prompt sent to Gemini 2.5 Pro, which generates JSON mock data that is then validated against the GraphQL schema using the graphqlSync function.

What is the @generateMock directive and how does it work?

@generateMock is a client-side GraphQL directive that engineers add to any operation, fragment, or field to trigger automatic mock data generation. It accepts optional arguments: id (for naming), hints (additional context like 'Include travel entries for Barcelona'), and designURL (a link to design mockups). During code generation, the directive triggers LLM-based mock generation that produces both JSON data and companion source files with helper functions.

How does @respondWithMock enable parallel client and server development?

@respondWithMock works alongside @generateMock to allow client engineers to develop features before the backend implementation is complete. When placed on a query, the GraphQL client returns locally mocked data instead of server data. When placed on individual fields, the client fetches real data from the server for all other fields and patches in mock data only for annotated fields, creating a hybrid of production and mock data.

How does Airbnb keep GraphQL mocks in sync with evolving queries?

Niobe embeds two hash values in each generated JSON mock file: a hash of the GraphQL query document and a hash of the @generateMock input arguments. On each code generation run, Niobe compares current hashes against stored ones. If they differ, it sends the existing mock data plus a diff of query changes to the LLM with instructions to update only changed fields, preserving manual tweaks. Automated CI checks enforce that mock hashes stay current.

How does Airbnb validate LLM-generated mock data for type safety?

After receiving mock JSON from the LLM, Niobe validates it by passing the GraphQL schema, client document, and JSON data to the graphql NPM package's graphqlSync function. If validation fails (e.g., invalid enum values or missing required fields), the errors are fed back to the LLM along with the initial mock data in a retry mechanism that enables the system to self-heal and fix invalid data before writing the final output.

What context does Airbnb provide to the LLM for mock data generation?

Niobe provides six types of context: the query/fragment definitions being mocked, a pruned subset of the GraphQL schema with inline documentation, an image snapshot of the design mockup from the designURL parameter, additional developer hints, the target platform (iOS, Android, or Web), and a curated list of Airbnb-hosted image URLs with descriptions to prevent hallucinated URLs.

Why is random mock data generation insufficient for GraphQL testing?

Random value generators and field-level stubbing lack the domain knowledge and product context needed to produce realistic, meaningful test data. They generate technically valid but contextually meaningless values that are unsuitable for high-quality demos, product iteration, and reliable testing. For example, they cannot generate matching names, addresses, and listings that look like real Airbnb data or align with design mockups.

Why did Airbnb choose Gemini 2.5 Pro for GraphQL mock generation?

Airbnb chose Gemini 2.5 Pro for two key reasons: its 1-million token context window, which is necessary to include the pruned GraphQL schema, query definitions, design images, and other context, and the fact that in internal testing it performed significantly faster than comparable models while producing mock data of similar quality.

Key Statistics & Figures

Mocks generated and merged using @generateMock

Over 700

Across iOS, Android, and Web platforms in just a few months

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

API

Graphql

Core query language and schema system around which the entire mocking infrastructure is built

AI/ML

Gemini 2.5 Pro

LLM used for generating realistic mock data, chosen for its 1M token context window and speed

Language

Typescript

One of the target languages for generated mock data and helper functions

Language

Swift

iOS client language with generated mock helper functions demonstrated in code examples

Language

Kotlin

Android client language supported by the code generation tool

Library

Graphql Npm Package

Used for validating LLM-generated mock data via the graphqlSync function

Data Format

JSON

Format used for storing generated mock data files

Key Actionable Insights

1
Integrate LLM-based generation directly into existing developer tooling rather than building separate tools. Airbnb embedded mock generation into their existing Niobe CLI code generator, so engineers trigger mock generation with the same command they already use for GraphQL code generation. This eliminates context-switching and drives adoption.
Engineers are more likely to adopt tools that fit into their existing workflows. By making mock generation a side effect of normal code generation, Airbnb ensured seamless adoption without requiring engineers to learn new tools or processes.

2
Always validate LLM output against a formal schema and implement a self-healing retry loop. Airbnb validates generated mock data using graphqlSync against the GraphQL schema, and if validation fails, feeds errors back to the LLM to correct the output. This provides strong guarantees that final output is fully valid.
LLMs can hallucinate invalid enum values or miss required fields. By placing the LLM within existing validation infrastructure rather than using it as a standalone tool, you can enforce guardrails that ensure correctness.

3
Prune the schema context sent to the LLM rather than including the entire schema. Airbnb traverses the schema and strips out types, fields, and whitespace not needed to resolve the specific query being mocked. This prevents context window overload while still providing the type information the LLM needs.
Large GraphQL schemas can exceed LLM context windows. Schema traversal and pruning ensures only relevant type definitions are included, improving both generation quality and performance.

4
Use hash-based versioning to detect when generated artifacts drift from their source definitions. Embedding hashes of both the query document and directive arguments in generated files enables smart regeneration that only updates mocks when the underlying query actually changes.
This approach prevents unnecessary regeneration, preserves manual engineer tweaks to mock data, and enables CI checks that guarantee mocks stay synchronized with evolving queries.

5
Provide the LLM with curated, valid resource URLs (like images) to prevent hallucination of non-existent resources. Airbnb feeds the LLM a list of real Airbnb-hosted image URLs with short descriptions so generated mock data contains loadable images at runtime.
LLMs commonly hallucinate URLs that don't exist. By constraining the LLM to choose from a known set of valid resources, you ensure mock data actually works when loaded in the application for prototyping or demos.

6
Support hybrid production/mock data at the field level rather than only full query mocking. Airbnb's @respondWithMock directive can be applied to individual fields, allowing the client to fetch real data from the server for existing fields while patching in mock data only for new, unimplemented fields.
This granular approach is more practical for iterative development where engineers add new fields to existing queries. It allows client teams to develop against partially implemented backends without losing access to real data for already-complete features.

Common Pitfalls

1

Sending the entire GraphQL schema to the LLM instead of pruning it to only the types relevant to the query being mocked. Large schemas can easily exceed LLM context windows and degrade generation quality by flooding the model with irrelevant type information.

Airbnb's Niobe traverses the schema graph starting from the query's referenced types and strips out all unrelated types, fields, and extra whitespace before constructing the LLM prompt.

2

Allowing the LLM to generate image URLs or resource URLs freely, which leads to hallucinated URLs that don't resolve at runtime. This makes mock data unusable for demos and prototyping since images and resources fail to load.

Airbnb solves this by providing the LLM with a curated list of valid, hosted image URLs with textual descriptions, constraining the model to only choose from real, loadable resources.

3

Trusting LLM output without validation against the GraphQL schema. LLMs can produce invalid enum values, miss required fields, or generate structurally incorrect data that causes runtime errors when loaded into the application.

Implementing a validation step using graphqlSync and a retry loop that feeds errors back to the LLM provides a self-healing mechanism that ensures final output is always schema-compliant.

4

Regenerating all mock data from scratch on every code generation run, which overwrites manual engineer tweaks and breaks tests that depend on specific mock values. This also wastes LLM API calls and time on unchanged mocks.

Airbnb uses hash-based versioning and provides the LLM with a diff of what changed in the query, instructing it to only modify fields affected by the diff while preserving all other existing data.

5

Hand-writing and maintaining mock data as either raw JSON files or by copying and pasting server responses. These mocks are not coupled to the underlying queries and schema, so they drift out of sync as queries evolve over time, degrading test quality silently.

Automated CI checks that verify mock version hashes are up to date provide a forcing function that guarantees mocks stay synchronized with their corresponding GraphQL queries.

Related Concepts

Graphql Directives

Graphql Code Generation

Llm Prompt Engineering

Mock Data Generation

Type-safe Testing

Schema Validation

Client-server Parallel Development

Snapshot Testing

Design-driven Development

Graphql Schema Evolution

Self-healing AI Pipelines

Developer Experience Tooling