How Airbnb combines GraphQL infra, product context, and LLMs to generate and maintain convincing, type-safe mock data using a new directive.
Overview
Airbnb built a system that combines GraphQL infrastructure, product context, and LLMs to automatically generate and maintain realistic, type-safe mock data using a custom @generateMock directive. The solution eliminates manual mock creation, enables client engineers to develop without waiting for backend implementations, and keeps mocks synchronized with evolving GraphQL queries through hash-based version tracking.
What You'll Learn
How to use LLMs with GraphQL schema context to generate realistic, type-safe mock data automatically
How to design a directive-based approach (@generateMock, @respondWithMock) for seamless mock generation in existing developer workflows
How to validate LLM-generated mock data against a GraphQL schema using a self-healing retry mechanism
How to keep mock data synchronized with evolving GraphQL queries using hash-based version tracking
How to enable client-server parallel development by combining production and mock data at the field level
Prerequisites & Requirements
- Understanding of GraphQL queries, fragments, directives, and schema definitions
- Familiarity with mock data patterns for testing and prototyping
- Basic understanding of LLMs and prompt engineering concepts
- Experience with GraphQL code generation tools and client-server development workflows(optional)
Key Questions Answered
How does Airbnb use LLMs to generate realistic GraphQL mock data?
What is the @generateMock directive and how does it work?
How does @respondWithMock enable parallel client and server development?
How does Airbnb keep GraphQL mocks in sync with evolving queries?
How does Airbnb validate LLM-generated mock data for type safety?
What context does Airbnb provide to the LLM for mock data generation?
Why is random mock data generation insufficient for GraphQL testing?
Why did Airbnb choose Gemini 2.5 Pro for GraphQL mock generation?
Key Statistics & Figures
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Key Actionable Insights
1Integrate LLM-based generation directly into existing developer tooling rather than building separate tools. Airbnb embedded mock generation into their existing Niobe CLI code generator, so engineers trigger mock generation with the same command they already use for GraphQL code generation. This eliminates context-switching and drives adoption.Engineers are more likely to adopt tools that fit into their existing workflows. By making mock generation a side effect of normal code generation, Airbnb ensured seamless adoption without requiring engineers to learn new tools or processes.
2Always validate LLM output against a formal schema and implement a self-healing retry loop. Airbnb validates generated mock data using graphqlSync against the GraphQL schema, and if validation fails, feeds errors back to the LLM to correct the output. This provides strong guarantees that final output is fully valid.LLMs can hallucinate invalid enum values or miss required fields. By placing the LLM within existing validation infrastructure rather than using it as a standalone tool, you can enforce guardrails that ensure correctness.
3Prune the schema context sent to the LLM rather than including the entire schema. Airbnb traverses the schema and strips out types, fields, and whitespace not needed to resolve the specific query being mocked. This prevents context window overload while still providing the type information the LLM needs.Large GraphQL schemas can exceed LLM context windows. Schema traversal and pruning ensures only relevant type definitions are included, improving both generation quality and performance.
4Use hash-based versioning to detect when generated artifacts drift from their source definitions. Embedding hashes of both the query document and directive arguments in generated files enables smart regeneration that only updates mocks when the underlying query actually changes.This approach prevents unnecessary regeneration, preserves manual engineer tweaks to mock data, and enables CI checks that guarantee mocks stay synchronized with evolving queries.
5Provide the LLM with curated, valid resource URLs (like images) to prevent hallucination of non-existent resources. Airbnb feeds the LLM a list of real Airbnb-hosted image URLs with short descriptions so generated mock data contains loadable images at runtime.LLMs commonly hallucinate URLs that don't exist. By constraining the LLM to choose from a known set of valid resources, you ensure mock data actually works when loaded in the application for prototyping or demos.
6Support hybrid production/mock data at the field level rather than only full query mocking. Airbnb's @respondWithMock directive can be applied to individual fields, allowing the client to fetch real data from the server for existing fields while patching in mock data only for new, unimplemented fields.This granular approach is more practical for iterative development where engineers add new fields to existing queries. It allows client teams to develop against partially implemented backends without losing access to real data for already-complete features.