Sandcastle: data/AI apps for everyone

Airbnb made it easy to bring data/AI ideas to life through a platform for prototyping web applications.

Daniel Miller
9 min readintermediate
--
View Original

Overview

The article discusses Sandcastle, an internal prototyping platform developed by Airbnb that empowers data scientists, engineers, and product managers to create interactive data/AI applications. It highlights the challenges faced in sharing web applications internally and how Sandcastle addresses these issues by leveraging existing cloud infrastructure.

What You'll Learn

1

How to create interactive data/AI applications using Sandcastle

2

Why leveraging existing cloud infrastructure can streamline prototyping

3

How to package and share data science prototypes effectively

Prerequisites & Requirements

  • Familiarity with data science concepts and Python programming
  • Basic understanding of web application frameworks like Streamlit or FastAPI(optional)

Key Questions Answered

What is Sandcastle and how does it benefit data scientists at Airbnb?
Sandcastle is an internal prototyping platform that allows data scientists, engineers, and product managers to quickly create and share interactive web applications. It simplifies the process of turning data/AI ideas into live prototypes, enabling rapid iteration without the need for extensive engineering resources.
What challenges do data scientists face when sharing web applications internally?
Data scientists often encounter hurdles such as limited engineering bandwidth, the need for complex infrastructure setup, and difficulties in sharing prototypes with non-technical stakeholders. These challenges can hinder the effective communication of data-driven ideas within organizations.
How does Airbnb's Onebrain framework facilitate reproducible data science projects?
Onebrain allows data scientists to package their code in a structured manner using a project file that includes metadata, entry points, and environment specifications. This enables easy sharing and reproducibility of data science projects across the organization.
What role does kube-gen play in the Sandcastle platform?
kube-gen is a code-generation layer built on top of Kubernetes that simplifies the configuration of cloud infrastructure for data scientists. It automates many aspects of service configuration, allowing developers to focus on application logic rather than infrastructure details.

Key Statistics & Figures

Number of live prototypes developed
175
In the last year, Airbnb's data science and product management community developed over 175 live prototypes using Sandcastle.
Unique internal visitors
3.5k
These prototypes were visited by over 3.5k unique internal visitors across more than 69k distinct active days.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Packaging Framework
Onebrain
Used for packaging data science and prototyping code in a reproducible manner.
Infrastructure
Kube-gen
Handles authentication, tracing, and cross-service communication for applications deployed on Kubernetes.
Frontend Framework
Streamlit
Facilitates the rapid development of interactive web applications for data science prototypes.
Backend Framework
Fastapi
Used for building APIs and backend services in data science applications.

Key Actionable Insights

1
Leverage Sandcastle to rapidly prototype data/AI applications, allowing for quick iterations and feedback from stakeholders.
Using Sandcastle can significantly reduce the time from idea to live application, enabling data scientists to validate their concepts with real users and gather insights faster.
2
Utilize Onebrain for packaging your data science code to ensure reproducibility and ease of sharing within your organization.
By structuring your projects with Onebrain, you can streamline collaboration and make it easier for other team members to access and build upon your work.
3
Integrate kube-gen to simplify the deployment of your applications on Kubernetes, minimizing the complexity of cloud infrastructure management.
This allows data scientists to focus on developing their applications without getting bogged down by the intricacies of cloud configurations.

Common Pitfalls

1
Failing to properly configure the cloud infrastructure can lead to deployment issues and security vulnerabilities.
Data scientists may overlook essential configurations when deploying applications, which can expose sensitive data or lead to application failures. It's crucial to use frameworks like kube-gen to automate and simplify these configurations.

Related Concepts

Data Science Prototyping
Kubernetes Deployment Strategies
Interactive Application Development Frameworks