How We’re Solving Data Discovery Challenges at Shopify

Ranko Cupovic
10 min readintermediate
--
View Original

Overview

The article discusses Shopify's approach to solving data discovery challenges through the development of a tool called Artifact. It highlights the importance of effective data management and governance in the face of rapidly growing data volumes, and outlines the specific challenges faced by the data teams at Shopify.

What You'll Learn

1

How to improve data discovery processes within an organization

2

Why effective data governance is critical for data management

3

When to implement a custom data discovery tool versus using existing solutions

4

How to leverage metadata for better data asset management

Key Questions Answered

What are the main challenges of data discovery at Shopify?
The main challenges include curation, governance, and accessibility. Curation involves finding existing data assets, governance relates to understanding the impact of changes on data assets, and accessibility focuses on surfacing relevant data points for stakeholders. These challenges hinder the efficiency of data teams.
How does Artifact improve data discovery at Shopify?
Artifact enhances data discovery by centralizing metadata, allowing users to search and browse data assets easily. It provides context through ownership and usage information, helping teams to leverage data more effectively and reducing reliance on the Data team.
What are the key features of Artifact's user experience?
Artifact's user experience includes a landing page for browsing data assets, a search function powered by Elasticsearch, and detailed data asset pages that provide metadata, lineage information, and usage statistics. This structure helps users quickly find and understand data assets.
What trade-offs did Shopify consider in building Artifact?
Shopify considered whether to buy or build a data discovery tool. They found that existing solutions required heavy customization and did not meet their specific needs, leading them to build Artifact for better flexibility and control over technical debt.

Key Statistics & Figures

Percentage of Data team using Artifact weekly
30%
This statistic reflects the adoption rate of Artifact since its launch, indicating its effectiveness in addressing data discovery challenges.
Percentage of Data team feeling hindered by pre-Artifact discovery process
80%
This figure highlights the significant improvement in the data discovery process after implementing Artifact, reducing the frustration experienced by data professionals.
Percentage of teams understanding the impact of their changes after Artifact
46%
This statistic shows the improvement in governance and awareness among data teams regarding the downstream effects of their data changes.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend
Elasticsearch
Used for indexing and storing data asset information to facilitate efficient searching and browsing.
Backend
Graphql API
Exposes data to the Artifact UI, allowing for dynamic querying of metadata.

Key Actionable Insights

1
Implement a centralized data discovery tool like Artifact to streamline data access and improve productivity.
By providing a single source of truth for data assets, teams can reduce the time spent searching for information and enhance collaboration across departments.
2
Focus on metadata management to enhance data governance and accessibility.
Well-documented metadata allows users to understand the context and lineage of data assets, which is crucial for making informed decisions and minimizing the risk of errors.
3
Evaluate the specific needs of your organization before choosing a data management solution.
Understanding the unique data landscape and requirements can help in deciding whether to build a custom solution or adapt an existing one, ultimately leading to better alignment with business goals.

Common Pitfalls

1
Relying on one-size-fits-all data management tools can lead to inefficiencies and unmet needs.
Many organizations find that existing tools do not capture the unique aspects of their data processes, leading to frustration and a lack of effective data governance.

Related Concepts

Data Governance
Data Management Tools
Metadata Management
Data Discovery Processes