Building Shopify’s schematization platform for managing Personally Identifiable Information (PII) within our data warehouse.
Overview
The article discusses Shopify's approach to managing personal identifiable information (PII) at scale through a schematization platform that enhances data processing reliability, performance, and efficiency. It details the collaboration between the Privacy team and Data Science & Engineering teams to implement effective deletion strategies for PII using obfuscation and tokenization techniques.
What You'll Learn
How to design and implement a schematization system for event data
Why obfuscation and tokenization are critical for handling PII
How to effectively delete PII across multiple data controllers
When to apply pseudonymization techniques in data processing
Prerequisites & Requirements
- Understanding of data privacy regulations and PII management
- Familiarity with Kafka and data warehousing concepts(optional)
Key Questions Answered
How does Shopify handle the deletion of PII at scale?
What are the benefits of using a schematization platform?
What types of pseudonymization techniques are used in data processing?
What challenges did Shopify face in adopting the new PII management tools?
Key Statistics & Figures
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Key Actionable Insights
1Implement a schematization system to standardize event data collection.This ensures that all event data adheres to a defined structure, improving data quality and compliance with privacy regulations.
2Utilize obfuscation techniques to protect sensitive data while maintaining its analytical value.Obfuscation allows for the analysis of data without exposing personal identifiers, which is essential for privacy compliance.
3Adopt a tokenization strategy to facilitate the deletion of PII across multiple data controllers.This approach simplifies the deletion process, allowing organizations to comply with data protection regulations efficiently.