How Spotify Uses Apache
23 engineering articles about Apache from Spotify's engineering team
Other Spotify Technologies
Other Companies Using Apache
Articles
Filter:
This article discusses fleet-wide refactoring at Spotify, detailing the tools and strategies developed to manage code changes across thousands of Git repositories.
Matt Brown
25 min read
Includes Code
Has Summary
--
This article discusses Spotify's transition to a declarative infrastructure model using Kubernetes, enabling efficient management of cloud resources across numerous services.
AnsibleApacheApache KafkaCassandraDockerElasticsearchGoogle CloudJSONKubernetesMemcachedPostgreSQLPuppetTerraformTypeScriptYAML
David Flemström
11 min read
Includes Code
Has Summary
--
The article discusses Spotify's integration of the Podz ML pipeline using Google Dataflow to generate podcast previews efficiently.
Diego Casabuena (ML Engineer
12 min read
Includes Code
Has Summary
--
The article discusses the development of infrastructure at Spotify to enhance user forecasting capabilities in response to the company's global expansion.
Molly Zhu
7 min read
Has Summary
--
The article discusses the development of ML Home, Spotify's internal user interface for their Machine Learning Platform, highlighting the challenges faced in building a platform for ML practitioner...
Maisha Lopa
11 min read
Has Summary
--
This article discusses Spotify's migration of its Event Delivery Infrastructure (EDI) to Google Cloud Platform (GCP), detailing the challenges faced, solutions implemented, and the resulting improv...
Flavio Santos (Data Infrastructure Engineer) and Robert Stephenson (Senior Product Manager)
14 min read
Has Summary
--
This article discusses how Spotify optimized its largest Dataflow job for Wrapped 2020 by implementing Sort Merge Bucket (SMB) joins, significantly reducing costs and improving performance.
Neville Li
11 min read
Has Summary
--
The article discusses the development and open-sourcing of Klio, a framework designed for building efficient data pipelines for audio processing at scale.
David Riordan and Lynn Root
11 min read
Has Summary
--
The article discusses Spotify's 'Listening Together' campaign, which visualizes real-time musical connections among users worldwide.
Gandalf Hernandez
4 min read
Has Summary
--
The article discusses Spotify's journey in improving its Machine Learning infrastructure using TensorFlow Extended (TFX) and Kubeflow.
ApacheCachingDockerGoogle CloudHTMLKubernetesMachine LearningMySQLScalaSQLTensorFlowTerraformTransformerXGBoost
Josh Baer
13 min read
Has Summary
--
Spotify's Event Delivery system is a crucial component for understanding user behavior and delivering personalized content.
Bartosz Janota
17 min read
Has Summary
--
Scio 0. 7 is a Scala API for Apache Beam and Google Cloud Dataflow, designed to simplify large-scale data processing for Spotify engineers.
Claire McGinty
12 min read
Includes Code
Has Summary
--
The article discusses Spotify's approach to user privacy through a centralized encryption system called Padlock, which manages user data encryption keys.
Bram Leenders
12 min read
Has Summary
--
This article delves into Scio, a Scala API for Apache Beam and Google Cloud Dataflow, highlighting its unique features, basic concepts, and practical use cases at Spotify.
Neville Li
7 min read
Includes Code
Has Summary
--
This article discusses Spotify's transition to Google Cloud and the development of Scio, a Scala API for Apache Beam, which facilitates big data processing.
Neville Li
9 min read
Has Summary
--
This article discusses Spotify's transition to a cloud-based event delivery system, focusing on the architecture and implementation using Google Cloud services.
Igor Maravić
11 min read
Includes Code
Has Summary
--
This article discusses Spotify's transition to a new event delivery system built on Google Cloud managed services, focusing on the architecture and design choices made to improve reliability and ef...
Igor Maravić
13 min read
Has Summary
--
The article introduces RAMLfications, a Python package developed by Spotify for parsing and validating RAML files into Python objects.
The article discusses how Spotify scales its real-time data processing pipelines using Apache Storm, focusing on architecture, maintainability, and performance optimization.
The article discusses how Spotify addressed performance issues in their ad analysis pipeline by implementing a sharded join strategy in MapReduce.
Noel Cody
5 min read
Includes Code
Has Summary
--
The article discusses the Date-Tiered Compaction Strategy (DTCS) developed for Apache Cassandra, particularly for optimizing time series data storage and retrieval.
The article discusses how Spotify processes vast amounts of user-generated data using Apache Crunch on Hadoop.
davidawhiting
7 min read
Includes Code
Has Summary
--
The article discusses Spotify's evolving backend infrastructure, emphasizing the importance of autonomous squads, a transparent code model, and self-service infrastructure to support rapid growth a...
Spotify Engineering
9 min read
Has Summary
--
You've reached the end! All 23 articles loaded.