#
Scala Programming Tutorials & Engineering Articles
119 Scala tutorials, guides, and engineering insights from LinkedIn, Uber, Netflix, and more
Companies Using This
Scala Articles & Tutorials
Filter:
Uber's migration from Spark 2. 4 to Spark 3. 3 involved upgrading over 2 million Spark applications, utilizing innovative automation tools like Iron Dome.
Amruth Sampath, Arnav Balyan, Nimesh Khandelwal, Sumit Singh, Parth Halani, Suprit Acharya
8 min read
Has Summary
--
The article discusses Airbnb's migration of its JVM monorepo from Gradle to Bazel, detailing the motivations, process, and outcomes of this significant transition.
The article discusses the optimization of LinkedIn Sales Navigator’s search pipeline using Apache Spark, highlighting the transition from MapReduce to Spark and the resulting performance improvemen...
The article discusses Uber's upgrade of its search platform from Lucene version 7. 5. 0 to 9. 4.
Anand Kotriwal, Aparajita Pandey, Charu Jain, Yupeng Fu
12 min read
Has Summary
--
The article discusses how Uber utilizes Apache Pinot for low-latency offline table analytics, highlighting its capabilities in handling various use cases, including real-time and offline data inges...
Ankit Sultana, Caner Balci
15 min read
Has Summary
--
The article discusses the Sparkle framework developed by Uber to standardize modular ETL processes, enhancing developer productivity and data quality.
Dinesh Jagannathan, Sharath Bhat, Suman Voleti, Praveen Raj
8 min read
Has Summary
--
This article details Slack's migration from AWS EMR 5 with Spark 2 to EMR 6 with Spark 3, highlighting the challenges faced and the performance improvements achieved.
The article discusses how Notion built and scaled its data lake to manage a tenfold increase in data over three years, driven by user and content growth.
This article continues the exploration of Spotify's data platform, detailing its building blocks, scalability, and the community-driven approach to managing a complex data ecosystem.
Anastasia Khlebnikova (Senior Engineer) and Carol Cunha (Product Manager)
6 min read
Has Summary
--
This article details Uber's migration of over a trillion entries of ledger data from DynamoDB to LedgerStore, focusing on the challenges, strategies, and outcomes of the process.
Chronon, Airbnb's ML Feature Platform, is now open source, providing tools for observability and management that simplify the complexity of data engineering for machine learning practitioners.
The article discusses Pinterest's journey in implementing AI-assisted development, focusing on the balance between innovation and safety.
Pinterest Engineering
7 min read
Has Summary
--
The article discusses Airbnb's migration of its iOS build system from Buck to Bazel, detailing the approach taken to ensure a smooth transition with minimal disruption to developer workflows.
The article discusses Psyberg, a tool developed by Netflix to automate the end-to-end catchup of data pipelines, particularly focusing on how it manages late-arriving data and enhances workflow eff...
Netflix Technology Blog
7 min read
Has Summary
--
Spotify has introduced Voyager, a new nearest-neighbor search library that significantly improves upon its predecessor, Annoy, by offering increased speed and accuracy.
Peter Sobot
4 min read
Includes Code
Has Summary
--
The article explores Javier's transition from a music career to data science, highlighting the intersection of math and music in his journey.
LinkedIn Engineering Team
5 min read
Has Summary
--
This article discusses Pinterest's implementation of a finer-grained access control (FGAC) framework to manage data access securely and efficiently within their data engineering platform.
LinkedIn has integrated Google Protocol Buffers (Protobuf) with Rest. li to enhance microservices performance, achieving significant reductions in latency and improvements in resource utilization.
Karthik Ramgopal
7 min read
Has Summary
--
The article discusses how Uber implemented an incremental ETL process using Apache Hudi to manage its transactional data lake.
The article discusses Vasundhara's journey as an AI engineer at LinkedIn, highlighting her transition from Seoul to Dublin during the pandemic and her work on machine learning algorithms for Linked...
LinkedIn Engineering Team
6 min read
Has Summary
--
The article discusses Netflix's efforts to scale its media machine learning infrastructure, focusing on the challenges faced by media ML practitioners and the solutions developed to optimize and st...
Netflix Technology Blog
12 min read
Includes Code
Has Summary
--
This article presents three additional tips for optimizing Apache Flink applications, focusing on enhancing performance through proper parallelism, avoiding sink bottlenecks, and utilizing HybridSo...
Kevin Lam
8 min read
Includes Code
Has Summary
--
The article discusses the implementation of sample data pipelines using Dataflow at Netflix, focusing on bootstrapping, standardization, and automation of batch data pipelines.
Netflix Technology Blog
17 min read
Includes Code
Has Summary
--
The article discusses Airbnb's Safe Deploy system, focusing on its architecture and engineering choices for implementing near real-time experiments.
The article discusses Uber's journey in rebuilding its A/B testing platform, Morpheus, to address scalability and reliability challenges.
The article shares the career journey of Deepti, a biomedical engineer turned data scientist at LinkedIn, highlighting her transitions between industries and roles.
The article discusses the challenges and solutions Stripe engineers face in maintaining a continuous integration (CI) system that balances speed and security.
Sushain Cherivirala
11 min read
Includes Code
Has Summary
--
The article discusses SQL Notebooks, a tool developed at Meta that combines the functionalities of SQL IDEs and Jupyter Notebooks to enhance data analytics.
This article provides seven actionable tips for optimizing Apache Flink applications, focusing on performance and resiliency.
Yaroslav Tkachenko
16 min read
Includes Code
Has Summary
--
The article discusses Project RADAR, an intelligent fraud detection system developed by Uber that integrates machine learning and human expertise to identify and mitigate fraudulent activities in r...
Sergey Zelvenskiy, Garvit Harisinghani, Tiffany Yu, Edwin Ng, Robin Wei
14 min read
Has Summary
--
The article discusses DARWIN, LinkedIn's unified Data Science and Artificial Intelligence Workbench, designed to streamline the workflows of data scientists and AI engineers by centralizing various...
Varun S.
20 min read
Has Summary
--
This article discusses Spotify's migration of its Event Delivery Infrastructure (EDI) to Google Cloud Platform (GCP), detailing the challenges faced, solutions implemented, and the resulting improv...
Flavio Santos (Data Infrastructure Engineer) and Robert Stephenson (Senior Product Manager)
14 min read
Has Summary
--
The article discusses the management of data pipeline assets at Netflix using a tool called Dataflow.
The article discusses how Airbnb developed the Wall framework to enhance data quality and prevent data bugs across its data engineering workflows.
This article discusses Uber's journey in containerizing their Apache Hadoop infrastructure, detailing the challenges faced and the solutions implemented over two years.
This article discusses the evolution of LinkedIn's Daily Executive Dashboard (DED) from a simple dashboard to a robust enterprise-grade data pipeline.
The article discusses how Pinterest improved data processing efficiency by implementing partial deserialization of Thrift encoded data.
The article discusses Himeji, a scalable centralized system for authorization developed at Airbnb, which addresses challenges faced during the transition from a monolithic Ruby on Rails architectur...
Alan Yao
10 min read
Includes Code
Has Summary
--
This article features an interview with Dhevi Rajendran, a Data Engineer at Netflix, discussing her journey into data engineering, her role in the Growth Data Science and Engineering team, and her ...
Netflix Technology Blog
7 min read
Has Summary
--
The article discusses how to accelerate deep learning applications using Apache Spark and NVIDIA GPUs on AWS. It highlights the integration of GPU scheduling in Apache Spark 3.
Qing Lan
6 min read
Includes Code
Has Summary
--
The article discusses silent data corruption, a prevalent issue in large-scale infrastructure systems that can lead to undetected data errors and significant application-level problems.
Harish Dattatraya Dixit
5 min read
Includes Code
Has Summary
--
This article discusses how Spotify optimized its largest Dataflow job for Wrapped 2020 by implementing Sort Merge Bucket (SMB) joins, significantly reducing costs and improving performance.
Neville Li
11 min read
Has Summary
--
The article discusses the Smart Argument Suite, a Python library designed to streamline the process of passing command-line arguments for AI workflows.
LinkedIn Engineering Team
8 min read
Has Summary
--
The article discusses Airbnb's commitment to improving data quality through a comprehensive initiative that addresses ownership, architecture, and governance of data.
The article introduces the 2020 Safety Engineering interns at Uber, highlighting their experiences during a unique summer internship affected by COVID-19.
Safety Engineering Interns
10 min read
Has Summary
--
The article discusses how Spark 3. 0 and XGBoost can be accelerated using GPUs to enhance machine learning workflows, focusing on end-to-end training and hyperparameter tuning.
Carol McDonald
15 min read
Includes Code
Has Summary
--
The article discusses Spotify's 'Listening Together' campaign, which visualizes real-time musical connections among users worldwide.
Gandalf Hernandez
4 min read
Has Summary
--
The article discusses the enhancements in Apache Spark 3. 0, particularly focusing on GPU acceleration and performance optimizations.
ApacheApache SparkAWSAzureGoogle CloudKerasKubernetesMachine LearningPythonPyTorchRapidsScalaSQLTensorFlowXGBoost
Carol McDonald
10 min read
Has Summary
--
The article discusses the LinkedIn Fairness Toolkit (LiFT), an open-source library designed to address bias in AI applications at scale.
Sriram Vasudevan
11 min read
Has Summary
--
The article discusses how Pinterest empowered its data scientists and machine learning engineers by building a PySpark infrastructure that addresses challenges faced with existing tools like Hive a...
Pinterest Engineering
7 min read
Has Summary
--