#

Apache Programming Tutorials & Engineering Articles

961 Apache tutorials, guides, and engineering insights from LinkedIn, Uber, NVIDIA, and more

Apache Articles & Tutorials

Filter:
Notion logo
Notion
Intermediate
The article discusses Notion's journey in scaling its vector search infrastructure, achieving a 10x increase in scale while reducing costs by 90% over two years.
Preeti Gondi, Mickey Liu, Nathan Louie, Calder Lund, Jacob Sager
10 min read
Has Summary
--
Cloudflare logo
Cloudflare
Intermediate
The article discusses the ecdysis library developed by Cloudflare, which enables graceful restarts for Rust services without dropping live connections.
Manuel Olguín Muñoz
10 min read
Includes Code
Has Summary
--
Pinterest logo
Pinterest
Intermediate
The article discusses Pinterest's transition to a next-generation database ingestion framework designed to address the limitations of legacy systems.
Pinterest Engineering
10 min read
Includes Code
Has Summary
--
Uber logo
Uber
Intermediate
This article introduces uForwarder, Uber's open-source push-based consumer proxy for Apache Kafka's async queuing system.
Zhifeng Chen, Yang Yang, Haifeng Chen
12 min read
Has Summary
--
Uber logo
Uber
Intermediate
This article details how Uber optimized Apache Hadoop's Distcp (Distributed Copy) tool to scale their data replication infrastructure from handling 250 TB to petabytes of daily data movement.
Abhay Yadav, Radhika Patwari, Sanjay Sundaresan
15 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses the challenges and solutions related to HDFS block placement in the context of maintaining exabyte-scale clusters at LinkedIn.
Ponmani Palanisamy
12 min read
Has Summary
--
Uber logo
Uber
Advanced
This article details how Uber built and scaled Apache Hudi to power one of the world's largest data lakes, managing 19,500 datasets with trillions of records across a multi-hundred-petabyte reposit...
Prashant Wason, Balajee Nagasubramaniam, Surya Prasanna Kumar Yalla, Meenal Binwade, Xinli Shang, Jack Song
19 min read
Has Summary
--
Pinterest logo
Pinterest
Advanced
PinLanding is a multimodal AI pipeline developed by Pinterest to generate shopping collections from billions of products.
Pinterest Engineering
8 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses Uber's transition from traditional keyword-based search using Apache Lucene to implementing semantic vector search with Amazon OpenSearch.
Hao Sun, Jiasen Xu, Smit Patel, Anand Kotriwal, Xu Zhang
11 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses Project Aether, a tool developed by NVIDIA to facilitate the migration of CPU-based Apache Spark workloads to GPU-accelerated environments on Amazon EMR.
Navin Kumar
6 min read
Includes Code
Has Summary
--
Uber logo
Uber
Advanced
This article discusses how Uber utilizes a pull-based ingestion model in OpenSearch™ to effectively index streaming data.
Yupeng Fu, Varun Bharadwaj, Shuyi Zhang, Xu Xiong, Michael Froh
14 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
NVIDIA's Sirius, an open-source GPU-native SQL engine, has set a new performance record on ClickBench, enhancing DuckDB with GPU-accelerated analytics.
Xiangyao Yu
6 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the development of scientific AI agents using reinforcement learning (RL) techniques, specifically through the NVIDIA NeMo framework.
Christian Munley
12 min read
Includes Code
Has Summary
--
Google logo
Google
Beginner
The article introduces A2UI, an open-source project designed for agent-driven interfaces that allows agents to generate contextually relevant user interfaces.
Google A2UI Team
13 min read
Includes Code
Has Summary
--
Uber logo
Uber
Advanced
This article discusses Uber's transition from batch to streaming data ingestion using Apache Flink, which significantly enhances data freshness and operational efficiency.
Xinli Shang, Peter Huang, Jing Li, Jing Zhao, Jack Song
6 min read
Has Summary
--
Uber logo
Uber
Intermediate
This article discusses Uber's implementation of Apache Pinot to manage and analyze its extensive inventory and catalog data efficiently.
Suraj Modi, Ankit Sultana, Tarun Mavani
11 min read
Has Summary
--
Shopify logo
Shopify
Intermediate
Shopify's 2025 Black Friday Cyber Monday (BFCM) live globe was reimagined as an interactive pinball machine running at 120fps in a browser. Built with Three.
Shopify Engineering
18 min read
Has Summary
--
ClickHouse logo
ClickHouse
Intermediate
This article benchmarks five major cloud data warehouses—Snowflake, Databricks, ClickHouse Cloud, BigQuery, and Redshift—across various scales of data to compare their cost-performance.
Tom Schreiber & Lionel Palacin
16 min read
Includes Code
Has Summary
--
ClickHouse logo
ClickHouse
Advanced
This article discusses the transition from OpenTelemetry (OTel) to Rotel, an open-source Rust project that enhances tracing capabilities at petabyte scale.
28 min read
Includes Code
Has Summary
--
Uber logo
Uber
Advanced
The article discusses the evolution and scaling of Uber's Delivery Search Platform, emphasizing the transition from traditional lexical search to a semantic search model that enhances user experien...
Divya Nagar, Zheng Liu, Jiasen Xu, Bo Ling, Haoyang Chen
11 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses the evolution of the Venice ingestion pipeline at LinkedIn, highlighting its architectural advancements and optimizations that enable the platform to handle over 230 million r...
Gaojie Liu
14 min read
Has Summary
--
ClickHouse logo
ClickHouse
Beginner
The article discusses the development of StockHouse, a real-time market analytics application that leverages ClickHouse, Massive, and Perspective to handle high-frequency financial data.
Lionel Palacin
7 min read
Includes Code
Has Summary
--
Uber logo
Uber
Advanced
The article discusses Uber's implementation of I/O observability for its massive petabyte-scale data lake, focusing on the challenges and solutions in monitoring data access patterns across its hyb...
Arnav Balyan, Kartik Bommepally, Amruth Sampath, Jing Zhao, Akshayaprakash Sharma
10 min read
Has Summary
--
Uber logo
Uber
Advanced
This article discusses the implementation of zone failure resilience in Apache Pinot at Uber, detailing strategies to ensure uninterrupted service during zone failures.
Si Lao, Christina Li, Xuanyi Li, Yang Yang, Ujwala Tulshigiri
10 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses Uber's approach to enhancing the safety of machine learning (ML) model deployments through a series of mechanisms integrated into their ML life cycle.
Sophie Wang, Jia Li, Joseph Wang
10 min read
Has Summary
--
OpenAI logo
OpenAI
Advanced
The article introduces gpt-oss-safeguard, OpenAI's new open-weight reasoning models designed for safety classification tasks.
OpenAI
9 min read
Has Summary
--
OpenAI logo
OpenAI
Advanced
The technical report provides an evaluation of the gpt-oss-safeguard-120b and gpt-oss-safeguard-20b models, focusing on their performance and safety metrics.
OpenAI
2 min read
Has Summary
--
Netflix logo
Netflix
Advanced
Netflix engineered a real-time recommendation delivery system for live events that can update over 100 million devices in under a minute.
Netflix Technology Blog
9 min read
Has Summary
--
Netflix logo
Netflix
Advanced
Netflix built a Real-Time Distributed Graph (RDG) to connect member interaction data across their expanding business verticals including streaming, live events, and mobile games.
Netflix Technology Blog
8 min read
Has Summary
--
ClickHouse logo
ClickHouse
Beginner
The article discusses the potential of lakehouses using open table formats like Apache Iceberg and Delta Lake for observability, highlighting their advantages in scalability, cost-effectiveness, an...
Melvyn Peignon & Dale McDiarmid
24 min read
Includes Code
Has Summary
--
Uber logo
Uber
Advanced
This article discusses the rebuilding of Uber's Apache Pinot™ query architecture, focusing on the transition from Neutrino to a new query system that utilizes Pinot's Multi-Stage Engine Lite Mode.
Ankit Sultana, Christina Li, Shaurya Chaturvedi, Tarun Mavani, Shreyaa Sharma
11 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses the modernization of the LDAP and Kerberos infrastructure that secures Hadoop at LinkedIn, detailing the transition from a legacy setup to a highly available, automated system...
Aswin M Prabhu
15 min read
Has Summary
--
ClickHouse logo
ClickHouse
Intermediate
The article discusses the new capabilities of ClickHouse Cloud to query Iceberg and Delta Lake tables through the DataLakeCatalog engine.
Tom Schreiber
15 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the collaboration between IBM and NVIDIA to enhance large-scale data analytics through GPU-native Velox and NVIDIA cuDF, highlighting significant performance improvements over...
Gregory Kimball
7 min read
Has Summary
--
Google logo
Google
Advanced
The article discusses building high-performance data pipelines using Grain, a data loading library for JAX, and ArrayRecord, an efficient file format.
Jiyang Kang, Shivaji Dutta, Ihor Indyk, Felix Chern
10 min read
Includes Code
Has Summary
--
Uber logo
Uber
Advanced
This article details how Uber standardized its mobile analytics system to improve data consistency and quality across its applications.
Ben Hjerrild, Rajat Sharma, Shawn Dong, Wugang Zhao
12 min read
Has Summary
--
ClickHouse logo
ClickHouse
Intermediate
ClickHouse Release 25. 9 introduces significant enhancements, including 25 new features, 22 performance optimizations, and 83 bug fixes.
The ClickHouse Team
11 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article discusses the integration of the Newton physics engine with NVIDIA Isaac Lab for training quadruped locomotion policies and simulating cloth manipulation.
Mohammad Mohajerani
13 min read
Includes Code
Has Summary
--
Cloudflare logo
Cloudflare
Advanced
The article reflects on Cloudflare's 15-year journey and the initiatives launched during Birthday Week 2025, emphasizing their commitment to building a better Internet.
Meta logo
Meta
Intermediate
The article discusses Meta's evolution in infrastructure over 21 years, highlighting the significant changes brought about by AI.
Yee Jiun Song
20 min read
Has Summary
--
Cloudflare logo
Cloudflare
Intermediate
The article announces the Cloudflare Data Platform, which includes three key products: Cloudflare Pipelines for data ingestion, R2 Data Catalog for managing metadata, and R2 SQL for querying data.
Micah Wylde
11 min read
Includes Code
Has Summary
--
Uber logo
Uber
Advanced
Uber's migration from Spark 2. 4 to Spark 3. 3 involved upgrading over 2 million Spark applications, utilizing innovative automation tools like Iron Dome.
Amruth Sampath, Arnav Balyan, Nimesh Khandelwal, Sumit Singh, Parth Halani, Suprit Acharya
8 min read
Has Summary
--
Netflix logo
Netflix
Advanced
The article discusses how Netflix scales its Muse application to provide data-driven creative insights at a massive scale, focusing on the architectural evolution and optimizations made to handle t...
Netflix Technology Blog
10 min read
Includes Code
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses the implementation of a Policy Simulator at Uber to enhance the safety and determinism of Identity and Access Management (IAM) policy changes.
Avinash Srivenkatesh, Zi Wen, Zakir Akram
15 min read
Has Summary
--
ClickHouse logo
ClickHouse
Intermediate
The article discusses the rising costs associated with observability in software engineering and proposes a shift towards open, cost-efficient architectures.
Stripe logo
Stripe
Intermediate
The article discusses the development of a real-time streaming analytics system for Stripe Billing, enabling customers to access subscription metrics with minimal latency.
Reed Trevelyan
8 min read
Has Summary
--
Uber logo
Uber
Advanced
This article discusses the architecture and implementation of Uber's HiveSync, a critical service for data replication across its massive data lake.
Radhika Patwari, Trivedhi Talakola, Rajan Jaiswal, Chayanika Bhandary, Mukesh Verma, Sanjay Sundaresan
14 min read
Has Summary
--
ClickHouse logo
ClickHouse
Intermediate
ClickHouse version 25. 8 introduces 45 new features, 47 performance optimizations, and 119 bug fixes, enhancing its capabilities as a high-performance analytical database.
ClickHouse Team
15 min read
Includes Code
Has Summary
--
ClickHouse logo
ClickHouse
Advanced
This article discusses how to instrument a Next. js application using OpenTelemetry and ClickStack, focusing on the integration of observability and analytics through ClickHouse.
Google logo
Google
Intermediate
This article discusses the integration of Google's EmbeddingGemma model with Google Cloud's Dataflow to create a scalable embedding pipeline for AI applications.
Danny McCormick, Ian Ballantyne, Olivier Lacombe
5 min read
Includes Code
Has Summary
--