#
Caching Programming Tutorials & Engineering Articles
157 Caching tutorials, guides, and engineering insights from Netflix, Pinterest, Shopify, and more
Companies Using This
Caching Articles & Tutorials
Filter:
The article discusses how PeerDB facilitates large-scale PostgreSQL migrations, specifically achieving a 1TB migration in just 2 hours.
15 min read
Includes Code
Has Summary
--
The article discusses the collaboration between NVIDIA and Black Forest Labs to optimize the FLUX. 2 text-to-image model for NVIDIA Blackwell Data Center GPUs.
OpenAI details how they scaled PostgreSQL to support 800 million ChatGPT users, achieving millions of queries per second through a single-primary architecture with nearly 50 read replicas across mu...
Bohan Zhang
13 min read
Has Summary
--
Slack's build pipeline team reduced build times for Quip and Slack Canvas from 60 minutes to as little as 10 minutes by applying classic software engineering principles—separation of concerns, cach...
David Reed
19 min read
Includes Code
Has Summary
--
Netflix redesigned Maestro's internal workflow engine, replacing the legacy Conductor 2. x-based stateless worker model with a custom stateful actor model built on Java 21 virtual threads.
This article discusses how Netflix built a resilient data platform using a Write-Ahead Log (WAL) to address data consistency, reliability, and operational efficiency challenges at scale.
The article discusses Pinterest's journey in enhancing developer experience through the creation of PinConsole, an Internal Developer Platform built on Backstage.
Pinterest Engineering
15 min read
Has Summary
--
The article discusses the evolution of Netflix's Tudum architecture, transitioning from a CQRS model utilizing Kafka to a more efficient system based on RAW Hollow.
The article discusses the transition of ClickHouse Cloud to a fully stateless compute architecture, enabled by the introduction of a Shared Catalog.
This article discusses the development of a distributed cache for ClickHouse Cloud, aimed at providing low-latency access to hot data across compute nodes.
Tom Schreiber
23 min read
Includes Code
Has Summary
--
The article discusses the introduction of implicit caching support in Gemini 2. 5 models, enabling developers to benefit from significant cost savings without needing to create an explicit cache.
The article discusses the importance of structuring application prompts to enhance the security of key-value (KV) caching in large language model (LLM) applications.
Joseph Lucas
11 min read
Includes Code
Has Summary
--
The article discusses Mobile Bridge, a framework developed by Shopify to enhance WebViews in their mobile app, making them feel more native.
Mauricio de Meirelles
8 min read
Has Summary
--
The article discusses the requirements and best practices for deploying AI in production within the insurance underwriting sector.
Palantir
21 min read
Has Summary
--
The article introduces new KV cache reuse optimizations in NVIDIA TensorRT-LLM, focusing on improving memory management and throughput for large language models (LLMs).
John Thomson
7 min read
Includes Code
Has Summary
--
The article discusses the implementation of data-efficient knowledge distillation using NVIDIA NeMo-Aligner during supervised fine-tuning (SFT).
Anna Shors
5 min read
Has Summary
--
The article discusses how NVIDIA TensorRT-LLM enhances the inference throughput of Meta's Llama 3. 3 70B model by up to 3x through optimizations like speculative decoding and KV caching.
Anjali Shah
8 min read
Includes Code
Has Summary
--
Netflix's TimeSeries Data Abstraction Layer is designed to efficiently store and query vast amounts of temporal event data with low latency.
Netflix Technology Blog
22 min read
Includes Code
Has Summary
--
The article discusses the Structured DataStore (SDS), a unified multi-model data management platform developed by Pinterest.
The article discusses Preon, a microservice developed by Uber for intelligent and efficient query analysis using the Presto SQL engine.
Gurmeet Singh
13 min read
Has Summary
--
The article discusses Pinterest's implementation of feature caching in their recommender systems using Cachelib, an in-process caching engine developed by Meta Open Source.
The article introduces Contextual Retrieval, a method that enhances Retrieval-Augmented Generation (RAG) by improving the retrieval step through Contextual Embeddings and Contextual BM25.
The article discusses how Meta has optimized the deployment of its AI-generated image animation feature to serve billions of users efficiently.
Gaurav Sharma
11 min read
Has Summary
--
This article discusses Uber's migration of its Apache Hadoop-based data lake to Google Cloud Storage (GCS) and the security measures implemented during this transition.
Matt Mathew, Alexander Gulko, Lei Sun, KK Sriramadhesikan, Alan Cao, Omkar Kakade
20 min read
Includes Code
Has Summary
--
The article discusses Pinterest's adoption of TiDB as a replacement for HBase, detailing the motivations, selection methodology, and the journey of integrating TiDB into their infrastructure.
The article announces the General Availability of AI Gateway, a unified interface for managing and scaling generative AI workloads.
Kathy Liao
6 min read
Includes Code
Has Summary
--
The article discusses Pinterest's transition from HBase, its first NoSQL datastore, to a new serving architecture with a unified storage service.
The article recaps the Google I/O 2024 event, highlighting advancements in AI technologies aimed at making AI accessible for developers.
CachingDartFirebaseGeminiGenerative AIGoogle CloudJAXKerasKotlinOllamaPostgreSQLPyTorchTensorFlowWebAssembly
Jeanine Banks
8 min read
Has Summary
--
Notion has significantly improved the launch speed of its Android app, making it more than twice as fast compared to the beginning of 2023.
The article delves into Uber's comprehensive accounting data testing strategies, emphasizing the importance of precision and integrity in financial processes.
Onkar Singh, Harsha Aditya Ravuri, Viswanath Ramakkagari, Aditya Gopisetti, Hari Srinivasan
16 min read
Has Summary
--
The article discusses how Uber serves over 40 million reads per second from its online storage using an integrated caching solution called CacheFront.
Cloudflare celebrated its 13th birthday with a series of announcements aimed at enhancing its services for customers and the broader internet community.
Dina Kozlov
9 min read
Has Summary
--
This article discusses how to build a distributed inference cache using NVIDIA Triton and Redis, highlighting the benefits and drawbacks of local versus distributed caching.
Steve Lorello
12 min read
Includes Code
Has Summary
--
The article discusses the concept of 'prompt design' and draws parallels between prompting in AI and web design.
The article discusses Pacer, Pinterest's new asynchronous computing platform designed to address the limitations of its predecessor, Pinlater.
This article discusses Uber's implementation of a local caching solution for HDFS DataNodes to optimize performance while adopting high-density HDDs.
The article announces Cohort #2 of the Workers Launchpad, highlighting the success of the first cohort and introducing 25 new startups selected for the program.
Mia Wang
8 min read
Has Summary
--
The article discusses how Cloudflare is transitioning its architecture to utilize Cloudflare Workers, aiming to enhance the performance, robustness, and developer experience of its products.
Richard Boulton
23 min read
Includes Code
Has Summary
--
The article discusses the modernization of the build system for Cloudflare Pages, introducing a new beta version that supports updated tools and languages, including Node. js, Python, and Ruby.
Greg Brimble
8 min read
Includes Code
Has Summary
--
This article discusses the optimization of LZ4 decompression in ClickHouse, highlighting the challenges and solutions to improve performance.
Alexey Milovidov
37 min read
Includes Code
Has Summary
--
The article discusses how LinkedIn reduced the upload of Apache Spark application dependencies by 99% through the implementation of a user-level caching mechanism.
LinkedIn Engineering Team
10 min read
Has Summary
--
The article introduces the ClickHouse Query Cache, a new feature designed to enhance performance by caching the results of expensive SELECT queries.
ClickHouse Release 23. 1 introduces significant enhancements including 17 new features, 17 performance optimizations, and 78 bug fixes.
This article discusses the transition from Ruby's Marshal serialization to MessagePack for caching in Rails applications.
Chris Salzberg
19 min read
Includes Code
Has Summary
--
The article discusses the critical role of caching in Rails applications and the inherent risks associated with using Ruby's Marshal for serialization.
Chris Salzberg
12 min read
Includes Code
Has Summary
--
The article discusses Uber's implementation of Alluxio local caching to enhance the performance of Presto, a data analytics engine.
This article discusses the integration of Cloudflare Workers with micro-frontends, presenting a fragments architecture that enhances web application performance and scalability.
Peter Bacon Darwin
14 min read
Includes Code
Has Summary
--
The article discusses uBuild, Uber's platform for building container images efficiently and securely.
The article summarizes key talks from RailsConf 2022, highlighting insights from Shopify engineers on various topics related to Ruby on Rails, performance improvements, and open-source contribution...
Kevin Ritchie
5 min read
Has Summary
--