How Netflix Uses AWS
184 engineering articles about AWS from Netflix's engineering team
Other Netflix Technologies
Other Companies Using AWS
Articles
Filter:
Netflix introduces Spin, a new feature in Metaflow 2.
Netflix Technology Blog
10 min read
Includes Code
Has Summary
--
This article discusses the development of a reliable cloud live streaming pipeline for Netflix, focusing on the challenges faced and solutions implemented to ensure high-quality live streaming expe...
The article discusses Netflix's journey in implementing live streaming technology over the past three years, detailing the architectural decisions, challenges, and innovations that have led to succ...
The article discusses how Netflix enhances content delivery efficiency by classifying cache misses within its Open Connect content delivery network (CDN).
This article discusses how Netflix accurately attributes eBPF flow logs to workload identities, addressing challenges related to misattribution in cloud environments.
Netflix Technology Blog
12 min read
Includes Code
Has Summary
--
The article discusses Netflix's Media Production Suite (MPS), a cloud-based solution designed to streamline media management in film and television production.
The article discusses Netflix's approach to enhancing cloud efficiency through the use of Amazon Web Services (AWS) and a structured data framework.
This article is the first in a multi-part series that explores the Analytics Engineering work at Netflix, highlighting how the company empowers its teams to produce and deliver actionable analytic ...
The article discusses Netflix's implementation of service-level prioritized load shedding to enhance system reliability and user experience during high traffic conditions.
This article investigates a cross-regional network performance issue at Netflix, detailing the troubleshooting process that led to identifying a Linux kernel upgrade as the root cause.
Netflix Technology Blog
13 min read
Includes Code
Has Summary
--
The article discusses how Netflix supports a diverse range of machine learning (ML) systems through its Machine Learning Platform (MLP) and the Metaflow framework.
ApacheApache ArrowApache SparkAWSDockerDynamoDBJSONKubernetesMachine LearningPandasPolarsREST APIStreamlit
Netflix Technology Blog
15 min read
Includes Code
Has Summary
--
This article discusses Netflix's implementation of a zero-configuration service mesh with on-demand cluster discovery, detailing the motivations behind adopting service mesh technology and the coll...
The article discusses the challenges of connection churn in Zuul, a gateway service used by Netflix, and outlines the strategies implemented to mitigate this issue.
In 2022, Netflix successfully migrated its mobile applications from a monolithic Falcor API to a Federated GraphQL API with zero downtime.
The article discusses the successful launch of the 'Basic with ads' tier on Netflix, detailing the innovative methods used to simulate projected traffic and test ad algorithms prior to launch.
Netflix Technology Blog
6 min read
Has Summary
--
This article discusses the debugging of a deadlock issue in a custom FUSE filesystem used at Netflix, detailing the symptoms, analysis, and resolution of the problem.
The article discusses how VFX studios can transition to cloud-based rendering to enhance their production capabilities.
This article discusses the challenges faced by Netflix when migrating a Java microservice to a larger AWS instance, which unexpectedly resulted in suboptimal performance.
Netflix Technology Blog
11 min read
Has Summary
--
The article discusses the implementation of a consistent caching mechanism in the Titus Gateway, which is part of Netflix's cloud container runtime.
Netflix Technology Blog
15 min read
Has Summary
--
The article discusses Netflix Maestro, a next-generation workflow orchestrator designed to manage data and machine learning workflows at scale.
Netflix Technology Blog
15 min read
Includes Code
Has Summary
--
The article discusses the evolution of the Axion ML fact store at Netflix, focusing on its design, components, and the lessons learned during its development.
Netflix Technology Blog
14 min read
Has Summary
--
The article discusses the Rapid Event Notification System (RENO) developed by Netflix to ensure real-time communication between backend systems and devices, enhancing the user experience for over 2...
Netflix Technology Blog
10 min read
Has Summary
--
The article discusses Netflix's auto-diagnosis and remediation system, Pensive, which addresses failures in their complex data platform.
Netflix Technology Blog
7 min read
Has Summary
--
The article discusses Netflix's approach to cloud security at scale, particularly through their Detection, Enrichment, and Response platform called Snare.
The article discusses Netflix's advancements in cloud packaging technology to efficiently handle terabyte-sized media files.
The article discusses the Elasticsearch indexing strategy implemented in Netflix's Asset Management Platform (AMP), focusing on how to efficiently manage and query large volumes of digital media as...
The article discusses how Netflix employs eBPF flow logs through a network observability sidecar called Flow Exporter to gain network insights at scale.
Netflix Technology Blog
5 min read
Has Summary
--
The article discusses ConsoleMe, an open-source tool developed by Netflix for managing AWS permissions and access across multiple accounts.
The article discusses the development of Netflix Workstations, which are remote workstations designed to provide artists with the necessary tools and resources to create visual effects and animatio...
The article discusses Netflix's Growth Engineering team's approach to automated imagery generation for enhancing the user experience on the Netflix homepage.
This article discusses the optimization of data warehouse storage at Netflix, focusing on the AutoOptimize system designed to enhance performance and reduce costs.
This article discusses Netflix's implementation of GraphQL Federation, detailing the core infrastructure, developer experience, schema governance, observability, security, and resilience strategies...
The article discusses Bulldozer, a self-serve data platform developed by Netflix for efficiently moving batch data from data warehouse tables to online key-value stores.
The article discusses the development of Netflix's distributed tracing infrastructure, specifically focusing on the design and implementation of Edgar, a troubleshooting tool for streaming sessions.
Netflix Technology Blog
11 min read
Has Summary
--
The article discusses Telltale, a monitoring system developed by Netflix to simplify application monitoring and improve the health assessment of services.
The article discusses the integration of AWS Step Functions with Metaflow, a data science framework open-sourced by Netflix.
Netflix Technology Blog
13 min read
Has Summary
--
The article discusses Netflix's approach to optimizing its data infrastructure costs through transparency and a custom dashboard.
Netflix Technology Blog
8 min read
Includes Code
Has Summary
--
This article discusses how Netflix enriches VPC Flow Logs at hyper scale to enhance network insight within its cloud infrastructure.
Netflix has announced the open-source release of Dispatch, a crisis management orchestration framework designed to streamline incident management by integrating with existing tools like Slack and J...
DBLog is a generic Change-Data-Capture (CDC) framework developed to capture committed changes from databases in real-time and propagate them to downstream consumers.
Netflix Technology Blog
17 min read
Has Summary
--
The article discusses the open-sourcing of Metaflow, a human-centric framework for data science developed by Netflix.
Netflix Technology Blog
9 min read
Has Summary
--
The article discusses Netflix's participation in AWS re:Invent 2019, highlighting their speaking events and sessions focused on technology advancements and operational strategies.
Netflix Technology Blog
7 min read
Has Summary
--
The article announces the open sourcing of Mantis, a platform developed by Netflix for building cost-effective, real-time, operations-focused applications.
The article discusses the infrastructure for Contextual Bandits and Reinforcement Learning, highlighting insights from a meetup hosted at Netflix.
Netflix Technology Blog
11 min read
Includes Code
Has Summary
--
The article discusses how Netflix utilizes a microservices architecture to manage dataset propagation through an in-house pub/sub system called Gutenberg.
Delta is a data synchronization and enrichment platform developed by Netflix to address the challenges of keeping multiple datastores in sync while allowing for data enrichment.
The article discusses Netflix's evolution of its regional evacuation strategy to enhance service availability and resilience.
Netflix Technology Blog
7 min read
Has Summary
--
The article discusses the challenges faced by Netflix in customizing Windows images and how they improved their methodology using DevOps patterns.
The article discusses the evolution of Netflix Conductor, a workflow orchestration engine that has gained significant adoption within Netflix for managing core workflows.
Netflix Technology Blog
11 min read
Has Summary
--
The article discusses Netflix's Hack Day in May 2019, highlighting innovative projects developed by employees to explore new ideas and technologies.