#
YAML Programming Tutorials & Engineering Articles
192 YAML tutorials, guides, and engineering insights from NVIDIA, Uber, Shopify, and more
Companies Using This
YAML Articles & Tutorials
Filter:
Uber’s Rate Limiting System details the evolution of Uber's approach to managing service overload through a unified rate-limiting architecture.
Chien-Chih Liao, Rahul Gutal, Smit Sheth, Ying Jiang
14 min read
Includes Code
Has Summary
--
The article discusses the introduction of time-based fairshare in NVIDIA Run:ai v2.
Ekin Karabulut
11 min read
Has Summary
--
The article discusses how to utilize NVIDIA Earth-2 to downscale coarse climate projections into high-resolution, bias-corrected fields, enabling better assessment of local climate extremes.
Georg Ertl
11 min read
Includes Code
Has Summary
--
Shopify uses SkyPilot, an open-source framework, to manage GPU-intensive ML training workloads across multiple cloud providers (Nebius and GCP).
Javier Moreno
7 min read
Includes Code
Has Summary
--
Airbnb's Pay as a Local initiative launched 20+ locally relevant payment methods (LPMs) across multiple global markets in 14 months.
The article discusses the importance of evaluations (evals) for AI agents, emphasizing how they help teams identify and resolve issues before they reach production.
The article discusses how to build and orchestrate end-to-end synthetic data generation (SDG) workflows using NVIDIA Isaac Sim and NVIDIA OSMO.
Asawaree Bhide
11 min read
Includes Code
Has Summary
--
The article discusses advancements in real-time decoding and AI inference enhancements in NVIDIA CUDA-Q QEC, focusing on how these improvements facilitate quantum error correction in quantum comput...
Tom Lubowe
6 min read
Includes Code
Has Summary
--
The article discusses how to simulate an accurate radio environment for 5G and 6G systems using the NVIDIA Aerial Omniverse Digital Twin (AODT).
The article discusses the optimization of semiconductor defect classification using generative AI and vision foundation models (VFMs).
Tim Lin
11 min read
Includes Code
Has Summary
--
The article discusses the Skip Softmax technique, a method for accelerating long-context inference in large language models (LLMs) using NVIDIA TensorRT-LLM.
Shopify open-sources Tangle, an ML experimentation platform built to solve six common failure modes in machine learning development.
Shopify Engineering
12 min read
Has Summary
--
The article discusses the use of AI Model Distillation to create efficient financial data workflows, focusing on the optimization of large language models (LLMs) for applications in quantitative fi...
Dhruv Desai
10 min read
Includes Code
Has Summary
--
The article discusses the challenges of identifying the root cause of configuration management failures using Salt at Cloudflare, particularly when dealing with a high volume of changes across nume...
Opeyemi Onikute
17 min read
Includes Code
Has Summary
--
This article discusses how Pinterest successfully reduced Android testing build times by over 36% through the implementation of a runtime-aware sharding mechanism.
The article discusses how NVIDIA's CorrDiff model leverages generative AI for downscaling weather predictions, significantly improving efficiency and reducing computational costs.
Alicia Sui
11 min read
Includes Code
Has Summary
--
The article discusses NVIDIA Grove, a Kubernetes API designed to streamline complex AI inference workloads by managing multicomponent systems.
Sanjay Chatterjee
9 min read
Includes Code
Has Summary
--
The article discusses Spotify's evolution in developer productivity through the use of background coding agents within their Fleet Management system.
Max Charas (Senior Staff Engineer) and Marc Bruggmann (Principal Engineer)
7 min read
Has Summary
--
This article discusses how Uber has integrated explainability into its machine learning platform, Michelangelo, using Integrated Gradients (IG) to provide interpretable attributions for deep learni...
Hugh Chen, Eric Wang, Gaoyuan Huang, Howard Yu, Jia Li, Sally Lee
14 min read
Has Summary
--
The article discusses the development of Agent Skills, which are organized folders of instructions, scripts, and resources that enhance the capabilities of general-purpose agents like Claude.
The article discusses the integration of the NVIDIA KAI Scheduler with Ray, enabling advanced scheduling features like gang scheduling, workload prioritization, and autoscaling in Ray clusters.
Ekin Karabulut
9 min read
Includes Code
Has Summary
--
Netflix redesigned Maestro's internal workflow engine, replacing the legacy Conductor 2. x-based stateless worker model with a custom stateful actor model built on Java 21 virtual threads.
Train a Quadruped Locomotion Policy and Simulate Cloth Manipulation with NVIDIA Isaac Lab and Newton
This article discusses the integration of the Newton physics engine with NVIDIA Isaac Lab for training quadruped locomotion policies and simulating cloth manipulation.
The article discusses the integration of NVIDIA Run:ai v2. 23 with NVIDIA Dynamo to address the challenges of large language model (LLM) inference across distributed environments.
Ekin Karabulut
9 min read
Includes Code
Has Summary
--
This article discusses building a real-time visual inspection pipeline using NVIDIA TAO 6 and NVIDIA DeepStream 8, addressing challenges in defect detection and quality control.
The article discusses the open-source release of Starlark Worker, a tool that integrates Cadence workflow orchestration with the Starlark scripting language.
The article discusses the integration of AI-powered simulations in computer-aided engineering (CAE) to accelerate design processes.
The article discusses the enhancements in reinforcement learning training throughput using NVIDIA NeMo-RL with Megatron-Core support.
Anna Shors
7 min read
Includes Code
Has Summary
--
The article discusses how Continuous Integration and Continuous Delivery/Deployment (CI/CD) practices can be applied to network automation, particularly with Cumulus Linux and the NVIDIA Air digita...
The article discusses the development of Jetflow, a framework designed by Cloudflare's Business Intelligence team to manage complex data ingestion tasks efficiently.
Harry Hough
11 min read
Has Summary
--
The article discusses how NVIDIA Air facilitates network automation using tools like Ansible and Git, emphasizing the importance of coding, versioning, and automating network configurations.
The article discusses how Shopify's Admin was optimized to be 30% faster and prepared for AI integration by transforming its architecture.
Craig Brunner
7 min read
Includes Code
Has Summary
--
The article introduces Roast, a structured AI workflow orchestration framework developed by Shopify to enhance developer productivity by integrating AI agents with traditional coding practices.
Obie Fernandez
10 min read
Includes Code
Has Summary
--
The article discusses the NVIDIA AI Blueprint for building efficient AI agents through model distillation, focusing on the challenges of scaling intelligent applications and managing inference cost...
Daniel Glogowski
10 min read
Includes Code
Has Summary
--
The article discusses Uber's implementation of a configuration-driven archival and retrieval framework designed to manage vast amounts of regulatory data efficiently.
NVIDIA Dynamo's v0.
Amr Elmeleegy
7 min read
Has Summary
--
Google Cloud has announced the general availability of the Apigee APIM Operator, which enhances API management capabilities within Google Kubernetes Engine (GKE).
Sanjay Pujare
2 min read
Includes Code
Has Summary
--
The article discusses the NVIDIA NeMo Agent toolkit, an open-source library designed for building and optimizing AI agent workflows.
Wenqi Glantz
11 min read
Includes Code
Has Summary
--
The article discusses how Slack's DevXP team optimized their end-to-end (E2E) testing pipeline, significantly reducing build times and eliminating unnecessary frontend builds.
The article introduces GPT-4. 1, a new series of models in the API that significantly enhance coding, instruction following, and long context comprehension.
The article highlights the innovative approaches taken by startups Lamatic AI and Skyward AI in building AI agent platforms using Cloudflare's infrastructure.
Christopher Rotas
12 min read
Includes Code
Has Summary
--
The article discusses the introduction of Managed I/O in Google Cloud Dataflow, which simplifies the management of Apache Beam I/O connectors.
Chamikara Jayalath
8 min read
Includes Code
Has Summary
--
The article provides insights into the Certified Backstage Associate (CBA) exam, a new certification offered by The Linux Foundation that validates skills in building and managing Backstage, an ope...
André Wanlin
10 min read
Has Summary
--
SafetyCulture documents their migration from Helm-based deployment pipelines to GitOps with ArgoCD for hundreds of microservices across multiple Kubernetes clusters.
The article discusses IssueOps, a methodology that automates CI/CD workflows using GitHub Issues and Actions.
Nick Alteen
21 min read
Includes Code
Has Summary
--
The article discusses the advancements in AI-driven biological research with the introduction of Evo 2, a foundation model that integrates genomic, RNA, and protein data across multiple life domain...
Kyle Tretina
9 min read
Includes Code
Has Summary
--
The article discusses the development of Turbo, a YAML-based configuration system designed to enhance machine learning model deployment at Ramp.
Ryan Stevens, Ryne Carbone
9 min read
Has Summary
--
The article discusses the continued pretraining of the Colosseum 355B large language model (LLM) by Domyn, leveraging NVIDIA DGX Cloud infrastructure.
Martin Cimmino
16 min read
Includes Code
Has Summary
--
The article discusses Cloudflare's approach to upgrading their developer documentation by treating it as an open source product.
Kim Jeske
8 min read
Includes Code
Has Summary
--