#
Google Cloud Storage Programming Tutorials & Engineering Articles
54 Google Cloud Storage tutorials, guides, and engineering insights from Shopify, NVIDIA, Uber, and more
Companies Using This
Google Cloud Storage Articles & Tutorials
Filter:
This article details how ClickPy, a free Python download statistics platform powered by ClickHouse, scaled to over 2 trillion rows by replacing its legacy cron-based ingestion pipeline with ClickPi...
8 min read
Includes Code
Has Summary
--
This article details how Uber built and scaled Apache Hudi to power one of the world's largest data lakes, managing 19,500 datasets with trillions of records across a multi-hundred-petabyte reposit...
Prashant Wason, Balajee Nagasubramaniam, Surya Prasanna Kumar Yalla, Meenal Binwade, Xinli Shang, Jack Song
19 min read
Has Summary
--
The article discusses Uber's implementation of I/O observability for its massive petabyte-scale data lake, focusing on the challenges and solutions in monitoring data access patterns across its hyb...
Arnav Balyan, Kartik Bommepally, Amruth Sampath, Jing Zhao, Akshayaprakash Sharma
10 min read
Has Summary
--
The article discusses the limitations of traditional request-response models in AI agent development and proposes a real-time bidirectional streaming architecture as a solution.
Hangfei Lin
7 min read
Includes Code
Has Summary
--
The article discusses the potential of lakehouses using open table formats like Apache Iceberg and Delta Lake for observability, highlighting their advantages in scalability, cost-effectiveness, an...
Melvyn Peignon & Dale McDiarmid
24 min read
Includes Code
Has Summary
--
The article discusses building high-performance data pipelines using Grain, a data loading library for JAX, and ArrayRecord, an efficient file format.
Jiyang Kang, Shivaji Dutta, Ihor Indyk, Felix Chern
10 min read
Includes Code
Has Summary
--
The article discusses the evolution of Uber's Search Platform, highlighting its transition from Elasticsearch to an in-house solution called Sia, and ultimately to the adoption of OpenSearch.
Yupeng Fu, Shubham Gupta, Shanshan Song, Mingmin Chen
15 min read
Has Summary
--
The article discusses the latest enhancements in RAPIDS, including zero-code-change acceleration for Python machine learning, significant IO performance improvements, and out-of-core XGBoost capabi...
ApacheAzureAzure Blob StorageDaskGeminiGoogle CloudGoogle Cloud StorageLightGBMNetworkXPolarsPythonscikit-learnXGBoost
Nick Becker
9 min read
Includes Code
Has Summary
--
The article discusses the introduction of Managed I/O in Google Cloud Dataflow, which simplifies the management of Apache Beam I/O connectors.
Chamikara Jayalath
8 min read
Includes Code
Has Summary
--
The article discusses how to integrate Stripe Data Pipeline (SDP) with Google Cloud Storage (GCS) and BigQuery to enhance analytics capabilities for businesses.
Sushant Jain
8 min read
Includes Code
Has Summary
--
The article discusses optimizing high-performance remote I/O operations using NVIDIA KvikIO for data analysis workloads on cloud object storage services.
Tom Augspurger
8 min read
Includes Code
Has Summary
--
This article discusses the implementation of a Medallion architecture for Bluesky data using ClickHouse, focusing on the challenges of handling high-volume JSON event streams.
PME Team
30 min read
Includes Code
Has Summary
--
Cloudflare has announced significant upgrades to its AI platform, including Workers AI, AI Gateway, and Vectorize, aimed at enhancing performance, flexibility, and cost-effectiveness for developers.
Michelle Chen
14 min read
Has Summary
--
The article discusses Uber's migration of its batch data platform to the cloud, focusing on the implementation of DataMesh principles.
Arun Mahadeva Iyer, Abhi Khune, Sahana Bhat
11 min read
Has Summary
--
The article discusses the implementation of Gemma 2, a lightweight large language model (LLM) by Google, for processing streaming data with Dataflow.
Reza Rokni, Ravin Kumar
16 min read
Includes Code
Has Summary
--
This article discusses Uber's migration of its Apache Hadoop-based data lake to Google Cloud Storage (GCS) and the security measures implemented during this transition.
Matt Mathew, Alexander Gulko, Lei Sun, KK Sriramadhesikan, Alan Cao, Omkar Kakade
20 min read
Includes Code
Has Summary
--
Uber is modernizing its batch data infrastructure by migrating to Google Cloud Platform (GCP) to enhance data analytics and machine learning capabilities.
Abhi Khune, Arun Mahadeva Iyer, Sahana Bhat, Matt Mathew
7 min read
Has Summary
--
Developer Week 2024 concluded with significant product announcements aimed at enhancing the developer experience on Cloudflare's platform.
Cloudflare WorkersGoogle CloudGoogle Cloud StorageJavaScriptNext.jsPrismaRate LimitingSQLSWRWebAssemblyWebRTC
Phillip Jones
8 min read
Has Summary
--
Cloudflare R2 has introduced three new features: Event Notifications, Super Slurper for Google Cloud Storage migrations, and an Infrequent Access storage tier.
Matt DeBoard
5 min read
Includes Code
Has Summary
--
DataCentral is Uber's proprietary platform designed for Big Data observability, chargeback, and governance.
Arnav Balyan, Atul Mantri, Krishna Karri, Amruth Sampath
10 min read
Has Summary
--
The article discusses how Uber utilizes Apache Pinot for real-time analytics of mobile app crashes, enhancing their ability to detect and resolve issues quickly.
Kriti Dangi, Anil Purohit, Parijat Bansal, Rohit Yadav
17 min read
Has Summary
--
The article discusses the evolution of Data Lifecycle Management (DLM) at Uber, detailing the journey from initial implementations to the development of a unified system.
Sumanth Srinivasa Krishnaswamy, Matt Mathew, Sonali Goyal
13 min read
Has Summary
--
This article discusses fleet-wide refactoring at Spotify, detailing the tools and strategies developed to manage code changes across thousands of Git repositories.
Matt Brown
25 min read
Includes Code
Has Summary
--
The article discusses how retailers can enhance their data analytics capabilities using GPU-accelerated Apache Spark workloads on Google Cloud Dataproc.
Saurav Agarwal
12 min read
Includes Code
Has Summary
--
The article discusses Tophat, a tool developed by Shopify to enhance the mobile developer experience by streamlining the testing process for mobile applications.
Lukas Romsicki
14 min read
Includes Code
Has Summary
--
This article explores the integration of ClickHouse and BigQuery for real-time analytics, highlighting their complementary strengths.
Dale McDiarmid
28 min read
Includes Code
Has Summary
--
This article provides a comprehensive guide on deploying machine learning models on Google Cloud Platform (GCP).
AutoMLAWSAzureFlaskGoogle CloudGoogle Cloud FunctionsGoogle Cloud StorageHTMLIrisMachine LearningPandasPythonscikit-learnServerlessVertex AI
Kurtis Pykes
10 min read
Includes Code
Has Summary
--
The article discusses the complexities of tax compliance for U. S. merchants and details the development of Shopify's Tax Insights feature.
Siraj Ali
12 min read
Has Summary
--
This article discusses the integration of NVIDIA TensorRT with Apache Beam SDK to streamline and enhance machine learning predictions at scale.
ApacheDeep LearningDockerGoogle CloudGoogle Cloud StorageGoogle Compute EngineMachine LearningPythonPyTorchTensorFlowtorchvision
Alexander Zhurkevich
11 min read
Includes Code
Has Summary
--
The article discusses the implementation of Server Sent Events (SSE) to enhance real-time data streaming for Shopify's BFCM Live Map.
Bao Nguyen
10 min read
Has Summary
--
The article discusses how a team at Shopify discovered a query in BigQuery that could potentially cost them nearly $1 million per month and outlines the steps they took to reduce this cost signific...
Calvin Zhou
6 min read
Includes Code
Has Summary
--
The article discusses the experiences and lessons learned from running Apache Airflow at scale within Shopify.
Megan Parker
14 min read
Includes Code
Has Summary
--
The article announces the Cloudflare Images Sourcing Kit, which enables users to define multiple sources for bulk image imports into Cloudflare Images.
Paulo Costa
5 min read
Includes Code
Has Summary
--
The article discusses the optimization of accessing Parquet data using the fsspec library, particularly through the new fsspec. parquet module.
Rick Zamora
11 min read
Includes Code
Has Summary
--
This article provides seven actionable tips for optimizing Apache Flink applications, focusing on performance and resiliency.
Yaroslav Tkachenko
16 min read
Includes Code
Has Summary
--
The article introduces XCRemoteCache, an open-source remote caching tool developed by Spotify that significantly reduces clean build times for iOS applications by 70%.
Bartosz Polaczyk
9 min read
Includes Code
Has Summary
--
The article discusses the development and operationalization of recommender systems using NVIDIA Merlin and MLOps practices, emphasizing the importance of continuous improvement for maintaining com...
Shashank Verma
11 min read
Has Summary
--
Pinterest has open sourced Querybook, a collaborative big data hub designed to improve data access and analysis for teams, especially in a remote working environment.
Pinterest Engineering
7 min read
Has Summary
--
The article discusses how to enhance data processing for analytics and AI using Alluxio and NVIDIA GPUs.
ApacheApache SparkAzureCachingGoogle CloudGoogle Cloud StorageGoogle Compute EngineKubernetesPyTorchSQLTensorFlow
Dong Meng
9 min read
Includes Code
Has Summary
--
The article discusses Shopify's evolution from a traditional data warehouse to a more dynamic data platform using Change Data Capture (CDC) and event streaming technologies.
John Martin
25 min read
Includes Code
Has Summary
--
The article discusses the development of a Question and Answering (QA) service utilizing Natural Language Processing (NLP) with NVIDIA NGC and Google Cloud.
BERTDockerGoogle CloudGoogle Cloud StoragegRPCNatural Language ProcessingPythonPyTorchShellTensorFlowTransformersYAML
James Sohn
10 min read
Includes Code
Has Summary
--
This article provides a comprehensive guide on deploying a Natural Language Processing service, specifically a BERT Question-Answering model, on a Kubernetes cluster using Helm charts from NVIDIA N...
BERTDockerGoogle CloudGoogle Cloud StoragegRPCHelmIstioKubernetesNatural Language ProcessingShellTensorFlowYAML
James Sohn
11 min read
Includes Code
Has Summary
--
The article discusses strategies for identifying and fixing slow code in Ruby applications, emphasizing the importance of profiling and benchmarking.
Jay Lim
12 min read
Includes Code
Has Summary
--
Apache Pinot 0. 3. 0 is an open-source, distributed OLAP data store developed at LinkedIn, designed for near-real-time analytics.
Mayank S.
9 min read
Has Summary
--
Spotify has streamlined its content delivery network (CDN) services using Fastly's edge cloud platform to enhance the streaming experience for over 230 million users.
Spotify Engineering
7 min read
Has Summary
--
Flan Scan is Cloudflare's open-source lightweight network vulnerability scanner designed to simplify deployment and enhance security.
Nadin El-Yabroudi
7 min read
Has Summary
--
This article discusses how Shopify manages MySQL backups and restores at a petabyte scale using Google Cloud Platform's Persistent Disk snapshots.
Akshay Suryawanshi
7 min read
Has Summary
--
This article discusses the implementation of interactive Leaflet maps for the game Factorio, enabling players to share their factory designs.
Guest Author
5 min read
Includes Code
Has Summary
--
The article discusses the integration of DroneDeploy with Cloudflare Workers, highlighting how this collaboration enhances the functionality and performance of DroneDeploy's cloud platform for dron...
Jonathan Bruce
8 min read
Includes Code
Has Summary
--
Cloudflare has launched Cloudflare Workers, enabling users to run JavaScript at the edge of their network using the Service Workers API.
Kenton Varda
6 min read
Includes Code
Has Summary
--