#

Google Cloud Storage Programming Tutorials & Engineering Articles

54 Google Cloud Storage tutorials, guides, and engineering insights from Shopify, NVIDIA, Uber, and more

Google Cloud Storage Articles & Tutorials

Filter:
ClickHouse logo
ClickHouse
Advanced
This article details how ClickPy, a free Python download statistics platform powered by ClickHouse, scaled to over 2 trillion rows by replacing its legacy cron-based ingestion pipeline with ClickPi...
8 min read
Includes Code
Has Summary
--
Uber logo
Uber
Advanced
This article details how Uber built and scaled Apache Hudi to power one of the world's largest data lakes, managing 19,500 datasets with trillions of records across a multi-hundred-petabyte reposit...
Prashant Wason, Balajee Nagasubramaniam, Surya Prasanna Kumar Yalla, Meenal Binwade, Xinli Shang, Jack Song
19 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses Uber's implementation of I/O observability for its massive petabyte-scale data lake, focusing on the challenges and solutions in monitoring data access patterns across its hyb...
Arnav Balyan, Kartik Bommepally, Amruth Sampath, Jing Zhao, Akshayaprakash Sharma
10 min read
Has Summary
--
Google logo
Google
Advanced
The article discusses the limitations of traditional request-response models in AI agent development and proposes a real-time bidirectional streaming architecture as a solution.
Hangfei Lin
7 min read
Includes Code
Has Summary
--
ClickHouse logo
ClickHouse
Beginner
The article discusses the potential of lakehouses using open table formats like Apache Iceberg and Delta Lake for observability, highlighting their advantages in scalability, cost-effectiveness, an...
Melvyn Peignon & Dale McDiarmid
24 min read
Includes Code
Has Summary
--
Google logo
Google
Advanced
The article discusses building high-performance data pipelines using Grain, a data loading library for JAX, and ArrayRecord, an efficient file format.
Jiyang Kang, Shivaji Dutta, Ihor Indyk, Felix Chern
10 min read
Includes Code
Has Summary
--
Uber logo
Uber
Advanced
The article discusses the evolution of Uber's Search Platform, highlighting its transition from Elasticsearch to an in-house solution called Sia, and ultimately to the adoption of OpenSearch.
Yupeng Fu, Shubham Gupta, Shanshan Song, Mingmin Chen
15 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the latest enhancements in RAPIDS, including zero-code-change acceleration for Python machine learning, significant IO performance improvements, and out-of-core XGBoost capabi...
Google logo
Google
Intermediate
The article discusses the introduction of Managed I/O in Google Cloud Dataflow, which simplifies the management of Apache Beam I/O connectors.
Chamikara Jayalath
8 min read
Includes Code
Has Summary
--
Stripe logo
Stripe
Advanced
The article discusses how to integrate Stripe Data Pipeline (SDP) with Google Cloud Storage (GCS) and BigQuery to enhance analytics capabilities for businesses.
Sushant Jain
8 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses optimizing high-performance remote I/O operations using NVIDIA KvikIO for data analysis workloads on cloud object storage services.
Tom Augspurger
8 min read
Includes Code
Has Summary
--
ClickHouse logo
ClickHouse
Intermediate
This article discusses the implementation of a Medallion architecture for Bluesky data using ClickHouse, focusing on the challenges of handling high-volume JSON event streams.
PME Team
30 min read
Includes Code
Has Summary
--
Cloudflare logo
Cloudflare
Intermediate
Cloudflare has announced significant upgrades to its AI platform, including Workers AI, AI Gateway, and Vectorize, aimed at enhancing performance, flexibility, and cost-effectiveness for developers.
Michelle Chen
14 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses Uber's migration of its batch data platform to the cloud, focusing on the implementation of DataMesh principles.
Arun Mahadeva Iyer, Abhi Khune, Sahana Bhat
11 min read
Has Summary
--
Google logo
Google
Advanced
The article discusses the implementation of Gemma 2, a lightweight large language model (LLM) by Google, for processing streaming data with Dataflow.
Reza Rokni, Ravin Kumar
16 min read
Includes Code
Has Summary
--
Uber logo
Uber
Advanced
This article discusses Uber's migration of its Apache Hadoop-based data lake to Google Cloud Storage (GCS) and the security measures implemented during this transition.
Matt Mathew, Alexander Gulko, Lei Sun, KK Sriramadhesikan, Alan Cao, Omkar Kakade
20 min read
Includes Code
Has Summary
--
Uber logo
Uber
Advanced
Uber is modernizing its batch data infrastructure by migrating to Google Cloud Platform (GCP) to enhance data analytics and machine learning capabilities.
Abhi Khune, Arun Mahadeva Iyer, Sahana Bhat, Matt Mathew
7 min read
Has Summary
--
Cloudflare logo
Cloudflare
Intermediate
Developer Week 2024 concluded with significant product announcements aimed at enhancing the developer experience on Cloudflare's platform.
Cloudflare logo
Cloudflare
Intermediate
Cloudflare R2 has introduced three new features: Event Notifications, Super Slurper for Google Cloud Storage migrations, and an Infrequent Access storage tier.
Matt DeBoard
5 min read
Includes Code
Has Summary
--
Uber logo
Uber
Intermediate
DataCentral is Uber's proprietary platform designed for Big Data observability, chargeback, and governance.
Arnav Balyan, Atul Mantri, Krishna Karri, Amruth Sampath
10 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses how Uber utilizes Apache Pinot for real-time analytics of mobile app crashes, enhancing their ability to detect and resolve issues quickly.
Kriti Dangi, Anil Purohit, Parijat Bansal, Rohit Yadav
17 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses the evolution of Data Lifecycle Management (DLM) at Uber, detailing the journey from initial implementations to the development of a unified system.
Sumanth Srinivasa Krishnaswamy, Matt Mathew, Sonali Goyal
13 min read
Has Summary
--
Spotify logo
Spotify
Advanced
This article discusses fleet-wide refactoring at Spotify, detailing the tools and strategies developed to manage code changes across thousands of Git repositories.
Matt Brown
25 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses how retailers can enhance their data analytics capabilities using GPU-accelerated Apache Spark workloads on Google Cloud Dataproc.
Saurav Agarwal
12 min read
Includes Code
Has Summary
--
Shopify logo
Shopify
Advanced
The article discusses Tophat, a tool developed by Shopify to enhance the mobile developer experience by streamlining the testing process for mobile applications.
Lukas Romsicki
14 min read
Includes Code
Has Summary
--
ClickHouse logo
ClickHouse
Intermediate
This article explores the integration of ClickHouse and BigQuery for real-time analytics, highlighting their complementary strengths.
Dale McDiarmid
28 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article provides a comprehensive guide on deploying machine learning models on Google Cloud Platform (GCP).
Shopify logo
Shopify
Intermediate
The article discusses the complexities of tax compliance for U. S. merchants and details the development of Shopify's Tax Insights feature.
Siraj Ali
12 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article discusses the integration of NVIDIA TensorRT with Apache Beam SDK to streamline and enhance machine learning predictions at scale.
Shopify logo
Shopify
Intermediate
The article discusses the implementation of Server Sent Events (SSE) to enhance real-time data streaming for Shopify's BFCM Live Map.
Shopify logo
Shopify
Intermediate
The article discusses how a team at Shopify discovered a query in BigQuery that could potentially cost them nearly $1 million per month and outlines the steps they took to reduce this cost signific...
Calvin Zhou
6 min read
Includes Code
Has Summary
--
Shopify logo
Shopify
Intermediate
The article discusses the experiences and lessons learned from running Apache Airflow at scale within Shopify.
Cloudflare logo
Cloudflare
Beginner
The article announces the Cloudflare Images Sourcing Kit, which enables users to define multiple sources for bulk image imports into Cloudflare Images.
Paulo Costa
5 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the optimization of accessing Parquet data using the fsspec library, particularly through the new fsspec. parquet module.
Rick Zamora
11 min read
Includes Code
Has Summary
--
Shopify logo
Shopify
Advanced
This article provides seven actionable tips for optimizing Apache Flink applications, focusing on performance and resiliency.
Yaroslav Tkachenko
16 min read
Includes Code
Has Summary
--
Spotify logo
Spotify
Advanced
The article introduces XCRemoteCache, an open-source remote caching tool developed by Spotify that significantly reduces clean build times for iOS applications by 70%.
Bartosz Polaczyk
9 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the development and operationalization of recommender systems using NVIDIA Merlin and MLOps practices, emphasizing the importance of continuous improvement for maintaining com...
Pinterest logo
Pinterest
Beginner
Pinterest has open sourced Querybook, a collaborative big data hub designed to improve data access and analysis for teams, especially in a remote working environment.
Pinterest Engineering
7 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how to enhance data processing for analytics and AI using Alluxio and NVIDIA GPUs.
Shopify logo
Shopify
Intermediate
The article discusses Shopify's evolution from a traditional data warehouse to a more dynamic data platform using Change Data Capture (CDC) and event streaming technologies.
NVIDIA logo
NVIDIA
Advanced
The article discusses the development of a Question and Answering (QA) service utilizing Natural Language Processing (NLP) with NVIDIA NGC and Google Cloud.
NVIDIA logo
NVIDIA
Advanced
This article provides a comprehensive guide on deploying a Natural Language Processing service, specifically a BERT Question-Answering model, on a Kubernetes cluster using Helm charts from NVIDIA N...
Shopify logo
Shopify
Advanced
The article discusses strategies for identifying and fixing slow code in Ruby applications, emphasizing the importance of profiling and benchmarking.
Jay Lim
12 min read
Includes Code
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
Apache Pinot 0. 3. 0 is an open-source, distributed OLAP data store developed at LinkedIn, designed for near-real-time analytics.
Spotify logo
Spotify
Intermediate
Spotify has streamlined its content delivery network (CDN) services using Fastly's edge cloud platform to enhance the streaming experience for over 230 million users.
Spotify Engineering
7 min read
Has Summary
--
Cloudflare logo
Cloudflare
Intermediate
Flan Scan is Cloudflare's open-source lightweight network vulnerability scanner designed to simplify deployment and enhance security.
Shopify logo
Shopify
Intermediate
This article discusses how Shopify manages MySQL backups and restores at a petabyte scale using Google Cloud Platform's Persistent Disk snapshots.
Akshay Suryawanshi
7 min read
Has Summary
--
Cloudflare logo
Cloudflare
Advanced
This article discusses the implementation of interactive Leaflet maps for the game Factorio, enabling players to share their factory designs.
Guest Author
5 min read
Includes Code
Has Summary
--
Cloudflare logo
Cloudflare
Intermediate
The article discusses the integration of DroneDeploy with Cloudflare Workers, highlighting how this collaboration enhances the functionality and performance of DroneDeploy's cloud platform for dron...
Jonathan Bruce
8 min read
Includes Code
Has Summary
--
Cloudflare logo
Cloudflare
Intermediate
Cloudflare has launched Cloudflare Workers, enabling users to run JavaScript at the edge of their network using the Service Workers API.