How LinkedIn Uses Avro
43 engineering articles about Avro from LinkedIn's engineering team
Other LinkedIn Technologies
Other Companies Using Avro
Articles
Filter:
The article discusses the evolution of the Venice ingestion pipeline at LinkedIn, highlighting its architectural advancements and optimizations that enable the platform to handle over 230 million r...
The article discusses the optimization of LinkedIn Sales Navigatorโs search pipeline using Apache Spark, highlighting the transition from MapReduce to Spark and the resulting performance improvemen...
This article discusses the challenges and solutions related to Java heap memory and garbage collection, specifically in the context of LinkedIn's FollowFeed service.
LakeChime is a data trigger service designed to enhance the efficiency of data processing in modern data lakes.
Walaa Eldin Moustafa
16 min read
Includes Code
Has Summary
--
The article discusses the open-sourcing of AvroTensorDataset, a TensorFlow dataset designed for efficiently processing Avro data.
Jonathan Hung
16 min read
Includes Code
Has Summary
--
The article discusses LinkedIn's strategy to upscale its profile datastore while reducing operational costs.
The article discusses the concept of Super Tables at LinkedIn, which are designed to address the challenges of data discoverability, reliability, and change management in a rapidly growing data eco...
The article discusses the evolution of the Venice platform to support large fanout use cases at scale, particularly focusing on optimizing performance and scalability for handling high-throughput r...
The article discusses the importance of data governance in large organizations like LinkedIn, emphasizing the need for effective schema annotations and automation in managing vast datasets.
The article discusses the implementation of a load-balanced Brooklin Mirror Maker at LinkedIn, which efficiently replicates large-scale Kafka clusters.
vaibhav maheshwari
14 min read
Has Summary
--
The article discusses Opal, a system developed at LinkedIn to manage mutable datasets within a data lake.
The article discusses LinkedIn's transition from a proprietary analytics tech stack to an open-source big data technology stack, detailing the challenges faced and the improvements made during the ...
This article discusses the evolution of LinkedIn's Daily Executive Dashboard (DED) from a simple dashboard to a robust enterprise-grade data pipeline.
The article introduces FastIngest, a new evolution of Apache Gobblin designed to enable low-latency data ingestion from Kafka to HDFS using the ORC file format and Apache Iceberg for metadata manag...
The article discusses Coral, an open-sourced SQL translation, analysis, and rewrite engine developed at LinkedIn for modern data lakehouses.
The article discusses the introduction of Pegasus Data Language (PDL), a new schema definition language designed to replace the existing Pegasus Data Schema (PDSC) for data modeling in Rest.
Dagli is an open-source machine learning library designed for Java and other JVM languages, aimed at simplifying the creation of model pipelines while minimizing technical debt.
Jeff Pasternack
14 min read
Has Summary
--
The article discusses the LinkedIn Fairness Toolkit (LiFT), an open-source library designed to address bias in AI applications at scale.
Sriram Vasudevan
11 min read
Has Summary
--
The article discusses Spark-TFRecord, a new data source for Apache Spark that aims to provide full support for the TFRecord data format used in TensorFlow.
Jun Shi
5 min read
Has Summary
--
Apache Pinot 0. 3. 0 is an open-source, distributed OLAP data store developed at LinkedIn, designed for near-real-time analytics.
Mayank S.
9 min read
Has Summary
--
The article discusses the implementation of typed AI features in LinkedIn's feed, emphasizing the importance of standardization for rapid experimentation and continuous improvement.
Ian Ackerman
10 min read
Has Summary
--
The article discusses advanced schema management techniques for Apache Spark applications at LinkedIn, focusing on the integration of Avro schemas with the Hive Metastore to enhance type safety and...
This article provides an in-depth look at LinkedIn's data pipeline monitoring system, focusing on the challenges faced with traditional monitoring methods and how they have evolved to improve visib...
The article discusses how LinkedIn customizes Apache Kafka to handle an impressive scale of 7 trillion messages per day.
Jon Lee
10 min read
Has Summary
--
DataHub is a generalized metadata search and discovery tool developed by LinkedIn to enhance the productivity of data teams.
Mars Lan
17 min read
Has Summary
--
Avro2TF is an open-source feature transformation engine designed to facilitate the conversion of data into a format compatible with TensorFlow.
Xuhong Zhang
5 min read
Has Summary
--
The article discusses how Apache Calcite can bridge the gap between offline and nearline computations in big data processing.
Khai Tran
12 min read
Has Summary
--
The article discusses Brooklin, a data ingestion service developed by LinkedIn to facilitate streaming data from various sources to multiple destinations.
The article discusses JARVIS, a search system developed by LinkedIn to enhance the navigation of its source code.
The article discusses the migration of LinkedIn's internal service, Babylonia, from Oracle to Espresso, a distributed NoSQL database.
The article discusses the engineering infrastructure at LinkedIn that supports test tracking across various platforms, including iOS, Android, and web.
Ning Zhang
9 min read
Has Summary
--
The article discusses the open sourcing of Kafka Monitor, a framework designed to monitor and test Kafka deployments.
Dong Lin
10 min read
Has Summary
--
The article discusses the Kafka ecosystem at LinkedIn, detailing its critical role as a messaging system and the various solutions developed to enhance its functionality.
Joel Koshy
8 min read
Has Summary
--
The article discusses FollowFeed, LinkedIn's new feed infrastructure designed to enhance performance and relevance for its users.
The article 'Running Kafka At Scale' discusses how LinkedIn utilizes Apache Kafka as a crucial messaging system for handling vast amounts of data.
Todd Palino
10 min read
Includes Code
Has Summary
--
Espresso is LinkedIn's distributed, fault-tolerant NoSQL database that supports various applications, including Member Profile and InMail.
Apache Helix is a framework designed for developing distributed systems, addressing challenges such as scalability, fault tolerance, and partition management.
Kishore Gopalakrishna
10 min read
Has Summary
--
The article discusses how LinkedIn utilizes Apache Samza to gain real-time insights into its performance by processing data from numerous services and machines.
LinkedIn Engineering Team
11 min read
Includes Code
Has Summary
--
The article announces the release of Voldemort 1. 6. 0, a distributed key-value storage system developed at LinkedIn.
The article discusses the significance of the log as a fundamental abstraction in real-time data systems, emphasizing its role in distributed systems, data integration, and stream processing.
Jay Kreps
63 min read
Has Summary
--
The article discusses DataFu's Hourglass framework, which simplifies incremental data processing in Hadoop.
The article announces the release of Voldemort 1. 3. 0, detailing significant performance improvements, new features, and enhanced operability.
The article discusses Autometrics, a self-service metrics collection system developed at LinkedIn to streamline the process of metrics collection and visualization.
You've reached the end! All 43 articles loaded.