LinkedIn logo

How LinkedIn Uses Avro

43 engineering articles about Avro from LinkedIn's engineering team

Articles

Filter:
LinkedIn logo
LinkedIn
Advanced
The article discusses the evolution of the Venice ingestion pipeline at LinkedIn, highlighting its architectural advancements and optimizations that enable the platform to handle over 230 million r...
Gaojie Liu
14 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses the optimization of LinkedIn Sales Navigatorโ€™s search pipeline using Apache Spark, highlighting the transition from MapReduce to Spark and the resulting performance improvemen...
Chunxu Tang
14 min read
Includes Code
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
This article discusses the challenges and solutions related to Java heap memory and garbage collection, specifically in the context of LinkedIn's FollowFeed service.
Nisheedh Raveendran
10 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
LakeChime is a data trigger service designed to enhance the efficiency of data processing in modern data lakes.
Walaa Eldin Moustafa
16 min read
Includes Code
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses the open-sourcing of AvroTensorDataset, a TensorFlow dataset designed for efficiently processing Avro data.
Jonathan Hung
16 min read
Includes Code
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses LinkedIn's strategy to upscale its profile datastore while reducing operational costs.
LinkedIn Engineering Team
18 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses the concept of Super Tables at LinkedIn, which are designed to address the challenges of data discoverability, reliability, and change management in a rapidly growing data eco...
LinkedIn Engineering Team
15 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses the evolution of the Venice platform to support large fanout use cases at scale, particularly focusing on optimizing performance and scalability for handling high-throughput r...
Gaojie Liu
18 min read
Has Summary
--
LinkedIn logo
LinkedIn
Beginner
The article discusses the importance of data governance in large organizations like LinkedIn, emphasizing the need for effective schema annotations and automation in managing vast datasets.
Joshua Shinavier
8 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses the implementation of a load-balanced Brooklin Mirror Maker at LinkedIn, which efficiently replicates large-scale Kafka clusters.
vaibhav maheshwari
14 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses Opal, a system developed at LinkedIn to manage mutable datasets within a data lake.
Bhupendra Jain
16 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses LinkedIn's transition from a proprietary analytics tech stack to an open-source big data technology stack, detailing the challenges faced and the improvements made during the ...
LinkedIn Engineering Team
10 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
This article discusses the evolution of LinkedIn's Daily Executive Dashboard (DED) from a simple dashboard to a robust enterprise-grade data pipeline.
Jennifer Zheng
16 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article introduces FastIngest, a new evolution of Apache Gobblin designed to enable low-latency data ingestion from Kafka to HDFS using the ORC file format and Apache Iceberg for metadata manag...
Zihan Li
15 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses Coral, an open-sourced SQL translation, analysis, and rewrite engine developed at LinkedIn for modern data lakehouses.
Walaa Eldin Moustafa
20 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses the introduction of Pegasus Data Language (PDL), a new schema definition language designed to replace the existing Pegasus Data Schema (PDSC) for data modeling in Rest.
Yingjie (Nicki) B.
6 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
Dagli is an open-source machine learning library designed for Java and other JVM languages, aimed at simplifying the creation of model pipelines while minimizing technical debt.
LinkedIn logo
LinkedIn
Advanced
The article discusses the LinkedIn Fairness Toolkit (LiFT), an open-source library designed to address bias in AI applications at scale.
Sriram Vasudevan
11 min read
Has Summary
--
LinkedIn logo
LinkedIn
Beginner
The article discusses Spark-TFRecord, a new data source for Apache Spark that aims to provide full support for the TFRecord data format used in TensorFlow.
LinkedIn logo
LinkedIn
Advanced
Apache Pinot 0. 3. 0 is an open-source, distributed OLAP data store developed at LinkedIn, designed for near-real-time analytics.
LinkedIn logo
LinkedIn
Intermediate
The article discusses the implementation of typed AI features in LinkedIn's feed, emphasizing the importance of standardization for rapid experimentation and continuous improvement.
Ian Ackerman
10 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses advanced schema management techniques for Apache Spark applications at LinkedIn, focusing on the integration of Avro schemas with the Hive Metastore to enhance type safety and...
Walaa Eldin Moustafa
14 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
This article provides an in-depth look at LinkedIn's data pipeline monitoring system, focusing on the challenges faced with traditional monitoring methods and how they have evolved to improve visib...
Krishnan Raman
16 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses how LinkedIn customizes Apache Kafka to handle an impressive scale of 7 trillion messages per day.
Jon Lee
10 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
DataHub is a generalized metadata search and discovery tool developed by LinkedIn to enhance the productivity of data teams.
LinkedIn logo
LinkedIn
Intermediate
Avro2TF is an open-source feature transformation engine designed to facilitate the conversion of data into a format compatible with TensorFlow.
Xuhong Zhang
5 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses how Apache Calcite can bridge the gap between offline and nearline computations in big data processing.
Khai Tran
12 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses Brooklin, a data ingestion service developed by LinkedIn to facilitate streaming data from various sources to multiple destinations.
Samarth Shetty
11 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses JARVIS, a search system developed by LinkedIn to enhance the navigation of its source code.
Rajeev Kumar
16 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses the migration of LinkedIn's internal service, Babylonia, from Oracle to Espresso, a distributed NoSQL database.
David Max
11 min read
Has Summary
--
LinkedIn logo
LinkedIn
Beginner
The article discusses the engineering infrastructure at LinkedIn that supports test tracking across various platforms, including iOS, Android, and web.
Ning Zhang
9 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses the open sourcing of Kafka Monitor, a framework designed to monitor and test Kafka deployments.
Dong Lin
10 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses the Kafka ecosystem at LinkedIn, detailing its critical role as a messaging system and the various solutions developed to enhance its functionality.
Joel Koshy
8 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses FollowFeed, LinkedIn's new feed infrastructure designed to enhance performance and relevance for its users.
Ankit Gupta
25 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article 'Running Kafka At Scale' discusses how LinkedIn utilizes Apache Kafka as a crucial messaging system for handling vast amounts of data.
Todd Palino
10 min read
Includes Code
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
Espresso is LinkedIn's distributed, fault-tolerant NoSQL database that supports various applications, including Member Profile and InMail.
LinkedIn Engineering Team
17 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
Apache Helix is a framework designed for developing distributed systems, addressing challenges such as scalability, fault tolerance, and partition management.
Kishore Gopalakrishna
10 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses how LinkedIn utilizes Apache Samza to gain real-time insights into its performance by processing data from numerous services and machines.
LinkedIn Engineering Team
11 min read
Includes Code
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article announces the release of Voldemort 1. 6. 0, a distributed key-value storage system developed at LinkedIn.
LinkedIn Engineering Team
10 min read
Includes Code
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses the significance of the log as a fundamental abstraction in real-time data systems, emphasizing its role in distributed systems, data integration, and stream processing.
LinkedIn logo
LinkedIn
Intermediate
The article discusses DataFu's Hourglass framework, which simplifies incremental data processing in Hadoop.
Matthew Hayes
15 min read
Includes Code
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article announces the release of Voldemort 1. 3. 0, detailing significant performance improvements, new features, and enhanced operability.
Vinoth Chandar
9 min read
Includes Code
Has Summary
--
LinkedIn logo
LinkedIn
Beginner
The article discusses Autometrics, a self-service metrics collection system developed at LinkedIn to streamline the process of metrics collection and visualization.
Grier Johnson
6 min read
Has Summary
--

You've reached the end! All 43 articles loaded.