Spark Summit 2017: Research, Open Source, and Community

Carl Steinbach
4 min readintermediate
--
View Original

Overview

The article discusses the Spark Summit 2017, highlighting the contributions of LinkedIn engineers and data scientists to the Apache Spark community. It features summaries of key presentations and a meetup event, showcasing the impact of Spark on data processing and analysis at LinkedIn.

What You'll Learn

1

How to leverage Apache Spark for large-scale graph analysis

2

Why Dr. Elephant is essential for optimizing Apache Spark jobs

3

When to use Spark-ML for sales intelligence applications

Key Questions Answered

What are the key topics covered in the Spark Summit presentations?
The Spark Summit presentations cover various topics including large-scale graph analysis, sales intelligence using Spark, multi-label graph computations, and the use of Dr. Elephant for optimizing Spark jobs. Each presentation highlights specific techniques and applications relevant to Apache Spark.
How does Dr. Elephant improve Apache Spark job performance?
Dr. Elephant enhances Apache Spark job performance by providing recommendations on workload tuning and configuration. It was originally developed by LinkedIn and is now utilized in multiple environments to increase developer productivity and cluster efficiency.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend
Apache Spark
Used for data processing and analysis at LinkedIn.
Tool
Dr. Elephant
Helps in monitoring and tuning Apache Spark jobs.

Key Actionable Insights

1
Utilize Apache Spark for graph analysis to enhance machine learning applications.
Graph analysis techniques, such as random walks, can significantly improve personalized recommendations and insights in various applications, making them more effective.
2
Implement Dr. Elephant to monitor and tune your Spark jobs for better performance.
By using Dr. Elephant, developers can avoid common pitfalls in Spark job configurations, leading to improved efficiency and reduced operational costs.

Common Pitfalls

1
Neglecting to monitor Spark job performance can lead to inefficiencies.
Without tools like Dr. Elephant, developers may miss critical insights into job performance, resulting in suboptimal configurations and wasted resources.

Related Concepts

Apache Spark
Graph Analysis Techniques
Sales Intelligence Applications
Performance Tuning In Big Data Environments