Introducing Velox: An open source unified execution engine

Meta is introducing Velox, an open source unified execution engine aimed at accelerating data management systems and streamlining their development. Velox is under active development. Experimental …

Pedro Pedreira
10 min readintermediate
--
View Original

Overview

Meta has introduced Velox, an open source unified execution engine designed to enhance data management systems and streamline their development. Velox aims to consolidate fragmented data computation engines, improving efficiency and consistency across various workloads.

What You'll Learn

1

How to integrate Velox into existing data management systems like Presto and Spark

2

Why unifying execution engines can improve data processing efficiency

3

How to leverage runtime optimizations provided by Velox for better performance

Key Questions Answered

What is Velox and what are its main features?
Velox is an open source unified execution engine that accelerates data management systems by consolidating common components of data computation engines. It supports various workloads, offers extensibility, and promotes efficient data processing through optimizations like filter reordering and adaptive column prefetching.
How does Velox improve performance in data management systems?
Velox enhances performance by providing a unified execution engine that reduces fragmentation among data computation engines. Experimental results show speedups of up to 10x for CPU-bound queries and 6-7x for real-world production traffic, demonstrating significant efficiency gains.
What integrations does Velox support?
Velox is integrated with several data systems at Meta, including Presto, Spark, and TorchArrow. These integrations allow Velox to function as a common execution engine, facilitating consistent behavior across different computation engines and improving user experience.
What are the main components of Velox?
Velox includes several key components such as a generic type system, a vectorized expression evaluation engine, APIs for custom functions, implementations of common SQL operators, and resource management primitives. These components work together to optimize data processing tasks.

Key Statistics & Figures

Speedup in CPU-bound queries
close to an order of magnitude
Measured during experiments using the TPC-H benchmark with Velox integrated into Presto.
Average speedup for shuffle-bound queries
3-6x
Observed in experiments comparing Velox's performance in real-world production traffic.
Average speedup in data querying
6-7x
Achieved during testing with various interactive analytical tools at Meta.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Execution Engine
Velox
Used as a unified execution engine across different data management systems.
Data Management System
Presto
Integrated with Velox through the Prestissimo project to enhance query performance.
Data Processing Framework
Spark
Utilizes Velox via the Gluten project to enable C++ execution engines.
Data Preprocessing Library
Torcharrow
Translates dataframe representations into Velox plans for execution in ML workflows.

Key Actionable Insights

1
Integrating Velox into your data management systems can significantly enhance performance and reduce development complexity.
By consolidating execution engines, Velox allows for consistent semantics and optimizations across various workloads, making it easier for developers to manage and scale their data systems.
2
Utilize Velox's runtime optimizations to improve query performance in your applications.
Implementing features like dynamic filter pushdown and adaptive column prefetching can lead to substantial speed improvements, especially in data-intensive applications.
3
Participate in the Velox open source community to contribute to its development and benefit from collective innovation.
With over 150 contributors already involved, joining the community can provide valuable insights and accelerate your own projects while helping to advance the technology.

Common Pitfalls

1
Failing to recognize the benefits of unifying execution engines can lead to continued fragmentation and inefficiency.
Many organizations may hesitate to adopt new technologies like Velox due to existing dependencies on legacy systems. However, embracing a unified approach can ultimately streamline development and improve performance.

Related Concepts

Data Management Systems
Execution Engines
Data Processing Optimizations
Open Source Contributions