All Projects → oap-project → gazelle_plugin

oap-project / gazelle_plugin

Licence: Apache-2.0 license
Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.

Programming Languages

scala
5932 projects
C++
36643 projects - #6 most used programming language
PLpgSQL
1095 projects
java
68154 projects - #9 most used programming language
Jupyter Notebook
11667 projects
CMake
9771 projects

Projects that are alternatives of or similar to gazelle plugin

Datafusion
DataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (+151.44%)
Mutual labels:  arrow
Blazingsql
BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
Stars: ✭ 1,652 (+579.84%)
Mutual labels:  arrow
Actual Number Picker
Android: A horizontal number picker
Stars: ✭ 206 (-15.23%)
Mutual labels:  arrow
Android Expandicon
Nice and simple customizable implementation of Google style up/down expand arrow.
Stars: ✭ 871 (+258.44%)
Mutual labels:  arrow
Arrow.jl
Pure Julia implementation of the apache arrow data format (https://arrow.apache.org/)
Stars: ✭ 92 (-62.14%)
Mutual labels:  arrow
Sexytooltip
The tooltip that has all the right moves
Stars: ✭ 125 (-48.56%)
Mutual labels:  arrow
Cudf
cuDF - GPU DataFrame Library
Stars: ✭ 4,370 (+1698.35%)
Mutual labels:  arrow
arrow-optics
Λrrow Optics is part of Λrrow, a functional companion to Kotlin's Standard Library
Stars: ✭ 20 (-91.77%)
Mutual labels:  arrow
Smartmaterialspinner
The powerful android spinner library for your application
Stars: ✭ 108 (-55.56%)
Mutual labels:  arrow
Ballista
Distributed compute platform implemented in Rust, and powered by Apache Arrow.
Stars: ✭ 2,274 (+835.8%)
Mutual labels:  arrow
Pre Short Closures
Stars: ✭ 36 (-85.19%)
Mutual labels:  arrow
Open Arrow
Open Arrow is an open-source font that contains 112 arrow symbols from U+2190 to U+21ff
Stars: ✭ 89 (-63.37%)
Mutual labels:  arrow
Kartothek
A consistent table management library in python
Stars: ✭ 144 (-40.74%)
Mutual labels:  arrow
React Archer
🏹 Draw arrows between React elements 🖋
Stars: ✭ 666 (+174.07%)
Mutual labels:  arrow
Awkward 0.x
Manipulate arrays of complex data structures as easily as Numpy.
Stars: ✭ 216 (-11.11%)
Mutual labels:  arrow
Arrow
Λrrow - Functional companion to Kotlin's Standard Library
Stars: ✭ 4,771 (+1863.37%)
Mutual labels:  arrow
Leader Line
Draw a leader line in your web page.
Stars: ✭ 1,872 (+670.37%)
Mutual labels:  arrow
AndroidFunctionalValidation
Simple form validation using Arrow
Stars: ✭ 45 (-81.48%)
Mutual labels:  arrow
Vscode Data Preview
Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
Stars: ✭ 245 (+0.82%)
Mutual labels:  arrow
Fletcher
Fletcher: A framework to integrate FPGA accelerators with Apache Arrow
Stars: ✭ 144 (-40.74%)
Mutual labels:  arrow
* Gazelle support has officially ended as of February 2023. Please see below information for the end of life announcement.

It's all started from the spark summit session Apache Arrow-Based Unified Data Sharing and Transferring Format Among CPU and Accelerators. On 4/25/2019, we created Gazelle project to explore the new opportunity to reach higher performance in Spark with vectorized execution engine. We're proud of the work has been done in Gazlle not only to reach better performance beyond Vanilla Spark, but also to unleash the power of hardware capability and bring it into another level. During the time frame to push Gazelle go to the market, we have heard many voices from the customer side to refactor Gazelle source code, leverage Gazelle's JNI as a unified API, as well as to add some existing and mature SQL engine or library such as ClickHouse or Vcelox as the backend support. In 2023, we decide that no longer to support Gazelle project and move to the next stage to extend the experience for Spark with vectorized execution engine support. We encourage the existing Gazelle users or developers move the focus to our 2nd generation native SQL engine - Gluten, which can provide more possibility with multiple native SQL backend integration as well as more companies work together to build a new ecosystem for Spark vectorized execution engine. Thank you for join with Gazelle's journey and we look forward that you can continue the journey in Gluten with better experience as well.

* LEGAL NOTICE: Your use of this software and any required dependent software (the "Software Package") is subject to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party or open source software included in or with the Software Package, and your use indicates your acceptance of all such terms. Please refer to the "TPP.txt" or other similarly-named text file included with the Software Package for additional details.
* Optimized Analytics Package for Spark* Platform is under Apache 2.0 (https://www.apache.org/licenses/LICENSE-2.0).

Gazelle Plugin

A Native Engine for Spark SQL with vectorized SIMD optimizations. Please refer to user guide for details on how to enable Gazelle.

Online Documentation

You can find the all the Gazelle Plugin documents on the project web page.

Introduction

Overview

Spark SQL works very well with structured row-based data. It used WholeStageCodeGen to improve the performance by Java JIT code. However Java JIT is usually not working very well on utilizing latest SIMD instructions, especially under complicated queries. Apache Arrow provided CPU-cache friendly columnar in-memory layout, its SIMD-optimized kernels and LLVM-based SQL engine Gandiva are also very efficient.

Gazelle Plugin reimplements Spark SQL execution layer with SIMD-friendly columnar data processing based on Apache Arrow, and leverages Arrow's CPU-cache friendly columnar in-memory layout, SIMD-optimized kernels and LLVM-based expression engine to bring better performance to Spark SQL.

Performance data

For advanced performance testing, below charts show the results by using two benchmarks with Gazelle v1.1: 1. Decision Support Benchmark1 and 2. Decision Support Benchmark2. The testing environment for Decision Support Benchmark1 is using 1 master + 3 workers and Intel(r) Xeon(r) Gold 6252 CPU|384GB memory|NVMe SSD x3 per single node with 1.5TB dataset and parquet format.

  • Decision Support Benchmark1 is a query set modified from TPC-H benchmark. We change Decimal to Double since Decimal hasn't been supported in OAP v1.0-Gazelle Plugin. Overall, the result shows a 1.49X performance speed up from OAP v1.0-Gazelle Plugin comparing to Vanilla SPARK 3.0.0. We also put the detail result by queries, most of queries in Decision Support Benchmark1 can take the advantages from Gazelle Plugin. The performance boost ratio may depend on the individual query.

Performance

Performance

The testing environment for Decision Support Benchmark2 is using 1 master + 3 workers and Intel(r) Xeon(r) Platinum 8360Y CPU|1440GB memory|NVMe SSD x4 per single node with 3TB dataset and parquet format.

  • Decision Support Benchmark2 is a query set modified from TPC-DS benchmark. We change Decimal to Doubel since Decimal hasn't been supported in OAP v1.0-Gazelle Plugin. We pick up 10 queries which can be fully supported in OAP v1.0-Gazelle Plugin and the result shows a 1.26X performance speed up comparing to Vanilla SPARK 3.0.0.

Performance

Please notes the performance data is not an official from TPC-H and TPC-DS. The actual performance result may vary by individual workloads. Please try your workloads with Gazelle Plugin first and check the DAG or log file to see if all the operators can be supported in OAP-Gazelle Plugin. Please check the detailed page on performance tuning for TPC-H and TPC-DS workloads.

Coding Style

Contact

[email protected] [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].