All Projects → Datafusion → Similar Projects or Alternatives

1872 Open source projects that are alternatives of or similar to Datafusion

Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-75.45%)
Mutual labels:  dataframe, sql, spark
Dataframe Js
A javascript library providing a new data structure for datascientists and developpers
Stars: ✭ 376 (-38.46%)
Mutual labels:  dataframe, sql, data
Ballista
Distributed compute platform implemented in Rust, and powered by Apache Arrow.
Stars: ✭ 2,274 (+272.18%)
Mutual labels:  dataframe, spark, arrow
Cubes
Light-weight Python OLAP framework for multi-dimensional data analysis
Stars: ✭ 1,393 (+127.99%)
Mutual labels:  sql, data
Athenax
SQL-based streaming analytics platform at scale
Stars: ✭ 1,178 (+92.8%)
Mutual labels:  sql, data
Deveeldb
DeveelDB is a complete SQL database system, primarly developed for .NET/Mono frameworks
Stars: ✭ 80 (-86.91%)
Mutual labels:  sql, data
Pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 647 (+5.89%)
Mutual labels:  dataframe, data
Bigbash
A converter that generates a bash one-liner from an SQL Select query (no DB necessary)
Stars: ✭ 230 (-62.36%)
Mutual labels:  sql, data
Net.jgp.labs.spark
Apache Spark examples exclusively in Java
Stars: ✭ 55 (-91%)
Mutual labels:  dataframe, spark
Geni
A Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-75.12%)
Mutual labels:  dataframe, spark
Datasheets
Read data from, write data to, and modify the formatting of Google Sheets
Stars: ✭ 593 (-2.95%)
Mutual labels:  dataframe, data
Spark
Apache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+5074.8%)
Mutual labels:  sql, spark
Interference
opensource distributed database with base JPA implementation and event processing support
Stars: ✭ 57 (-90.67%)
Mutual labels:  sql, cluster
Spark Website
Apache Spark Website
Stars: ✭ 75 (-87.73%)
Mutual labels:  sql, spark
Kamu Cli
Next generation tool for decentralized exchange and transformation of semi-structured data
Stars: ✭ 69 (-88.71%)
Mutual labels:  sql, spark
Linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+280.2%)
Mutual labels:  sql, spark
Xsql
Unified SQL Analytics Engine Based on SparkSQL
Stars: ✭ 176 (-71.19%)
Mutual labels:  sql, spark
Mobius
C# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+52.05%)
Mutual labels:  dataframe, spark
Parquet Index
Spark SQL index for Parquet tables
Stars: ✭ 109 (-82.16%)
Mutual labels:  sql, spark
fastdata-cluster
Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-96.73%)
Mutual labels:  spark, cluster
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-81.83%)
Mutual labels:  spark, dataframe
Datagear
数据可视化分析平台,使用Java语言开发,采用浏览器/服务器架构,支持SQL、CSV、Excel、HTTP接口、JSON等多种数据源
Stars: ✭ 266 (-56.46%)
Mutual labels:  sql, data
Elasticluster
Create clusters of VMs on the cloud and configure them with Ansible.
Stars: ✭ 298 (-51.23%)
Mutual labels:  spark, cluster
Cook
Fair job scheduler on Kubernetes and Mesos for batch workloads and Spark
Stars: ✭ 314 (-48.61%)
Mutual labels:  spark, cluster
Micronaut Data
Ahead of Time Data Repositories
Stars: ✭ 352 (-42.39%)
Mutual labels:  sql, data
Keypathkit
KeyPathKit is a library that provides the standard functions to manipulate data along with a call-syntax that relies on typed keypaths to make the call sites as short and clean as possible.
Stars: ✭ 376 (-38.46%)
Mutual labels:  sql, data
Databook
A facebook for data
Stars: ✭ 26 (-95.74%)
Mutual labels:  sql, data
Parquet Generator
Parquet file generator
Stars: ✭ 16 (-97.38%)
Mutual labels:  sql, spark
Pypika
PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.
Stars: ✭ 1,111 (+81.83%)
Mutual labels:  sql, data
Scriptis
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+13.91%)
Mutual labels:  sql, spark
Spark Daria
Essential Spark extensions and helper methods ✨😲
Stars: ✭ 553 (-9.49%)
Mutual labels:  dataframe, spark
Locopy
locopy: Loading/Unloading to Redshift and Snowflake using Python.
Stars: ✭ 73 (-88.05%)
Mutual labels:  sql, data
Docker Trino Cluster
Multiple node presto cluster on docker container
Stars: ✭ 81 (-86.74%)
Mutual labels:  sql, cluster
Datacompy
Pandas and Spark DataFrame comparison for humans
Stars: ✭ 147 (-75.94%)
Mutual labels:  spark, data
Quicksql
A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Stars: ✭ 1,821 (+198.04%)
Mutual labels:  sql, spark
Join Monster Graphql Tools Adapter
Use Join Monster to fetch your data with Apollo Server.
Stars: ✭ 130 (-78.72%)
Mutual labels:  sql, data
Splitgraph
Splitgraph command line client and python library
Stars: ✭ 209 (-65.79%)
Mutual labels:  sql, data
Blazingsql
BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
Stars: ✭ 1,652 (+170.38%)
Mutual labels:  sql, arrow
Spark Redis
A connector for Spark that allows reading and writing to/from Redis cluster
Stars: ✭ 773 (+26.51%)
Mutual labels:  dataframe, spark
Modin
Modin: Speed up your Pandas workflows by changing a single line of code
Stars: ✭ 6,639 (+986.58%)
Mutual labels:  dataframe, sql
Awesome Cybersecurity Datasets
A curated list of amazingly awesome Cybersecurity datasets
Stars: ✭ 380 (-37.81%)
Mutual labels:  dataframe, data
Pdpipe
Easy pipelines for pandas DataFrames.
Stars: ✭ 590 (-3.44%)
Mutual labels:  dataframe, data
Data science blogs
A repository to keep track of all the code that I end up writing for my blog posts.
Stars: ✭ 139 (-77.25%)
Mutual labels:  spark, data
polars
Fast multi-threaded DataFrame library in Rust | Python | Node.js
Stars: ✭ 6,368 (+942.23%)
Mutual labels:  arrow, dataframe
arrow-datafusion
Apache Arrow DataFusion SQL Query Engine
Stars: ✭ 2,360 (+286.25%)
Mutual labels:  arrow, dataframe
Roapi
Create full-fledged APIs for static datasets without writing a single line of code.
Stars: ✭ 253 (-58.59%)
Mutual labels:  sql, arrow
bow
Go data analysis / manipulation library built on top of Apache Arrow
Stars: ✭ 20 (-96.73%)
Mutual labels:  arrow, dataframe
Crate
CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of data in real-time.
Stars: ✭ 3,254 (+432.57%)
Mutual labels:  sql, cluster
Android Nosql
Lightweight, simple structured NoSQL database for Android
Stars: ✭ 284 (-53.52%)
Mutual labels:  sql, data
Sparklens
Qubole Sparklens tool for performance tuning Apache Spark
Stars: ✭ 345 (-43.54%)
Mutual labels:  spark, cluster
Koalas
Koalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+398.2%)
Mutual labels:  dataframe, spark
Tensorflowonspark
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Stars: ✭ 3,748 (+513.42%)
Mutual labels:  spark, cluster
Arquero
Query processing and transformation of array-backed data tables.
Stars: ✭ 384 (-37.15%)
Mutual labels:  dataframe, data
Kyuubi
Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (-40.59%)
Mutual labels:  sql, spark
Sparkmagic
Jupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+56.14%)
Mutual labels:  spark, cluster
Pyspark Cheatsheet
🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-82.32%)
Mutual labels:  spark, data
Featran
A Scala feature transformation library for data science and machine learning
Stars: ✭ 420 (-31.26%)
Mutual labels:  spark, data
Metorikku
A simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (-40.92%)
Mutual labels:  sql, spark
Agile data code 2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (-32.41%)
Mutual labels:  spark, data
Udacity Data Engineering Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (-25.04%)
Mutual labels:  data, cluster
1-60 of 1872 similar projects