All Projects → arrow-datafusion → Similar Projects or Alternatives

531 Open source projects that are alternatives of or similar to arrow-datafusion

Ballista
Distributed compute platform implemented in Rust, and powered by Apache Arrow.
Stars: ✭ 2,274 (-3.64%)
Mutual labels:  arrow, dataframe, datafusion
polars
Fast multi-threaded DataFrame library in Rust | Python | Node.js
Stars: ✭ 6,368 (+169.83%)
Mutual labels:  arrow, dataframe
Eland
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (-90.04%)
Mutual labels:  big-data, dataframe
Koalas
Koalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+28.98%)
Mutual labels:  big-data, dataframe
Geni
A Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-93.56%)
Mutual labels:  big-data, dataframe
Datafusion
DataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (-74.11%)
Mutual labels:  arrow, dataframe
bow
Go data analysis / manipulation library built on top of Apache Arrow
Stars: ✭ 20 (-99.15%)
Mutual labels:  arrow, dataframe
metriql
The metrics layer for your data. Join us at https://metriql.com/slack
Stars: ✭ 227 (-90.38%)
Mutual labels:  big-data, olap
hadoop-data-ingestion-tool
OLAP and ETL of Big Data
Stars: ✭ 17 (-99.28%)
Mutual labels:  big-data, olap
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-93.64%)
Mutual labels:  big-data, dataframe
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+94.11%)
Mutual labels:  big-data, query-engine
Awkward 0.x
Manipulate arrays of complex data structures as easily as Numpy.
Stars: ✭ 216 (-90.85%)
Mutual labels:  big-data, arrow
Cboard
An easy to use, self-service open BI reporting and BI dashboard platform.
Stars: ✭ 2,795 (+18.43%)
Mutual labels:  big-data, olap
vinum
Vinum is a SQL processor for Python, designed for data analysis workflows and in-memory analytics.
Stars: ✭ 57 (-97.58%)
Mutual labels:  arrow, olap
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-95.3%)
Mutual labels:  big-data, dataframe
Clickhouse
ClickHouse® is a free analytics DBMS for big data
Stars: ✭ 21,089 (+793.6%)
Mutual labels:  big-data, olap
Crate
CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of data in real-time.
Stars: ✭ 3,254 (+37.88%)
Mutual labels:  big-data, olap
TT Tech Space
TT Tech Research Notes
Stars: ✭ 21 (-99.11%)
Mutual labels:  big-data, olap
pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (-96.95%)
Mutual labels:  big-data, dataframe
RemoteShuffleService
Celeborn provides an elastic and high-performance service for shuffle and spilled data.
Stars: ✭ 262 (-88.9%)
Mutual labels:  big-data
beekeeper
Service for automatically managing and cleaning up unreferenced data
Stars: ✭ 43 (-98.18%)
Mutual labels:  big-data
terraform-aws-kinesis-firehose
This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.
Stars: ✭ 25 (-98.94%)
Mutual labels:  big-data
spark-root
Apache Spark Data Source for ROOT File Format
Stars: ✭ 28 (-98.81%)
Mutual labels:  big-data
incubator-liminal
Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
Stars: ✭ 117 (-95.04%)
Mutual labels:  big-data
siembol
An open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.
Stars: ✭ 153 (-93.52%)
Mutual labels:  big-data
dataframe
Structured data processing in Kotlin
Stars: ✭ 319 (-86.48%)
Mutual labels:  dataframe
matcha
🍵 SPARQL-like DSL for querying in memory Linked Data Models
Stars: ✭ 18 (-99.24%)
Mutual labels:  query-engine
heidi
heidi : tidy data in Haskell
Stars: ✭ 24 (-98.98%)
Mutual labels:  dataframe
IoT-system-PLC-data-to-InfluxDB
This project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.
Stars: ✭ 26 (-98.9%)
Mutual labels:  big-data
cloudberry
Big Data Visualization
Stars: ✭ 89 (-96.23%)
Mutual labels:  big-data
Pointy
A jQuery plugin that dynamically points one element at another ~
Stars: ✭ 25 (-98.94%)
Mutual labels:  arrow
sparkucx
A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (-98.64%)
Mutual labels:  big-data
dxram
A distributed in-memory key-value storage for billions of small objects.
Stars: ✭ 25 (-98.94%)
Mutual labels:  big-data
LoL-Match-Prediction
Win probability predictions for League of Legends matches using neural networks
Stars: ✭ 34 (-98.56%)
Mutual labels:  big-data
rastercube
rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-99.36%)
Mutual labels:  big-data
nebula
A distributed, fast open-source graph database featuring horizontal scalability and high availability
Stars: ✭ 8,196 (+247.29%)
Mutual labels:  big-data
img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Stars: ✭ 1,173 (-50.3%)
Mutual labels:  big-data
hamilton
A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (-74.07%)
Mutual labels:  dataframe
airavata-php-gateway
Mirror of Apache Airavata PHP Gateway
Stars: ✭ 15 (-99.36%)
Mutual labels:  big-data
Real Time Social Media Mining
DevOps pipeline for Real Time Social/Web Mining
Stars: ✭ 22 (-99.07%)
Mutual labels:  big-data
GDLibrary
Matlab library for gradient descent algorithms: Version 1.0.1
Stars: ✭ 50 (-97.88%)
Mutual labels:  big-data
datafusion-python
A Python library to run analytics workloads with the performance of Rust, the flexibility of Python and O(1) cost in moving data between the two. Uses Apache Arrow in-memory format and respective query engine DataFusion.
Stars: ✭ 56 (-97.63%)
Mutual labels:  datafusion
airavata-django-portal
Mirror of Apache Airavata Django Portal
Stars: ✭ 20 (-99.15%)
Mutual labels:  big-data
lcbo-api
A crawler and API server for Liquor Control Board of Ontario retail data
Stars: ✭ 152 (-93.56%)
Mutual labels:  big-data
hood
The plugin to manage benchmarks on your CI
Stars: ✭ 17 (-99.28%)
Mutual labels:  arrow
tv
📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.
Stars: ✭ 1,763 (-25.3%)
Mutual labels:  dataframe
tooltip
[DEPRECATED] The tooltip that has all the right moves
Stars: ✭ 133 (-94.36%)
Mutual labels:  arrow
spark-vcf
Spark VCF data source implementation for Dataframes
Stars: ✭ 15 (-99.36%)
Mutual labels:  dataframe
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-98.35%)
Mutual labels:  big-data
gan deeplearning4j
Automatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-99.19%)
Mutual labels:  big-data
azure-big-data-starter
A boilerplate project for Azure Big Data PaaS services
Stars: ✭ 13 (-99.45%)
Mutual labels:  big-data
automile-php
Automile offers a simple, smart, cutting-edge telematics solution for businesses to track and manage their business vehicles.
Stars: ✭ 28 (-98.81%)
Mutual labels:  big-data
FlameStream
Distributed stream processing model and its implementation
Stars: ✭ 14 (-99.41%)
Mutual labels:  big-data
avit-da2k
💲 oh-my-zsh theme based on avit theme
Stars: ✭ 15 (-99.36%)
Mutual labels:  arrow
CS Book
🔥 Latest computer science e-books。提供最新技术类电子书下载, “我无非就是想卷死各位,或者被各位卷死!”
Stars: ✭ 40 (-98.31%)
Mutual labels:  big-data
lubeck
High level linear algebra library for Dlang
Stars: ✭ 57 (-97.58%)
Mutual labels:  big-data
ngm
swissgeol.ch gives you insight in geoscientific data - above and below the surface.
Stars: ✭ 23 (-99.03%)
Mutual labels:  big-data
HTAPBench
Benchmark suite to evaluate HTAP database engines
Stars: ✭ 15 (-99.36%)
Mutual labels:  olap
nifi
Deploy a secured, clustered, auto-scaling NiFi service in AWS.
Stars: ✭ 37 (-98.43%)
Mutual labels:  big-data
scipp
Multi-dimensional data arrays with labeled dimensions
Stars: ✭ 55 (-97.67%)
Mutual labels:  dataframe
1-60 of 531 similar projects