BallistaDistributed compute platform implemented in Rust, and powered by Apache Arrow.
Stars: ✭ 2,274 (-3.64%)
polarsFast multi-threaded DataFrame library in Rust | Python | Node.js
Stars: ✭ 6,368 (+169.83%)
ElandPython Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (-90.04%)
KoalasKoalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+28.98%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-93.56%)
DatafusionDataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (-74.11%)
bowGo data analysis / manipulation library built on top of Apache Arrow
Stars: ✭ 20 (-99.15%)
metriqlThe metrics layer for your data. Join us at https://metriql.com/slack
Stars: ✭ 227 (-90.38%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-93.64%)
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+94.11%)
Awkward 0.xManipulate arrays of complex data structures as easily as Numpy.
Stars: ✭ 216 (-90.85%)
CboardAn easy to use, self-service open BI reporting and BI dashboard platform.
Stars: ✭ 2,795 (+18.43%)
vinumVinum is a SQL processor for Python, designed for data analysis workflows and in-memory analytics.
Stars: ✭ 57 (-97.58%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-95.3%)
ClickhouseClickHouse® is a free analytics DBMS for big data
Stars: ✭ 21,089 (+793.6%)
CrateCrateDB is a distributed SQL database that makes it simple to store and analyze
massive amounts of data in real-time.
Stars: ✭ 3,254 (+37.88%)
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (-96.95%)
RemoteShuffleServiceCeleborn provides an elastic and high-performance service for shuffle and spilled data.
Stars: ✭ 262 (-88.9%)
beekeeperService for automatically managing and cleaning up unreferenced data
Stars: ✭ 43 (-98.18%)
spark-rootApache Spark Data Source for ROOT File Format
Stars: ✭ 28 (-98.81%)
incubator-liminalApache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
Stars: ✭ 117 (-95.04%)
siembolAn open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.
Stars: ✭ 153 (-93.52%)
dataframeStructured data processing in Kotlin
Stars: ✭ 319 (-86.48%)
matcha🍵 SPARQL-like DSL for querying in memory Linked Data Models
Stars: ✭ 18 (-99.24%)
heidiheidi : tidy data in Haskell
Stars: ✭ 24 (-98.98%)
IoT-system-PLC-data-to-InfluxDBThis project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.
Stars: ✭ 26 (-98.9%)
cloudberryBig Data Visualization
Stars: ✭ 89 (-96.23%)
PointyA jQuery plugin that dynamically points one element at another ~
Stars: ✭ 25 (-98.94%)
sparkucxA high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (-98.64%)
dxramA distributed in-memory key-value storage for billions of small objects.
Stars: ✭ 25 (-98.94%)
LoL-Match-PredictionWin probability predictions for League of Legends matches using neural networks
Stars: ✭ 34 (-98.56%)
rastercuberastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-99.36%)
nebulaA distributed, fast open-source graph database featuring horizontal scalability and high availability
Stars: ✭ 8,196 (+247.29%)
img2datasetEasily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Stars: ✭ 1,173 (-50.3%)
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (-74.07%)
GDLibraryMatlab library for gradient descent algorithms: Version 1.0.1
Stars: ✭ 50 (-97.88%)
datafusion-pythonA Python library to run analytics workloads with the performance of Rust, the flexibility of Python and O(1) cost in moving data between the two. Uses Apache Arrow in-memory format and respective query engine DataFusion.
Stars: ✭ 56 (-97.63%)
lcbo-apiA crawler and API server for Liquor Control Board of Ontario retail data
Stars: ✭ 152 (-93.56%)
hoodThe plugin to manage benchmarks on your CI
Stars: ✭ 17 (-99.28%)
tv📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.
Stars: ✭ 1,763 (-25.3%)
tooltip[DEPRECATED] The tooltip that has all the right moves
Stars: ✭ 133 (-94.36%)
spark-vcfSpark VCF data source implementation for Dataframes
Stars: ✭ 15 (-99.36%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-98.35%)
gan deeplearning4jAutomatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-99.19%)
automile-phpAutomile offers a simple, smart, cutting-edge telematics solution for businesses to track and manage their business vehicles.
Stars: ✭ 28 (-98.81%)
FlameStreamDistributed stream processing model and its implementation
Stars: ✭ 14 (-99.41%)
avit-da2k💲 oh-my-zsh theme based on avit theme
Stars: ✭ 15 (-99.36%)
CS Book🔥 Latest computer science e-books。提供最新技术类电子书下载, “我无非就是想卷死各位,或者被各位卷死!”
Stars: ✭ 40 (-98.31%)
lubeckHigh level linear algebra library for Dlang
Stars: ✭ 57 (-97.58%)
ngmswissgeol.ch gives you insight in geoscientific data - above and below the surface.
Stars: ✭ 23 (-99.03%)
HTAPBenchBenchmark suite to evaluate HTAP database engines
Stars: ✭ 15 (-99.36%)
nifiDeploy a secured, clustered, auto-scaling NiFi service in AWS.
Stars: ✭ 37 (-98.43%)
scippMulti-dimensional data arrays with labeled dimensions
Stars: ✭ 55 (-97.67%)