parquet2Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow
Stars: ✭ 157 (-43.12%)
lensMirror of Apache Lens
Stars: ✭ 57 (-79.35%)
wranglerWrangler Transform: A DMD system for transforming Big Data
Stars: ✭ 63 (-77.17%)
loggingGeneric file logger for .NET Core (FileLoggerProvider) with minimal dependencies
Stars: ✭ 109 (-60.51%)
predictionioPredictionIO, a machine learning server for developers and ML engineers.
Stars: ✭ 12,510 (+4432.61%)
HybridBackendEfficient training of deep recommenders on cloud.
Stars: ✭ 30 (-89.13%)
clusterdockclusterdock is a framework for creating Docker-based container clusters
Stars: ✭ 26 (-90.58%)
DatahubThe Metadata Platform for the Modern Data Stack
Stars: ✭ 4,232 (+1433.33%)
check-engineData validation library for PySpark 3.0.0
Stars: ✭ 29 (-89.49%)
SparkApache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .
Stars: ✭ 55 (-80.07%)
CleanArchitectureASP.NET Core 6 Web API Clean Architecture Solution Template
Stars: ✭ 312 (+13.04%)
classifai🔥 One of the most comprehensive open-source data annotation platform.
Stars: ✭ 99 (-64.13%)
spark-utilsBasic framework utilities to quickly start writing production ready Apache Spark applications
Stars: ✭ 25 (-90.94%)
Equinox.NET Event Sourcing library with CosmosDB, EventStoreDB, SqlStreamStore and integration test backends. Focused at stream level; see https://github.com/jet/propulsion for cross-stream projections/subscriptions/reactions
Stars: ✭ 260 (-5.8%)
alluxio-pyAlluxio Python client - Access Any Data Source with Python
Stars: ✭ 18 (-93.48%)
SparkTwitterAnalysisAn Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.
Stars: ✭ 29 (-89.49%)
connected-componentMap Reduce Implementation of Connected Component on Apache Spark
Stars: ✭ 68 (-75.36%)
HAL-9000Automatically setup a productive development environment with Ansible on macOS
Stars: ✭ 72 (-73.91%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (-65.58%)
MLBDMaterials for "Machine Learning on Big Data" course
Stars: ✭ 20 (-92.75%)
ByteSlice"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)
Stars: ✭ 24 (-91.3%)
FusilladeAn opinionated HTTP library for Mobile Development
Stars: ✭ 269 (-2.54%)
RoapiCreate full-fledged APIs for static datasets without writing a single line of code.
Stars: ✭ 253 (-8.33%)
bandar-logMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 20 (-92.75%)
centurionKotlin Bigdata Toolkit
Stars: ✭ 320 (+15.94%)
Big-Data-Demo基于Vue、three.js、echarts,数据可视化展示项目,包含三维模型导入交互、三维模型标注等功能
Stars: ✭ 146 (-47.1%)
Parquet.jlJulia implementation of Parquet columnar file format reader
Stars: ✭ 93 (-66.3%)
talariaTalariaDB is a distributed, highly available, and low latency time-series database for Presto
Stars: ✭ 148 (-46.38%)
meepo异构存储数据迁移
Stars: ✭ 29 (-89.49%)
meetups-archivosPpts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …
Stars: ✭ 60 (-78.26%)
xcastA High-Performance Data Science Toolkit for the Earth Sciences
Stars: ✭ 28 (-89.86%)
experimentsCode examples for my blog posts
Stars: ✭ 21 (-92.39%)
hyperdriveExtensible streaming ingestion pipeline on top of Apache Spark
Stars: ✭ 31 (-88.77%)
Spark Jupyter AwsA guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Stars: ✭ 259 (-6.16%)
bigtableTypeScript Bigtable Client with 🔋🔋 included.
Stars: ✭ 13 (-95.29%)
hotmapWebGL Heatmap Viewer for Big Data and Bioinformatics
Stars: ✭ 13 (-95.29%)
jupyterlab-sparkmonitorJupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Stars: ✭ 78 (-71.74%)
arrow-datafusionApache Arrow DataFusion SQL Query Engine
Stars: ✭ 2,360 (+755.07%)
pyparEfficient and scalable parallelism using the message passing interface (MPI) to handle big data and highly computational problems.
Stars: ✭ 66 (-76.09%)
LoL-Match-PredictionWin probability predictions for League of Legends matches using neural networks
Stars: ✭ 34 (-87.68%)
SGDLibraryMATLAB/Octave library for stochastic optimization algorithms: Version 1.0.20
Stars: ✭ 165 (-40.22%)
dbddbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Stars: ✭ 30 (-89.13%)
egisEgis - a handy Ruby interface for AWS Athena
Stars: ✭ 38 (-86.23%)
waspWASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-93.12%)
pytorch kmeansImplementation of the k-means algorithm in PyTorch that works for large datasets
Stars: ✭ 38 (-86.23%)
cloudberryBig Data Visualization
Stars: ✭ 89 (-67.75%)
incubator-liminalApache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
Stars: ✭ 117 (-57.61%)