Attic PredictionioPredictionIO, a machine learning server for developers and ML engineers.
Stars: ✭ 12,522 (+24944%)
predictionioPredictionIO, a machine learning server for developers and ML engineers.
Stars: ✭ 12,510 (+24920%)
meetups-archivosPpts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …
Stars: ✭ 60 (+20%)
wranglerWrangler Transform: A DMD system for transforming Big Data
Stars: ✭ 63 (+26%)
bftkvA distributed key-value storage that's tolerant to Byzantine fault.
Stars: ✭ 27 (-46%)
LoL-Match-PredictionWin probability predictions for League of Legends matches using neural networks
Stars: ✭ 34 (-32%)
SGDLibraryMATLAB/Octave library for stochastic optimization algorithms: Version 1.0.20
Stars: ✭ 165 (+230%)
incubator-liminalApache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
Stars: ✭ 117 (+134%)
check-engineData validation library for PySpark 3.0.0
Stars: ✭ 29 (-42%)
beekeeperService for automatically managing and cleaning up unreferenced data
Stars: ✭ 43 (-14%)
siembolAn open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.
Stars: ✭ 153 (+206%)
big-sorterJava library that sorts very large files of records by splitting into smaller sorted files and merging
Stars: ✭ 49 (-2%)
classifai🔥 One of the most comprehensive open-source data annotation platform.
Stars: ✭ 99 (+98%)
CS Book🔥 Latest computer science e-books。提供最新技术类电子书下载, “我无非就是想卷死各位,或者被各位卷死!”
Stars: ✭ 40 (-20%)
talariaTalariaDB is a distributed, highly available, and low latency time-series database for Presto
Stars: ✭ 148 (+196%)
xcastA High-Performance Data Science Toolkit for the Earth Sciences
Stars: ✭ 28 (-44%)
hyper-enginePython library for Bayesian hyper-parameters optimization
Stars: ✭ 80 (+60%)
arrow-datafusionApache Arrow DataFusion SQL Query Engine
Stars: ✭ 2,360 (+4620%)
pytorch kmeansImplementation of the k-means algorithm in PyTorch that works for large datasets
Stars: ✭ 38 (-24%)
storm-mlan online learning algorithm library for Storm
Stars: ✭ 18 (-64%)
spark-recordsBulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (+34%)
cloudberryBig Data Visualization
Stars: ✭ 89 (+78%)
clusterdockclusterdock is a framework for creating Docker-based container clusters
Stars: ✭ 26 (-48%)
nebulaA distributed block-based data storage and compute engine
Stars: ✭ 127 (+154%)
big-data-liteSamples to the Oracle Big Data Lite VM
Stars: ✭ 41 (-18%)
sparkucxA high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (-36%)
opendcCollaborative Datacenter Simulation and Exploration for Everybody
Stars: ✭ 40 (-20%)
rastercuberastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-70%)
pyparEfficient and scalable parallelism using the message passing interface (MPI) to handle big data and highly computational problems.
Stars: ✭ 66 (+32%)
subsemblesubsemble R package for ensemble learning on subsets of data
Stars: ✭ 40 (-20%)
falconMirror of Apache Falcon
Stars: ✭ 95 (+90%)
scarfToolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.
Stars: ✭ 54 (+8%)
SynapseMLSimple and Distributed Machine Learning
Stars: ✭ 3,355 (+6610%)
RemoteShuffleServiceCeleborn provides an elastic and high-performance service for shuffle and spilled data.
Stars: ✭ 262 (+424%)
IoT-system-PLC-data-to-InfluxDBThis project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.
Stars: ✭ 26 (-48%)
pyspark-cheatsheetPySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (+130%)
docker-predictionioDocker container for PredictionIO-based machine learning services
Stars: ✭ 75 (+50%)
spark-rootApache Spark Data Source for ROOT File Format
Stars: ✭ 28 (-44%)
MLBDMaterials for "Machine Learning on Big Data" course
Stars: ✭ 20 (-60%)
dxramA distributed in-memory key-value storage for billions of small objects.
Stars: ✭ 25 (-50%)
nebulaA distributed, fast open-source graph database featuring horizontal scalability and high availability
Stars: ✭ 8,196 (+16292%)
ByteSlice"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)
Stars: ✭ 24 (-52%)