Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …

Stars: ✭ 60 (+20%)

Mutual labels: big-data

wrangler

Wrangler Transform: A DMD system for transforming Big Data

Stars: ✭ 63 (+26%)

Mutual labels: big-data

bftkv

A distributed key-value storage that's tolerant to Byzantine fault.

Stars: ✭ 27 (-46%)

Mutual labels: big-data

LoL-Match-Prediction

Win probability predictions for League of Legends matches using neural networks

Stars: ✭ 34 (-32%)

Mutual labels: big-data

SGDLibrary

MATLAB/Octave library for stochastic optimization algorithms: Version 1.0.20

Stars: ✭ 165 (+230%)

Mutual labels: big-data

insightedge

InsightEdge Core

Stars: ✭ 22 (-56%)

Mutual labels: big-data

incubator-liminal

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

Stars: ✭ 117 (+134%)

Mutual labels: big-data

check-engine

Data validation library for PySpark 3.0.0

Stars: ✭ 29 (-42%)

Mutual labels: big-data

beekeeper

Service for automatically managing and cleaning up unreferenced data

Stars: ✭ 43 (-14%)

Mutual labels: big-data

siembol

An open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.

Stars: ✭ 153 (+206%)

Mutual labels: big-data

big-sorter

Java library that sorts very large files of records by splitting into smaller sorted files and merging

Stars: ✭ 49 (-2%)

Mutual labels: big-data

v6.dooring.public

可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.

Stars: ✭ 323 (+546%)

Mutual labels: big-data

classifai

🔥 One of the most comprehensive open-source data annotation platform.

Stars: ✭ 99 (+98%)

Mutual labels: big-data

airavata-php-gateway

Mirror of Apache Airavata PHP Gateway

Stars: ✭ 15 (-70%)

Mutual labels: big-data

CS Book

🔥 Latest computer science e-books。提供最新技术类电子书下载， “我无非就是想卷死各位，或者被各位卷死！”

Stars: ✭ 40 (-20%)

Mutual labels: big-data

talaria

TalariaDB is a distributed, highly available, and low latency time-series database for Presto

Stars: ✭ 148 (+196%)

Mutual labels: big-data

couchdb-mango

Mirror of Apache CouchDB Mango

Stars: ✭ 34 (-32%)

Mutual labels: big-data

xcast

A High-Performance Data Science Toolkit for the Earth Sciences

Stars: ✭ 28 (-44%)

Mutual labels: big-data

hyper-engine

Python library for Bayesian hyper-parameters optimization

Stars: ✭ 80 (+60%)

Mutual labels: big-data

arrow-datafusion

Apache Arrow DataFusion SQL Query Engine

Stars: ✭ 2,360 (+4620%)

Mutual labels: big-data

couchdb-couch-plugins

Mirror of Apache CouchDB

Stars: ✭ 14 (-72%)

Mutual labels: big-data

pytorch kmeans

Implementation of the k-means algorithm in PyTorch that works for large datasets

Stars: ✭ 38 (-24%)

Mutual labels: big-data

storm-ml

an online learning algorithm library for Storm

Stars: ✭ 18 (-64%)

Mutual labels: big-data

spark-records

Bulletproof Apache Spark jobs with fast root cause analysis of failures.

Stars: ✭ 67 (+34%)

Mutual labels: big-data

cloudberry

Big Data Visualization

Stars: ✭ 89 (+78%)

Mutual labels: big-data

clusterdock

clusterdock is a framework for creating Docker-based container clusters

Stars: ✭ 26 (-48%)

Mutual labels: big-data

nebula

A distributed block-based data storage and compute engine

Stars: ✭ 127 (+154%)

Mutual labels: big-data

big-data-lite

Samples to the Oracle Big Data Lite VM

Stars: ✭ 41 (-18%)

Mutual labels: big-data

sparkucx

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer

Stars: ✭ 32 (-36%)

Mutual labels: big-data

opendc

Collaborative Datacenter Simulation and Exploration for Everybody

Stars: ✭ 40 (-20%)

Mutual labels: big-data

rastercube

rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)

Stars: ✭ 15 (-70%)

Mutual labels: big-data

pypar

Efficient and scalable parallelism using the message passing interface (MPI) to handle big data and highly computational problems.

Stars: ✭ 66 (+32%)

Mutual labels: big-data

azure-big-data-starter

A boilerplate project for Azure Big Data PaaS services

Stars: ✭ 13 (-74%)

Mutual labels: big-data

subsemble

subsemble R package for ensemble learning on subsets of data

Stars: ✭ 40 (-20%)

Mutual labels: big-data

beam-site

Apache Beam Site

Stars: ✭ 28 (-44%)

Mutual labels: big-data

falcon

Mirror of Apache Falcon

Stars: ✭ 95 (+90%)

Mutual labels: big-data

scarf

Toolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.

Stars: ✭ 54 (+8%)

Mutual labels: big-data

SynapseML

Simple and Distributed Machine Learning

Stars: ✭ 3,355 (+6610%)

Mutual labels: big-data

RemoteShuffleService

Celeborn provides an elastic and high-performance service for shuffle and spilled data.

Stars: ✭ 262 (+424%)

Mutual labels: big-data

IoT-system-PLC-data-to-InfluxDB

This project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.

Stars: ✭ 26 (-48%)

Mutual labels: big-data

pyspark-cheatsheet

PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster

Stars: ✭ 115 (+130%)

Mutual labels: big-data

docker-predictionio

Docker container for PredictionIO-based machine learning services

Stars: ✭ 75 (+50%)

Mutual labels: predictionio

OnlineStatsBase.jl

Base types for OnlineStats.