Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …

Stars: ✭ 60 (+15.38%)

Mutual labels: big-data

big-sorter

Java library that sorts very large files of records by splitting into smaller sorted files and merging

Stars: ✭ 49 (-5.77%)

Mutual labels: big-data

incubator-liminal

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

Stars: ✭ 117 (+125%)

Mutual labels: big-data

couchdb-couch-plugins

Mirror of Apache CouchDB

Stars: ✭ 14 (-73.08%)

Mutual labels: big-data

Pylians3

Libraries to analyze numerical simulations (python3)

Stars: ✭ 35 (-32.69%)

Mutual labels: density-estimation

classifai

🔥 One of the most comprehensive open-source data annotation platform.

Stars: ✭ 99 (+90.38%)

Mutual labels: big-data

big-data-lite

Samples to the Oracle Big Data Lite VM

Stars: ✭ 41 (-21.15%)

Mutual labels: big-data

OnlineStatsBase.jl

Base types for OnlineStats.

Stars: ✭ 26 (-50%)

Mutual labels: big-data

egis

Egis - a handy Ruby interface for AWS Athena

Stars: ✭ 38 (-26.92%)

Mutual labels: big-data

Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Stars: ✭ 47 (-9.62%)

Mutual labels: big-data

predictionio-sdk-php

PredictionIO PHP SDK

Stars: ✭ 269 (+417.31%)

Mutual labels: big-data

bigquery-kafka-connect

☁️ nodejs kafka connect connector for Google BigQuery

Stars: ✭ 17 (-67.31%)

Mutual labels: big-data

awesome-AI-kubernetes

❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (+82.69%)

Mutual labels: big-data

insightedge

InsightEdge Core

Stars: ✭ 22 (-57.69%)

Mutual labels: big-data

couchdb-mango

Mirror of Apache CouchDB Mango

Stars: ✭ 34 (-34.62%)

Mutual labels: big-data

nebula

A distributed block-based data storage and compute engine

Stars: ✭ 127 (+144.23%)

Mutual labels: big-data

big data

A collection of tutorials on Hadoop, MapReduce, Spark, Docker

Stars: ✭ 34 (-34.62%)

Mutual labels: big-data

sparkucx

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer

Stars: ✭ 32 (-38.46%)

Mutual labels: big-data

clusterdock

clusterdock is a framework for creating Docker-based container clusters

Stars: ✭ 26 (-50%)

Mutual labels: big-data

siembol

An open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.

Stars: ✭ 153 (+194.23%)

Mutual labels: big-data

opendc

Collaborative Datacenter Simulation and Exploration for Everybody

Stars: ✭ 40 (-23.08%)

Mutual labels: big-data

bftkv

A distributed key-value storage that's tolerant to Byzantine fault.

Stars: ✭ 27 (-48.08%)

Mutual labels: big-data

subsemble

subsemble R package for ensemble learning on subsets of data

Stars: ✭ 40 (-23.08%)

Mutual labels: big-data

pypar

Efficient and scalable parallelism using the message passing interface (MPI) to handle big data and highly computational problems.

Stars: ✭ 66 (+26.92%)

Mutual labels: big-data

SynapseML

Simple and Distributed Machine Learning

Stars: ✭ 3,355 (+6351.92%)

Mutual labels: big-data

SMC.jl

Sequential Monte Carlo algorithm for approximation of posterior distributions.

Stars: ✭ 53 (+1.92%)

Mutual labels: online-algorithms

MLBD

Materials for "Machine Learning on Big Data" course

Stars: ✭ 20 (-61.54%)

Mutual labels: big-data

alluxio-py

Alluxio Python client - Access Any Data Source with Python

Stars: ✭ 18 (-65.38%)

Mutual labels: big-data

Big-Data-Demo

基于Vue、three.js、echarts，数据可视化展示项目，包含三维模型导入交互、三维模型标注等功能

Stars: ✭ 146 (+180.77%)

Mutual labels: big-data

falcon

Mirror of Apache Falcon

Stars: ✭ 95 (+82.69%)

Mutual labels: big-data

talaria

TalariaDB is a distributed, highly available, and low latency time-series database for Presto

Stars: ✭ 148 (+184.62%)

Mutual labels: big-data

pytorch kmeans

Implementation of the k-means algorithm in PyTorch that works for large datasets

Stars: ✭ 38 (-26.92%)

Mutual labels: big-data

xcast

A High-Performance Data Science Toolkit for the Earth Sciences

Stars: ✭ 28 (-46.15%)

Mutual labels: big-data

SparkProgrammingInScala

Apache Spark Course Material

Stars: ✭ 57 (+9.62%)

Mutual labels: big-data

arrow-datafusion

Apache Arrow DataFusion SQL Query Engine

Stars: ✭ 2,360 (+4438.46%)

Mutual labels: big-data

hadoop-data-ingestion-tool

OLAP and ETL of Big Data

Stars: ✭ 17 (-67.31%)

Mutual labels: big-data

SGDLibrary

MATLAB/Octave library for stochastic optimization algorithms: Version 1.0.20

Stars: ✭ 165 (+217.31%)

Mutual labels: big-data

FTRLProximal

R package for online training of regression models using FTRL Proximal

Stars: ✭ 12 (-76.92%)

Mutual labels: online-algorithms

cloudberry

Big Data Visualization

Stars: ✭ 89 (+71.15%)

Mutual labels: big-data

pyspark-cheatsheet

PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster

Stars: ✭ 115 (+121.15%)

Mutual labels: big-data

kscore

Nonparametric Score Estimators, ICML 2020

Stars: ✭ 32 (-38.46%)

Mutual labels: density-estimation

wrangler

Wrangler Transform: A DMD system for transforming Big Data

Stars: ✭ 63 (+21.15%)

Mutual labels: big-data

beekeeper

Service for automatically managing and cleaning up unreferenced data

Stars: ✭ 43 (-17.31%)

Mutual labels: big-data

predictionio-template-similar-product

PredictionIO Similar Product Engine Template (Scala-based parallelized engine)

Stars: ✭ 50 (-3.85%)

Mutual labels: big-data

predictionio

PredictionIO, a machine learning server for developers and ML engineers.

Stars: ✭ 12,510 (+23957.69%)

Mutual labels: big-data

rastercube

rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)

Stars: ✭ 15 (-71.15%)

Mutual labels: big-data

gradient-boosted-normalizing-flows

We got a stew going!

Stars: ✭ 20 (-61.54%)

Mutual labels: density-estimation

check-engine

Data validation library for PySpark 3.0.0

Stars: ✭ 29 (-44.23%)

Mutual labels: big-data

predictionio-template-attribute-based-classifier

PredictionIO Classification Engine Template (Scala-based parallelized engine)

Stars: ✭ 38 (-26.92%)

Mutual labels: big-data

vxquery

Mirror of Apache VXQuery

Stars: ✭ 19 (-63.46%)

Mutual labels: big-data

hotmap

WebGL Heatmap Viewer for Big Data and Bioinformatics

Stars: ✭ 13 (-75%)

Mutual labels: big-data

hyper-engine

Python library for Bayesian hyper-parameters optimization

Stars: ✭ 80 (+53.85%)

Mutual labels: big-data

delfi

Density estimation likelihood-free inference. No longer actively developed see https://github.com/mackelab/sbi instead

Stars: ✭ 66 (+26.92%)

Mutual labels: density-estimation

1-60 of 392 similar projects

›

next*5