❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (-99.47%)

Mutual labels: big-data

cassandra.realtime

Different ways to process data into Cassandra in realtime with technologies such as Kafka, Spark, Akka, Flink

Stars: ✭ 25 (-99.86%)

Mutual labels: flink

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (-99.38%)

Mutual labels: big-data

MLBD

Materials for "Machine Learning on Big Data" course

Stars: ✭ 20 (-99.89%)

Mutual labels: big-data

predictionio-template-similar-product

PredictionIO Similar Product Engine Template (Scala-based parallelized engine)

Stars: ✭ 50 (-99.72%)

Mutual labels: big-data

ByteSlice

"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)

Stars: ✭ 24 (-99.87%)

Mutual labels: big-data

Knowage Server

Knowage is the professional open source suite for modern business analytics over traditional sources and big data systems.

Stars: ✭ 276 (-98.45%)

Mutual labels: big-data

Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Stars: ✭ 47 (-99.74%)

Mutual labels: big-data

hotmap

WebGL Heatmap Viewer for Big Data and Bioinformatics

Stars: ✭ 13 (-99.93%)

Mutual labels: big-data

meetups-archivos

Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …

Stars: ✭ 60 (-99.66%)

Mutual labels: big-data

predictionio-template-java-ecom-recommender

PredictionIO E-Commerce Recommendation Engine Template (Java-based parallelized engine)

Stars: ✭ 36 (-99.8%)

Mutual labels: big-data

bigquery-kafka-connect

☁️ nodejs kafka connect connector for Google BigQuery

Stars: ✭ 17 (-99.9%)

Mutual labels: big-data

egis

Egis - a handy Ruby interface for AWS Athena

Stars: ✭ 38 (-99.79%)

Mutual labels: big-data

arrow-datafusion

Apache Arrow DataFusion SQL Query Engine

Stars: ✭ 2,360 (-86.73%)

Mutual labels: big-data

Succinct

Enabling queries on compressed data.

Stars: ✭ 257 (-98.55%)

Mutual labels: big-data

flink-training-troubleshooting

No description or website provided.

Stars: ✭ 41 (-99.77%)

Mutual labels: flink

pytorch kmeans

Implementation of the k-means algorithm in PyTorch that works for large datasets

Stars: ✭ 38 (-99.79%)

Mutual labels: big-data

insightedge

InsightEdge Core

Stars: ✭ 22 (-99.88%)

Mutual labels: big-data

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-99.93%)

Mutual labels: big-data

incubator-liminal

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

Stars: ✭ 117 (-99.34%)

Mutual labels: big-data

big-sorter

Java library that sorts very large files of records by splitting into smaller sorted files and merging

Stars: ✭ 49 (-99.72%)

Mutual labels: big-data

nebula

A distributed block-based data storage and compute engine

Stars: ✭ 127 (-99.29%)

Mutual labels: big-data

Trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Stars: ✭ 4,581 (-74.24%)

Mutual labels: big-data

sparkucx

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer

Stars: ✭ 32 (-99.82%)

Mutual labels: big-data

big data

A collection of tutorials on Hadoop, MapReduce, Spark, Docker

Stars: ✭ 34 (-99.81%)

Mutual labels: big-data

coolplayflink

Flink: Stateful Computations over Data Streams

Stars: ✭ 14 (-99.92%)

Mutual labels: flink

lens

Mirror of Apache Lens

Stars: ✭ 57 (-99.68%)

Mutual labels: big-data

rastercube

rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)

Stars: ✭ 15 (-99.92%)

Mutual labels: big-data

bftkv

A distributed key-value storage that's tolerant to Byzantine fault.

Stars: ✭ 27 (-99.85%)

Mutual labels: big-data

airavata-php-gateway

Mirror of Apache Airavata PHP Gateway

Stars: ✭ 15 (-99.92%)

Mutual labels: big-data

bandar-log

Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.

Stars: ✭ 20 (-99.89%)

Mutual labels: big-data

azure-big-data-starter

A boilerplate project for Azure Big Data PaaS services

Stars: ✭ 13 (-99.93%)

Mutual labels: big-data

2018-flink-forward-china

Flink Forward China 2018 第一届记录，视频记录 | 文档记录 | 不仅仅是流计算 | More than streaming

Stars: ✭ 25 (-99.86%)

Mutual labels: flink

beam-site

Apache Beam Site

Stars: ✭ 28 (-99.84%)

Mutual labels: big-data

flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

Stars: ✭ 48 (-99.73%)

Mutual labels: flink

scarf

Toolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.

Stars: ✭ 54 (-99.7%)

Mutual labels: big-data

v6.dooring.public

可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.

Stars: ✭ 323 (-98.18%)

Mutual labels: big-data

IoT-system-PLC-data-to-InfluxDB

This project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.

Stars: ✭ 26 (-99.85%)

Mutual labels: big-data

Attic Predictionio Sdk Php

PredictionIO PHP SDK

Stars: ✭ 272 (-98.47%)

Mutual labels: big-data

spark-root

Apache Spark Data Source for ROOT File Format

Stars: ✭ 28 (-99.84%)

Mutual labels: big-data

fb scraper

FBLYZE is a Facebook scraping system and analysis system.