A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (+555.56%)

Mutual labels: big-data

incubator-tez

Mirror of Apache Tez (Incubating)

Stars: ✭ 60 (+122.22%)

Mutual labels: big-data

Keyvi

Keyvi - a key value index that powers Cliqz search engine. It is an in-memory FST-based data structure highly optimized for size and lookup performance.

Stars: ✭ 171 (+533.33%)

Mutual labels: big-data

awesome-coder-resources

编程路上加油站！------【持续更新中...欢迎star,欢迎常回来看看......】【内容：编程/学习/阅读资源，开源项目,面试题,网站,书,博客,教程等等】

Stars: ✭ 54 (+100%)

Mutual labels: big-data

Geopyspark

GeoTrellis for PySpark

Stars: ✭ 167 (+518.52%)

Mutual labels: big-data

Social-Network-Analysis-in-Python

Social Network Facebook Analysis (Python, Networkx)

Stars: ✭ 26 (-3.7%)

Mutual labels: big-data

Fluo

Apache Fluo

Stars: ✭ 159 (+488.89%)

Mutual labels: big-data

mit-6.824-distributed-systems

Template repository to work on the labs from MIT 6.824 Distributed Systems course.

Stars: ✭ 48 (+77.78%)

Mutual labels: mapreduce

Usql

U-SQL Examples and Issue Tracking

Stars: ✭ 221 (+718.52%)

Mutual labels: big-data

Just Dashboard

📊 📋 Dashboards using YAML or JSON files

Stars: ✭ 1,511 (+5496.3%)

Mutual labels: big-data

Geni

A Clojure dataframe library that runs on Spark

Stars: ✭ 152 (+462.96%)

Mutual labels: big-data

predictionio-sdk-ruby

PredictionIO Ruby SDK

Stars: ✭ 192 (+611.11%)

Mutual labels: big-data

Datasciencevm

Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)

Stars: ✭ 153 (+466.67%)

Mutual labels: big-data

couchdb-pkg

Apache CouchDB Packaging support files

Stars: ✭ 24 (-11.11%)

Mutual labels: big-data

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (+455.56%)

Mutual labels: big-data

acousticbrainz-server

The server components for the AcousticBrainz project

Stars: ✭ 128 (+374.07%)

Mutual labels: big-data

100daysofmlcode

My journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge.

Stars: ✭ 146 (+440.74%)

Mutual labels: big-data

awesome-tools

curated list of awesome tools and libraries for specific domains

Stars: ✭ 31 (+14.81%)

Mutual labels: big-data

Metamodel

Mirror of Apache Metamodel

Stars: ✭ 143 (+429.63%)

Mutual labels: big-data

predictionio-template-recommender

PredictionIO Recommendation Engine Template (Scala-based parallelized engine)

Stars: ✭ 80 (+196.3%)

Mutual labels: big-data

Big Data Study

🐳 big data study

Stars: ✭ 141 (+422.22%)

Mutual labels: big-data

Quantitative-Big-Imaging-2018

(Latest semester at https://github.com/kmader/Quantitative-Big-Imaging-2019) The material for the Quantitative Big Imaging course at ETHZ for the Spring Semester 2018

Stars: ✭ 50 (+85.19%)

Mutual labels: big-data

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (+418.52%)

Mutual labels: big-data

Koalas

Koalas: pandas API on Apache Spark

Stars: ✭ 3,044 (+11174.07%)

Mutual labels: big-data

Sparkling Graph

SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.

Stars: ✭ 139 (+414.81%)

Mutual labels: big-data

pg-logical-replication

PostgreSQL Logical Replication client for node.js

Stars: ✭ 56 (+107.41%)

Mutual labels: cdc

Spark On Lambda

Apache Spark on AWS Lambda

Stars: ✭ 137 (+407.41%)

Mutual labels: big-data

Cboard

An easy to use, self-service open BI reporting and BI dashboard platform.

Stars: ✭ 2,795 (+10251.85%)

Mutual labels: big-data

Attic Apex Malhar

Mirror of Apache Apex malhar

Stars: ✭ 131 (+385.19%)

Mutual labels: big-data

mmtf-spark

Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.

Stars: ✭ 20 (-25.93%)

Mutual labels: big-data

Calcite Avatica

Mirror of Apache Calcite - Avatica

Stars: ✭ 130 (+381.48%)

Mutual labels: big-data

Hyperspace

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.

Stars: ✭ 246 (+811.11%)

Mutual labels: big-data

Gaffer

A large-scale entity and relation database supporting aggregation of properties

Stars: ✭ 1,642 (+5981.48%)

Mutual labels: big-data

javaer-mind

Java 程序员进阶学习的思维导图

Stars: ✭ 66 (+144.44%)

Mutual labels: big-data

Tajo

Mirror of Apache Tajo

Stars: ✭ 128 (+374.07%)

Mutual labels: big-data

Trafodion

Apache Trafodion

Stars: ✭ 242 (+796.3%)

Mutual labels: big-data

Feast

Feature Store for Machine Learning

Stars: ✭ 2,576 (+9440.74%)

Mutual labels: big-data

Clustering4Ever

C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

Stars: ✭ 126 (+366.67%)

Mutual labels: big-data

Richdem

High-performance Terrain and Hydrology Analysis

Stars: ✭ 127 (+370.37%)

Mutual labels: big-data

Selinon

An advanced distributed task flow management on top of Celery

Stars: ✭ 237 (+777.78%)

Mutual labels: big-data

Hazelcast Nodejs Client

Hazelcast IMDG Node.js Client

Stars: ✭ 124 (+359.26%)

Mutual labels: big-data

lidbox

End-to-end spoken language identification out of the box.

Stars: ✭ 39 (+44.44%)

Mutual labels: big-data

Scala Spark Tutorial

Project for James' Apache Spark with Scala course

Stars: ✭ 121 (+348.15%)

Mutual labels: big-data

Books

整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据、推荐系统、数据库、数据挖掘、机器学习、深度学习、算法等。

Stars: ✭ 222 (+722.22%)

Mutual labels: big-data

Hdfs Shell

HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS

Stars: ✭ 117 (+333.33%)

Mutual labels: big-data

predictionio-sdk-python

PredictionIO Python SDK

Stars: ✭ 199 (+637.04%)

Mutual labels: big-data

Cmak

CMAK is a tool for managing Apache Kafka clusters

Stars: ✭ 10,544 (+38951.85%)

Mutual labels: big-data

Nakedtensor

Bare bone examples of machine learning in TensorFlow

Stars: ✭ 2,443 (+8948.15%)

Mutual labels: big-data

bigdata-doc

大数据学习笔记，学习路线，技术案例整理。

Stars: ✭ 37 (+37.04%)

Mutual labels: mapreduce

Pythondata

repo for code published on pythondata.com

Stars: ✭ 113 (+318.52%)

Mutual labels: big-data

accumulo-docker

Apache Accumulo Docker

Stars: ✭ 17 (-37.04%)

Mutual labels: big-data

Awkward 0.x

Manipulate arrays of complex data structures as easily as Numpy.

Stars: ✭ 216 (+700%)

Mutual labels: big-data

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (+696.3%)

Mutual labels: big-data

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (+700%)

Mutual labels: big-data

data-viz-utils

Functions for easily making publication-quality figures with matplotlib.

Stars: ✭ 16 (-40.74%)

Mutual labels: big-data

61-120 of 451 similar projects

‹

›

next*5