A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (-7.81%)

Mutual labels: big-data

Setl

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (-58.85%)

Mutual labels: big-data

Richdem

High-performance Terrain and Hydrology Analysis

Stars: ✭ 127 (-33.85%)

Mutual labels: big-data

Spark Website

Apache Spark Website

Stars: ✭ 75 (-60.94%)

Mutual labels: big-data

Metamodel

Mirror of Apache Metamodel

Stars: ✭ 143 (-25.52%)

Mutual labels: big-data

Labs

Research on distributed system

Stars: ✭ 73 (-61.98%)

Mutual labels: big-data

Hazelcast Nodejs Client

Hazelcast IMDG Node.js Client

Stars: ✭ 124 (-35.42%)

Mutual labels: big-data

My Journey In The Data Science World

📢 Ready to learn or review your knowledge!

Stars: ✭ 1,175 (+511.98%)

Mutual labels: big-data

Fluo

Apache Fluo

Stars: ✭ 159 (-17.19%)

Mutual labels: big-data

Appdocs

Application Performance Optimization Summary

Stars: ✭ 1,169 (+508.85%)

Mutual labels: big-data

Scala Spark Tutorial

Project for James' Apache Spark with Scala course

Stars: ✭ 121 (-36.98%)

Mutual labels: big-data

Carbondata

Mirror of Apache CarbonData

Stars: ✭ 1,158 (+503.13%)

Mutual labels: big-data

Big Data Study

🐳 big data study

Stars: ✭ 141 (-26.56%)

Mutual labels: big-data

Flink Shaded

Apache Flink shaded artifacts repository

Stars: ✭ 67 (-65.1%)

Mutual labels: big-data

Hdfs Shell

HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS

Stars: ✭ 117 (-39.06%)

Mutual labels: big-data

Cloud Volume

Read and write Neuroglancer datasets programmatically.

Stars: ✭ 63 (-67.19%)

Mutual labels: big-data

Presto Go Client

A Presto client for the Go programming language.

Stars: ✭ 183 (-4.69%)

Mutual labels: big-data

Warp

Convert and analyze large data sets at light speed, on Mac and iOS.

Stars: ✭ 62 (-67.71%)

Mutual labels: big-data

Cmak

CMAK is a tool for managing Apache Kafka clusters

Stars: ✭ 10,544 (+5391.67%)

Mutual labels: big-data

Verticapy

VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.

Stars: ✭ 59 (-69.27%)

Mutual labels: big-data

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (-27.08%)

Mutual labels: big-data

Ymcache

YMCache is a lightweight object caching solution for iOS and Mac OS X that is designed for highly parallel access scenarios.

Stars: ✭ 58 (-69.79%)

Mutual labels: big-data

Asakusafw

Asakusa Framework

Stars: ✭ 114 (-40.62%)

Mutual labels: big-data

Kibble 1

Apache Kibble - a tool to collect, aggregate and visualize data about any software project

Stars: ✭ 54 (-71.87%)

Mutual labels: big-data

Geni

A Clojure dataframe library that runs on Spark

Stars: ✭ 152 (-20.83%)

Mutual labels: big-data

Macro ml

Course Website on Macroeconomic Analysis with Machine Learning and Big Data

Stars: ✭ 53 (-72.4%)

Mutual labels: big-data

Pythondata

repo for code published on pythondata.com

Stars: ✭ 113 (-41.15%)

Mutual labels: big-data

Datumbox Framework

Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

Stars: ✭ 1,063 (+453.65%)

Mutual labels: big-data

Sparkling Graph

SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.

Stars: ✭ 139 (-27.6%)

Mutual labels: big-data

Traildb

TrailDB is an efficient tool for storing and querying series of events

Stars: ✭ 1,029 (+435.94%)

Mutual labels: big-data

Genie

Distributed Big Data Orchestration Service

Stars: ✭ 1,544 (+704.17%)

Mutual labels: big-data

Couchdb Couch

Mirror of Apache CouchDB

Stars: ✭ 43 (-77.6%)

Mutual labels: big-data

Keyvi

Keyvi - a key value index that powers Cliqz search engine. It is an in-memory FST-based data structure highly optimized for size and lookup performance.

Stars: ✭ 171 (-10.94%)

Mutual labels: big-data

Egads

A Java package to automatically detect anomalies in large scale time-series data

Stars: ✭ 997 (+419.27%)

Mutual labels: big-data

Spark R Notebooks

R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 109 (-43.23%)

Mutual labels: big-data

Esper Tv

Esper instance for TV news analysis