Apache Flink is an open source project of The Apache Software Foundation (ASF). The Apache Flink project originated from the Stratosphere research project.

Stars: ✭ 17,781 (+8170.23%)

Mutual labels: big-data

Mobydq

🐳 Tool to automate data quality checks on data pipelines

Stars: ✭ 123 (-42.79%)

Mutual labels: big-data

Spark Druid Olap

Sparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.

Stars: ✭ 282 (+31.16%)

Mutual labels: spark

Flink Shaded

Apache Flink shaded artifacts repository

Stars: ✭ 67 (-68.84%)

Mutual labels: big-data

Dvid

Distributed, Versioned, Image-oriented Dataservice

Stars: ✭ 174 (-19.07%)

Mutual labels: big-data

Vue Info Card

Simple and beautiful card component with an elegant spark line, for VueJS.

Stars: ✭ 159 (-26.05%)

Mutual labels: spark

Relation extraction

Relation Extraction using Deep learning(CNN)

Stars: ✭ 96 (-55.35%)

Mutual labels: spark

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+194.42%)

Mutual labels: spark

Cloudflow

Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.

Stars: ✭ 278 (+29.3%)

Mutual labels: spark

Thingsboard

Open-source IoT Platform - Device management, data collection, processing and visualization.

Stars: ✭ 10,526 (+4795.81%)

Mutual labels: spark

Spark Infotheoretic Feature Selection

This package contains a generic implementation of greedy Information Theoretic Feature Selection (FS) methods. The implementation is based on the common theoretic framework presented by Gavin Brown. Implementations of mRMR, InfoGain, JMI and other commonly used FS filters are provided.

Stars: ✭ 123 (-42.79%)

Mutual labels: spark

Spark Bigquery

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.

Stars: ✭ 65 (-69.77%)

Mutual labels: spark

Ldetool

Code generator for fast log file parsers

Stars: ✭ 273 (+26.98%)

Mutual labels: bigdata

Spark Tsne

Distributed t-SNE via Apache Spark

Stars: ✭ 151 (-29.77%)

Mutual labels: spark

Datavec

ETL Library for Machine Learning - data pipelines, data munging and wrangling

Stars: ✭ 272 (+26.51%)

Mutual labels: spark

W2v

Word2Vec models with Twitter data using Spark. Blog:

Stars: ✭ 64 (-70.23%)

Mutual labels: spark

Hadoop Mini Clusters

hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE

Stars: ✭ 265 (+23.26%)

Mutual labels: hadoop

Dynamometer

A tool for scale and performance testing of HDFS with a specific focus on the NameNode.

Stars: ✭ 122 (-43.26%)

Mutual labels: hadoop

Pysparkgeoanalysis

🌐 Interactive Workshop on GeoAnalysis using PySpark

Stars: ✭ 63 (-70.7%)

Mutual labels: spark

Facebook Hive Udfs

Facebook's Hive UDFs

Stars: ✭ 213 (-0.93%)

Mutual labels: hadoop

Data Science Career

Career Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository

Stars: ✭ 630 (+193.02%)

Mutual labels: big-data

Freestyle

A cohesive & pragmatic framework of FP centric Scala libraries

Stars: ✭ 627 (+191.63%)

Mutual labels: spark

Tony

TonY is a framework to natively run deep learning frameworks on Apache Hadoop.

Stars: ✭ 626 (+191.16%)

Mutual labels: hadoop

Accelerator

The Accelerator is a tool for fast and reproducible processing of large amounts of data.

Stars: ✭ 137 (-36.28%)

Mutual labels: big-data

Treeviz

Tree diagrams with JavaScript 🌲 📈

Stars: ✭ 95 (-55.81%)

Mutual labels: big-data

Sdc

Intel® Scalable Dataframe Compiler for Pandas*

Stars: ✭ 623 (+189.77%)

Mutual labels: big-data

DetEdit

A graphical user interface for annotating and editing events detected in long-term acoustic monitoring data

Stars: ✭ 20 (-90.7%)

Mutual labels: bigdata

Warp

Convert and analyze large data sets at light speed, on Mac and iOS.

Stars: ✭ 62 (-71.16%)

Mutual labels: big-data

sparkProjectTemplate.g8

Template for Spark Projects

Stars: ✭ 77 (-64.19%)

Mutual labels: spark

Benchm Ml

A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).

Stars: ✭ 1,835 (+753.49%)

Mutual labels: spark

kafka-spark-streaming-zeppelin-docker

One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)

Stars: ✭ 82 (-61.86%)

Mutual labels: spark

Silex

something to help you spark

Stars: ✭ 61 (-71.63%)

Mutual labels: spark

Zparkio

Boiler plate framework to use Spark and ZIO together.

Stars: ✭ 121 (-43.72%)

Mutual labels: spark

Data Science Cookbook

🎓 Jupyter notebooks from UFC data science course

Stars: ✭ 60 (-72.09%)

Mutual labels: spark

Javapdf

🍣100本 Java电子书技术书籍PDF(以下载阅读为荣，以点赞收藏为耻)

Stars: ✭ 609 (+183.26%)

Mutual labels: hadoop

daf-kylo

Kylo integration with PDND (previously DAF).

Stars: ✭ 20 (-90.7%)

Mutual labels: spark

Spark Nlp

State of the Art Natural Language Processing

Stars: ✭ 2,518 (+1071.16%)

Mutual labels: spark

Spotify-Song-Recommendation-ML

UC Berkeley team's submission for RecSys Challenge 2018

Stars: ✭ 70 (-67.44%)

Mutual labels: spark

Verticapy

VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.

Stars: ✭ 59 (-72.56%)

Mutual labels: big-data

Covid19 Market Waiting Times

A project to help people stand in line at the market as little as possible

Stars: ✭ 95 (-55.81%)

Mutual labels: bigdata

Kafka Streams

equivalent to kafka-streams 🐙 for nodejs ✨🐢🚀✨

Stars: ✭ 613 (+185.12%)

Mutual labels: big-data

Example Spark Kafka

Apache Spark and Apache Kafka integration example

Stars: ✭ 120 (-44.19%)

Mutual labels: spark

Attic Predictionio Sdk Python

PredictionIO Python SDK

Stars: ✭ 196 (-8.84%)

Mutual labels: big-data

Roaringbitmap

A better compressed bitset in Java

Stars: ✭ 2,460 (+1044.19%)

Mutual labels: spark

Java Notes

☕️ Java 基础 👫 面向对象思想✏️ 算法 📝 操作系统 ☁️ 网络 💾 数据库 🙊 Spring 💡 系统架构🐘大数据

Stars: ✭ 160 (-25.58%)

Mutual labels: bigdata

Apache Spark Node

Node.js bindings for Apache Spark DataFrame APIs

Stars: ✭ 136 (-36.74%)

Mutual labels: spark

Dev Setup

macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.

Stars: ✭ 5,590 (+2500%)

Mutual labels: spark

Wifi

基于wifi抓取信息的大数据查询分析系统

Stars: ✭ 93 (-56.74%)

Mutual labels: hadoop

Datafusion

DataFusion has now been donated to the Apache Arrow project

Stars: ✭ 611 (+184.19%)

Mutual labels: spark

Hbaseclient

HBase客户端数据管理软件