a one-stop repo to lookup for code snippets of core java concepts, sql, data structures as well as big data. It also consists of interview questions asked in real-life.

Stars: ✭ 25 (-79.17%)

Mutual labels: spark-streaming

Waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.

Stars: ✭ 60 (-50%)

Mutual labels: spark

open-stream-processing-benchmark

This repository contains the code base for the Open Stream Processing Benchmark.

Stars: ✭ 37 (-69.17%)

Mutual labels: spark-streaming

Useractionanalyzeplatform

电商用户行为分析大数据平台

Stars: ✭ 645 (+437.5%)

Mutual labels: spark

Tweet-Analysis-With-Kafka-and-Spark

A real time analytics dashboard to analyze the trending hashtags and @ mentions at any location using kafka and spark streaming.

Stars: ✭ 18 (-85%)

Mutual labels: spark-streaming

Spark Lucenerdd

Spark RDD with Lucene's query and entity linkage capabilities

Stars: ✭ 114 (-5%)

Mutual labels: spark

Spark ALS

基于spark-ml,spark-mllib,spark-streaming的推荐算法实现

Stars: ✭ 89 (-25.83%)

Mutual labels: spark-streaming

Freestyle

A cohesive & pragmatic framework of FP centric Scala libraries

Stars: ✭ 627 (+422.5%)

Mutual labels: spark

fdp-modelserver

An umbrella project for multiple implementations of model serving

Stars: ✭ 47 (-60.83%)

Mutual labels: spark-streaming

Zemberek Nlp Server

Zemberek Türkçe NLP Java Kütüphanesi üzerine REST Docker Sunucu

Stars: ✭ 60 (-50%)

Mutual labels: spark

bigdatatutorial

Stars: ✭ 34 (-71.67%)

Mutual labels: spark-streaming

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+4613.33%)

Mutual labels: spark

Spark Nlp Models

Models and Pipelines for the Spark NLP library

Stars: ✭ 88 (-26.67%)

Mutual labels: spark

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+4494.17%)

Mutual labels: spark

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-79.17%)

Mutual labels: spark

Hyperspace

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.

Stars: ✭ 246 (+105%)

Mutual labels: spark

Rumble

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (-51.67%)

Mutual labels: spark

Dpark

Python clone of Spark, a MapReduce alike framework in Python

Stars: ✭ 2,668 (+2123.33%)

Mutual labels: spark

Alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

Stars: ✭ 5,379 (+4382.5%)

Mutual labels: spark

Video Stream Analytics

Stars: ✭ 240 (+100%)

Mutual labels: spark

Seldon Server

Machine Learning Platform and Recommendation Engine built on Kubernetes

Stars: ✭ 1,435 (+1095.83%)

Mutual labels: spark

Azure Event Hubs

☁️ Cloud-scale telemetry ingestion from any stream of data with Azure Event Hubs

Stars: ✭ 233 (+94.17%)

Mutual labels: spark

Streaming Readings

Streaming System 相关的论文读物

Stars: ✭ 554 (+361.67%)

Mutual labels: spark-streaming

Installations mac ubuntu windows

Installations for Data Science. Anaconda, RStudio, Spark, TensorFlow, AWS (Amazon Web Services).

Stars: ✭ 231 (+92.5%)

Mutual labels: spark

Model Serving Tutorial

Code and presentation for Strata Model Serving tutorial

Stars: ✭ 57 (-52.5%)

Mutual labels: spark

Spark.fish

▁▂▄▆▇█▇▆▄▂▁

Stars: ✭ 229 (+90.83%)

Mutual labels: spark

Justenoughscalaforspark

A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.

Stars: ✭ 538 (+348.33%)

Mutual labels: spark

Ruby Spark

Ruby wrapper for Apache Spark

Stars: ✭ 221 (+84.17%)

Mutual labels: spark

Laravel Spark Google2fa

Google Authenticator support for Laravel Spark

Stars: ✭ 86 (-28.33%)

Mutual labels: spark

Mlfeature

Feature engineering toolkit for Spark MLlib.

Stars: ✭ 12 (-90%)

Mutual labels: spark

daf-kylo

Kylo integration with PDND (previously DAF).

Stars: ✭ 20 (-83.33%)

Mutual labels: spark

Net.jgp.labs.spark

Apache Spark examples exclusively in Java

Stars: ✭ 55 (-54.17%)

Mutual labels: spark

dllib

dllib is a distributed deep learning library running on Apache Spark

Stars: ✭ 32 (-73.33%)

Mutual labels: spark

Labs

Research on distributed system

Stars: ✭ 73 (-39.17%)

Mutual labels: spark

Mare

MaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.

Stars: ✭ 11 (-90.83%)

Mutual labels: spark

Spotify-Song-Recommendation-ML

UC Berkeley team's submission for RecSys Challenge 2018

Stars: ✭ 70 (-41.67%)

Mutual labels: spark

Magellan

Geo Spatial Data Analytics on Spark

Stars: ✭ 507 (+322.5%)

Mutual labels: spark

Javaorbigdata Interview

Java开发者或者大数据开发者面试知识点整理

Stars: ✭ 203 (+69.17%)

Mutual labels: spark

Elephas

Distributed Deep learning with Keras & Spark

Stars: ✭ 1,521 (+1167.5%)

Mutual labels: spark

Spark Practice

Apache Spark (PySpark) Practice on Real Data

Stars: ✭ 200 (+66.67%)

Mutual labels: spark

Pdf

编程电子书，电子书，编程书籍，包括C，C#，Docker，Elasticsearch，Git，Hadoop，HeadFirst，Java，Javascript，jvm，Kafka，Linux，Maven，MongoDB，MyBatis，MySQL，Netty，Nginx，Python，RabbitMQ，Redis，Scala，Solr，Spark，Spring，SpringBoot，SpringCloud，TCPIP，Tomcat，Zookeeper，人工智能，大数据类，并发编程，数据库类，数据挖掘，新面试题，架构设计，算法系列，计算机类，设计模式，软件测试，重构优化，等更多分类

Stars: ✭ 12,009 (+9907.5%)

Mutual labels: spark

Scanns

A scalable nearest neighbor search library in Apache Spark

Stars: ✭ 190 (+58.33%)

Mutual labels: spark

Docker Hadoop

A Docker container with a full Hadoop cluster setup with Spark and Zeppelin

Stars: ✭ 54 (-55%)

Mutual labels: spark

Azuredatabricksbestpractices

Version 1 of Technical Best Practices of Azure Databricks based on real world Customer and Technical SME inputs

Stars: ✭ 186 (+55%)

Mutual labels: spark

Bdp Dataplatform

大数据生态解决方案数据平台：基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。

Stars: ✭ 456 (+280%)

Mutual labels: spark

spark learning

尚硅谷大数据Spark-2019版最新 Spark 学习

Stars: ✭ 42 (-65%)

Mutual labels: spark

Sparkjni

A heterogeneous Apache Spark framework.

Stars: ✭ 11 (-90.83%)

Mutual labels: spark

spark-data-sources

Developing Spark External Data Sources using the V2 API

Stars: ✭ 36 (-70%)

Mutual labels: spark

prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

Stars: ✭ 54 (-55%)

Mutual labels: spark

Bigdataie

大数据博客、笔试题、教程、项目、面经的整理

Stars: ✭ 445 (+270.83%)

Mutual labels: spark

Elassandra

Elassandra = Elasticsearch + Apache Cassandra

Stars: ✭ 1,610 (+1241.67%)

Mutual labels: spark

Cube.js

📊 Cube — Open-Source Analytics API for Building Data Apps