定期更新Hadoop生态圈中常用大数据组件文档重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图印象笔记 Scala版本简单demo 常用工具类去敏后的train code 持续更新!!!)

Stars: ✭ 567 (+626.92%)

Mutual labels: hadoop

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (-8.97%)

Mutual labels: spark

Spark Daria

Essential Spark extensions and helper methods ✨😲

Stars: ✭ 553 (+608.97%)

Mutual labels: spark

Vagrant Projects

Vagrant projects for various use-cases with Spark, Zeppelin, IPython / Jupyter, SparkR

Stars: ✭ 34 (-56.41%)

Mutual labels: spark

Lopq

Training of Locally Optimized Product Quantization (LOPQ) models for approximate nearest neighbor search of high dimensional data in Python and Spark.

Stars: ✭ 530 (+579.49%)

Mutual labels: spark

Zemberek Nlp Server

Zemberek Türkçe NLP Java Kütüphanesi üzerine REST Docker Sunucu

Stars: ✭ 60 (-23.08%)

Mutual labels: spark

Cdap

An open source framework for building data analytic applications.

Stars: ✭ 509 (+552.56%)

Mutual labels: spark

Akkeeper

An easy way to deploy your Akka services to a distributed environment.

Stars: ✭ 30 (-61.54%)

Mutual labels: hadoop

Bigdata

💎🔥大数据学习笔记

Stars: ✭ 488 (+525.64%)

Mutual labels: hadoop

Docker Hadoop

Apache Hadoop docker image

Stars: ✭ 1,190 (+1425.64%)

Mutual labels: hadoop

School Of Sre

At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.

Stars: ✭ 5,141 (+6491.03%)

Mutual labels: hadoop

Sparkmagic

Jupyter magics and kernels for working with remote Spark clusters

Stars: ✭ 954 (+1123.08%)

Mutual labels: spark

Base

https://www.researchgate.net/profile/Rajah_Iyer

Stars: ✭ 48 (-38.46%)

Mutual labels: hadoop

Yandex Big Data Engineering

Stars: ✭ 17 (-78.21%)

Mutual labels: spark

Pyspark Examples

Code examples on Apache Spark using python

Stars: ✭ 58 (-25.64%)

Mutual labels: spark

Parquet Generator

Parquet file generator

Stars: ✭ 16 (-79.49%)

Mutual labels: spark

Atsd

Axibase Time Series Database Documentation

Stars: ✭ 68 (-12.82%)

Mutual labels: hadoop

Labs

Research on distributed system

Stars: ✭ 73 (-6.41%)

Mutual labels: spark

Jumbune

Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,

Stars: ✭ 64 (-17.95%)

Mutual labels: hadoop

Awesome Recommendation Engine

The purpose of this tiny project is to put things together with the know how that i learned from the course big data expert from formacionhadoop.com The idea is to show how to play with apache spark streaming, kafka,mongo, spark machine learning algorithms.

Stars: ✭ 47 (-39.74%)

Mutual labels: spark

Big Data Scala Spark

Coursera's big data course with Scala and Spark

Stars: ✭ 16 (-79.49%)

Mutual labels: spark

High Performance Spark Examples

Examples for High Performance Spark

Stars: ✭ 436 (+458.97%)

Mutual labels: spark

Heracles

High performance HBase / Spark SQL engine

Stars: ✭ 27 (-65.38%)

Mutual labels: spark

Dji Firmware Tools

Tools for handling firmwares of DJI products, with focus on quadcopters.

Stars: ✭ 424 (+443.59%)

Mutual labels: spark

Featran

A Scala feature transformation library for data science and machine learning

Stars: ✭ 420 (+438.46%)

Mutual labels: spark

Spark Website

Apache Spark Website

Stars: ✭ 75 (-3.85%)

Mutual labels: spark

Agile data code 2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Stars: ✭ 413 (+429.49%)

Mutual labels: spark

Tedsds

Apache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark

Stars: ✭ 14 (-82.05%)

Mutual labels: spark

Spark As Service Using Embedded Server

This application comes as Spark2.1-as-Service-Provider using an embedded, Reactive-Streams-based, fully asynchronous HTTP server

Stars: ✭ 46 (-41.03%)

Mutual labels: spark

Sparkling Water

Sparkling Water provides H2O functionality inside Spark cluster

Stars: ✭ 887 (+1037.18%)

Mutual labels: spark

Awesome Pulsar

A curated list of Pulsar tools, integrations and resources.

Stars: ✭ 57 (-26.92%)

Mutual labels: spark

Cdc Kafka Hadoop

MySQL to NoSQL real time dataflow

Stars: ✭ 13 (-83.33%)

Mutual labels: hadoop

Kontextfrei

Writing application logic for Spark jobs that can be unit-tested without a SparkContext

Stars: ✭ 67 (-14.1%)

Mutual labels: spark

W2v

Word2Vec models with Twitter data using Spark. Blog:

Stars: ✭ 64 (-17.95%)

Mutual labels: spark

Spark Tda

SparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.

Stars: ✭ 45 (-42.31%)

Mutual labels: spark

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

Stars: ✭ 5 (-93.59%)

Mutual labels: hadoop

Sparkling Titanic

Training models with Apache Spark, PySpark for Titanic Kaggle competition

Stars: ✭ 12 (-84.62%)

Mutual labels: spark

Ignite

Apache Ignite

Stars: ✭ 4,027 (+5062.82%)

Mutual labels: hadoop

Pulsar Spark

When Apache Pulsar meets Apache Spark

Stars: ✭ 55 (-29.49%)

Mutual labels: spark

Mare

MaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.

Stars: ✭ 11 (-85.9%)

Mutual labels: spark

Lpa Detector

Optimize and improve the Label propagation algorithm

Stars: ✭ 75 (-3.85%)

Mutual labels: spark

Goodreads etl pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Stars: ✭ 793 (+916.67%)

Mutual labels: spark

Moosefs

MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)

Stars: ✭ 1,025 (+1214.1%)

Mutual labels: hadoop

Spark Redis

A connector for Spark that allows reading and writing to/from Redis cluster

Stars: ✭ 773 (+891.03%)

Mutual labels: spark

Utils4s

scala、spark使用过程中，各种测试用例以及相关资料整理

Stars: ✭ 1,070 (+1271.79%)

Mutual labels: spark

Sparklyr

R interface for Apache Spark

Stars: ✭ 775 (+893.59%)

Mutual labels: spark

Ds Cheatsheets

List of Data Science Cheatsheets to rule the world

Stars: ✭ 9,452 (+12017.95%)

Mutual labels: spark

Kamu Cli

Next generation tool for decentralized exchange and transformation of semi-structured data

Stars: ✭ 69 (-11.54%)

Mutual labels: spark

Pyspark Twitter Stream Mining

Real-time Machine Learning with Apache Spark on Twitter Public Stream

Stars: ✭ 64 (-17.95%)

Mutual labels: spark

Delta Architecture

Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline

Stars: ✭ 43 (-44.87%)

Mutual labels: spark

Angel

A Flexible and Powerful Parameter Server for large-scale machine learning

Stars: ✭ 6,458 (+8179.49%)

Mutual labels: spark

Coding Now

学习记录的一些笔记，以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等

Stars: ✭ 750 (+861.54%)

Mutual labels: spark

Spark Examples

Spark examples

Stars: ✭ 41 (-47.44%)

Mutual labels: spark

Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (+855.13%)

Mutual labels: spark

121-180 of 575 similar projects