编程电子书，电子书，编程书籍，包括C，C#，Docker，Elasticsearch，Git，Hadoop，HeadFirst，Java，Javascript，jvm，Kafka，Linux，Maven，MongoDB，MyBatis，MySQL，Netty，Nginx，Python，RabbitMQ，Redis，Scala，Solr，Spark，Spring，SpringBoot，SpringCloud，TCPIP，Tomcat，Zookeeper，人工智能，大数据类，并发编程，数据库类，数据挖掘，新面试题，架构设计，算法系列，计算机类，设计模式，软件测试，重构优化，等更多分类

Stars: ✭ 12,009 (+9.26%)

Mutual labels: spark, hadoop

Spark Website

Apache Spark Website

Stars: ✭ 75 (-99.32%)

Mutual labels: spark, big-data

Operators

Collection of Kubernetes Operators built with KUDO.

Stars: ✭ 175 (-98.41%)

Mutual labels: zookeeper, kafka

kafka-connect-fs

Kafka Connect FileSystem Connector

Stars: ✭ 107 (-99.03%)

Mutual labels: hadoop, hdfs

Cdap

An open source framework for building data analytic applications.

Stars: ✭ 509 (-95.37%)

Mutual labels: spark, mapreduce

Bigslice

A serverless cluster computing system for the Go programming language

Stars: ✭ 469 (-95.73%)

Mutual labels: bigdata, mapreduce

Magellan

Geo Spatial Data Analytics on Spark

Stars: ✭ 507 (-95.39%)

Mutual labels: spark, big-data

Cuesheet

A framework for writing Spark 2.x applications in a pretty way

Stars: ✭ 86 (-99.22%)

Mutual labels: spark, yarn

phoenix-queryserver

Apache Phoenix Query Server

Stars: ✭ 33 (-99.7%)

Mutual labels: phoenix, big-data

Clustering4Ever

C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

Stars: ✭ 126 (-98.85%)

Mutual labels: big-data, bigdata

smart-data-lake

Smart Automation Tool for building modern Data Lakes and Data Pipelines

Stars: ✭ 79 (-99.28%)

Mutual labels: hive, hadoop

lectures-hse-spark

Масштабируемое машинное обучение и анализ больших данных с Apache Spark

Stars: ✭ 20 (-99.82%)

Mutual labels: bigdata, mapreduce

Books Recommendation

程序员进阶书籍（视频），持续更新（Programmer Books）

Stars: ✭ 558 (-94.92%)

Mutual labels: zookeeper, kafka

Cleanframes

type-class based data cleansing library for Apache Spark SQL

Stars: ✭ 75 (-99.32%)

Mutual labels: spark, bigdata

Streamx

kafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)

Stars: ✭ 96 (-99.13%)

Mutual labels: kafka, big-data

Javakeeper

✍️ Java 工程师必备架构体系知识总结：涵盖分布式、微服务、RPC等互联网公司常用架构，以及数据存储、缓存、搜索等必备技能

Stars: ✭ 502 (-95.43%)

Mutual labels: zookeeper, kafka

Alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

Stars: ✭ 5,379 (-51.06%)

Mutual labels: spark, hadoop

Hadoop Yarn Api Python Client

Python client for Hadoop® YARN API

Stars: ✭ 91 (-99.17%)

Mutual labels: hadoop, yarn

HDFS-Netdisc

基于Hadoop的分布式云存储系统 🌴

Stars: ✭ 56 (-99.49%)

Mutual labels: hadoop, hdfs

nifi

Deploy a secured, clustered, auto-scaling NiFi service in AWS.

Stars: ✭ 37 (-99.66%)

Mutual labels: big-data, zookeeper

HadoopDedup

🍉基于Hadoop和HBase的大规模海量数据去重

Stars: ✭ 27 (-99.75%)

Mutual labels: big-data, mapreduce

Data-pipeline-project

Data pipeline project

Stars: ✭ 18 (-99.84%)

Mutual labels: hadoop, mapreduce

hbase-meta-repair

Repair hbase metadata table from hdfs.

Stars: ✭ 36 (-99.67%)

Mutual labels: hbase, hdfs

Useractionanalyzeplatform

电商用户行为分析大数据平台

Stars: ✭ 645 (-94.13%)

Mutual labels: spark, hadoop

disk

基于hadoop+hbase+springboot实现分布式网盘系统

Stars: ✭ 53 (-99.52%)

Mutual labels: hadoop, hbase

Kafka Streams

equivalent to kafka-streams 🐙 for nodejs ✨🐢🚀✨

Stars: ✭ 613 (-94.42%)

Mutual labels: kafka, big-data

Freestyle

A cohesive & pragmatic framework of FP centric Scala libraries

Stars: ✭ 627 (-94.3%)

Mutual labels: kafka, spark

Scriptis

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Stars: ✭ 696 (-93.67%)

Mutual labels: spark, hive

Interview Questions Collection

按知识领域整理面试题，包括C++、Java、Hadoop、机器学习等

Stars: ✭ 21 (-99.81%)

Mutual labels: spark, hadoop

mango

Core utility library & data connectors designed for simpler usage in Scala

Stars: ✭ 41 (-99.63%)

Mutual labels: hbase, zookeeper

Real Time Social Media Mining

DevOps pipeline for Real Time Social/Web Mining

Stars: ✭ 22 (-99.8%)

Mutual labels: big-data, hdfs

liquibase-impala

Liquibase extension to add Impala Database support

Stars: ✭ 23 (-99.79%)

Mutual labels: hive, hadoop

darwin

Avro Schema Evolution made easy

Stars: ✭ 26 (-99.76%)

Mutual labels: hadoop, hbase

hadoop-etl-udfs

The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL

Stars: ✭ 17 (-99.85%)

Mutual labels: hive, hadoop

Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Stars: ✭ 47 (-99.57%)

Mutual labels: big-data, hadoop

Storm

Mirror of Apache Storm

Stars: ✭ 6,297 (-42.71%)

Mutual labels: big-data, storm

Stream Reactor

Streaming reference architecture for ETL with Kafka and Kafka-Connect. You can find more on http://lenses.io on how we provide a unified solution to manage your connectors, most advanced SQL engine for Kafka and Kafka Streams, cluster monitoring and alerting, and more.

Stars: ✭ 753 (-93.15%)

Mutual labels: kafka, hbase

Sitewhere

SiteWhere is an industrial strength open-source application enablement platform for the Internet of Things (IoT). It provides a multi-tenant microservice-based infrastructure that includes device/asset management, data ingestion, big-data storage, and integration through a modern, scalable architecture. SiteWhere provides REST APIs for all system functionality. SiteWhere provides SDKs for many common device platforms including Android, iOS, Arduino, and any Java-capable platform such as Raspberry Pi rapidly accelerating the speed of innovation.

Stars: ✭ 788 (-92.83%)

Mutual labels: zookeeper, kafka

Coding Now

学习记录的一些笔记，以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等

Stars: ✭ 750 (-93.18%)

Mutual labels: spark, bigdata

Bigdata File Viewer

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

Stars: ✭ 86 (-99.22%)

Mutual labels: bigdata, hdfs

ros hadoop

Hadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.

Stars: ✭ 92 (-99.16%)

Mutual labels: hadoop, hdfs

cobra-policytool

Manage Apache Atlas and Ranger configuration for your Hadoop environment.

Stars: ✭ 16 (-99.85%)

Mutual labels: hive, hadoop

clusterdock

clusterdock is a framework for creating Docker-based container clusters

Stars: ✭ 26 (-99.76%)

Mutual labels: big-data, hadoop

BookRecommenderSystem

基于大数据的图书推荐系统

Stars: ✭ 30 (-99.73%)

Mutual labels: flume, azkaban

Bandar Log

Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.

Stars: ✭ 19 (-99.83%)

Mutual labels: kafka, big-data

Kylo

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

Stars: ✭ 916 (-91.67%)

Mutual labels: spark, hadoop

Stormtweetssentimentd3viz

Computes and visualizes the sentiment analysis of tweets of US States in real-time using Storm.

Stars: ✭ 25 (-99.77%)

Mutual labels: hadoop, storm

Hazelcast Jet

Distributed Stream and Batch Processing

Stars: ✭ 855 (-92.22%)

Mutual labels: kafka, big-data

Awesome Recommendation Engine

The purpose of this tiny project is to put things together with the know how that i learned from the course big data expert from formacionhadoop.com The idea is to show how to play with apache spark streaming, kafka,mongo, spark machine learning algorithms.

Stars: ✭ 47 (-99.57%)

Mutual labels: kafka, spark

Spark R Notebooks

R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 109 (-99.01%)

Mutual labels: big-data, bigdata

Genie

Distributed Big Data Orchestration Service

Stars: ✭ 1,544 (-85.95%)

Mutual labels: big-data, bigdata

Gpmall

【咕泡学院实战项目】-基于SpringBoot+Dubbo构建的电商平台-微服务架构、商城、电商、微服务、高并发、kafka、Elasticsearch