All Projects → Bigdata Notes → Similar Projects or Alternatives

2100 Open source projects that are alternatives of or similar to Bigdata Notes

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (-93.22%)

flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例，还有 Flink 落地应用的大型项目案例（PVUV、日志存储、百亿数据实时去重、监控告警）分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》

Stars: ✭ 11,378 (+3.52%)

Mutual labels: kafka, spark, hbase

Spring Boot 2.x Examples

Spring Boot 2.x code examples

Stars: ✭ 104 (-99.05%)

Mutual labels: kafka, storm, hbase

Aliyun Emapreduce Datasources

Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.

Stars: ✭ 132 (-98.8%)

Mutual labels: kafka, spark, hadoop

Yandex Big Data Engineering

Stars: ✭ 17 (-99.85%)

Mutual labels: spark, mapreduce, hdfs

Ibis

A pandas-like deferred expression system, with first-class SQL support

Stars: ✭ 1,630 (-85.17%)

Mutual labels: hadoop, hdfs, spark

Bigdata practice

大数据分析可视化实践

Stars: ✭ 166 (-98.49%)

Mutual labels: kafka, bigdata, hive

Azure Event Hubs Spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Stars: ✭ 140 (-98.73%)

Mutual labels: kafka, spark, bigdata

Mobius

C# and F# language binding and extensions to Apache Spark

Stars: ✭ 929 (-91.55%)

Mutual labels: spark, bigdata, mapreduce

Avro Hadoop Starter

Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.

Stars: ✭ 110 (-99%)

Mutual labels: hadoop, hive, mapreduce

Recommendsys

推荐项目（实时推荐和离线推荐）

Stars: ✭ 198 (-98.2%)

Mutual labels: kafka, hadoop, storm

Data Accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (-97.75%)

Mutual labels: kafka, spark, big-data

Sparkstreaming

💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算)；🚀 支持运行过程中增删topic；🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。

Stars: ✭ 179 (-98.37%)

Mutual labels: kafka, spark, hbase

Every Single Day I Tldr

A daily digest of the articles or videos I've found interesting, that I want to share with you.

Stars: ✭ 249 (-97.73%)

Mutual labels: kafka, spark, bigdata

Logisland

Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Stars: ✭ 97 (-99.12%)

Mutual labels: kafka, spark, big-data

Kafka Storm Starter

Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

Stars: ✭ 728 (-93.38%)

Mutual labels: kafka, spark, storm

dpkb

大数据相关内容汇总，包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词：Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse

Stars: ✭ 123 (-98.88%)

Mutual labels: hive, hadoop, hbase

docker-hadoop

Docker image for main Apache Hadoop components (Yarn/Hdfs)

Stars: ✭ 59 (-99.46%)

Mutual labels: yarn, hadoop, hdfs

Data Algorithms Book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

Stars: ✭ 949 (-91.37%)

Mutual labels: spark, hadoop, mapreduce

the-apache-ignite-book

All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above

Stars: ✭ 65 (-99.41%)

Mutual labels: hive, hadoop, bigdata

hive to es

同步Hive数据仓库数据到Elasticsearch的小工具

Stars: ✭ 21 (-99.81%)

Mutual labels: hive, hadoop, hdfs

Weblogsanalysissystem

A big data platform for analyzing web access logs

Stars: ✭ 37 (-99.66%)

Mutual labels: spark, hadoop, hbase

hadoopoffice

HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)

Stars: ✭ 56 (-99.49%)

Mutual labels: hive, hadoop, bigdata

Technology Talk

汇总java生态圈常用技术框架、开源中间件，系统架构、数据库、大公司架构案例、常用三方类库、项目管理、线上问题排查、个人成长、思考等知识

Stars: ✭ 12,136 (+10.42%)

Mutual labels: kafka, spark, hbase

Docker Spark Cluster

A Spark cluster setup running on Docker containers

Stars: ✭ 57 (-99.48%)

Mutual labels: spark, big-data, hadoop

BigInsights-on-Apache-Hadoop

Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix

Stars: ✭ 21 (-99.81%)

Mutual labels: hive, hadoop, hbase

ETL-Starter-Kit

📁 Extract, Transform, Load (ETL) 👷 refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.

Stars: ✭ 21 (-99.81%)

Mutual labels: hive, bigdata, azkaban

GooglePlay-Web-Crawler

Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive

Stars: ✭ 18 (-99.84%)

Mutual labels: hive, hadoop, mapreduce

spark-acid

ACID Data Source for Apache Spark based on Hive ACID

Stars: ✭ 91 (-99.17%)

Mutual labels: big-data, spark, hive

DataX-src

DataX 是异构数据广泛使用的离线数据同步工具/平台，实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。

Stars: ✭ 21 (-99.81%)

Mutual labels: hive, hbase, hdfs

phoenix

Apache Phoenix / Hbase Spring Boot Microservices

Stars: ✭ 23 (-99.79%)

Mutual labels: phoenix, hadoop, hbase

hadoop-data-ingestion-tool

OLAP and ETL of Big Data

Stars: ✭ 17 (-99.85%)

Mutual labels: phoenix, big-data, hadoop

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (-98.99%)

Mutual labels: big-data, spark, hadoop

Spline

Data Lineage Tracking And Visualization Solution

Stars: ✭ 306 (-97.22%)

Mutual labels: spark, hadoop, bigdata

hadoop-docker-lite

Docker build project to setup a lightweight hadoop cluster containing hadoop, pig, zookeeper, hbase, phoenix, storm, kafka, kafka manager

Stars: ✭ 24 (-99.78%)

Mutual labels: hadoop, storm, hbase

Wirbelsturm

Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.

Stars: ✭ 332 (-96.98%)

Mutual labels: kafka, spark, storm

Kyuubi

Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark

Stars: ✭ 363 (-96.7%)

Mutual labels: spark, hive, yarn

Bigdl

Building Large-Scale AI Applications for Distributed Big Data

Stars: ✭ 3,813 (-65.31%)

Mutual labels: spark, big-data, hadoop

Hive

Apache Hive

Stars: ✭ 4,031 (-63.32%)

Mutual labels: big-data, hadoop, hive

Spiderman

基于 scrapy-redis 的通用分布式爬虫框架

Stars: ✭ 392 (-96.43%)

Mutual labels: kafka, hive, hbase

Big data architect skills

一个大数据架构师应该掌握的技能

Stars: ✭ 400 (-96.36%)

Mutual labels: spark, hadoop, bigdata

Dataspherestudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Stars: ✭ 1,195 (-89.13%)

Mutual labels: spark, hadoop, hive

Agile data code 2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Stars: ✭ 413 (-96.24%)

Mutual labels: kafka, spark

Enterprise gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.

Stars: ✭ 412 (-96.25%)

Mutual labels: spark, yarn

Listenbrainz Server

Server for the ListenBrainz project

Stars: ✭ 420 (-96.18%)

Mutual labels: spark, big-data

Hops Examples

Examples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops

Stars: ✭ 84 (-99.24%)

Mutual labels: spark, hive

Marmaray

Generic Data Ingestion & Dispersal Library for Hadoop

Stars: ✭ 414 (-96.23%)

Mutual labels: spark, hadoop

Moonbox

Moonbox is a DVtaaS (Data Virtualization as a Service) Platform

Stars: ✭ 424 (-96.14%)

Mutual labels: spark, hive

Cortx

CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.

Stars: ✭ 426 (-96.12%)

Mutual labels: big-data, bigdata

Cookbook

🎉🎉🎉JAVA高级架构师技术栈==任何技能通过 “刻意练习” 都可以达到融会贯通的境界，就像烹饪一样，这里有一份JAVA开发技术手册，只需要增加自己练习的次数。🏃🏃🏃

Stars: ✭ 428 (-96.11%)

Mutual labels: zookeeper, kafka

Circosjs

d3 library to build circular graphs

Stars: ✭ 436 (-96.03%)

Mutual labels: big-data, bigdata

Kafka Study

关于kafka的一些相关使用示例代码。

Stars: ✭ 84 (-99.24%)

Mutual labels: kafka, storm

Yanagishima

Web UI for Trino, Presto, Hive, Elasticsearch, SparkSQL

Stars: ✭ 424 (-96.14%)

Mutual labels: spark, hive

Bigdataie

大数据博客、笔试题、教程、项目、面经的整理

Stars: ✭ 445 (-95.95%)

Mutual labels: spark, bigdata

Java Sourcecode Blogs

Java源码分析【源码笔记】专注于Java后端系列框架的源码分析，每周持续推出Java后端系列框架的源码分析文章。

Stars: ✭ 448 (-95.92%)

Mutual labels: zookeeper, kafka

Bigslice

A serverless cluster computing system for the Go programming language

Stars: ✭ 469 (-95.73%)

Mutual labels: bigdata, mapreduce

Pdf

编程电子书，电子书，编程书籍，包括C，C#，Docker，Elasticsearch，Git，Hadoop，HeadFirst，Java，Javascript，jvm，Kafka，Linux，Maven，MongoDB，MyBatis，MySQL，Netty，Nginx，Python，RabbitMQ，Redis，Scala，Solr，Spark，Spring，SpringBoot，SpringCloud，TCPIP，Tomcat，Zookeeper，人工智能，大数据类，并发编程，数据库类，数据挖掘，新面试题，架构设计，算法系列，计算机类，设计模式，软件测试，重构优化，等更多分类

Stars: ✭ 12,009 (+9.26%)

Mutual labels: spark, hadoop

Magellan

Geo Spatial Data Analytics on Spark

Stars: ✭ 507 (-95.39%)

Mutual labels: spark, big-data

Cdap

An open source framework for building data analytic applications.

Stars: ✭ 509 (-95.37%)

Mutual labels: spark, mapreduce

Books Recommendation

程序员进阶书籍（视频），持续更新（Programmer Books）

Stars: ✭ 558 (-94.92%)

Mutual labels: zookeeper, kafka

61-120 of 2100 similar projects

‹

›

next*5