All Projects → bigdata-doc → Similar Projects or Alternatives

571 Open source projects that are alternatives of or similar to bigdata-doc

懒松鼠Flink-Boot 脚手架让Flink全面拥抱Spring生态体系，使得开发者可以以Java WEB开发模式开发出分布式运行的流处理程序，懒松鼠让跨界变得更加简单。懒松鼠旨在让开发者以更底上手成本（不需要理解分布式计算的理论知识和Flink框架的细节）便可以快速编写业务代码实现。为了进一步提升开发者使用懒松鼠脚手架开发大型项目的敏捷的度，该脚手架默认集成Spring框架进行Bean管理，同时将微服务以及WEB开发领域中经常用到的框架集成进来，进一步提升开发速度。比如集成Mybatis ORM框架，Hibernate Validator校验框架,Spring Retry重试框架等，具体见下面的脚手架特性。

Stars: ✭ 209 (+464.86%)

Mutual labels: bigdata, flink

HDFS-Netdisc

基于Hadoop的分布式云存储系统 🌴

Stars: ✭ 56 (+51.35%)

Mutual labels: hadoop, hdfs

Flinkx

Based on Apache Flink. support data synchronization/integration and streaming SQL computation.

Stars: ✭ 2,651 (+7064.86%)

Mutual labels: bigdata, flink

learning-spark

Tidy up Spark and Hadoop tutorials.

Stars: ✭ 28 (-24.32%)

Mutual labels: hadoop, bigdata

Ecommercerecommendsystem

商品大数据实时推荐系统。前端：Vue + TypeScript + ElementUI，后端 Spring + Spark

Stars: ✭ 139 (+275.68%)

Mutual labels: bigdata, flink

ros hadoop

Hadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.

Stars: ✭ 92 (+148.65%)

Mutual labels: hadoop, hdfs

flokkr

Documentation placeholder and utilities for all the other containers.

Stars: ✭ 30 (-18.92%)

Mutual labels: hadoop, bigdata

Hdfs Shell

HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS

Stars: ✭ 117 (+216.22%)

Mutual labels: hadoop, hdfs

Dynamometer

A tool for scale and performance testing of HDFS with a specific focus on the NameNode.

Stars: ✭ 122 (+229.73%)

Mutual labels: hadoop, hdfs

Big Whale

Spark、Flink等离线任务的调度以及实时任务的监控

Stars: ✭ 163 (+340.54%)

Mutual labels: hadoop, flink

Hive

Apache Hive

Stars: ✭ 4,031 (+10794.59%)

Mutual labels: hive, hadoop

Awesome Learning

实践源码库：https://github.com/jast90/bigdata 。微信搜索Jast关注公众号，获取最新技术分享😯。

Stars: ✭ 197 (+432.43%)

Mutual labels: hadoop, bigdata

Shifu

An end-to-end machine learning and data mining framework on Hadoop

Stars: ✭ 207 (+459.46%)

Mutual labels: hadoop, bigdata

Hive Jdbc Uber Jar

Hive JDBC "uber" or "standalone" jar based on the latest Apache Hive version

Stars: ✭ 188 (+408.11%)

Mutual labels: hive, hadoop

Hops Examples

Examples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops

Stars: ✭ 84 (+127.03%)

Mutual labels: hive, flink

kafka-connect-fs

Kafka Connect FileSystem Connector

Stars: ✭ 107 (+189.19%)

Mutual labels: hadoop, hdfs

Trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Stars: ✭ 4,581 (+12281.08%)

Mutual labels: hive, hadoop

Hive Funnel Udf

Hive UDFs for funnel analysis

Stars: ✭ 72 (+94.59%)

Mutual labels: hive, hadoop

Big data architect skills

一个大数据架构师应该掌握的技能

Stars: ✭ 400 (+981.08%)

Mutual labels: hadoop, bigdata

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+59489.19%)

Mutual labels: hadoop, mapreduce

Behemoth

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

Stars: ✭ 286 (+672.97%)

Mutual labels: hadoop, mapreduce

Jsr203 Hadoop

A Java NIO file system provider for HDFS

Stars: ✭ 35 (-5.41%)

Mutual labels: hadoop, hdfs

Data Algorithms Book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

Stars: ✭ 949 (+2464.86%)

Mutual labels: hadoop, mapreduce

logparser

Easy parsing of Apache HTTPD and NGINX access logs with Java, Hadoop, Hive, Pig, Flink, Beam, Storm, Drill, ...

Stars: ✭ 139 (+275.68%)

Mutual labels: hive, flink

Drill

Apache Drill is a distributed MPP query layer for self describing data

Stars: ✭ 1,619 (+4275.68%)

Mutual labels: hive, hadoop

BigData-News

基于Spark2.2新闻网大数据实时系统项目

Stars: ✭ 36 (-2.7%)

Mutual labels: hive, hadoop

Flink Notes

flink学习笔记

Stars: ✭ 106 (+186.49%)

Mutual labels: bigdata, flink

Waterdrop

Production Ready Data Integration Product, documentation：

Stars: ✭ 1,856 (+4916.22%)

Mutual labels: hadoop, flink

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (+481.08%)

Mutual labels: hadoop, bigdata

hive-bigquery-storage-handler

Hive Storage Handler for interoperability between BigQuery and Apache Hive

Stars: ✭ 16 (-56.76%)

Mutual labels: hive, hadoop

Addax

Addax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.

Stars: ✭ 615 (+1562.16%)

Mutual labels: hive, hadoop

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (+305.41%)

Mutual labels: hadoop, hdfs

Ibis

A pandas-like deferred expression system, with first-class SQL support

Stars: ✭ 1,630 (+4305.41%)

Mutual labels: hadoop, hdfs

Facebook Hive Udfs

Facebook's Hive UDFs

Stars: ✭ 213 (+475.68%)

Mutual labels: hive, hadoop

Javaorbigdata Interview

Java开发者或者大数据开发者面试知识点整理

Stars: ✭ 203 (+448.65%)

Mutual labels: hadoop, bigdata

Wedatasphere

WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!

Stars: ✭ 372 (+905.41%)

Mutual labels: hive, hadoop

Asakusafw

Asakusa Framework

Stars: ✭ 114 (+208.11%)

Mutual labels: hadoop, mapreduce

Datafaker

Datafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具

Stars: ✭ 327 (+783.78%)

Mutual labels: hive, bigdata

smart-data-lake

Smart Automation Tool for building modern Data Lakes and Data Pipelines

Stars: ✭ 79 (+113.51%)

Mutual labels: hive, hadoop

Hadoop Attack Library

A collection of pentest tools and resources targeting Hadoop environments

Stars: ✭ 228 (+516.22%)

Mutual labels: hadoop, bigdata

hive-jdbc-driver

An alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC

Stars: ✭ 31 (-16.22%)

Mutual labels: hive, hadoop

Bigdata practice

大数据分析可视化实践

Stars: ✭ 166 (+348.65%)

Mutual labels: hive, bigdata

Presto

The official home of the Presto distributed SQL query engine for big data

Stars: ✭ 12,957 (+34918.92%)

Mutual labels: hive, hadoop

swordfish

Open-source distribute workflow schedule tools, also support streaming task.

Stars: ✭ 35 (-5.41%)

Mutual labels: hive, hadoop

Movie recommend

基于Spark的电影推荐系统，包含爬虫项目、web网站、后台管理系统以及spark推荐系统

Stars: ✭ 2,092 (+5554.05%)

Mutual labels: hive, hadoop

liquibase-impala

Liquibase extension to add Impala Database support

Stars: ✭ 23 (-37.84%)

Mutual labels: hive, hadoop

hadoop-etl-udfs

The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL

Stars: ✭ 17 (-54.05%)

Mutual labels: hive, hadoop

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-35.14%)

Mutual labels: hive, hadoop

TIL

Today I Learned

Stars: ✭ 43 (+16.22%)

Mutual labels: hive, hadoop

cobra-policytool

Manage Apache Atlas and Ranger configuration for your Hadoop environment.

Stars: ✭ 16 (-56.76%)

Mutual labels: hive, hadoop

litemall-dw

基于开源Litemall电商项目的大数据项目，包含前端埋点(openresty+lua)、后端埋点；数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化)，同时也包含了Azkaban的workflow。

Stars: ✭ 36 (-2.7%)

Mutual labels: hive, flink

xxhadoop

Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !

Stars: ✭ 37 (+0%)

Mutual labels: hive, hadoop

EngineeringTeam

와이빅타 엔지니어링팀의 자료를 정리해두는 곳입니다.

Stars: ✭ 41 (+10.81%)

Mutual labels: hive, hadoop

cloud

云计算之hadoop、hive、hue、oozie、sqoop、hbase、zookeeper环境搭建及配置文件

Stars: ✭ 48 (+29.73%)

Mutual labels: hive, hadoop

TitanDataOperationSystem

最好的大数据项目。《Titan数据运营系统》，本项目是一个全栈闭环系统，我们有用作数据可视化的web系统，然后用flume-kafaka-flume进行日志的读取，在hive设计数仓，编写spark代码进行数仓表之间的转化以及ads层表到mysql的迁移，使用azkaban进行定时任务的调度，使用技术：Java/Scala语言，Hadoop、Spark、Hive、Kafka、Flume、Azkaban、SpringBoot，Bootstrap， Echart等；

Stars: ✭ 62 (+67.57%)

Mutual labels: hive, hadoop

Bigdata File Viewer

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

Stars: ✭ 86 (+132.43%)

Mutual labels: bigdata, hdfs

BigInsights-on-Apache-Hadoop

Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix

Stars: ✭ 21 (-43.24%)

Mutual labels: hive, hadoop

ETL-Starter-Kit

📁 Extract, Transform, Load (ETL) 👷 refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.

Stars: ✭ 21 (-43.24%)

Mutual labels: hive, bigdata

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (+278.38%)

Mutual labels: hive, hadoop

Quicksql

A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources

Stars: ✭ 1,821 (+4821.62%)

Mutual labels: hive, flink

61-120 of 571 similar projects

‹

›

next*5