Top 231 hadoop open source projects

Devops Bash Tools
550+ DevOps Bash Scripts - AWS, GCP, Kubernetes, Kafka, Docker, APIs, Hadoop, SQL, PostgreSQL, MySQL, Hive, Impala, Travis CI, Jenkins, Concourse, GitHub, GitLab, BitBucket, Azure DevOps, TeamCity, Spotify, MP3, LDAP, Code/Build Linting, pkg mgmt for Linux, Mac, Python, Perl, Ruby, NodeJS, Golang, Advanced dotfiles: .bashrc, .vimrc, .gitconfig, .screenrc, .tmux.conf, .psqlrc ...
Hadoop Attack Library
A collection of pentest tools and resources targeting Hadoop environments
Luigi
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Hadoop Connectors
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
Sparkrdma
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Facebook Hive Udfs
Facebook's Hive UDFs
Shifu
An end-to-end machine learning and data mining framework on Hadoop
Javaorbigdata Interview
Java开发者或者大数据开发者面试知识点整理
Recommendsys
推荐项目(实时推荐和离线推荐)
Awesome Learning
实践源码库:https://github.com/jast90/bigdata 。 微信搜索Jast关注公众号,获取最新技术分享😯。
Nutch
Apache Nutch is an extensible and scalable web crawler
Hive Jdbc Uber Jar
Hive JDBC "uber" or "standalone" jar based on the latest Apache Hive version
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Deeplearning4j
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Big Whale
Spark、Flink等离线任务的调度以及实时任务的监控
Presto
The official home of the Presto distributed SQL query engine for big data
Hadoop Common
Mirror of Apache Hadoop common
✭ 155
javahadoop
Movie recommend
基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Hadoop Hdfs
Mirror of Apache Hadoop HDFS
✭ 152
javahadoop
Parquet Rs
Apache Parquet implementation in Rust
Eel Sdk
Big Data Toolkit for the JVM
Hbaseclient
HBase客户端数据管理软件
Aliyun Emapreduce Datasources
Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.
Calcite Avatica
Mirror of Apache Calcite - Avatica
Gaffer
A large-scale entity and relation database supporting aggregation of properties
Airflow Pipeline
An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Spydra
Ephemeral Hadoop clusters using Google Compute Platform
Hadoopcryptoledger
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Parquet4s
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Dynamometer
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Hdfs Shell
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Ibis
A pandas-like deferred expression system, with first-class SQL support
Datax
DataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Parquet Go
Go package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.
Xlearning Xdml
extremely distributed machine learning
Avro Hadoop Starter
Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Introtohadoopandmr udacity course
🐘 Source code for assignments of Udacity course "Introduction to Hadoop and MapReduce"
Waterdrop
Production Ready Data Integration Product, documentation:
Haproxy Configs
80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.
Antsdb
AntsDB is a low latency, high concurrency, MySQL compliant SQL layer for HBase
Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Wifi
基于wifi抓取信息的大数据查询分析系统
Hadoop Yarn Api Python Client
Python client for Hadoop® YARN API
Hadoop Mapreduce
Mirror of Apache Hadoop MapReduce
✭ 88
javahadoop
Hadoop cookbook
Cookbook to install Hadoop 2.0+ using Chef
Docker Hadoop Cluster
Multiple node cluster on Docker for self development.
Camus
Mirror of Linkedin's Camus
Docker Spark
🚢 Docker image for Apache Spark
Chukwa
Mirror of Apache Chukwa
1-60 of 231 hadoop projects