Top 225 hadoop open source projects

Devops Bash Tools
550+ DevOps Bash Scripts - AWS, GCP, Kubernetes, Kafka, Docker, APIs, Hadoop, SQL, PostgreSQL, MySQL, Hive, Impala, Travis CI, Jenkins, Concourse, GitHub, GitLab, BitBucket, Azure DevOps, TeamCity, Spotify, MP3, LDAP, Code/Build Linting, pkg mgmt for Linux, Mac, Python, Perl, Ruby, NodeJS, Golang, Advanced dotfiles: .bashrc, .vimrc, .gitconfig, .screenrc, .tmux.conf, .psqlrc ...
Hadoop Attack Library
A collection of pentest tools and resources targeting Hadoop environments
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Hadoop Connectors
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Facebook Hive Udfs
Facebook's Hive UDFs
An end-to-end machine learning and data mining framework on Hadoop
Javaorbigdata Interview
Awesome Learning
实践源码库: 。 微信搜索Jast关注公众号,获取最新技术分享😯。
Apache Nutch is an extensible and scalable web crawler
Hive Jdbc Uber Jar
Hive JDBC "uber" or "standalone" jar based on the latest Apache Hive version
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Big Whale
The official home of the Presto distributed SQL query engine for big data
Hadoop Common
Mirror of Apache Hadoop common
✭ 155
Movie recommend
Hadoop Hdfs
Mirror of Apache Hadoop HDFS
✭ 152
Parquet Rs
Apache Parquet implementation in Rust
Eel Sdk
Big Data Toolkit for the JVM
Aliyun Emapreduce Datasources
Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.
Calcite Avatica
Mirror of Apache Calcite - Avatica
A large-scale entity and relation database supporting aggregation of properties
Airflow Pipeline
An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Ephemeral Hadoop clusters using Google Compute Platform
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Hdfs Shell
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
A pandas-like deferred expression system, with first-class SQL support
DataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Parquet Go
Go package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.
Xlearning Xdml
extremely distributed machine learning
Avro Hadoop Starter
Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Introtohadoopandmr udacity course
🐘 Source code for assignments of Udacity course "Introduction to Hadoop and MapReduce"
Production Ready Data Integration Product, documentation:
Haproxy Configs
80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.
AntsDB is a low latency, high concurrency, MySQL compliant SQL layer for HBase
Hadoop Yarn Api Python Client
Python client for Hadoop® YARN API
Hadoop Mapreduce
Mirror of Apache Hadoop MapReduce
✭ 88
Hadoop cookbook
Cookbook to install Hadoop 2.0+ using Chef
Docker Hadoop Cluster
Multiple node cluster on Docker for self development.
Mirror of Linkedin's Camus
Docker Spark
🚢 Docker image for Apache Spark
Mirror of Apache Chukwa
1-60 of 225 hadoop projects