Top 67 hdfs open source projects

Storagetapper
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Hdfs
API and command line interface for HDFS
Smart open
Utils for streaming large files (S3, HDFS, gzip, bz2...)
Dcos Commons
DC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.
Seaweedfs
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
Wradlib
weather radar data processing - python package
Hsuntzu
HDFS compress tar zip snappy gzip uncompress untar codec hadoop spark
Elasticctr
ElasticCTR,即飞桨弹性计算推荐系统,是基于Kubernetes的企业级推荐系统开源解决方案。该方案融合了百度业务场景下持续打磨的高精度CTR模型、飞桨开源框架的大规模分布式训练能力、工业级稀疏参数弹性调度服务,帮助用户在Kubernetes环境中一键完成推荐系统部署,具备高性能、工业级部署、端到端体验的特点,并且作为开源套件,满足二次深度开发的需求。
Dynamometer
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Hdfs Shell
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Ibis
A pandas-like deferred expression system, with first-class SQL support
Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Wifi
基于wifi抓取信息的大数据查询分析系统
Bigdata File Viewer
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Camus
Mirror of Linkedin's Camus
Tiledb Py
Python interface to the TileDB storage manager
Cloud Note
基于分布式的云笔记(参考某道云笔记),数据存储在redis与hbase中
Big Data Engineering Coursera Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Rumble
⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Hdfs
A native go client for HDFS
Learning Spark
零基础学习spark,大数据学习
Jsr203 Hadoop
A Java NIO file system provider for HDFS
Pucket
Bucketing and partitioning system for Parquet
Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Cluster Pack
A library on top of either pex or conda-pack to make your Python code easily available on a cluster
Snakebite
A pure python HDFS client
✭ 828
pythonhdfs
Hadoop For Geoevent
ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Sparta
Real Time Analytics and Data Pipelines based on Spark Streaming
God Of Bigdata
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Juicefs
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
bigdata-fun
A complete (distributed) BigData stack, running in containers
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
fluent-plugin-webhdfs
Hadoop WebHDFS output plugin for Fluentd
fastdata-cluster
Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
py-hdfs-mount
Mount HDFS with fuse, works with kerberos!
ros hadoop
Hadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.
fsbrowser
Fast desktop client for Hadoop Distributed File System
aaocp
一个对用户行为日志进行分析的大数据项目
wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
skein
A tool and library for easily deploying applications on Apache YARN
hbase-meta-repair
Repair hbase metadata table from hdfs.
starlake
Starlake is a Spark Based On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing
ucz-dfs
A distributed file system written in Rust.
local-hashicorp-stack
Local Hashicorp Stack for DevOps Development without Hypervisor or Cloud
1-60 of 67 hdfs projects