All Projects → EDS-APHP → py-hdfs-mount

EDS-APHP / py-hdfs-mount

Licence: other
Mount HDFS with fuse, works with kerberos!

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to py-hdfs-mount

teraslice
Scalable data processing pipelines in JavaScript
Stars: ✭ 48 (+269.23%)
Mutual labels:  hadoop, hdfs
hive to es
同步Hive数据仓库数据到Elasticsearch的小工具
Stars: ✭ 21 (+61.54%)
Mutual labels:  hadoop, hdfs
bigdata-doc
大数据学习笔记,学习路线,技术案例整理。
Stars: ✭ 37 (+184.62%)
Mutual labels:  hadoop, hdfs
docker-hadoop
Docker image for main Apache Hadoop components (Yarn/Hdfs)
Stars: ✭ 59 (+353.85%)
Mutual labels:  hadoop, hdfs
wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (+46.15%)
Mutual labels:  hadoop, hdfs
kafka-connect-fs
Kafka Connect FileSystem Connector
Stars: ✭ 107 (+723.08%)
Mutual labels:  hadoop, hdfs
fsbrowser
Fast desktop client for Hadoop Distributed File System
Stars: ✭ 27 (+107.69%)
Mutual labels:  hadoop, hdfs
Bigdata docker
Big Data Ecosystem Docker
Stars: ✭ 161 (+1138.46%)
Mutual labels:  hadoop, hdfs
ros hadoop
Hadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.
Stars: ✭ 92 (+607.69%)
Mutual labels:  hadoop, hdfs
fuse-nfs-crossbuild-scripts
fuse-nfs for windows using dokany
Stars: ✭ 35 (+169.23%)
Mutual labels:  fuse, mount
Seaweedfs
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
Stars: ✭ 13,380 (+102823.08%)
Mutual labels:  fuse, hdfs
aaocp
一个对用户行为日志进行分析的大数据项目
Stars: ✭ 53 (+307.69%)
Mutual labels:  hadoop, hdfs
Moosefs
MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+7784.62%)
Mutual labels:  fuse, hadoop
ipfs-api-mount
Mount IPFS directory as local FS.
Stars: ✭ 16 (+23.08%)
Mutual labels:  fuse, mount
fbind
A versatile Android mounting utility for folders, EXT4 images, LUKS/LUKS2 encrypted volumes, regular partitions and more.
Stars: ✭ 42 (+223.08%)
Mutual labels:  fuse, mount
HDFS-Netdisc
基于Hadoop的分布式云存储系统 🌴
Stars: ✭ 56 (+330.77%)
Mutual labels:  hadoop, hdfs
Dynamometer
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Stars: ✭ 122 (+838.46%)
Mutual labels:  hadoop, hdfs
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+1053.85%)
Mutual labels:  hadoop, hdfs
skein
A tool and library for easily deploying applications on Apache YARN
Stars: ✭ 128 (+884.62%)
Mutual labels:  hadoop, hdfs
datasqueeze
Hadoop utility to compact small files
Stars: ✭ 18 (+38.46%)
Mutual labels:  hadoop, hdfs

Requirements

Python 3

Install

sudo apt-get install fuse libfuse2
pip3 install -r requirements.txt

If you will be using kerberos, install libkrb5-dev:

sudo apt-get install libkrb5-dev

Configuration

cp example.config.yaml config.yaml
$EDITOR config.yaml

Running

If you are using kerberos, run a kinit:

kinit -kt $USER $USER@REALM

In all cases you then will have to create a new empty directory that with be the mount point:

mkdir /mnt/dest_mount

And finaly you can run py-hdfs-fuse:

python3 hdfs_mount.py [--loglevel LEVEL] config.yaml

Have fun!

Note: if anything goes wrong and you have to kill py-hdfs-mount, you will probably have to run this command on the mounted folder to unlock it:

fusermount -u /mnt/dest_mount
umount -l /mnt/dest_mount

Tested with

  • Vim (open file, edit randomly, save and close)
  • cp/mv

Functionnalities

  • Cached writes (HDFS is an immutable FS (so writes=delete+insert))
  • Random writes (slow - because of the immutability of HDFS - but working!)
  • Very fast ls (cached directory metadata)
  • directory stored as a zip file in HDFS (to solve small files problem)
  • directory stored as a avro file in HDFS (to solve small files problem)
  • CRC32 checksum
  • Load options from configuration file

Implemented FUSE methods

Basic

  • access
  • chmod
  • chown
  • getattr
  • readdir
  • readlink
  • mknod
  • rmdir
  • mkdir
  • statfs
  • unlink
  • symlink
  • rename
  • link
  • utimens

File methods

  • open
  • create
  • read
  • write (caching is done in memory)
  • truncate
  • flush (writes the in memory written chunks to a temporary file in the local FS in the right order and calls fsync)
  • fsync (send the temporary file to HDFS)
  • release
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].