All Projects → avast → Hdfs Shell

avast / Hdfs Shell

Licence: apache-2.0
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS

Programming Languages

java
68154 projects - #9 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to Hdfs Shell

Hadoop For Geoevent
ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-95.73%)
Mutual labels:  big-data, hadoop, hdfs
bigdata-fun
A complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-88.03%)
Mutual labels:  big-data, hadoop, hdfs
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+28.21%)
Mutual labels:  big-data, hadoop, hdfs
Kafka Connect Hdfs
Kafka Connect HDFS connector
Stars: ✭ 400 (+241.88%)
Mutual labels:  big-data, hadoop, hdfs
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-88.89%)
Mutual labels:  big-data, hadoop, hdfs
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+9294.02%)
Mutual labels:  big-data, hadoop, hdfs
H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+4734.19%)
Mutual labels:  big-data, hadoop
Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+632.48%)
Mutual labels:  hadoop, hdfs
Ibis
A pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (+1293.16%)
Mutual labels:  hadoop, hdfs
Moosefs
MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+776.07%)
Mutual labels:  big-data, hadoop
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+247.01%)
Mutual labels:  hadoop, hdfs
Jsr203 Hadoop
A Java NIO file system provider for HDFS
Stars: ✭ 35 (-70.09%)
Mutual labels:  hadoop, hdfs
Docker Spark Cluster
A Spark cluster setup running on Docker containers
Stars: ✭ 57 (-51.28%)
Mutual labels:  big-data, hadoop
Bigdata
💎🔥大数据学习笔记
Stars: ✭ 488 (+317.09%)
Mutual labels:  hadoop, hdfs
Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+18744.44%)
Mutual labels:  big-data, hadoop
God Of Bigdata
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+5035.04%)
Mutual labels:  hadoop, hdfs
Learning Spark
零基础学习spark,大数据学习
Stars: ✭ 37 (-68.38%)
Mutual labels:  hadoop, hdfs
Wifi
基于wifi抓取信息的大数据查询分析系统
Stars: ✭ 93 (-20.51%)
Mutual labels:  hadoop, hdfs
Camus
Mirror of Linkedin's Camus
Stars: ✭ 81 (-30.77%)
Mutual labels:  hadoop, hdfs
Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-21.37%)
Mutual labels:  hadoop, hdfs

HDFS Shell UI (CLI tool)

HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS

Image of HDFS-Shell

Build Status - Master Linux Windows Apache 2

Purpose

There are 3 possible usecases:

  • Running user interactive UI shell, inserting command by user
  • Launching Shell with specific HDFS command
  • Running in daemon mode - communication using UNIX domain sockets

Why such UI shell?

Advantages UI against direct calling hdfs dfs function:

  • HDFS DFS initiates JVM for each command call, HDFS Shell does it only once - which means great speed enhancement when you need to work with HDFS more often
  • Commands can be used in short way - eg. hdfs dfs -ls /, ls / - both will work
  • HDFS path completion using TAB key
  • you can easily add any other HDFS manipulation function
  • there is a command history persisting in history log (~/.hdfs-shell/hdfs-shell.log)
  • support for relative directory + commands cd and pwd
  • advanced commands like su, groups, whoami
  • customizable shell prompt

Disadvantages UI against direct calling hdfs dfs function:

  • commands cannot be piped, eg: calling ls /analytics | less is not possible at this time, you have to use HDFS Shell in Daemon mode

Using HDFS Shell UI

Launching HDFS Shell UI

Requirements:

  • JDK 1.8
  • It's working on both Windows/Linux Hadoop 2.6.0

Download

Configuring launch script(s) for your environment

HDFS-Shell is a standard Java application. For its launch you need to define 2 things on your classpath:

  1. All ./lib/*.jar on classpath (the dependencies ./lib are included in the binary bundle or they are located in Gradle build/distributions/*.zip)
  2. Path to directory with your Hadoop Cluster config files (hdfs-site.xml, core-site.xml etc.) - without these files the HDFS Shell will work in local filesystem mode
  • on Linux it's usually located in /etc/hadoop/conf folder
  • on Windows it's usually located in %HADOOP_HOME%\etc\hadoop\ folder

Note that paths inside java -cp switch are separated by : on Linux and ; on Windows.

Pre-defined launch scripts are located in the zip file. You can modify it locally as needed.

  • for CLI UI run hdfs-shell.sh (without parameters) otherwise:
  • HDFS Shell can be launched directly with the command to execute - after completion, hdfs-shell will exit
  • launch HDFS with hdfs-shell.sh script <file_path> to execute commands from file
  • launch HDFS with hdfs-shell.sh xscript <file_path> to execute commands from file but ignore command errors (skip errors)

Possible commands inside shell

  • type help to get list of all supported commands
  • clear or cls to clear screen
  • exit or quit or just q to exit the shell
  • for calling system command type ! <command> , eg. ! echo hello will call the system command echo
  • type (hdfs) command only without any parameters to get its parameter description, eg. ls only
  • script <file_path> to execute commands from file
  • xscript <file_path> to execute commands from file but ignore command errors (skip errors)
Additional commands

For our purposes we also integrated following commands:

  • set showResultCodeON and set showResultCodeOFF - if it's enabled, it will write command result code after its completion
  • cd, pwd
  • su <username> - experimental - changes current active user - it won't probably work on secured HDFS (KERBEROS)
  • whoami - prints effective username
  • groups <username1 <username2,...>> - eg.groups hdfs prints groups for given users, same as hdfs groups my_user my_user2 functionality
  • edit 'my file' - see the config below
Edit Command

Since the version 1.0.4 the simple command 'edit' is available. The command gets selected file from HDFS to the local temporary directory and launches the editor. Once the editor saves the file (with a result code 0), the file is uploaded back into HDFS (target file is overwritten). By default the editor path is taken from $EDITOR environment variable. If $EDITOR is not set, vim (Linux, Mac) or notepad.exe (Windows) is used.

How to change command (shell) prompt

HDFS Shell supports customized bash-like prompt setting! I implemented support for these switches listed in this table (include colors!, exclude \!, \#). You can also use this online prompt generator to create prompt value of your wish. To setup your favorite prompt simply add export HDFS_SHELL_PROMPT="value" to your .bashrc (or set env variable on Windows) and that's it. Restart HDFS Shell to apply change. Default value is currently set to \e[36m\[email protected]\h \e[0;39m\e[33m\w\e[0;39m\e[36m\\$ \e[37;0;39m.

Running Daemon mode

Image of HDFS-Shell

  • run hdfs-shell-daemon.sh
  • then communicate with this daemon using UNIX domain sockets - eg. echo ls / | nc -U /var/tmp/hdfs-shell.sock

Project programming info

The project is using Gradle 3.x to build. By default it's using Hadoop 2.6.0, but it also has been succesfully tested with version 2.7.x. It's based on Spring Shell (includes JLine component). Using Spring Shell mechanism you can easily add your own commands into HDFS Shell. (see com.avast.server.hdfsshell.commands.ContextCommands or com.avast.server.hdfsshell.commands.HadoopDfsCommands for more details)

All suggestions and merge requests are welcome.

Other tech info:

For developing, add to JVM args in your IDE launch config dialog: -Djline.WindowsTerminal.directConsole=false -Djline.terminal=jline.UnsupportedTerminal

Known limitations & problems

  • There is a problem with a parsing of commands containing a file or directory including a space - eg. it's not possible to create directory My dir using command mkdir "My dir" . This should be probably resolved with an upgrade to Spring Shell 2.
  • It's not possible to remove root directory (rm -R dir) from root (/) directory. You have to use absolut path instead (rm -R /dir). It's caused by bug in Hadoop. See HADOOP-15233 for more details. Removing directory from another cwd is not affected.

Contact

Author&Maintainer: Ladislav Vitasek - vitasek/@/avast.com

Help Us

  • If you like using HDFS Shell, please spread the word - eg. write a blog post about it.
  • Do you like using it? Tell us!

Companies using HDFS Shell (we know about)

  • Avast
  • Komercni banka
  • Ataccama Software
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].