All Projects → colinmarc → Hdfs

colinmarc / Hdfs

Licence: mit
A native go client for HDFS

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to Hdfs

Pxltrm
🖌️ pxltrm - [WIP] A pixel art editor inside the terminal
Stars: ✭ 459 (-53.73%)
Mutual labels:  commandline
Hadoop For Geoevent
ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-99.5%)
Mutual labels:  hdfs
Socli
Stack overflow command line client. Search and browse stack overflow without leaving the terminal 💻
Stars: ✭ 911 (-8.17%)
Mutual labels:  commandline
Tio
tio - A simple TTY terminal I/O application
Stars: ✭ 489 (-50.71%)
Mutual labels:  commandline
Clipp
easy to use, powerful & expressive command line argument parsing for modern C++ / single header / usage & doc generation
Stars: ✭ 687 (-30.75%)
Mutual labels:  commandline
Yandex Big Data Engineering
Stars: ✭ 17 (-98.29%)
Mutual labels:  hdfs
Cobra
A Commander for modern Go CLI interactions
Stars: ✭ 24,437 (+2363.41%)
Mutual labels:  commandline
Jsr203 Hadoop
A Java NIO file system provider for HDFS
Stars: ✭ 35 (-96.47%)
Mutual labels:  hdfs
Launchy
A helper for launching cross-platform applications in a fire and forget manner.
Stars: ✭ 704 (-29.03%)
Mutual labels:  commandline
Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (-13.61%)
Mutual labels:  hdfs
Sparta
Real Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (-48.29%)
Mutual labels:  hdfs
Fontpreview
Highly customizable and minimal font previewer written in bash
Stars: ✭ 661 (-33.37%)
Mutual labels:  commandline
Cluster Pack
A library on top of either pex or conda-pack to make your Python code easily available on a cluster
Stars: ✭ 23 (-97.68%)
Mutual labels:  hdfs
Bigdata
💎🔥大数据学习笔记
Stars: ✭ 488 (-50.81%)
Mutual labels:  hdfs
Pucket
Bucketing and partitioning system for Parquet
Stars: ✭ 29 (-97.08%)
Mutual labels:  hdfs
Ls
ls on steroids
Stars: ✭ 458 (-53.83%)
Mutual labels:  commandline
Snakebite
A pure python HDFS client
Stars: ✭ 828 (-16.53%)
Mutual labels:  hdfs
Learning Spark
零基础学习spark,大数据学习
Stars: ✭ 37 (-96.27%)
Mutual labels:  hdfs
Gitviper
Enhanced git experience using the command line
Stars: ✭ 35 (-96.47%)
Mutual labels:  commandline
Ps Webapi
(Migrated from CodePlex) Let PowerShell Script serve or command-line process as WebAPI. PSWebApi is a simple library for building ASP.NET Web APIs (RESTful Services) by PowerShell Scripts or batch/executable files out of the box.
Stars: ✭ 24 (-97.58%)
Mutual labels:  commandline

HDFS for Go

GoDoc build

This is a native golang client for hdfs. It connects directly to the namenode using the protocol buffers API.

It tries to be idiomatic by aping the stdlib os package, where possible, and implements the interfaces from it, including os.FileInfo and os.PathError.

Here's what it looks like in action:

client, _ := hdfs.New("namenode:8020")

file, _ := client.Open("/mobydick.txt")

buf := make([]byte, 59)
file.ReadAt(buf, 48847)

fmt.Println(string(buf))
// => Abominable are the tumblers into which he pours his poison.

For complete documentation, check out the Godoc.

The hdfs Binary

Along with the library, this repo contains a commandline client for HDFS. Like the library, its primary aim is to be idiomatic, by enabling your favorite unix verbs:

$ hdfs --help
Usage: hdfs COMMAND
The flags available are a subset of the POSIX ones, but should behave similarly.

Valid commands:
  ls [-lah] [FILE]...
  rm [-rf] FILE...
  mv [-fT] SOURCE... DEST
  mkdir [-p] FILE...
  touch [-amc] FILE...
  chmod [-R] OCTAL-MODE FILE...
  chown [-R] OWNER[:GROUP] FILE...
  cat SOURCE...
  head [-n LINES | -c BYTES] SOURCE...
  tail [-n LINES | -c BYTES] SOURCE...
  du [-sh] FILE...
  checksum FILE...
  get SOURCE [DEST]
  getmerge SOURCE DEST
  put SOURCE DEST

Since it doesn't have to wait for the JVM to start up, it's also a lot faster hadoop -fs:

$ time hadoop fs -ls / > /dev/null

real  0m2.218s
user  0m2.500s
sys 0m0.376s

$ time hdfs ls / > /dev/null

real  0m0.015s
user  0m0.004s
sys 0m0.004s

Best of all, it comes with bash tab completion for paths!

Installing the commandline client

Grab a tarball from the releases page and unzip it wherever you like.

To configure the client, make sure one or both of these environment variables point to your Hadoop configuration (core-site.xml and hdfs-site.xml). On systems with Hadoop installed, they should already be set.

$ export HADOOP_HOME="/etc/hadoop"
$ export HADOOP_CONF_DIR="/etc/hadoop/conf"

To install tab completion globally on linux, copy or link the bash_completion file which comes with the tarball into the right place:

$ ln -sT bash_completion /etc/bash_completion.d/gohdfs

By default on non-kerberized clusters, the HDFS user is set to the currently-logged-in user. You can override this with another environment variable:

$ export HADOOP_USER_NAME=username

Using the commandline client with Kerberos authentication

Like hadoop fs, the commandline client expects a ccache file in the default location: /tmp/krb5cc_<uid>. That means it should 'just work' to use kinit:

$ kinit [email protected]
$ hdfs ls /

If that doesn't work, try setting the KRB5CCNAME environment variable to wherever you have the ccache saved.

Compatibility

This library uses "Version 9" of the HDFS protocol, which means it should work with hadoop distributions based on 2.2.x and above. The tests run against CDH 5.x and HDP 2.x.

Acknowledgements

This library is heavily indebted to snakebite.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].