All Projects → whitfin → s3-utils

whitfin / s3-utils

Licence: MIT license
Utilities and tools based around Amazon S3 to provide convenience APIs in a CLI

Programming Languages

rust
11053 projects

Projects that are alternatives of or similar to s3-utils

s3-concat
Concatenate Amazon S3 files remotely using flexible patterns
Stars: ✭ 32 (-28.89%)
Mutual labels:  aws-s3, text-processing
Emotion-recognition-from-tweets
A comprehensive approach on recognizing emotion (sentiment) from a certain tweet. Supervised machine learning.
Stars: ✭ 17 (-62.22%)
Mutual labels:  text-processing
simple-file-uploader
A file uploader written using HTML5 and Node.js. It can upload both to a local directory on the server or to an AWS S3 server.
Stars: ✭ 85 (+88.89%)
Mutual labels:  aws-s3
Go-Clean-Architecture-REST-API
Golang Clean Architecture REST API example
Stars: ✭ 376 (+735.56%)
Mutual labels:  aws-s3
corpusexplorer2.0
Korpuslinguistik war noch nie so einfach...
Stars: ✭ 16 (-64.44%)
Mutual labels:  text-processing
Questions
Web app inspired by Quora, allowing users ask question and get answers
Stars: ✭ 15 (-66.67%)
Mutual labels:  aws-s3
s3-fuzzer
🔐 A concurrent, command-line AWS S3 Fuzzer. Written in Go.
Stars: ✭ 43 (-4.44%)
Mutual labels:  aws-s3
koel-aws
Official Lambda package and guide to use AWS S3 with Koel
Stars: ✭ 23 (-48.89%)
Mutual labels:  aws-s3
teanaps
자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.
Stars: ✭ 91 (+102.22%)
Mutual labels:  text-processing
s3-cli
Go version of s3cmd
Stars: ✭ 114 (+153.33%)
Mutual labels:  aws-s3
dif
'dif' is a Linux preprocessing front end to gvimdiff/meld/kompare
Stars: ✭ 18 (-60%)
Mutual labels:  text-processing
s3 uploader
Multithreaded recursive directory upload to S3 using FOG
Stars: ✭ 36 (-20%)
Mutual labels:  aws-s3
T-Watch
Real Time Twitter Sentiment Analysis Product
Stars: ✭ 20 (-55.56%)
Mutual labels:  aws-s3
SuperCombinators
[Deprecated] A Swift parser combinator framework
Stars: ✭ 19 (-57.78%)
Mutual labels:  text-processing
fuzzychinese
A small package to fuzzy match chinese words
Stars: ✭ 50 (+11.11%)
Mutual labels:  text-processing
static-aws-deploy
A tool for deploying files to an AWS S3 bucket with configurable headers and invalidating AWS Cloudfront Objects.
Stars: ✭ 27 (-40%)
Mutual labels:  aws-s3
estratto
parsing fixed width files content made easy
Stars: ✭ 12 (-73.33%)
Mutual labels:  text-processing
vi-rs
Vietnamese Input Method library
Stars: ✭ 69 (+53.33%)
Mutual labels:  text-processing
python-mecab
A repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)
Stars: ✭ 27 (-40%)
Mutual labels:  text-processing
text2video
Text to Video Generation Problem
Stars: ✭ 28 (-37.78%)
Mutual labels:  text-processing

s3-utils

Crates.io Build Status

Utilities and tools based around Amazon S3 to provide convenience APIs in a CLI.

This tool contains a small set of command line utilities for working with Amazon S3, focused on including features which are not readily available in the S3 API. It has evolved from various scripts and use cases during work life, but packaged into something a little more useful. It's likely that more tools will be added over time as they become useful and/or required.

All S3 interaction is controlled by rusoto_s3.

Installation

You can install s3-utils from either this repository, or from Crates (once it's published):

# install from Cargo
$ cargo install s3-utils

# install the latest from GitHub
$ cargo install --git https://github.com/whitfin/s3-utils.git

Commands

Credentials can be configured by following the instructions on the AWS Documentation. Almost every command you might use will take this shape:

$ AWS_ACCESS_KEY_ID=MY_ACCESS_KEY_ID \
    AWS_SECRET_ACCESS_KEY=MY_SECRET_ACCESS_KEY \
    AWS_DEFAULT_REGION=MY_AWS_REGION \
    s3-utils <subcommand> <arguments>

There are several switches available on almost all commands (such as -d to dry run an operation), but please check the command documentation before assuming it does exist. Each command exposes a -h switch to show a help menu, as standard. The examples below will omit the AWS_ environment variables for brevity.

concat

This command is focused around concatenation of files in S3. You can concatenate files in a basic manner just by providing a source pattern, and a target file path:

$ s3-utils concat my.bucket.name 'archives/*.gz' 'archive.gz'

If the case you're working with long paths, you can add a prefix on the bucket name to avoid having to type it all out multiple times. In the following case, *.gz and archive.gz are relative to the my/annoyingly/nested/path/ prefix.

$ s3-utils concat my.bucket.name/my/annoyingly/nested/path/ '*.gz' 'archive.gz'

You can also use pattern matching (driven by the official regex crate), to use segments of the source paths in your target paths. Here is an example of mapping a date hierarchy (YYYY/MM/DD) to a flat structure (YYYY-MM-DD):

$ s3-utils concat my.bucket.name 'date-hierachy/(\d{4})/(\d{2})/(\d{2})/*.gz' 'flat-hierarchy/$1-$2-$3.gz'

In this case, all files in 2018/01/01/* would be mapped to 2018-01-01.gz. Don't forget to add single quotes around your expressions to avoid any pesky shell expansions!

In order to concatenate files remotely (i.e. without pulling them to your machine), this tool uses the Multipart Upload API of S3. This means that all limitations of that API are inherited by this tool. Usually, this isn't an issue, but one of the more noticeable problems is that files smaller than 5MB cannot be concatenated. To avoid wasted AWS calls, this is currently caught in the client layer and will result in a client side error. Due to the complexity in working around this, it's currently unsupported to join files with a size smaller than 5MB.

rename

The rename command offers dynamic file renaming using patterns, without having to download files. The main utility in this command is being able to use patterns to rename large amounts of files in a single command.

You can rename files in a basic manner, such as simply changing their prefix:

$ s3-utils rename my.bucket.name 'my-directory/(.*)' 'my-new-directory/$1'

Although basic, this shows how you can use captured patterns in your renaming operations. This allows you to do much more complicated mappings, such as transforming an existing tree hierarchy into flat files:

$ s3-utils rename my.bucket.name '(.*)/(.*)/(.*)' '$1-$2-$3'

This is a very simple model, but provides a pretty flexible tool to change a lot of stuff pretty quickly.

Due to limitations in the current AWS S3 API, this command is unable to work with files larger than 5GB in size. At some point we may add a workaround for this, but for now this is likely to throw an error.

report

Reports generate metadata about an S3 bucket or subdirectory thereof. They can be used to inspect things like file sizes, modification dates, etc. This command is extremely simple as it's fairly un-customizable:

$ s3-utils report my.bucket.name
$ s3-utils report my.bucket.name/my/directory/path

This generates shell output which follows a relatively simple format, meant to be easily extensible and (hopefully) convenient in shell pipelines. The general format is pretty stable, but certain formatting may change over time (spacing, number formatting, etc).

Below is an example based on a real S3 bucket (although with fake names):

[general]
total_time=7s
total_space=1.94TB
total_files=51,152

[file_size]
average_file_size=37.95MB
average_file_bytes=37949529
largest_file_size=1.82GB
largest_file_bytes=1818900684
largest_file_name=path/to/my_largest_file.txt.gz
smallest_file_size=54B
smallest_file_bytes=54
smallest_file_name=path/to/my_smallest_file.txt.gz
smallest_file_others=12

[extensions]
unique_extensions=1
most_frequent_extension=gz

[modification]
earliest_file_date=2016-06-11T17:36:57.000Z
earliest_file_name=path/to/my_earliest_file.txt.gz
earliest_file_others=3
latest_file_date=2017-01-01T00:03:19.000Z
latest_file_name=path/to/my_latest_file.txt.gz

This sample report is based on the initial builds of this subcommand, so depending on when you visit this tool there may be more (or less) included in the generated report.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].