All Projects → whitfin → s3-concat

whitfin / s3-concat

Licence: MIT license
Concatenate Amazon S3 files remotely using flexible patterns

Programming Languages

rust
11053 projects

Projects that are alternatives of or similar to s3-concat

moodle-tool objectfs
Object file storage system for Moodle
Stars: ✭ 61 (+90.63%)
Mutual labels:  filesystem, aws-s3
s3-utils
Utilities and tools based around Amazon S3 to provide convenience APIs in a CLI
Stars: ✭ 45 (+40.63%)
Mutual labels:  aws-s3, text-processing
anyfs
Portable file system for Node
Stars: ✭ 17 (-46.87%)
Mutual labels:  filesystem, aws-s3
Goofys
a high-performance, POSIX-ish Amazon S3 file system written in Go
Stars: ✭ 3,932 (+12187.5%)
Mutual labels:  filesystem, aws-s3
Oneupflysystembundle
A Flysystem integration for your Symfony projects.
Stars: ✭ 541 (+1590.63%)
Mutual labels:  filesystem, aws-s3
Mc
MinIO Client is a replacement for ls, cp, mkdir, diff and rsync commands for filesystems and object storage.
Stars: ✭ 1,962 (+6031.25%)
Mutual labels:  filesystem, aws-s3
S3fs Fuse
FUSE-based file system backed by Amazon S3
Stars: ✭ 5,733 (+17815.63%)
Mutual labels:  filesystem, aws-s3
punic
Punic is a remote cache CLI built for Carthage and Apple .xcframework
Stars: ✭ 25 (-21.87%)
Mutual labels:  tooling, aws-s3
Transactional-NTFS-TxF-.NET
Transactional NTFS (TxF) Library .NET is a small library .Net (C#) allows to use transactions on NTFS FileSystem (Transactional NTFS (TxF))
Stars: ✭ 20 (-37.5%)
Mutual labels:  filesystem
Go-Clean-Architecture-REST-API
Golang Clean Architecture REST API example
Stars: ✭ 376 (+1075%)
Mutual labels:  aws-s3
lfs
Lightweight file system
Stars: ✭ 12 (-62.5%)
Mutual labels:  filesystem
analysis-flow
Data Analysis Workflows & Reproducibility Learning Resources
Stars: ✭ 108 (+237.5%)
Mutual labels:  tooling
juicefs-csi-driver
JuiceFS CSI Driver
Stars: ✭ 117 (+265.63%)
Mutual labels:  filesystem
jquery.filebrowser
File browser jQuery plugin
Stars: ✭ 29 (-9.37%)
Mutual labels:  filesystem
TOFileSystemObserver
A bullet-proof mechanism for detecting any changes made to the contents of a folder in iOS and macOS.
Stars: ✭ 35 (+9.38%)
Mutual labels:  filesystem
Windows-System-Wide-Filter
Windows WDM driver filters to filter IO to devices and file systems
Stars: ✭ 49 (+53.13%)
Mutual labels:  filesystem
T-Watch
Real Time Twitter Sentiment Analysis Product
Stars: ✭ 20 (-37.5%)
Mutual labels:  aws-s3
fliphub
the easiest app builder
Stars: ✭ 30 (-6.25%)
Mutual labels:  tooling
Aqeous
(Inactive, Checkout AvanaOS, Rewrite of this) This is a New Operating System (Kernel right now). Made completely from scratch, We aim to make a complete OS for Learning purpose
Stars: ✭ 23 (-28.12%)
Mutual labels:  filesystem
Enchilada
Enchilada is a filesystem abstraction layer written in C#
Stars: ✭ 29 (-9.37%)
Mutual labels:  filesystem

S3 Concat

Crates.io Build Status

This tool has been migrated into s3-utils, please use that crate for future updates.

A small utility to concatenate files in AWS S3. Designed to be simple and quick, this tool uses the Multipart Upload API provided by AWS to concatenate files. This avoids the need to download files to the local machines, although it does come with caveats. S3 interaction is controlled by rusoto_s3, so check out those docs for authorization practices.

Installation

You can install s3-concat from either this repository, or from Crates (once it's published):

# install from Cargo
$ cargo install s3-concat

# install the latest from GitHub
$ cargo install --git https://github.com/whitfin/s3-concat.git

Usage

Credentials can be configured by following the instructions on the AWS Documentation, although examples will use environment variables for the sake of clarity.

You can concatenate files in a basic manner just by providing a source pattern, and a target file path:

$ AWS_ACCESS_KEY_ID=MY_ACCESS_KEY_ID \
    AWS_SECRET_ACCESS_KEY=MY_SECRET_ACCESS_KEY \
    AWS_DEFAULT_REGION=us-west-2 \
    s3-concat my.bucket.name 'archives/*.gz' 'archive.gz'

If the case you're working with long paths, you can add a prefix on the bucket name to avoid having to type it all out multiple times. In the following case, *.gz and archive.gz are relative to the my/annoyingly/nested/path/ prefix.

$ AWS_ACCESS_KEY_ID=MY_ACCESS_KEY_ID \
    AWS_SECRET_ACCESS_KEY=MY_SECRET_ACCESS_KEY \
    AWS_DEFAULT_REGION=us-west-2 \
    s3-concat my.bucket.name/my/annoyingly/nested/path/ '*.gz' 'archive.gz'

You can also use pattern matching (driven by the official regex crate), to use segments of the source paths in your target paths. Here is an example of mapping a date hierarchy (YYYY/MM/DD) to a flat structure (YYYY-MM-DD):

$ AWS_ACCESS_KEY_ID=MY_ACCESS_KEY_ID \
    AWS_SECRET_ACCESS_KEY=MY_SECRET_ACCESS_KEY \
    AWS_DEFAULT_REGION=us-west-2 \
    s3-concat my.bucket.name 'date-hierachy/(\d{4})/(\d{2})/(\d{2})/*.gz' 'flat-hierarchy/$1-$2-$3.gz'

In this case, all files in 2018/01/01/* would be mapped to 2018-01-01.gz. Don't forget to add single quotes around your expressions to avoid any pesky shell expansions!

For any other functionality, check out the help menu (although this example below might be outdated):

$ s3-concat -h
s3-concat 1.0.0
Isaac Whitfield <[email protected]>
Concatenate Amazon S3 files remotely using flexible patterns

USAGE:
    s3-concat [FLAGS] <bucket> <source> <target>

FLAGS:
    -c, --cleanup    Removes source files after concatenation
    -d, --dry-run    Only print out the calculated writes
    -h, --help       Prints help information
    -q, --quiet      Only prints errors during execution
    -V, --version    Prints version information

ARGS:
    <bucket>    An S3 bucket prefix to work within
    <source>    A source pattern to use to locate files
    <target>    A target pattern to use to concatenate files into

Limitations

In order to concatenate files remotely (i.e. without pulling them to your machine), this tool uses the Multipart Upload API of S3. This means that all limitations of that API are inherited by this tool. Usually, this isn't an issue, but one of the more noticeable problems is that files smaller than 5MB cannot be concatenated. To avoid wasted AWS calls, this is currently caught in the client layer and will result in a client side error. Due to the complexity in working around this, it's currently unsupported to join files with a size smaller than 5MB.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].