All Projects → chanzuckerberg → s3parcp

chanzuckerberg / s3parcp

Licence: MIT license
Faster than s3cp

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to s3parcp

Pgbackrest
Reliable PostgreSQL Backup & Restore
Stars: ✭ 766 (+2370.97%)
Mutual labels:  checksum, s3
JSum
Consistent checksum calculation of JSON objects.
Stars: ✭ 64 (+106.45%)
Mutual labels:  checksum
BelaUtils
Tools reimplemented using Bela library
Stars: ✭ 24 (-22.58%)
Mutual labels:  checksum
SwagMediaS3
No description or website provided.
Stars: ✭ 22 (-29.03%)
Mutual labels:  s3
mmap-io
Clean straight forward mmap-bindings for node.js
Stars: ✭ 62 (+100%)
Mutual labels:  mmap
fluent-bit-go-s3
[Deprecated] The predessor of fluent-bit output plugin for Amazon S3. https://aws.amazon.com/s3/
Stars: ✭ 34 (+9.68%)
Mutual labels:  s3
python-sharearray
Share numpy arrays across processes efficiently ideal for large, read-only datasets
Stars: ✭ 32 (+3.23%)
Mutual labels:  mmap
mmap-object
Shared Memory Objects for Node
Stars: ✭ 90 (+190.32%)
Mutual labels:  mmap
minio-rclone-webdav-server
A @rclone served WebDAV server with @minio as the s3 storage backend docker example
Stars: ✭ 17 (-45.16%)
Mutual labels:  s3
Dive-Into-AWS
Links to the Repos and Sections in our Dive into AWS Course.
Stars: ✭ 27 (-12.9%)
Mutual labels:  s3
s3x
s3x is a minio gateway providing an S3 API powered by TemporalX that uses IPFS as the data storage layer. It lets you turn any S3 application into an IPFS application with no change in application design
Stars: ✭ 85 (+174.19%)
Mutual labels:  s3
LetsHack
Notes & HowTo's covering the Raspberry Pi, Arduino, ESP8266, ESP32, etc.
Stars: ✭ 37 (+19.35%)
Mutual labels:  s3
ionic-image-upload
Ionic Plugin for Uploading Images to Amazon S3
Stars: ✭ 26 (-16.13%)
Mutual labels:  s3
serverless-aws-static-websites
Deploy your static websites without all the hassle on AWS with CloudFront, S3, ACM and Route53 via Serverless
Stars: ✭ 121 (+290.32%)
Mutual labels:  s3
rclone-drive
☁️Simple web cloud storage based on rclone, transform cloud storage (s3, google drive, one drive, dropbox) into own custom web-based storage
Stars: ✭ 30 (-3.23%)
Mutual labels:  s3
s3-proxy
S3 Reverse Proxy with GET, PUT and DELETE methods and authentication (OpenID Connect and Basic Auth)
Stars: ✭ 106 (+241.94%)
Mutual labels:  s3
radixmmap
Mmap radix sort file by a fixed length prefix of each line
Stars: ✭ 52 (+67.74%)
Mutual labels:  mmap
jobAnalytics and search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-19.35%)
Mutual labels:  s3
NYC Taxi Pipeline
Design/Implement stream/batch architecture on NYC taxi data | #DE
Stars: ✭ 16 (-48.39%)
Mutual labels:  s3
storage
Go library providing common interface for working across multiple cloud storage backends
Stars: ✭ 154 (+396.77%)
Mutual labels:  s3

s3parcp Latest Version Check codecov GitHub license PRs Welcome

s3parcp is a CLI wrapper around AWS's Go SDK's Downloader. This downloader provides a chunked parallel download implementation from s3 offering speeds faster than s3cp. The API is inspired by cp.

Installation

Linux

Debian (Ubuntu/Mint)

Download and install the .deb:

RELEASES=chanzuckerberg/s3parcp/releases
VERSION=$(curl https://api.github.com/repos/${RELEASES}/latest | jq -r .name | sed s/^v//)
DOWNLOAD=s3parcp_${VERSION}_linux_amd64.deb
curl -L https://github.com/${RELEASES}/download/v${VERSION}/${DOWNLOAD} -o s3parcp.deb
sudo dpkg -i s3parcp.deb
rm s3parcp.deb

Fedora (RHEL/CentOS)

Download and install the .rpm:

RELEASES=chanzuckerberg/s3parcp/releases
VERSION=$(curl https://api.github.com/repos/${RELEASES}/latest | jq -r .name | sed s/^v//)
DOWNLOAD=s3parcp_${VERSION}_linux_amd64.rpm
curl -L https://github.com/${RELEASES}/download/v${VERSION}/${DOWNLOAD} -o s3parcp.rpm
sudo rpm -i s3parcp.rpm
rm s3parcp.rpm

MacOS

Install via homebrew:

brew tap chanzuckerberg/tap
brew install s3parcp

Binary

Download the appropriate binary for your platform:

RELEASES=chanzuckerberg/s3parcp/releases
PLATFORM=#linux,darwin,windows
VERSION=$(curl https://api.github.com/repos/${RELEASES}/latest | jq -r .name | sed s/^v//)
DOWNLOAD=s3parcp_${VERSION}_${PLATFORM}_amd64.tar.gz
curl -L https://github.com/${RELEASES}/download/v${VERSION}/${DOWNLOAD} | tar zx

Windows

Download the appropriate binary for your platform:

RELEASES=chanzuckerberg/s3parcp/releases
VERSION=$(curl https://api.github.com/repos/${RELEASES}/latest | jq -r .name | sed s/^v//)
DOWNLOAD=s3parcp_${VERSION}_windows_amd64.tar.gz
curl -L https://github.com/${RELEASES}/download/v${VERSION}/${DOWNLOAD} | tar zx

Usage

Usage:
  s3parcp [OPTIONS] [Source] [Destination]

Application Options:
  -p, --part-size=                  Part size in bytes of parts to be downloaded
  -c, --concurrency=                Download concurrency
  -b, --buffer-size=                Size of download buffer in bytes
      --checksum                    Compare checksum if downloading or place checksum
                                    in metadata if uploading
  -r, --recursive                   Copy directories or folders recursively
      --version                     Print the current version
      --s3_url=                     A custom s3 API url (also available as an environment
                                    variable 'S3PARCP_S3_URL', the flag takes precedence)
      --max-retries=                Max per chunk retries (default: 3)
      --disable-ssl                 Disable SSL
      --disable-cached-credentials  Disable caching AWS credentials
  -v, --verbose                     verbose logging

Help Options:
  -h, --help                        Show this help message

Arguments:
  Source:                           Source to copy from
  Destination:                      Destination to copy to (Optional, defaults to source's base
                                    name)

Examples

Uploading

s3parcp my/local/file s3://my-bucket/my-object

Downloading

s3parcp s3://my-bucket/my-object my/local/file

Tuning Chunk Parameters

Note: These example parameters don't necessarily represent good parameters for your system. s3parcp uses sane defaults so it is recommended to use the default parameters unless you have reason to believe your values will work better.

PART_SIZE=1048576 # 1 MB
BUFFER_SIZE=10485760 # 10 MB
CONCURRENCY=8
s3parcp \
  --part-size $PART_SIZE \
  --concurrency $CONCURRENCY \
  --buffer-size $BUFFER_SIZE \
  my/local/file s3://my-bucket/my-object

Using CRC32C Checksum

You must upload your file to s3 with s3parcp and the --checksum flag to use this feature for downloads.

Upload your file:

s3parcp --checksum my/local/file s3://my-bucket/my-object

The checksum should be stored in the s3 object's metadata with the key x-amz-meta-crc32c-checksum.

Download your file:

s3parcp --checksum s3://my-bucket/my-object my/new/local/file

Features

checksum

This tool comes with a parallelized crc32c checksum validator. The AWS SDK does not support checksums for multipart downloads. If you include the --checksum flag when uploading a checksum of your file will be computed and stored in the object's metadata in s3 with the key x-amz-meta-crc32c-checksum. When downloading, the --checksum flag will compute an independent crc32c checksum of the downloaded file and compare it of the checksum in the object's metadata.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].