All Projects → xtream1101 → s3-concat

xtream1101 / s3-concat

Licence: MIT license
Concat multiple files in s3

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to s3-concat

S3sync
Really fast sync tool for S3
Stars: ✭ 224 (+540%)
Mutual labels:  s3
aws-pdf-textract-pipeline
🔍 Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript
Stars: ✭ 141 (+302.86%)
Mutual labels:  s3
secure-media
Store private media securely in WordPress.
Stars: ✭ 22 (-37.14%)
Mutual labels:  s3
Storagetapper
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Stars: ✭ 232 (+562.86%)
Mutual labels:  s3
graphchain
⚡️ An efficient cache for the execution of dask graphs.
Stars: ✭ 63 (+80%)
Mutual labels:  s3
kafka-connect-fs
Kafka Connect FileSystem Connector
Stars: ✭ 107 (+205.71%)
Mutual labels:  s3
Rocket
Automated software delivery as fast and easy as possible 🚀
Stars: ✭ 217 (+520%)
Mutual labels:  s3
smockin
Dynamic API, S3 & Mail mocking for web, mobile & microservice development.
Stars: ✭ 74 (+111.43%)
Mutual labels:  s3
pg-bifrost
PostgreSQL Logical Replication tool into Kinesis, S3 and RabbitMQ
Stars: ✭ 31 (-11.43%)
Mutual labels:  s3
s3 asset deploy
Deploy & manage static assets on S3 with rolling deploys & rollbacks in mind.
Stars: ✭ 63 (+80%)
Mutual labels:  s3
Node S3 Uploader
Flexible and efficient resize, rename, and upload images to Amazon S3 disk storage. Uses the official AWS Node SDK for transfer, and ImageMagick for image processing. Support for multiple image versions targets.
Stars: ✭ 237 (+577.14%)
Mutual labels:  s3
amazon-sns-java-extended-client-lib
This AWS SNS client library allows to publish messages to SNS that exceed the 256 KB message size limit.
Stars: ✭ 23 (-34.29%)
Mutual labels:  s3
s3bundler
ARCHIVED - see https://aws.amazon.com/about-aws/whats-new/2019/04/Amazon-S3-Introduces-S3-Batch-Operations-for-Object-Management/ Amazon S3 Bundler downloads billions of small S3 objects, bundles them into archives, and uploads them back into S3.
Stars: ✭ 26 (-25.71%)
Mutual labels:  s3
Nextjs Aws S3
Example Next.js app to upload photos to an S3 bucket.
Stars: ✭ 229 (+554.29%)
Mutual labels:  s3
spring-file-storage-service
The FSS(file storage service) APIs make storing the blob file easy and simple .
Stars: ✭ 33 (-5.71%)
Mutual labels:  s3
Sftpgo
Fully featured and highly configurable SFTP server with optional HTTP, FTP/S and WebDAV support - S3, Google Cloud Storage, Azure Blob
Stars: ✭ 3,534 (+9997.14%)
Mutual labels:  s3
mediasort
Upload manager using Laravel's built-in Filesystem/Cloud Storage
Stars: ✭ 20 (-42.86%)
Mutual labels:  s3
s3redirect
Turn S3 into a URL redirect website using lambda and NO database!
Stars: ✭ 31 (-11.43%)
Mutual labels:  s3
xyr
Query any data source using SQL, works with the local filesystem, s3, and more. It should be a very tiny and lightweight alternative to AWS Athena, Presto ... etc.
Stars: ✭ 58 (+65.71%)
Mutual labels:  s3
logstash-output-s3
No description or website provided.
Stars: ✭ 55 (+57.14%)
Mutual labels:  s3

Python S3 Concat

PyPI PyPI

S3 Concat is used to concatenate many small files in an s3 bucket into fewer larger files.

Install

pip install s3-concat

Usage

Command Line

$ s3-concat -h

Import

from s3_concat import S3Concat

bucket = 'YOUR_BUCKET_NAME'
path_to_concat = 'PATH_TO_FILES_TO_CONCAT'
concatenated_file = 'FILE_TO_SAVE_TO.json'
# Setting this to a size will always add a part number at the end of the file name
min_file_size = '50MB'  # ex: FILE_TO_SAVE_TO-1.json, FILE_TO_SAVE_TO-2.json, ...
# Setting this to None will concat all files into a single file
# min_file_size = None  ex: FILE_TO_SAVE_TO.json

# Init the job
job = S3Concat(bucket, concatenated_file, min_file_size,
               content_type='application/json',
              #  session=boto3.session.Session(),  # For custom aws session
              # s3_client_kwargs={}  # Use to pass arguments allowed by the s3 client: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
               )
# Add files, can call multiple times to add files from other directories
job.add_files(path_to_concat)
# Add a single file at a time
job.add_file('some/file_key.json')
# Only use small_parts_threads if you need to. See Advanced Usage section below.
job.concat(small_parts_threads=4)

Advanced Usage

Depending on your use case, you may want to use small_parts_threads.

  • small_parts_threads is only used when the files you are trying to concat are less then 5MB. Due to the limitations of the s3 multipart_upload api (see Limitations below) any files less then 5MB need to be download locally, concated together, then re uploaded. By setting this thread count it will download the parts in parallel for faster creation of the concatination process.

The values set for these arguments depends on your use case and the system you are running this on.

Limitations

This uses the multipart upload of s3 and its limits are https://docs.aws.amazon.com/AmazonS3/latest/dev/qfacts.html

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].