All Projects → gallantlab → cottoncandy

gallantlab / cottoncandy

Licence: BSD-2-Clause license
sugar for s3

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to cottoncandy

rclone-drive
☁️Simple web cloud storage based on rclone, transform cloud storage (s3, google drive, one drive, dropbox) into own custom web-based storage
Stars: ✭ 30 (-9.09%)
Mutual labels:  s3, google-drive, cloud-storage
Rclone
"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Yandex Files
Stars: ✭ 30,541 (+92448.48%)
Mutual labels:  s3, google-drive, cloud-storage
go-drive
A simple cloud drive mapping web app supports local, FTP/SFTP, S3, OneDrive, WebDAV, Google Drive.
Stars: ✭ 184 (+457.58%)
Mutual labels:  s3, google-drive, cloud-storage
go-storage
A vendor-neutral storage library for Golang: Write once, run on every storage service.
Stars: ✭ 387 (+1072.73%)
Mutual labels:  s3, cloud-storage
Cloudcross
CloudCross it's opensource crossplatform software for syncronization a local files and folders with many cloud providers. On this moment a Cloud Mail.Ru, Yandex.Disk, Google drive, OneDrive and Dropbox support is available
Stars: ✭ 185 (+460.61%)
Mutual labels:  google-drive, cloud-storage
backblaze
Backblaze.Agent is a high-performance .NET Core implementation of the Backblaze B2 Cloud Storage API.
Stars: ✭ 32 (-3.03%)
Mutual labels:  s3, cloud-storage
Goofys
a high-performance, POSIX-ish Amazon S3 file system written in Go
Stars: ✭ 3,932 (+11815.15%)
Mutual labels:  s3, cloud-storage
poto
multi cloud storage to image gallery + image proxy + file api - 350 LOC.
Stars: ✭ 20 (-39.39%)
Mutual labels:  s3, cloud-storage
Filestash
🦄 A modern web client for SFTP, S3, FTP, WebDAV, Git, Minio, LDAP, CalDAV, CardDAV, Mysql, Backblaze, ...
Stars: ✭ 5,231 (+15751.52%)
Mutual labels:  s3, google-drive
Minio Hs
MinIO Client SDK for Haskell
Stars: ✭ 39 (+18.18%)
Mutual labels:  s3, cloud-storage
Cyberduck
Cyberduck is a libre FTP, SFTP, WebDAV, Amazon S3, Backblaze B2, Microsoft Azure & OneDrive and OpenStack Swift file transfer client for Mac and Windows.
Stars: ✭ 1,080 (+3172.73%)
Mutual labels:  s3, google-drive
Cloud Media Scripts
Upload and stream media from the cloud with or without encryption. Cache all new and recently streamed media locally to access quickly and reduce API calls
Stars: ✭ 84 (+154.55%)
Mutual labels:  google-drive, cloud-storage
Action Google Drive
GitHub Action to interact with Google Drive
Stars: ✭ 41 (+24.24%)
Mutual labels:  google-drive, cloud-storage
Cloudexplorer
Cloud Explorer
Stars: ✭ 170 (+415.15%)
Mutual labels:  s3, cloud-storage
Homescripts
My Scripts for Plex / Emby with Google Drive and rclone
Stars: ✭ 652 (+1875.76%)
Mutual labels:  google-drive, cloud-storage
S3fs Fuse
FUSE-based file system backed by Amazon S3
Stars: ✭ 5,733 (+17272.73%)
Mutual labels:  s3, cloud-storage
Go Storage
An application-oriented unified storage layer for Golang.
Stars: ✭ 87 (+163.64%)
Mutual labels:  s3, cloud-storage
Sftpgo
Fully featured and highly configurable SFTP server with optional HTTP, FTP/S and WebDAV support - S3, Google Cloud Storage, Azure Blob
Stars: ✭ 3,534 (+10609.09%)
Mutual labels:  s3, cloud-storage
aws-pdf-textract-pipeline
🔍 Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript
Stars: ✭ 141 (+327.27%)
Mutual labels:  s3
spring-file-storage-service
The FSS(file storage service) APIs make storing the blob file easy and simple .
Stars: ✭ 33 (+0%)
Mutual labels:  s3

cottoncandy logo

Welcome to cottoncandy!

Build Status DOI License Downloads

sugar for s3

https://gallantlab.github.io/cottoncandy

What is cottoncandy?

A python scientific library for storing and accessing numpy array data on S3. This is achieved by reading arrays from memory and downloading arrays directly into memory. This means that you don't have to download your array to disk, and then load it from disk into your python session.

This library relies heavily on boto3

Try it out!

Jupyter Notebook examples using cottoncandy to

Installation

Directly from the repo:

Clone the repo from GitHub and do the usual python install from the command line

$ git clone https://github.com/gallantlab/cottoncandy.git
$ cd cottoncandy
$ sudo python setup.py install

With pip:

$ pip install cottoncandy

Configuration file

Upon first use, cottoncandy will create a configuration file. This configuration file allows you to enter your S3 and Google Drive credentials and set many other options. See the default configuration file.

The configuration file is created the first time you import cottoncandy and it is stored under:

  • Linux: ~/.config/cottoncandy/options.cfg
  • MAC OS: ~/Library/Application Support/cottoncandy/options.cfg
  • Windows (not supported): C:\Users\<username>\AppData\Local\<AppAuthor>\cottoncandy\options.cfg

By default, cottoncandy sets object and bucket permissions to authenticated-read. If you wish to keep all your objects private, modify your configuration file and set default_acl = private. See AWS ACL overview for more information on S3 permissions.

Advanced (for admins): One can customize the cottoncandy system install by cloning the repo and modifying defaults.cfg. For example, one can set the default encyption key across the system for all users (key = SoMeEncypTionKey). When a user first uses cottoncandy, this deault value will be copied to their personal configuration file. Note however that the user can still overwrite that value.

Getting started

Setup the connection (endpoint, access and secret keys can be specified in the configuration file instead)::

>>> import cottoncandy as cc
>>> cci = cc.get_interface('my_bucket',
                           ACCESS_KEY='FAKEACCESSKEYTEXT',
                           SECRET_KEY='FAKESECRETKEYTEXT',
                           endpoint_url='https://s3.amazonaws.com')

Storing numpy arrays

>>> import numpy as np
>>> arr = np.random.randn(100)
>>> s3_response = cci.upload_raw_array('myarray', arr)
>>> arr_down = cci.download_raw_array('myarray')
>>> assert np.allclose(arr, arr_down)

Storing dask arrays

>>> arr = np.random.randn(100,600,1000)
>>> s3_response = cci.upload_dask_array('test_dim', arr, axis=-1)
>>> dask_object = cci.download_dask_array('test_dim')
>>> dask_object
dask.array<array, shape=(100, 600, 1000), dtype=float64, chunksize=(100, 600, 100)>
>>> dask_slice = dask_object[..., :200]
>>> dask_slice
dask.array<getitem..., shape=(100, 600, 1000), dtype=float64, chunksize=(100, 600, 100)>
>>> downloaded_data = np.asarray(dask_slice) # this downloads the array
>>> downloaded_data.shape
(100, 600, 200)

Command-line search

>>> cci.glob('/path/to/*/file01*.grp/image_data')
['/path/to/my/file01a.grp/image_data',
 '/path/to/my/file01b.grp/image_data',
 '/path/to/your/file01a.grp/image_data',
 '/path/to/your/file01b.grp/image_data']
>>> cci.glob('/path/to/my/file02*.grp/*')
['/path/to/my/file02a.grp/image_data',
 '/path/to/my/file02a.grp/text_data',
 '/path/to/my/file02b.grp/image_data',
 '/path/to/my/file02b.grp/text_data']

File system-like object browsing

>>> import cottoncandy as cc
>>> browser = cc.get_browser('my_bucket_name',
                             ACCESS_KEY='FAKEACCESSKEYTEXT',
                             SECRET_KEY='FAKESECRETKEYTEXT',
                             endpoint_url='https://s3.amazonaws.com')
>>> browser.sweet_project.sub<TAB>
browser.sweet_project.sub01_awesome_analysis_DOT_grp
browser.sweet_project.sub02_awesome_analysis_DOT_grp
>>> browser.sweet_project.sub01_awesome_analysis_DOT_grp
<cottoncandy-group <bucket:my_bucket_name> (sub01_awesome_analysis.grp: 3 keys)>
>>> browser.sweet_project.sub01_awesome_analysis_DOT_grp.result_model01
<cottoncandy-dataset <bucket:my_bucket_name [1.00MB:shape=(10000)]>

Connection settings (S3 only)

cottoncandy allows users to modify connection settings via botocore. For example, the user can define the connection time out for downloads, and the number of times to retry dropped S3 requests.

from botocore.client import Config
config = Config(connect_timeout=60, read_timeout=60, retries=dict(max_attempts=10))
cci = cc.get_interface('my_bucket_name', config=config)

Google Drive backend

cottoncandy can also use Google Drive as a back-end. This equires a client_secrets.json file in your ~/.config/cottoncandy folder and the pydrive package.

See the Google Drive setup instructions for more details.

>>> import cottoncandy as cc
>>> cci = cc.get_interface(backend='gdrive')

Encryption

cottoncandyprovides a transparent encryption interface for AWS S3 and Google Drive. This requires the pycrypto package.

WARNING: Encryption is an advance feature. Make sure to create a backup of the encryption keys (stored in ~/.config/cottoncandy/options.cfg). If you lose your encryption keys you will not be able to recover your data!

>>> import cottoncandy as cc
>>> cci = cc.get_encrypted_interface('my_bucket_name',
                                      ACCESS_KEY='FAKEACCESSKEYTEXT',
                                      SECRET_KEY='FAKESECRETKEYTEXT',
                                      endpoint_url='https://s3.amazonaws.com')                               

Contributing

  • If you find any issues with cottoncandy, please report it by submitting an issue on GitHub.
  • If you wish to contribute, please submit a pull request. Include information as to how you ran the tests and the full output log if possible. Running tests on AWS can incur costs.

Cite as

Nunez-Elizalde AO, Gao JS, Zhang T, Gallant JL (2018). cottoncandy: scientific python package for easy cloud storage. Journal of Open Source Software, 3(28), 890, https://doi.org/10.21105/joss.00890

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].