Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → danielfrg → S3contents

danielfrg / S3contents

Licence: apache-2.0

A S3 backed ContentsManager implementation for Jupyter

Programming Languages

python

139335 projects - #7 most used programming language

Labels

jupyter-notebook s3 minio

Projects that are alternatives of or similar to S3contents

Stock Analysis Engine

Backtest 1000s of minute-by-minute trading algorithms for training AI with automated pricing data from: IEX, Tradier and FinViz. Datasets and trading performance automatically published to S3 for building AI training datasets for teaching DNNs how to trade. Runs on Kubernetes and docker-compose. >150 million trading history rows generated from +5000 algorithms. Heads up: Yahoo's Finance API was disabled on 2019-01-03 https://developer.yahoo.com/yql/

Stars: ✭ 605 (+245.71%)

Mutual labels: s3, jupyter-notebook, minio

Helm S3

Helm plugin that allows to set up a chart repository in AWS S3.

Stars: ✭ 372 (+112.57%)

Mutual labels: s3, minio

Burry.sh

Cloud Native Infrastructure BackUp & RecoveRY

Stars: ✭ 260 (+48.57%)

Mutual labels: s3, minio

Cortx

CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.

Stars: ✭ 426 (+143.43%)

Mutual labels: s3, jupyter-notebook

terraform-provider-minio

Terraform provider for managing minio S3 buckets and IAM Users

Stars: ✭ 123 (-29.71%)

Mutual labels: s3, minio

s3x

s3x is a minio gateway providing an S3 API powered by TemporalX that uses IPFS as the data storage layer. It lets you turn any S3 application into an IPFS application with no change in application design

Stars: ✭ 85 (-51.43%)

Mutual labels: s3, minio

Commuter

🚎 Notebook sharing hub

Stars: ✭ 353 (+101.71%)

Mutual labels: s3, jupyter-notebook

Udacity Data Engineering

Udacity Data Engineering Nano Degree (DEND)

Stars: ✭ 89 (-49.14%)

Mutual labels: s3, jupyter-notebook

Dataengineeringproject

Example end to end data engineering project.

Stars: ✭ 82 (-53.14%)

Mutual labels: s3, minio

Minio Go

MinIO Client SDK for Go

Stars: ✭ 1,231 (+603.43%)

Mutual labels: s3, minio

fss3

FSS3 is an S3 filesystem abstraction layer for Golang

Stars: ✭ 52 (-70.29%)

Mutual labels: s3, minio

Beyond Jupyter

🐍💻📊 All material from the PyCon.DE 2018 Talk "Beyond Jupyter Notebooks - Building your own data science platform with Python & Docker" (incl. Slides, Video, Udemy MOOC & other References)

Stars: ✭ 135 (-22.86%)

Mutual labels: jupyter-notebook, minio

minback-postgres

A container which provides the ability to backup a PostgreSQL database to Minio on demand

Stars: ✭ 18 (-89.71%)

Mutual labels: s3, minio

minio-rclone-webdav-server

A @rclone served WebDAV server with @minio as the s3 storage backend docker example

Stars: ✭ 17 (-90.29%)

Mutual labels: s3, minio

esop

Cloud-enabled backup and restore tool for Apache Cassandra

Stars: ✭ 40 (-77.14%)

Mutual labels: s3, minio

gosquito

gosquito ("go" + "mosquito") is a pluggable tool for data gathering, data processing and data transmitting to various destinations.

Stars: ✭ 25 (-85.71%)

Mutual labels: s3, minio

mlflow-docker

Ready to run docker-compose configuration for ML Flow with Mysql and Minio S3

Stars: ✭ 146 (-16.57%)

Mutual labels: s3, minio

minio-dart

Unofficial MinIO Dart Client SDK that provides simple APIs to access any Amazon S3 compatible object storage server.

Stars: ✭ 42 (-76%)

Mutual labels: s3, minio

Rome

Carthage cache for S3, Minio, Ceph, Google Storage, Artifactory and many others

Stars: ✭ 724 (+313.71%)

Mutual labels: s3, minio

Sci Pype

A Machine Learning API with native redis caching and export + import using S3. Analyze entire datasets using an API for building, training, testing, analyzing, extracting, importing, and archiving. This repository can run from a docker container or from the repository.

Stars: ✭ 90 (-48.57%)

Mutual labels: s3, jupyter-notebook

View All Similar Projects ➔

S3Contents

An S3 and GCS backed ContentsManager implementation for Jupyter.

It aims to a be a transparent, drop-in replacement for Jupyter standard filesystem-backed storage system. With this implementation of a Jupyter Contents Manager you can save all your notebooks, regular files, directories structure directly to a S3/GCS bucket, this could be on AWS/GCP or a self hosted S3 API compatible like minio.

Prerequisites

Write access (valid credentials) to an S3/GCS bucket, this could be on AWS/GCP or a self hosted S3 like minio.

Installation

$ pip install s3contents

Jupyter config

Edit ~/.jupyter/jupyter_notebook_config.py based on the backend you want to based on the examples below. Replace credentials as needed.

AWS S3

from s3contents import S3ContentsManager

c = get_config()

# Tell Jupyter to use S3ContentsManager for all storage.
c.NotebookApp.contents_manager_class = S3ContentsManager
c.S3ContentsManager.access_key_id = "{{ AWS Access Key ID / IAM Access Key ID }}"
c.S3ContentsManager.secret_access_key = "{{ AWS Secret Access Key / IAM Secret Access Key }}"
c.S3ContentsManager.session_token = "{{ AWS Session Token / IAM Session Token }}"
c.S3ContentsManager.bucket = "{{ S3 bucket name }}"

# Optional settings:
c.S3ContentsManager.prefix = "this/is/a/prefix/on/the/s3/bucket"
c.S3ContentsManager.sse = "AES256"
c.S3ContentsManager.signature_version = "s3v4"
c.S3ContentsManager.init_s3_hook = init_function  # See AWS key refresh

Example for play.minio.io:9000:

from s3contents import S3ContentsManager

c = get_config()

# Tell Jupyter to use S3ContentsManager for all storage.
c.NotebookApp.contents_manager_class = S3ContentsManager
c.S3ContentsManager.access_key_id = "Q3AM3UQ867SPQQA43P2F"
c.S3ContentsManager.secret_access_key = "zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG"
c.S3ContentsManager.endpoint_url = "http://play.minio.io:9000"
c.S3ContentsManager.bucket = "s3contents-demo"
c.S3ContentsManager.prefix = "notebooks/test"

AWS EC2 role auth setup

It also possible to use IAM Role-based access to the S3 bucket from an Amazon EC2 instance.

To do that just leave access_key_id and secret_access_key set to their default values (None), and ensure that the EC2 instance has an IAM role which provides sufficient permissions for the bucket and the operations necessary.

AWS key refresh

The optional init_s3_hook configuration can be used to enable AWS key rotation (described here and here) as follows:

from s3contents import S3ContentsManager
from botocore.credentials import RefreshableCredentials
from botocore.session import get_session
import botocore
import boto3
from configparser import ConfigParser

def refresh_external_credentials():
    config = ConfigParser()
    config.read('/home/jovyan/.aws/credentials')
    return {
        "access_key": config['default']['aws_access_key_id'],
        "secret_key": config['default']['aws_secret_access_key'],
        "token": config['default']['aws_session_token'],
        "expiry_time": config['default']['aws_expiration']
    }

session_credentials = RefreshableCredentials.create_from_metadata(
        metadata = refresh_external_credentials(),
        refresh_using = refresh_external_credentials,
        method = 'custom-refreshing-key-file-reader'
)

def make_key_refresh_boto3(this_s3contents_instance):
    refresh_session =  get_session() # from botocore.session
    refresh_session._credentials = session_credentials
    my_s3_session =  boto3.Session(botocore_session=refresh_session)
    this_s3contents_instance.boto3_session = my_s3_session

# Tell Jupyter to use S3ContentsManager for all storage.
c.NotebookApp.contents_manager_class = S3ContentsManager

c.S3ContentsManager.init_s3_hook = make_key_refresh_boto3

GCP Cloud Storage

from s3contents import GCSContentsManager

c = get_config(

c.NotebookApp.contents_manager_class = GCSContentsManager
c.GCSContentsManager.project = "{{ your-project }}"
c.GCSContentsManager.token = "~/.config/gcloud/application_default_credentials.json"
c.GCSContentsManager.bucket = "{{ GCP bucket name }}"

Note that the file ~/.config/gcloud/application_default_credentials.json assumes a posix system when you did gcloud init

Access local files

To access local file as well as remote files in S3 you can use hybridcontents.

First install it:

pip install hybridcontents

Use a configuration similar to this:

from s3contents import S3ContentsManager
from hybridcontents import HybridContentsManager
from notebook.services.contents.largefilemanager import LargeFileManager

c = get_config()

c.NotebookApp.contents_manager_class = HybridContentsManager

c.HybridContentsManager.manager_classes = {
    # Associate the root directory with an S3ContentsManager.
    # This manager will receive all requests that don"t fall under any of the
    # other managers.
    "": S3ContentsManager,
    # Associate /directory with a LargeFileManager.
    "local_directory": LargeFileManager,
}

c.HybridContentsManager.manager_kwargs = {
    # Args for root S3ContentsManager.
    "": {
        "access_key_id": "{{ AWS Access Key ID / IAM Access Key ID }}",
        "secret_access_key": "{{ AWS Secret Access Key / IAM Secret Access Key }}",
        "bucket": "{{ S3 bucket name }}",
    },
    # Args for the LargeFileManager mapped to /directory
    "local_directory": {
        "root_dir": "/Users/danielfrg/Downloads",
    },
}

File Save Hooks

If you want to use pre/post file save hooks here are some examples.

A pre_save_hook is written in the exact same way as normal, operating on the file in local storage before commiting it to the object store.

def scrub_output_pre_save(model, **kwargs):
    """scrub output before saving notebooks"""
    # only run on notebooks
    if model["type"] != "notebook":
        return
    # only run on nbformat v4
    if model["content"]["nbformat"] != 4:
        return

    for cell in model["content"]["cells"]:
        if cell["cell_type"] != "code":
            continue
        cell["outputs"] = []
        cell["execution_count"] = None

c.S3ContentsManager.pre_save_hook = scrub_output_pre_save

A post_save_hook instead operates on the file in object storage, because of this it is useful to use the file methods on the contents_manager for data manipulation. In addition, one must use the following function signature (unique to s3contents):

def make_html_post_save(model, s3_path, contents_manager, **kwargs):
    """
    convert notebooks to HTML after saving via nbconvert
    """
    from nbconvert import HTMLExporter

    if model["type"] != "notebook":
        return

    content, _format = contents_manager.fs.read(s3_path, format="text")
    my_notebook = nbformat.reads(content, as_version=4)

    html_exporter = HTMLExporter()
    html_exporter.template_name = "classic"

    (body, resources) = html_exporter.from_notebook_node(my_notebook)

    base, ext = os.path.splitext(s3_path)
    contents_manager.fs.write(path=(base + ".html"), content=body, format=_format)

c.S3ContentsManager.post_save_hook = make_html_post_save

Notes

While there are some implementations of this already: (s3nb or s3drive), I wasn't able to make them work in newer versions of Jupyter Notebook. This aims to be a more tested version and it's based on PGContents.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 175

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (21) 🔗