All Projects → datashim-io → Datashim

datashim-io / Datashim

Licence: apache-2.0
A kubernetes based framework for hassle free handling of datasets

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to Datashim

terraform-aws-efs-backup
Terraform module designed to easily backup EFS filesystems to S3 using DataPipeline
Stars: ✭ 40 (-51.81%)
Mutual labels:  s3, nfs
Leofs
The LeoFS Storage System
Stars: ✭ 1,439 (+1633.73%)
Mutual labels:  s3, nfs
Edgefs
EdgeFS - decentralized, scalable data fabric platform for Edge/IoT Computing and Kubernetes apps
Stars: ✭ 358 (+331.33%)
Mutual labels:  s3, nfs
awesome-storage
A curated list of storage open source tools. Backups, redundancy, sharing, distribution, encryption, etc.
Stars: ✭ 324 (+290.36%)
Mutual labels:  s3, nfs
Infinit
The Infinit policy-based software-defined storage platform.
Stars: ✭ 363 (+337.35%)
Mutual labels:  s3, nfs
Terraform Aws S3 Log Storage
This module creates an S3 bucket suitable for receiving logs from other AWS services such as S3, CloudFront, and CloudTrail
Stars: ✭ 65 (-21.69%)
Mutual labels:  s3
Bucketlist
Amazon S3 bucket spelunking!
Stars: ✭ 72 (-13.25%)
Mutual labels:  s3
Cyberduck
Cyberduck is a libre FTP, SFTP, WebDAV, Amazon S3, Backblaze B2, Microsoft Azure & OneDrive and OpenStack Swift file transfer client for Mac and Windows.
Stars: ✭ 1,080 (+1201.2%)
Mutual labels:  s3
S3reverse
The format of various s3 buckets is convert in one format. for bugbounty and security testing.
Stars: ✭ 61 (-26.51%)
Mutual labels:  s3
Google Sheet S3
Google Apps Script that publishes a Google Sheet to Amazon S3 as a JSON file. Auto-updates on edit & maintains data types. Creates an array of objects keyed by column header.
Stars: ✭ 81 (-2.41%)
Mutual labels:  s3
Tiledb Py
Python interface to the TileDB storage manager
Stars: ✭ 78 (-6.02%)
Mutual labels:  s3
Objstore
A Multi-Master Distributed Caching Layer for Amazon S3.
Stars: ✭ 69 (-16.87%)
Mutual labels:  s3
React Deploy S3
Deploy create react app's in AWS S3
Stars: ✭ 66 (-20.48%)
Mutual labels:  s3
Locopy
locopy: Loading/Unloading to Redshift and Snowflake using Python.
Stars: ✭ 73 (-12.05%)
Mutual labels:  s3
Cloud Volume
Read and write Neuroglancer datasets programmatically.
Stars: ✭ 63 (-24.1%)
Mutual labels:  s3
Akubra
Simple solution to keep a independent S3 storages in sync
Stars: ✭ 79 (-4.82%)
Mutual labels:  s3
Undocumented S3 Apis
Undocumented Amazon S3 APIs and third-party extensions
Stars: ✭ 63 (-24.1%)
Mutual labels:  s3
Cloud Security Audit
A command line security audit tool for Amazon Web Services
Stars: ✭ 68 (-18.07%)
Mutual labels:  s3
Security Camera
🔦 Motion detecting security camera using a raspberry pi, webcam, and slack
Stars: ✭ 76 (-8.43%)
Mutual labels:  s3
Antenna
Painless iOS over-the-air enterprise distribution
Stars: ✭ 67 (-19.28%)
Mutual labels:  s3

Datashim

Go Report Card

Our Framework introduces the Dataset CRD which is a pointer to existing S3 and NFS data sources. It includes the necessary logic to map these Datasets into Persistent Volume Claims and ConfigMaps which users can reference in their pods, letting them focus on the workload development and not on configuring/mounting/tuning the data access. Thanks to Container Storage Interface it is extensible to support additional data sources in the future.

DLF

A Kubernetes Framework to provide easy access to S3 and NFS Datasets within pods. Orchestrates the provisioning of Persistent Volume Claims and ConfigMaps needed for each Dataset. Find more details in our FAQ

Quickstart

In order to quickly deploy DLF, based on your environment execute one of the following commands:

  • Kubernetes/Minikube
kubectl apply -f https://raw.githubusercontent.com/IBM/dataset-lifecycle-framework/master/release-tools/manifests/dlf.yaml
  • Kubernetes on IBM Cloud
kubectl apply -f https://raw.githubusercontent.com/IBM/dataset-lifecycle-framework/master/release-tools/manifests/dlf-ibm-k8s.yaml
  • Openshift
kubectl apply -f https://raw.githubusercontent.com/IBM/dataset-lifecycle-framework/master/release-tools/manifests/dlf-oc.yaml
  • Openshift on IBM Cloud
kubectl apply -f https://raw.githubusercontent.com/IBM/dataset-lifecycle-framework/master/release-tools/manifests/dlf-ibm-oc.yaml

Wait for all the pods to be ready :)

kubectl wait --for=condition=ready pods -l app.kubernetes.io/name=dlf -n dlf

As an optional step, label the namespace(or namespaces) you want in order have the pods labelling functionality (see below).

kubectl label namespace default monitor-pods-datasets=enabled

In case don't have an existing S3 Bucket follow our wiki to deploy an Object Store and populate it with data.

We will create now a Dataset named example-dataset pointing to your S3 bucket.

cat <<EOF | kubectl apply -f -
apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
  name: example-dataset
spec:
  local:
    type: "COS"
    accessKeyID: "{AWS_ACCESS_KEY_ID}"
    secretAccessKey: "{AWS_SECRET_ACCESS_KEY}"
    endpoint: "{S3_SERVICE_URL}"
    bucket: "{BUCKET_NAME}"
    readonly: "true" #OPTIONAL, default is false  
    region: "" #OPTIONAL
EOF

If everything worked okay, you should see a PVC and a ConfigMap named example-dataset which you can mount in your pods. As an easier way to use the Dataset in your pod, you can instead label the pod as follows:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    dataset.0.id: "example-dataset"
    dataset.0.useas: "mount"
spec:
  containers:
    - name: nginx
      image: nginx

As a convention the Dataset will be mounted in /mnt/datasets/example-dataset. If instead you wish to pass the connection details as environment variables, change the useas line to dataset.0.useas: "configmap"

Feel free to explore our examples

FAQ

Have a look on our wiki for Frequently Asked Questions

Roadmap

Have a look on our wiki for Roadmap

Contact

Reach out to us via email:

Acknowledgements

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825061.

H2020 evolve.

H2020 evolve logo
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].