All Projects → quiltdata → Quilt

quiltdata / Quilt

Licence: apache-2.0
Quilt is a self-organizing data hub for S3

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Quilt

Introduction Datascience Python Book
Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications
Stars: ✭ 275 (-72.69%)
Mutual labels:  jupyter-notebook, data
Agile data code 2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (-58.99%)
Mutual labels:  jupyter-notebook, data
Python
This repository helps you understand python from the scratch.
Stars: ✭ 285 (-71.7%)
Mutual labels:  jupyter-notebook, data
Data
Data and code behind the articles and graphics at FiveThirtyEight
Stars: ✭ 15,241 (+1413.51%)
Mutual labels:  jupyter-notebook, data
Pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 647 (-35.75%)
Mutual labels:  data-engineering, data
qsv
CSVs sliced, diced & analyzed.
Stars: ✭ 438 (-56.5%)
Mutual labels:  data-engineering, parquet
Cartola
Extração de dados da API do CartolaFC, análise exploratória dos dados e modelos preditivos em R e Python - 2014-20. [EN] Data munging, analysis and modeling of CartolaFC - the most popular fantasy football game in Brazil and maybe in the world. Data cover years 2014-19.
Stars: ✭ 304 (-69.81%)
Mutual labels:  jupyter-notebook, data
Data science blogs
A repository to keep track of all the code that I end up writing for my blog posts.
Stars: ✭ 139 (-86.2%)
Mutual labels:  jupyter-notebook, data
Sklearn Classification
Data Science Notebook on a Classification Task, using sklearn and Tensorflow.
Stars: ✭ 518 (-48.56%)
Mutual labels:  jupyter-notebook, data
Data Engineering Book
Accumulated knowledge and experience in the field of Data Engineering
Stars: ✭ 471 (-53.23%)
Mutual labels:  data-engineering, data
California Coronavirus Data
The Los Angeles Times' independent tally of coronavirus cases in California.
Stars: ✭ 188 (-81.33%)
Mutual labels:  jupyter-notebook, data
Skdata
Python tools for data analysis
Stars: ✭ 16 (-98.41%)
Mutual labels:  jupyter-notebook, data
Data Science Resources
👨🏽‍🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋
Stars: ✭ 171 (-83.02%)
Mutual labels:  jupyter-notebook, data
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (-72.89%)
Mutual labels:  jupyter-notebook, data
Data Engineering Nanodegree
Projects done in the Data Engineering Nanodegree by Udacity.com
Stars: ✭ 151 (-85%)
Mutual labels:  jupyter-notebook, data-engineering
Datascience course
Curso de Data Science em Português
Stars: ✭ 294 (-70.8%)
Mutual labels:  jupyter-notebook, data
Hass Data Detective
Explore and analyse your Home Assistant data
Stars: ✭ 109 (-89.18%)
Mutual labels:  jupyter-notebook, data
Datasets
🎁 3,000,000+ Unsplash images made available for research and machine learning
Stars: ✭ 1,805 (+79.25%)
Mutual labels:  jupyter-notebook, data
Udacity Data Engineering Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (-54.52%)
Mutual labels:  data-engineering, data
Awesome Ai Ml Dl
Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it. Study notes and a curated list of awesome resources of such topics.
Stars: ✭ 831 (-17.48%)
Mutual labels:  jupyter-notebook, data

docs on_gitbook chat on_slack codecov pypi

Quilt is a self-organizing data hub

Python Quick start, tutorials

If you have Python and an S3 bucket, you're ready to create versioned datasets with Quilt. Visit the Quilt docs for installation instructions, a quick start, and more.

Quilt in action

Who is Quilt for?

Quilt is for data-driven teams and offers features for coders (data scientists, data engineers, developers) and business users alike.

What does Quilt do?

Quilt manages data like code so that teams in machine learning, biotech, and analytics can experiment faster, build smarter models, and recover from errors.

How does Quilt work?

Quilt consists of a Python client, web catalog, lambda functions—all of which are open source—plus a suite of backend services and Docker containers orchestrated by CloudFormation.

The backend services are available under a paid license on quiltdata.com.

Use cases

  • Share data at scale. Quilt wraps AWS S3 to add simple URLs, web preview for large files, and sharing via email address (no need to create an IAM role).
  • Understand data better through inline documentation (Jupyter notebooks, markdown) and visualizations (Vega, Vega Lite)
  • Discover related data by indexing objects in ElasticSearch
  • Model data by providing a home for large data and models that don't fit in git, and by providing immutable versions for objects and data sets (a.k.a. "Quilt Packages")
  • Decide by broadening data access within the organization and supporting the documentation of decision processes through audit-able versioning and inline documentation

Roadmap

I - Performance and core services

  • [x] Address performance issues with push (e.g. re-hash)
  • [x] Provide Presto-DB-powered services for filtering package repos with SQL
  • [ ] Investigate and implement more efficient manifest formats (e.g. Parquet), that scale to 10M keys; consider abbreviated "fast manifests" for lazy browsing
  • [ ] Refactor s3://bucket/.quilt for improved listing and delete performance

II - CI/CD for data

  • [ ] Ability to fork/merge packages
  • [ ] Data quality monitoring

III - Storage agnostic (support Azure, GCP buckets)

  • [ ] Evaluate min.io and ceph.io as shims
  • [ ] Evaluate feasibility of on-prem local storage as a repo

IV - Cloud agnostic

  • [ ] Evaluate K8s and Terraform to replace CloudFormation
  • [ ] Shim lambdas (consider serverless.com)
  • [ ] Shim ElasticSearch (consider SOLR)
  • [ ] Shim IAM via RBAC
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].