All Projects → pachyderm → Pachyderm

pachyderm / Pachyderm

Licence: other
Reproducible Data Science at Scale!

Programming Languages

go
31211 projects - #10 most used programming language
shell
77523 projects
Mustache
554 projects
Makefile
30231 projects
Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pachyderm

Data Science Live Book
An open source book to learn data science, data analysis and machine learning, suitable for all ages!
Stars: ✭ 193 (-96.36%)
Mutual labels:  data-science, analytics, data-analysis, big-data
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (-13.65%)
Mutual labels:  data-science, analytics, big-data, distributed-systems
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-98.51%)
Mutual labels:  data-science, data-analysis, big-data
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (-74.78%)
Mutual labels:  data-science, data-analysis, big-data
Courses
Quiz & Assignment of Coursera
Stars: ✭ 454 (-91.44%)
Mutual labels:  data-science, data-analysis, big-data
Model Describer
model-describer : Making machine learning interpretable to humans
Stars: ✭ 22 (-99.59%)
Mutual labels:  data-science, analytics, data-analysis
Dataflowjavasdk
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (-83.9%)
Mutual labels:  data-science, data-analysis, big-data
Superset
Apache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+703.66%)
Mutual labels:  data-science, analytics, data-analysis
Tennis Crystal Ball
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-97.98%)
Mutual labels:  data-science, data-analysis, big-data
Pythondata
repo for code published on pythondata.com
Stars: ✭ 113 (-97.87%)
Mutual labels:  data-science, data-analysis, big-data
Datasciencevm
Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (-97.12%)
Mutual labels:  data-science, data-analysis, big-data
Sciblog support
Support content for my blog
Stars: ✭ 694 (-86.92%)
Mutual labels:  data-science, analytics, big-data
Data Science Career
Career Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Stars: ✭ 630 (-88.12%)
Mutual labels:  data-science, analytics, big-data
My Journey In The Data Science World
📢 Ready to learn or review your knowledge!
Stars: ✭ 1,175 (-77.85%)
Mutual labels:  data-science, data-analysis, big-data
Countly Sdk Cordova
Countly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: ✭ 69 (-98.7%)
Mutual labels:  analytics, data-analysis, big-data
Spark R Notebooks
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-97.95%)
Mutual labels:  data-science, data-analysis, big-data
nebula
A distributed block-based data storage and compute engine
Stars: ✭ 127 (-97.61%)
Mutual labels:  distributed-systems, big-data, data-analysis
awesome-AI-kubernetes
❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (-98.21%)
Mutual labels:  big-data, analytics, pachyderm
Agile data code 2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (-92.21%)
Mutual labels:  data-science, analytics
The Elements Of Statistical Learning Python Notebooks
A series of Python Jupyter notebooks that help you better understand "The Elements of Statistical Learning" book
Stars: ✭ 405 (-92.37%)
Mutual labels:  data-science, data-analysis

GitHub release GitHub license GoDoc Go Report Card Slack Status CLA assistant

Pachyderm: The Data Foundation for Machine Learning

Pachyderm provides the data layer that allows machine learning teams to productionize and scale their machine learning lifecycle. With Pachyderm’s industry leading data versioning, pipelines and lineage teams gain data driven automation, petabyte scalability and end-to-end reproducibility. Teams using Pachyderm get their ML projects to market faster, lower data processing and storage costs, and can more easily meet regulatory compliance requirements

Features

  • Automated Data Versioning: Pachyderm’s Data Versioning gives teams an automated and performant way to keep track of all data changes.
  • Data-Driven Pipelines: Pachyderm’s Containerized Pipelines speed data processing while lowering compute costs.
  • Immutable Data Lineage: Pachyderm’s data lineage provides an immutable record for all activities and assets in the ML lifecycle.
  • Console: The Pachyderm Console provides an intuitive visualization of your DAG (directed acyclic graph), and aids in reproducibility.
  • Notebooks: Pachyderm Notebooks provide an easy way to interact with Pachyderm data versioning and pipelines via Jupyter notebooks.

Getting Started

To start deploying your end-to-end version-controlled data pipelines, try us for free on Hub with little to no setup or run Pachyderm locally. You can also deploy on AWS/GCE/Azure in about 5 minutes.

You can also refer to our complete documentation to see tutorials, check out example projects, and learn about advanced features of Pachyderm.

If you'd like to see some examples and learn about core use cases for Pachyderm:

Documentation

Official Documentation

Community

Keep up to date and get Pachyderm support via:

  • Twitter Follow us on Twitter.
  • Slack Status Join our community Slack Channel to get help from the Pachyderm team and other users.

Contributing

To get started, sign the Contributor License Agreement.

You should also check out our contributing guide.

Send us PRs, we would love to see what you do! You can also check our GH issues for things labeled "help-wanted" as a good place to start. We're sometimes bad about keeping that label up-to-date, so if you don't see any, just let us know.

Join Us

WE'RE HIRING! Love Docker, Go and distributed systems? Learn more about our open positions

Usage Metrics

Pachyderm automatically reports anonymized usage metrics. These metrics help us understand how people are using Pachyderm and make it better. They can be disabled by setting the env variable METRICS to false in the pachd container.

License Information

Pachyderm has moved some components of Pachyderm Platform to a source-available limited license.

We remain committed to the culture of open source, developing our product transparently and collaboratively with our community, and giving our community and customers source code access and the ability to study and change the software to suit their needs.

Under the Pachyderm Community License, you can access the source code and modify or redistribute it; there is only one thing you cannot do, and that is use it to make a competing offering.

Check out our License FAQ Page for more information.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].