All Projects → IBM → Ffdl

IBM / Ffdl

Licence: apache-2.0
Fabric for Deep Learning (FfDL, pronounced fiddle) is a Deep Learning Platform offering TensorFlow, Caffe, PyTorch etc. as a Service on Kubernetes

Programming Languages

python
139335 projects - #7 most used programming language
go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to Ffdl

Polyaxon
Machine Learning Platform for Kubernetes (MLOps tools for experimentation and automation)
Stars: ✭ 2,966 (+363.44%)
Mutual labels:  artificial-intelligence, ai, jupyter, caffe, ml
Pycm
Multi-class confusion matrix library in Python
Stars: ✭ 1,076 (+68.13%)
Mutual labels:  artificial-intelligence, ai, deeplearning, ml
Caffe2
Caffe2 is a lightweight, modular, and scalable deep learning framework.
Stars: ✭ 8,409 (+1213.91%)
Mutual labels:  artificial-intelligence, ai, deep-neural-networks, ml
Netron
Visualizer for neural network, deep learning, and machine learning models
Stars: ✭ 17,193 (+2586.41%)
Mutual labels:  ai, deeplearning, caffe, ml
Best ai paper 2020
A curated list of the latest breakthroughs in AI by release date with a clear video explanation, link to a more in-depth article, and code
Stars: ✭ 2,140 (+234.38%)
Mutual labels:  artificial-intelligence, ai, deep-neural-networks, deeplearning
Atlas
An Open Source, Self-Hosted Platform For Applied Deep Learning Development
Stars: ✭ 259 (-59.53%)
Mutual labels:  artificial-intelligence, ai, ml
0xdeca10b
Sharing Updatable Models (SUM) on Blockchain
Stars: ✭ 285 (-55.47%)
Mutual labels:  artificial-intelligence, ai, ml
Tensorwatch
Debugging, monitoring and visualization for Python Machine Learning and Data Science
Stars: ✭ 3,191 (+398.59%)
Mutual labels:  ai, jupyter, deeplearning
Awesome Coreml Models
Largest list of models for Core ML (for iOS 11+)
Stars: ✭ 5,192 (+711.25%)
Mutual labels:  caffe, ml, model
Atari
AI research environment for the Atari 2600 games 🤖.
Stars: ✭ 174 (-72.81%)
Mutual labels:  artificial-intelligence, ai, ml
Clai
Command Line Artificial Intelligence or CLAI is an open-sourced project from IBM Research aimed to bring the power of AI to the command line interface.
Stars: ✭ 320 (-50%)
Mutual labels:  artificial-intelligence, ai, ml
Deeplearning.ai
deeplearning.ai , By Andrew Ng, All video link
Stars: ✭ 625 (-2.34%)
Mutual labels:  artificial-intelligence, deep-neural-networks, deeplearning
Imodels
Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
Stars: ✭ 194 (-69.69%)
Mutual labels:  artificial-intelligence, ai, ml
Free Ai Resources
🚀 FREE AI Resources - 🎓 Courses, 👷 Jobs, 📝 Blogs, 🔬 AI Research, and many more - for everyone!
Stars: ✭ 192 (-70%)
Mutual labels:  artificial-intelligence, ai, deep-neural-networks
Classifai
Enhance your WordPress content with Artificial Intelligence and Machine Learning services.
Stars: ✭ 188 (-70.62%)
Mutual labels:  artificial-intelligence, ai, ml
Csinva.github.io
Slides, paper notes, class notes, blog posts, and research on ML 📉, statistics 📊, and AI 🤖.
Stars: ✭ 342 (-46.56%)
Mutual labels:  artificial-intelligence, ai, ml
Deeppavlov
An open source library for deep learning end-to-end dialog systems and chatbots.
Stars: ✭ 5,525 (+763.28%)
Mutual labels:  artificial-intelligence, ai, deep-neural-networks
Text summurization abstractive methods
Multiple implementations for abstractive text summurization , using google colab
Stars: ✭ 359 (-43.91%)
Mutual labels:  artificial-intelligence, ai, deeplearning
Kglib
Grakn Knowledge Graph Library (ML R&D)
Stars: ✭ 405 (-36.72%)
Mutual labels:  artificial-intelligence, ai, ml
Xai
XAI - An eXplainability toolbox for machine learning
Stars: ✭ 596 (-6.87%)
Mutual labels:  artificial-intelligence, ai, ml

Read this in other languages: 中文.

build status

Fabric for Deep Learning (FfDL)

This repository contains the core services of the FfDL (Fabric for Deep Learning) platform. FfDL is an operating system "fabric" for Deep Learning. It is a collaboration platform for:

  • Framework-independent training of Deep Learning models on distributed hardware
  • Open Deep Learning APIs
  • Running Deep Learning hosting in user's private or public cloud

ffdl-architecture

To know more about the architectural details, please read the design document. If you are looking for demos, slides, collaterals, blogs, webinars and other materials related to FfDL, please find them here

Prerequisites

Usage Scenarios

  • If you are getting started and want to setup your own FfDL deployment, please follow the steps below.
  • If you have a FfDL deployment up and running, you can jump to FfDL User Guide to use FfDL for training your deep learning models.
  • If you want to leverage Jupyter notebooks to launch training on your FfDL cluster, please follow these instructions
  • If you have FfDL configured to use GPUs, and want to train using GPUs, follow steps here
  • To invoke Adversarial Robustness Toolbox to find vulnerabilities in your models, follow the instructions here
  • To deploy your trained models, follow the integration guide with Seldon
  • If you have used FfDL to train your models, and want to use a GPU enabled public cloud hosted service for further training and serving, please follow instructions here to train and serve your models using Watson Studio Deep Learning service.

Steps

  1. Quick Start
  1. Test
  2. Monitoring
  3. Development
  4. Clean Up
  5. Troubleshooting
  6. References

1. Quick Start

There are multiple installation paths for installing FfDL into an existing Kubernetes cluster. Below are the steps for quick install. If you want to follow more detailed step by step instructions , please visit the detailed installation guide

If you are using bash shell, you can modify the necessary environment variables in env.txt and export all of them using the following commands

source env.txt
export $(cut -d= -f1 env.txt)

1.1 Installation using Kubeadm-DIND

If you have Kubeadm-DIND installed on your machine, use these commands to deploy the FfDL platform:

export VM_TYPE=dind
export PUBLIC_IP=localhost
export SHARED_VOLUME_STORAGE_CLASS="";
export NAMESPACE=default # If your namespace does not exist yet, please create the namespace `kubectl create namespace $NAMESPACE` before running the make commands below

make deploy-plugin
make quickstart-deploy

1.2 Installation using Kubernetes Cluster

To install FfDL to any proper Kubernetes cluster, make sure kubectl points to the right namespace, then deploy the platform services:

Note: For PUBLIC_IP, put down one of your Cluster Public IP that can access your Cluster's NodePorts. For IBM Cloud, you can get your Public IP with bx cs workers <cluster_name>.

export VM_TYPE=none
export PUBLIC_IP=<Cluster Public IP>
export NAMESPACE=default # If your namespace does not exist yet, please create the namespace `kubectl create namespace $NAMESPACE` before running the make commands below

# Change the storage class to what's available on your Cloud Kubernetes Cluster.
export SHARED_VOLUME_STORAGE_CLASS="ibmc-file-gold";

make deploy-plugin
make quickstart-deploy

2. Test

To submit a simple example training job that is included in this repo (see etc/examples folder):

make test-push-data-s3
make test-job-submit

3. Monitoring

The platform ships with a simple Grafana monitoring dashboard. The URL is printed out when running the deploy make target.

4. Development

Please refer to the developer guide for more details.

5. Clean Up

If you want to remove FfDL from your cluster, simply use the following commands.

helm delete $(helm list | grep ffdl | awk '{print $1}' | head -n 1)

If you want to remove the storage driver and pvc from your cluster, run:

kubectl delete pvc static-volume-1
helm delete $(helm list | grep ibmcloud-object-storage-plugin | awk '{print $1}' | head -n 1)

For Kubeadm-DIND, you need to kill your forwarded ports. Note that the below command will kill all the ports that are created with kubectl.

kill $(lsof -i | grep kubectl | awk '{printf $2 " " }')

6. Troubleshooting

  • FfDL has only been tested under Mac OS and Linux
  • If glide install fails with an error complaining about non-existing paths (e.g., "Without src, cannot continue"), make sure to follow the standard Go directory layout (see Prerequisites section).

  • To remove FfDL on your Cluster, simply run make undeploy

  • When using the FfDL CLI to train a model, make sure your directory path doesn't have slashes / at the end.

  • If your job is stuck in pending stage, you can try to redeploy the plugin with helm install storage-plugin --set dind=true,cloud=false for Kubeadm-DIND and helm install storage-plugin for general Kubernetes Cluster. Also, double check your training job manifest file to make sure you have the correct object storage credentials.

7. References

Based on IBM Research work in Deep Learning.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].