All Projects → yahoo → Caffeonspark

yahoo / Caffeonspark

Licence: apache-2.0
Distributed deep learning on Hadoop and Spark clusters.

Projects that are alternatives of or similar to Caffeonspark

Tensorflow Tutorial
Stars: ✭ 85 (-93.32%)
Mutual labels:  jupyter-notebook
Sagemaker Ml Workflow With Apache Airflow
This repository shows a sample example to build, manage and orchestrate Machine Learning workflows using Amazon Sagemaker and Apache Airflow.
Stars: ✭ 86 (-93.24%)
Mutual labels:  jupyter-notebook
Pyrenko
Stars: ✭ 86 (-93.24%)
Mutual labels:  jupyter-notebook
Sphinx Book Theme
A lightweight book theme built off of the pydata sphinx theme
Stars: ✭ 86 (-93.24%)
Mutual labels:  jupyter-notebook
Zh Nlp Demo
自然语言处理NLP在中文文本上的一些应用,如文本分类、情感分析、命名实体识别等
Stars: ✭ 86 (-93.24%)
Mutual labels:  jupyter-notebook
Ml Cv
机器学习实战
Stars: ✭ 85 (-93.32%)
Mutual labels:  jupyter-notebook
Ganspace
Discovering Interpretable GAN Controls [NeurIPS 2020]
Stars: ✭ 1,224 (-3.77%)
Mutual labels:  jupyter-notebook
Atom notebook
Atom notebook
Stars: ✭ 85 (-93.32%)
Mutual labels:  jupyter-notebook
Network science meets deep learning
Designing Deep neural network architectures using topologies from the world of Complex Networks/network Science
Stars: ✭ 86 (-93.24%)
Mutual labels:  jupyter-notebook
Python For Data Scientists
Deliverable: This Jupyter notebook will help aspiring data scientists learn and practice the necessary python code needed for many data science projects.
Stars: ✭ 86 (-93.24%)
Mutual labels:  jupyter-notebook
Introdatasci
Course materials for: Introduction to Data Science and Programming
Stars: ✭ 86 (-93.24%)
Mutual labels:  jupyter-notebook
Quantecon Notebooks Julia
Stars: ✭ 85 (-93.32%)
Mutual labels:  jupyter-notebook
Kaggle Competitions
There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.
Stars: ✭ 86 (-93.24%)
Mutual labels:  jupyter-notebook
Training Material
A collection of code examples as well as presentations for training purposes
Stars: ✭ 85 (-93.32%)
Mutual labels:  jupyter-notebook
Quantum programming tutorial
Gamified tutorial for the QISKit quantum SDK
Stars: ✭ 86 (-93.24%)
Mutual labels:  jupyter-notebook
100 Plus Python Programming Exercises Extended
100+ python programming exercise problem discussed ,explained and solved in different ways
Stars: ✭ 1,250 (-1.73%)
Mutual labels:  jupyter-notebook
Wotan
Automagically remove trends from time-series data
Stars: ✭ 86 (-93.24%)
Mutual labels:  jupyter-notebook
Deep Learning Boot Camp
A community run, 5-day PyTorch Deep Learning Bootcamp
Stars: ✭ 1,270 (-0.16%)
Mutual labels:  jupyter-notebook
Viz torch optim
Videos of deep learning optimizers moving on 3D problem-landscapes
Stars: ✭ 86 (-93.24%)
Mutual labels:  jupyter-notebook
Book Mlearn Gyomu
Book sample (AI Machine-learning Deep-learning)
Stars: ✭ 84 (-93.4%)
Mutual labels:  jupyter-notebook

Note: we're lovingly marking this project as Archived since we're no longer supporting it. You are welcome to read the code and fork your own version of it and continue to use this code under the terms of the project license.

CaffeOnSpark

What's CaffeOnSpark?

CaffeOnSpark brings deep learning to Hadoop and Spark clusters. By combining salient features from deep learning framework Caffe and big-data frameworks Apache Spark and Apache Hadoop, CaffeOnSpark enables distributed deep learning on a cluster of GPU and CPU servers.

As a distributed extension of Caffe, CaffeOnSpark supports neural network model training, testing, and feature extraction. Caffe users can now perform distributed learning using their existing LMDB data files and minorly adjusted network configuration (as illustrated).

CaffeOnSpark is a Spark package for deep learning. It is complementary to non-deep learning libraries MLlib and Spark SQL. CaffeOnSpark's Scala API provides Spark applications with an easy mechanism to invoke deep learning (see sample) over distributed datasets.

CaffeOnSpark was developed by Yahoo for large-scale distributed deep learning on our Hadoop clusters in Yahoo's private cloud. It's been in use by Yahoo for image search, content classification and several other use cases.

Why CaffeOnSpark?

CaffeOnSpark provides some important benefits (see our blog) over alternative deep learning solutions.

  • It enables model training, test and feature extraction directly on Hadoop datasets stored in HDFS on Hadoop clusters.
  • It turns your Hadoop or Spark cluster(s) into a powerful platform for deep learning, without the need to set up a new dedicated cluster for deep learning separately.
  • Server-to-server direct communication (Ethernet or InfiniBand) achieves faster learning and eliminates scalability bottleneck.
  • Caffe users' existing datasets (e.g. LMDB) and configurations could be applied for distributed learning without any conversion needed.
  • High-level API empowers Spark applications to easily conduct deep learning.
  • Incremental learning is supported to leverage previously trained models or snapshots.
  • Additional data formats and network interfaces could be easily added.
  • It can be easily deployed on public cloud (ex. AWS EC2) or a private cloud.

Using CaffeOnSpark

Please check CaffeOnSpark wiki site for detailed documentations such as building instruction, API reference and getting started guides for standalone cluster and AWS EC2 cluster.

  • Batch sizes specified in prototxt files are per device.
  • Memory layers should not be shared among GPUs, and thus "share_in_parallel: false" is required for layer configuration.

Building for Spark 2.X

CaffeOnSpark supports both Spark 1.x and 2.x. For Spark 2.0, our default settings are:

  • spark-2.0.0
  • hadoop-2.7.1
  • scala-2.11.7 You may want to adjust them in caffe-grid/pom.xml.

Mailing List

Please join CaffeOnSpark user group for discussions and questions.

License

The use and distribution terms for this software are covered by the Apache 2.0 license. See LICENSE file for terms.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].