All Projects → Intel-bigdata → Tensorflowonyarn

Intel-bigdata / Tensorflowonyarn

Licence: apache-2.0
Support TensorFlow on YARN

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Tensorflowonyarn

docker-hadoop
Docker image for main Apache Hadoop components (Yarn/Hdfs)
Stars: ✭ 59 (-48.25%)
Mutual labels:  yarn, hadoop
fastdata-cluster
Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-82.46%)
Mutual labels:  yarn, hadoop
beanszoo
Distributed Java micro-services using ZooKeeper
Stars: ✭ 12 (-89.47%)
Mutual labels:  yarn, hadoop
Xlearning
AI on Hadoop
Stars: ✭ 1,709 (+1399.12%)
Mutual labels:  hadoop, yarn
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+9541.23%)
Mutual labels:  hadoop, yarn
yarn-prometheus-exporter
Export Hadoop YARN (resource-manager) metrics in prometheus format
Stars: ✭ 44 (-61.4%)
Mutual labels:  yarn, hadoop
wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-83.33%)
Mutual labels:  yarn, hadoop
knit
Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead
Stars: ✭ 53 (-53.51%)
Mutual labels:  yarn, hadoop
Akkeeper
An easy way to deploy your Akka services to a distributed environment.
Stars: ✭ 30 (-73.68%)
Mutual labels:  hadoop, yarn
Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+651.75%)
Mutual labels:  hadoop, yarn
Tf Yarn
Train TensorFlow models on YARN in just a few lines of code!
Stars: ✭ 76 (-33.33%)
Mutual labels:  hadoop, yarn
Jumbune
Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,
Stars: ✭ 64 (-43.86%)
Mutual labels:  hadoop, yarn
Hadoop Yarn Api Python Client
Python client for Hadoop® YARN API
Stars: ✭ 91 (-20.18%)
Mutual labels:  hadoop, yarn
Lmify
Install NPM dependencies programmatically 🤙
Stars: ✭ 98 (-14.04%)
Mutual labels:  yarn
Yarn
The 1.x line is frozen - features and bugfixes now happen on https://github.com/yarnpkg/berry
Stars: ✭ 40,325 (+35272.81%)
Mutual labels:  yarn
Molecule
⚛️ – :atom: – ⚛️ Boilerplate for cross platform web/native react apps with electron.
Stars: ✭ 95 (-16.67%)
Mutual labels:  yarn
Rocky
React Over Crystal Kemal and Yarn
Stars: ✭ 94 (-17.54%)
Mutual labels:  yarn
Introtohadoopandmr udacity course
🐘 Source code for assignments of Udacity course "Introduction to Hadoop and MapReduce"
Stars: ✭ 110 (-3.51%)
Mutual labels:  hadoop
Conf
Landing page for event React Conf Brazil
Stars: ✭ 104 (-8.77%)
Mutual labels:  yarn
Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-19.3%)
Mutual labels:  hadoop

TensorFlowOnYARN Build Status

TensorFlow on YARN (TOY) is a toolkit to enable Hadoop users an easy way to run TensorFlow applications in distributed pattern and accomplish tasks including model management and serving inference.

  • This project focuses on support of running Tensorflow on YARN, as part of Deep Learning on Hadoop (HDL) effort.
  • YARN-6043

Goals

  • Support all TensorFlow components on YARN, TensorFlow distributed cluster, TensorFlow serving, TensorBoard, etc.
  • Support multi-tenants with consideration of different types of users, such as devOp, data scientist and data engineer
  • Support running TensorFlow application in a short-time/long-running job manner of both between-graph mode and in-graph mode
  • Support model management to deploy and also support a service layer to handle upper layer's like Spark or web backend inference request easily
  • Minor or no changes required to run user’s existing TensorFlow application(can be written in all officially supported languages including Python, C++, Java and Go)

Note that current project is a prototype with limitation and is still under development

Architecture

Figure1. TOY Architecture

Features

  • [x] Launch a TensorFlow cluster with specified number of worker and PS server
  • [x] Replace python layer with java bridge layer to start server
  • [x] Generate ClusterSpec dynamically
  • [x] RPC support for client to get ClusterSpec from AM
  • [x] Signal handling for graceful shutdown
  • [x] Package TensorFlow runtime as a resource that can be distributed easily
  • [x] Run in-graph TensorFlow application in client mode
  • [x] TensorBoard support
  • [ ] Better handling of network port conflicts
  • [ ] Fault tolerance
  • [ ] Cluster mode based on Docker
  • [ ] Real-time logging support
  • [ ] Code refine and more tests

Quick Start

  1. Prepare the build environment following the instructions from https://www.tensorflow.org/install/install_sources

  2. Clone the TensorFlowOnYARN repository.

    git clone --recursive https://github.com/Intel-bigdata/TensorFlowOnYARN
    
  3. Build the assembly.

    cd TensorFlowOnYARN/tensorflow-parent
    mvn package -Pnative -Pdist
    

    tensorflow-yarn-${VERSION}.tar.gz and tensorflow-yarn-${VERSION}.zip are built out in the tensorflow-parent/tensorflow-yarn-dist/target directory. Distribute the assembly to the client node of a YARN cluster and extract.

  4. Run the between-graph mnist example.

    cd tensorflow-yarn-${VERSION}
    bin/ydl-tf launch --num_worker 2 --num_ps 2
    

    This will launch a YARN application, which creates a tf.train.Server instance for each task. A ClusterSpec is printed on the console such that you can submit the training script to. e.g.

    ClusterSpec: {"ps":["node1:22257","node2:22222"],"worker":["node3:22253","node2:22255"]}
    
    python examples/between-graph/mnist_feed.py \
      --ps_hosts="ps0.hostname:ps0.port,ps1.hostname:ps1.port" \
      --worker_hosts="worker0.hostname:worker0.port,worker1.hostname:worker1.port" \
      --task_index=0
    
    python examples/between-graph/mnist_feed.py \
      --ps_hosts="ps0.hostname:ps0.port,ps1.hostname:ps1.port" \
      --worker_hosts="worker0.hostname:worker0.port,worker1.hostname:worker1.port" \
      --task_index=1
    
  5. To get ClusterSpec of an existing TensorFlow cluster launched by a previous YARN application.

    bin/ydl-tf cluster --app_id <Application ID>
    
  6. You may also use YARN commands through ydl-tf.

    For example, to get running application list,

    bin/ydl-tf application --list
    

    or to kill an existing YARN application(TensorFlow cluster),

    bin/ydl-tf kill --application <Application ID>
    
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].