All Projects → dask → knit

dask / knit

Licence: other
Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead

Programming Languages

python
139335 projects - #7 most used programming language
scala
5932 projects
shell
77523 projects

Projects that are alternatives of or similar to knit

Akkeeper
An easy way to deploy your Akka services to a distributed environment.
Stars: ✭ 30 (-43.4%)
Mutual labels:  yarn, hadoop
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+20637.74%)
Mutual labels:  yarn, hadoop
Jumbune
Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,
Stars: ✭ 64 (+20.75%)
Mutual labels:  yarn, hadoop
Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+1516.98%)
Mutual labels:  yarn, hadoop
beanszoo
Distributed Java micro-services using ZooKeeper
Stars: ✭ 12 (-77.36%)
Mutual labels:  yarn, hadoop
Tf Yarn
Train TensorFlow models on YARN in just a few lines of code!
Stars: ✭ 76 (+43.4%)
Mutual labels:  yarn, hadoop
Hadoop Yarn Api Python Client
Python client for Hadoop® YARN API
Stars: ✭ 91 (+71.7%)
Mutual labels:  yarn, hadoop
Tensorflowonyarn
Support TensorFlow on YARN
Stars: ✭ 114 (+115.09%)
Mutual labels:  yarn, hadoop
docker-hadoop
Docker image for main Apache Hadoop components (Yarn/Hdfs)
Stars: ✭ 59 (+11.32%)
Mutual labels:  yarn, hadoop
Xlearning
AI on Hadoop
Stars: ✭ 1,709 (+3124.53%)
Mutual labels:  yarn, hadoop
wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-64.15%)
Mutual labels:  yarn, hadoop
yarn-prometheus-exporter
Export Hadoop YARN (resource-manager) metrics in prometheus format
Stars: ✭ 44 (-16.98%)
Mutual labels:  yarn, hadoop
fastdata-cluster
Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-62.26%)
Mutual labels:  yarn, hadoop
monopack
A JavaScript bundler for node.js monorepo-codebased applications.
Stars: ✭ 52 (-1.89%)
Mutual labels:  yarn
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+109.43%)
Mutual labels:  hadoop
framequery
SQL on dataframes - pandas and dask
Stars: ✭ 63 (+18.87%)
Mutual labels:  dask
typester
✒️ A WYSIWYG that gives you predictable and clean HTML
Stars: ✭ 29 (-45.28%)
Mutual labels:  yarn
ibis
IBIS is a workflow creation-engine that abstracts the Hadoop internals of ingesting RDBMS data.
Stars: ✭ 48 (-9.43%)
Mutual labels:  hadoop
monoreact
📦 React workspaces implementation
Stars: ✭ 13 (-75.47%)
Mutual labels:  yarn
Addax
Addax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
Stars: ✭ 615 (+1060.38%)
Mutual labels:  hadoop

knit

Build Status Coverage Status

Note: This library has been superceded by Skein (https://jcrist.github.io/skein/), and is no longer maintained.

For deploying Dask on YARN, please see dask-yarn (http://dask-yarn.readthedocs.io/), which has been rewritten to use Skein instead of Knit.

For user issues, please refer to either of those repositories

The knit library provides a Python interface to Scala for interacting with the YARN resource manager.

View the documentation for knit.

Overview

knit allows you to use python in conjunction with YARN, the most common resource manager for Hadoop systems. It provides to following high-level entry-points:

  • CondaCreator, a way to create zipped conda environments, so that they can be uploaded to HDFS and extracted for use in YARN containers
  • YARNAPI, an interface to the YARN resource manager to get application/container statuses, logs, and to kill running jobs
  • Knit, a YARN application runner, which generates an instance of a scala-based YARN client, and launches an application on YARN, which in turn runs commands in YARN containers
  • DaskYARNCluster, launches a Dask distributed cluster on YARN, one worker process per container.

The intent is to use knit from a cluster edge-node, i.e., with YARN configuration and the CLI available locally.

Quickstart

Install from conda-forge

> conda install -c conda-forge knit

or with pip

> pip install knit

If installing from source, you must first build the java library (requires java and maven)

> python setup.py install mvn

To run an arbitrary command on the yarn cluster

import knit
k = knit.Knit()
k.start('env')  # wait some time
k.logs()

To start a dask cluster on YARN

import dask_yarn
cluster = dask_yarn.DaskYARNCluster()
cluster.start(nworkers=4, memory=1024, cpus=2)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].