All Projects → statham-stone → Multirunner

statham-stone / Multirunner

This is a python package for multi-process running.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Multirunner

Mobilenet Ssd Realsense
[High Performance / MAX 30 FPS] RaspberryPi3(RaspberryPi/Raspbian Stretch) or Ubuntu + Multi Neural Compute Stick(NCS/NCS2) + RealSense D435(or USB Camera or PiCamera) + MobileNet-SSD(MobileNetSSD) + Background Multi-transparent(Simple multi-class segmentation) + FaceDetection + MultiGraph + MultiProcessing + MultiClustering
Stars: ✭ 322 (+33.06%)
Mutual labels:  deeplearning, multiprocessing
Tensorflow Internals
It is open source ebook about TensorFlow kernel and implementation mechanism.
Stars: ✭ 2,683 (+1008.68%)
Mutual labels:  deeplearning
Text Classification
Text Classification through CNN, RNN & HAN using Keras
Stars: ✭ 216 (-10.74%)
Mutual labels:  deeplearning
Bert Attributeextraction
USING BERT FOR Attribute Extraction in KnowledgeGraph. fine-tuning and feature extraction. 使用基于bert的微调和特征提取方法来进行知识图谱百度百科人物词条属性抽取。
Stars: ✭ 224 (-7.44%)
Mutual labels:  deeplearning
Cartoonize
A demo webapp to convert images and videos into cartoon!
Stars: ✭ 215 (-11.16%)
Mutual labels:  deeplearning
Deep Learning In Production
Develop production ready deep learning code, deploy it and scale it
Stars: ✭ 216 (-10.74%)
Mutual labels:  deeplearning
Trixi
Manage your machine learning experiments with trixi - modular, reproducible, high fashion. An experiment infrastructure optimized for PyTorch, but flexible enough to work for your framework and your tastes.
Stars: ✭ 211 (-12.81%)
Mutual labels:  deeplearning
Udemy derinogrenmeyegiris
Udemy Derin Öğrenmeye Giriş Kursunun Uygulamaları ve Daha Fazlası
Stars: ✭ 239 (-1.24%)
Mutual labels:  deeplearning
Bmw Yolov4 Inference Api Gpu
This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.
Stars: ✭ 237 (-2.07%)
Mutual labels:  deeplearning
Deepfashion
Apparel detection using deep learning
Stars: ✭ 223 (-7.85%)
Mutual labels:  deeplearning
My Awesome Ai Bookmarks
Curated list of my reads, implementations and core concepts of Artificial Intelligence, Deep Learning, Machine Learning by best folk in the world.
Stars: ✭ 223 (-7.85%)
Mutual labels:  deeplearning
Develop Source
Open source for developer.(开发资源整理:Java,Android,算法,iOS,MacOS等等)
Stars: ✭ 219 (-9.5%)
Mutual labels:  deeplearning
Retinaface
The remake of the https://github.com/biubug6/Pytorch_Retinaface
Stars: ✭ 226 (-6.61%)
Mutual labels:  deeplearning
Paddlehelix
Bio-Computing Platform featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集
Stars: ✭ 213 (-11.98%)
Mutual labels:  deeplearning
Learningdl
三个月教你从零入门深度学习Tensorflow版配套代码
Stars: ✭ 238 (-1.65%)
Mutual labels:  deeplearning
Joblib
Computing with Python functions.
Stars: ✭ 2,620 (+982.64%)
Mutual labels:  multiprocessing
Pytorch cifar10
Pretrained TorchVision models on CIFAR10 dataset (with weights)
Stars: ✭ 219 (-9.5%)
Mutual labels:  deeplearning
Deeplearning cv notes
📓 deepleaning and cv notes.
Stars: ✭ 223 (-7.85%)
Mutual labels:  deeplearning
Gordon cnn
A small convolution neural network deep learning framework implemented in c++.
Stars: ✭ 241 (-0.41%)
Mutual labels:  deeplearning
Hierarchical Attention Networks Pytorch
Hierarchical Attention Networks for document classification
Stars: ✭ 239 (-1.24%)
Mutual labels:  deeplearning

MultiRunner说明文档

这是一个进程级别的python并行框架,可用于深度学习调参等任务,可通过 pip install MultiRunner 安装

注意,本包的使用极度简洁,原代码完全无需改动,使用本包的时候,包含import语句在内,仅需加入四行代码。

如果你遇到了以下问题之一,你可能需要这个包:

  • 你是一个机器学习调参侠,你在一台主机上安装了多个GPU,或者你有多台共享硬盘的主机(节点,aws,hpc等),你需要以不同的参数运行一个函数多次,该函数针对每个参数返回一个结果(你可能想知道最好的那个结果所对应的参数,没错,我说的就是深度炼丹)。你想让这些主机/GPU并行地为你跑程序,但是你懒得手动一个个输入命令,且由于不同的主机/GPU运算能力不同,完成不同任务所需要的时间也不同,你不知道该如何为不同的节点分配不同的任务量,你不愿意坐在电脑前等着程序运行结束,也不想一个个手动输入命令

  • 你有一台普通电脑,你需要以不同的参数运行一个函数多次并得到结果,你想尽力并行运行这个函数以加快实验速度,但是你不想学习Multiprocessing库

先看一个例子,你原来的代码可能是这样的:

# old_run.py

def train_a_model(batch_size,hidden_layer_number,learning_rate):
    #your code here
    return accuracy

for batch_size in [16,64,256]:
    for hidden_layer_number in [1,2]:
        for learning_rate in [0.001,0.01,0.1]:
            print batch_size,hidden_layer_number,learning_rate
            my_result = train_a_model(batch_size,hidden_layer_number,learning_rate)
            print my_result

现在它可以是这样:

# new_run.py

def train_a_model(batch_size,hidden_layer_number,learning_rate):
    #your code here
    return accuracy

from MultiRunner import MultiRunner

a=MultiRunner()
a.generate_ini([[16,64,256], [1,2], [0.001,0.01,0.1]])#注意这里,所有的参数列表要最后用一个list或者tuple括起来
a.run(train_a_model)

注意:修改后的代码可以在多台共享硬盘的主机上(或同一台电脑的不同GPU上)同时跑,多个进程会并行地跑不同的实验,一个进程跑完某个实验之后将以文件格式保存该实验结果并立即运行下一个没跑过的实验。我们已经在代码中做了充分的多进程冲突处理(全部是封装好的)

你只要需要在多个节点的同一个目录下分别运行 python new_run.py 即可

运行该代码后,最先运行该代码的进程会创建./ini目录,并在其中以文件的格式保存参数用于进程同步,以刚刚的代码为例,该文件夹内会有3*2*3=18个文件,文件以pickle格式存储,分明命名为0_to_run, 1_to_run ...... 17_to_run

实验过程中,每个正在运行的参数对应的文件后缀将被更改为XX_running,以5_to_run为例,该参数对应的实验过程中,5_to_run将被改名为5_running,若该实验成功运行完毕,实验结果以pickle文件格式成功存储于./results/5,5_running将再次被重命名为5_finished。

代码可以处理实验过程中的错误情况,如果某个参数的实验中出现了错误,该参数对应的文件XX_running会自动回滚为XX_to_run,之后该进程将会选择另一个参数进行实验,当一个进程合计错误次数达到一定值的时候,该进程将退出,默认值是5,该值可以通过对象创建时候的max_error_times参数进行设定。

进阶内容

  • 函数find_best_gpu可以返回当前空闲的gpu id
  • 为最大化地优化实验速度,如果某进程运行完一个实验后,进行下一个实验参数选择的时候发现所有的参数都已经实验完毕或者实验正在运行的,该进程会随机选择一个正在运行的实验参数运行,这样设计的目的是避免出现以下情况:A, B主机/GPU运行速度很快,C主机/GPU运行速度很慢,C选取了一个参数慢慢运行,A, B虽然没有任务但是无法为C分担,实验速度被C拖慢。本设计中,如果遇到上述情况,A, B在运行完自己的实验后都会运行C的实验,并在C运行完毕前得到实验结果,因此提高了实验速度。但是,代码也会存在实验结果已经全部得出但是仍有进程在运行的问题。
  • 以上的叙述中,我们仅以每个主机/GPU上仅运行一个进程为例。事实上,假设我们有两台共享硬盘的主机,每台主机上有3个GPU,每个GPU上同时运行了4个进程,这2*3*4=24个进程同时运行,我们的代码也无需做任何更改。

有任何问题都可以直接提issue,也可以发邮件联系我。

欢迎给Star,欢迎fork,欢迎提新需求。

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].