All Projects → lightforever → Mlcomp

lightforever / Mlcomp

Licence: apache-2.0
Distributed DAG (Directed acyclic graph) framework for machine learning with UI

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Mlcomp

Construct
JavaScript Digital Organisms simulator
Stars: ✭ 17 (-90.71%)
Mutual labels:  artificial-intelligence, research, distributed-computing
Catalyst
Accelerated deep learning R&D
Stars: ✭ 2,804 (+1432.24%)
Mutual labels:  research, infrastructure, distributed-computing
Pygame Learning Environment
PyGame Learning Environment (PLE) -- Reinforcement Learning Environment in Python.
Stars: ✭ 828 (+352.46%)
Mutual labels:  artificial-intelligence, research
Autodl
Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Stars: ✭ 854 (+366.67%)
Mutual labels:  artificial-intelligence, automl
Ai Residency List
List of AI Residency & Research programs, Ph.D Fellowships, Research Internships
Stars: ✭ 69 (-62.3%)
Mutual labels:  artificial-intelligence, research
Pba
Efficient Learning of Augmentation Policy Schedules
Stars: ✭ 461 (+151.91%)
Mutual labels:  artificial-intelligence, automl
Atm
Auto Tune Models - A multi-tenant, multi-data system for automated machine learning (model selection and tuning).
Stars: ✭ 504 (+175.41%)
Mutual labels:  automl, distributed-computing
Otto
Otto makes machine learning an intuitive, natural language experience. 🏆 Facebook AI Hackathon winner ⭐️ #1 Trending on MadeWithML.com ⭐️ #4 Trending JavaScript Project on GitHub ⭐️ #15 Trending (All Languages) on GitHub
Stars: ✭ 894 (+388.52%)
Mutual labels:  artificial-intelligence, automl
Research And Coding
研究资源列表 A curated list of research resources
Stars: ✭ 100 (-45.36%)
Mutual labels:  artificial-intelligence, research
Auto ml
[UNMAINTAINED] Automated machine learning for analytics & production
Stars: ✭ 1,559 (+751.91%)
Mutual labels:  artificial-intelligence, automl
Top 10 Computer Vision Papers 2020
A list of the top 10 computer vision papers in 2020 with video demos, articles, code and paper reference.
Stars: ✭ 132 (-27.87%)
Mutual labels:  artificial-intelligence, research
Mindsdb
Predictive AI layer for existing databases.
Stars: ✭ 4,199 (+2194.54%)
Mutual labels:  artificial-intelligence, automl
Lagom
lagom: A PyTorch infrastructure for rapid prototyping of reinforcement learning algorithms.
Stars: ✭ 364 (+98.91%)
Mutual labels:  artificial-intelligence, research
Carla
Open-source simulator for autonomous driving research.
Stars: ✭ 7,012 (+3731.69%)
Mutual labels:  artificial-intelligence, research
Yarp
YARP - Yet Another Robot Platform
Stars: ✭ 358 (+95.63%)
Mutual labels:  artificial-intelligence, research
Csinva.github.io
Slides, paper notes, class notes, blog posts, and research on ML 📉, statistics 📊, and AI 🤖.
Stars: ✭ 342 (+86.89%)
Mutual labels:  artificial-intelligence, research
Airsim
Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research
Stars: ✭ 12,528 (+6745.9%)
Mutual labels:  artificial-intelligence, research
Dreamerv2
Mastering Atari with Discrete World Models
Stars: ✭ 287 (+56.83%)
Mutual labels:  artificial-intelligence, research
Always Learning
404 Not Found的知识库:计算机理论基础、计算机技术基础、底层研究、安全技术、安全研究、人工智能、企业安全建设、安全发展、职业规划、综合素质、国内外优秀技术人
Stars: ✭ 329 (+79.78%)
Mutual labels:  artificial-intelligence, research
Awesome System For Machine Learning
A curated list of research in machine learning system. I also summarize some papers if I think they are really interesting.
Stars: ✭ 1,185 (+547.54%)
Mutual labels:  automl, infrastructure

MLComp logo

Distributed directed acyclic graph framework for machine learning with UI

Build Status Pipi version Docs PyPI Status

Twitter Telegram Slack Github contributors

The goal of MLComp is to provide tools for training, inferencing, creating complex pipelines (especially for computer vision) in a rapid, well manageable way.
MLComp is compatible with: Python 3.6+, Unix operation system.

Part of Catalyst Ecosystem. Project manifest.


Features

  • Amazing UI
  • Catalyst support
  • Distributed training
  • Supervisor that controls computational resources
  • Synchronization of both code and data
  • Resource monitoring
  • Full functionality of the pause and continue on UI
  • Auto control of the requirements
  • Code dumping (with syntax highlight on UI)
  • Kaggle integration
  • Hierarchical logging
  • Grid search
  • Experiments comparison
  • Customizing layout system

Contents

Screenshots

Dags

dags

Computers

computers

Reports

reports

Code

code

Graph

graph

More screenshots

Installation

  1. Install MLComp package

    sudo apt-get install -y \
    libavformat-dev libavcodec-dev libavdevice-dev \
    libavutil-dev libswscale-dev libavresample-dev libavfilter-dev
    
    pip install mlcomp
    mlcomp init
    mlcomp migrate
    
  2. Setup your environment. Please consider Environment variables section

  3. Run db, redis, mlcomp-server, mlcomp-workers:

    Variant 1: minimal (if you have 1 computer)

    Run all necessary (mlcomp-server, mlcomp-workers, redis-server), it uses SQLITE:

    mlcomp-server start --daemon=True
    

    Variant 2: full

    a. Change your Environment variables to use PostgreSql

    b. Install rsync on each work computer

    sudo apt-get install rsync
    

    Ensure that every computer is available by SSH protocol with IP/PORT you specified in the Environment variables file.

    rsync will perform the following commands:

    to upload

    rsync -vhru -e "ssh -p {target.port} -o StrictHostKeyChecking=no" \
    {folder}/ {target.user}@{target.ip}:{folder}/ --perms  --chmod=777
    

    to download

    rsync -vhru -e "ssh -p {source.port} -o StrictHostKeyChecking=no" \
    {source.user}@{source.ip}:{folder}/ {folder}/ --perms  --chmod=777
    

    c. Install apex for distributed learning

    d. To Run postgresql, redis-server, mlcomp-server, execute on your server-computer:

    cd ~/mlcomp/configs/
    docker-compose -f server-compose.yml up -d
    

    e. Run on each worker-computer:

    mlcomp-worker start
    

UI

Web site is available at http://{WEB_HOST}:{WEB_PORT}

By default, it is http://localhost:4201

The front is built with AngularJS.

In case you desire to change it, please consider front's Readme page

Usage

Run

mlcomp dag PATH_TO_CONFIG.yml

This command copies files of the directory to the database.

Then, the server schedules the DAG considering free resources.

For more information, please consider Docs

Docs and examples

API documentation and an overview of the library can be found here Docs

You can find advanced tutorials and MLComp best practices in the examples folder of the repository.

FileSync tutorial describes data synchronization mechanism

Environment variables

The single file to setup your computer environment is located at ~/mlcomp/configs/.env

  • ROOT_FOLDER - folder to save MLComp files: configs, db, tasks, etc.
  • TOKEN - site security token. Please change it to any string
  • DB_TYPE. Either SQLITE or POSTGRESQL
  • POSTGRES_DB. PostgreSql db name
  • POSTGRES_USER. PostgreSql user
  • POSTGRES_PASSWORD. PostgreSql password
  • POSTGRES_HOST. PostgreSql host
  • PGDATA. PostgreSql db files location
  • REDIS_HOST. Redis host
  • REDIS_PORT. Redis port
  • REDIS_PASSWORD. Redis password
  • WEB_HOST. MLComp site host. 0.0.0.0 means it is available from everywhere
  • WEB_PORT. MLComp site port
  • CONSOLE_LOG_LEVEL. log level for output to the console
  • DB_LOG_LEVEL. log level for output to the database
  • IP. Ip of a work computer. The work computer must be accessible from other work computers by these IP/PORT
  • PORT. Port of a work computer. The work computer must be accessible from other work computers by these IP/PORT (SSH protocol)
  • MASTER_PORT_RANGE. distributed port range for a work computer. 29500-29510 means that if this work computer is a master in a distributed learning, it will use the first free port from this range. Ranges of different work computers must not overlap.
  • NCCL_SOCKET_IFNAME. NCCL network interface.
  • FILE_SYNC_INTERVAL. File sync interval in seconds. 0 means file sync is off
  • WORKER_USAGE_INTERVAL. Interval in seconds of writing worker usage to DB
  • INSTALL_DEPENDENCIES. True/False. Either install dependent libraries or not
  • SYNC_WITH_THIS_COMPUTER. True/False. If False, all computers except that will not sync with that one
  • CAN_PROCESS_TASKS. True/False. If false, this computer does not process tasks

You can see your network interfaces with ifconfig command. Please consider nvidia doc

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].