All Projects → erdewit → distex

erdewit / distex

Licence: BSD-2-Clause license
Distributed process pool for Python

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to distex

Pyexpool
Python Multi-Process Execution Pool: concurrent asynchronous execution pool with custom resource constraints (memory, timeouts, affinity, CPU cores and caching), load balancing and profiling capabilities of the external apps on NUMA architecture
Stars: ✭ 149 (+47.52%)
Mutual labels:  multiprocessing, parallel-computing, task-queue
Python Concurrency
Code examples from my toptal engineering blog article
Stars: ✭ 131 (+29.7%)
Mutual labels:  multiprocessing, asyncio
FinanceCenter
Fetching Financial Data (US/China)
Stars: ✭ 26 (-74.26%)
Mutual labels:  multiprocessing, asyncio
Fooproxy
稳健高效的评分制-针对性- IP代理池 + API服务,可以自己插入采集器进行代理IP的爬取,针对你的爬虫的一个或多个目标网站分别生成有效的IP代理数据库,支持MongoDB 4.0 使用 Python3.7(Scored IP proxy pool ,customise proxy data crawler can be added anytime)
Stars: ✭ 195 (+93.07%)
Mutual labels:  multiprocessing, asyncio
Fiber
Distributed Computing for AI Made Simple
Stars: ✭ 866 (+757.43%)
Mutual labels:  multiprocessing, distributed-computing
Aiomultiprocess
Take a modern Python codebase to the next level of performance.
Stars: ✭ 1,070 (+959.41%)
Mutual labels:  multiprocessing, asyncio
Pulsar
Event driven concurrent framework for Python
Stars: ✭ 1,867 (+1748.51%)
Mutual labels:  multiprocessing, asyncio
Dispy
Distributed and Parallel Computing Framework with / for Python
Stars: ✭ 222 (+119.8%)
Mutual labels:  parallel-computing, distributed-computing
pyabc
pyABC: distributed, likelihood-free inference
Stars: ✭ 13 (-87.13%)
Mutual labels:  parallel-computing, distributed-computing
job stream
An MPI-based C++ or Python library for easy distributed pipeline processing
Stars: ✭ 32 (-68.32%)
Mutual labels:  parallel-computing, distributed-computing
meesee
Task queue, Long lived workers for work based parallelization, with processes and Redis as back-end. For distributed computing.
Stars: ✭ 14 (-86.14%)
Mutual labels:  distributed-computing, task-queue
Aioprocessing
A Python 3.4+ library that integrates the multiprocessing module with asyncio
Stars: ✭ 438 (+333.66%)
Mutual labels:  multiprocessing, asyncio
simple-task-queue
asynchronous task queues using python's multiprocessing library
Stars: ✭ 39 (-61.39%)
Mutual labels:  multiprocessing, task-queue
Schwimmbad
A common interface to processing pools.
Stars: ✭ 82 (-18.81%)
Mutual labels:  multiprocessing, parallel-computing
Amadeus
Harmonious distributed data analysis in Rust.
Stars: ✭ 240 (+137.62%)
Mutual labels:  parallel-computing, distributed-computing
ParallelUtilities.jl
Fast and easy parallel mapreduce on HPC clusters
Stars: ✭ 28 (-72.28%)
Mutual labels:  parallel-computing, distributed-computing
Klyng
A message-passing distributed computing framework for node.js
Stars: ✭ 167 (+65.35%)
Mutual labels:  parallel-computing, distributed-computing
Awesome Parallel Computing
A curated list of awesome parallel computing resources
Stars: ✭ 212 (+109.9%)
Mutual labels:  parallel-computing, distributed-computing
Joblib
Computing with Python functions.
Stars: ✭ 2,620 (+2494.06%)
Mutual labels:  multiprocessing, parallel-computing
tasq
A simple task queue implementation to enqeue jobs on local or remote processes.
Stars: ✭ 83 (-17.82%)
Mutual labels:  distributed-computing, task-queue

Build PyPi Documentation

Introduction

Distex offers a distributed process pool to utilize multiple CPUs or machines. It uses asyncio to efficiently manage the worker processes.

Features:

  • Scales from 1 to 1000's of processors;
  • Can handle in the order of 50.000 small tasks per second;
  • Easy to use with SSH (secure shell) hosts;
  • Full async support;
  • Maps over unbounded iterables;
  • Compatible with concurrent.futures.ProcessPool (or PEP3148).

Installation

pip3 install -U distex

When using remote hosts then distex must be installed on those too. Make sure that the distex_proc script can be found in the path.

For SSH hosts: Authentication should be done with SSH keys since there is no support for passwords. The remote installation can be tested with:

ssh <host> distex_proc

Dependencies:

  • Python version 3.6 or higher;
  • On Unix the uvloop package is recommended: pip3 install uvloop
  • SSH client and server (optional).

Examples

A process pool can have local and remote workers. Here is a pool that uses 4 local workers:

from distex import Pool

def f(x):
    return x*x

pool = Pool(4)
for y in pool.map(f, range(100)):
    print(y)

To create a pool that also uses 8 workers on host maxi, using ssh:

pool = Pool(4, 'ssh://maxi/8')

To use a pool in combination with eventkit:

from distex import Pool
import eventkit as ev
import bz2

pool = Pool()
# await pool  # un-comment in Jupyter
data = [b'A' * 1000000] * 1000

pipe = ev.Sequence(data).poolmap(pool, bz2.compress).map(len).mean().last()

print(pipe.run())  # in Jupyter: print(await pipe)
pool.shutdown()

There is full support for every asynchronous construct imaginable:

import asyncio
from distex import Pool

def init():
    # pool initializer: set the start time for every worker
    import time
    import builtins
    builtins.t0 = time.time()

async def timer(i=0):
    # async code running in the pool
    import time
    import asyncio
    await asyncio.sleep(1)
    return time.time() - t0

async def ait():
    # async iterator running on the user side
    for i in range(20):
        await asyncio.sleep(0.1)
        yield i

async def main():
    async with Pool(4, initializer=init, qsize=1) as pool:
        async for t in pool.map_async(timer, ait()):
            print(t)
        print(await pool.run_on_all_async(timer))


loop = asyncio.get_event_loop()
loop.run_until_complete(main())

High level architecture

Distex does not use remote 'task servers'. Instead it is done the other way around: A local server is started first; Then the local and remote workers are started and each of them will connect on its own back to the server. When all workers have connected then the pool is ready for duty.

Each worker consists of a single-threaded process that is running an asyncio event loop. This loop is used both for communication and for running asynchronous tasks. Synchronous tasks are run in a blocking fashion.

When using ssh, a remote (or 'reverse') tunnel is created from a remote Unix socket to the local Unix socket that the local server is listening on. Multiple workers on a remote machine will use the same Unix socket and share the same ssh tunnel.

The plain ssh executable is used instead of much nicer solutions such as AsyncSSH. This is to keep the CPU usage of encrypting/decrypting outside of the event loop and offload it to the ssh process(es).

Documentation

Distex documentation

author:Ewald de Wit <[email protected]>
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].