All Projects → LaurentRDC → npstreams

LaurentRDC / npstreams

Licence: BSD-3-Clause License
Streaming operations on NumPy arrays

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to npstreams

Fourier-Transform
An implementation of the Fourier Transform using Python
Stars: ✭ 43 (+53.57%)
Mutual labels:  numpy
clumpy
create or transform numpy images from the command line
Stars: ✭ 38 (+35.71%)
Mutual labels:  numpy
python-machine-learning-book-2nd-edition
<머신러닝 교과서 with 파이썬, 사이킷런, 텐서플로>의 코드 저장소
Stars: ✭ 60 (+114.29%)
Mutual labels:  numpy
sparse dot
Python wrapper for Intel Math Kernel Library (MKL) matrix multiplication
Stars: ✭ 38 (+35.71%)
Mutual labels:  numpy
Python-camp
No description or website provided.
Stars: ✭ 34 (+21.43%)
Mutual labels:  numpy
alvito
Alvito - An Algorithm Visualization Tool for Python
Stars: ✭ 52 (+85.71%)
Mutual labels:  numpy
Data-Science-Tutorials
Python Tutorials for Data Science
Stars: ✭ 104 (+271.43%)
Mutual labels:  numpy
Data-Analyst-Nanodegree
Kai Sheng Teh - Udacity Data Analyst Nanodegree
Stars: ✭ 42 (+50%)
Mutual labels:  numpy
get-started-with-JAX
The purpose of this repo is to make it easy to get started with JAX, Flax, and Haiku. It contains my "Machine Learning with JAX" series of tutorials (YouTube videos and Jupyter Notebooks) as well as the content I found useful while learning about the JAX ecosystem.
Stars: ✭ 229 (+717.86%)
Mutual labels:  numpy
sklearn-predict
机器学习数据,预测趋势并画图
Stars: ✭ 16 (-42.86%)
Mutual labels:  numpy
neworder
A dynamic microsimulation framework for python
Stars: ✭ 15 (-46.43%)
Mutual labels:  numpy
numpythia
The interface between PYTHIA and NumPy
Stars: ✭ 33 (+17.86%)
Mutual labels:  numpy
CS231n
My solutions for Assignments of CS231n: Convolutional Neural Networks for Visual Recognition
Stars: ✭ 30 (+7.14%)
Mutual labels:  numpy
Python-notes
Python related technologies used in work: crawler, data analysis, timing tasks, RPC, page parsing, decorator, built-in functions, Python objects, multi-threading, multi-process, asynchronous, redis, mongodb, mysql, openstack, etc.
Stars: ✭ 104 (+271.43%)
Mutual labels:  numpy
Covid-19-analysis
Analysis with Covid-19 data
Stars: ✭ 49 (+75%)
Mutual labels:  numpy
polystores
A library for performing hyperparameter optimization
Stars: ✭ 48 (+71.43%)
Mutual labels:  numpy
DataSciPy
Data Science with Python
Stars: ✭ 15 (-46.43%)
Mutual labels:  numpy
array-api-comparison
Data and tooling to compare the API surfaces of various array libraries.
Stars: ✭ 46 (+64.29%)
Mutual labels:  numpy
dipiper
基于nodejs的股票数据爬虫
Stars: ✭ 83 (+196.43%)
Mutual labels:  numpy
Machine-Learning
This repository contains notebooks that will help you in understanding basic ML algorithms as well as basic numpy excercise. 💥 🌈 🌈
Stars: ✭ 15 (-46.43%)
Mutual labels:  numpy

npstreams

Documentation Build Status PyPI Version Conda-forge Version DOI badge

npstreams is an open-source Python package for streaming NumPy array operations. The goal is to provide tested routines that operate on streams (or generators) of arrays instead of dense arrays.

Streaming reduction operations (sums, averages, etc.) can be implemented in constant memory, which in turns allows for easy parallelization.

This approach has been a huge boon when working with lots of images; the images are read one-by-one from disk and combined/processed in a streaming fashion.

This package is developed in conjunction with other software projects in the Siwick research group.

Motivating Example

Consider the following snippet to combine 50 images from an iterable source:

import numpy as np

images = np.empty( shape = (2048, 2048, 50) )
for index, im in enumerate(source):
    images[:,:,index] = im

avg = np.average(images, axis = 2)

If the source iterable provided 1000 images, the above routine would not work on most machines. Moreover, what if we want to transform the images one by one before averaging them? What about looking at the average while it is being computed? Let's look at an example:

import numpy as np
from npstreams import iaverage
from scipy.misc import imread

stream = map(imread, list_of_filenames)
averaged = iaverage(stream)

At this point, the generators map and iaverage are 'wired' but will not compute anything until it is requested. We can look at the average evolve:

import matplotlib.pyplot as plt
for avg in average:
    plt.imshow(avg); plt.show()

We can also use last to get at the final average:

from npstreams import last

total = last(averaged) # average of the entire stream

Streaming Functions

npstreams comes with some streaming functions built-in. Some examples:

  • Numerics : isum, iprod, isub, etc.
  • Statistics : iaverage (weighted mean), ivar (single-pass variance), etc.

More importantly, npstreams gives you all the tools required to build your own streaming function. All routines are documented in the API Reference on readthedocs.io.

Benchmarking

npstreams provides a function for benchmarking common use cases.

To run the benchmark with default parameters, from the interpreter:

from npstreams import benchmark
benchmark()

From a command-line terminal:

python -c 'import npstreams; npstreams.benchmark()'

The results will be printed to the screen.

Future Work

Some of the features I want to implement in this package in the near future:

  • Optimize the CUDA-enabled routines
  • More functions : more streaming functions borrowed from NumPy and SciPy.

API Reference

The API Reference on readthedocs.io provides API-level documentation, as well as tutorials.

Installation

The only requirement is NumPy. To have access to CUDA-enabled routines, PyCUDA must also be installed. npstreams is available on PyPI; it can be installed with pip.:

python -m pip install npstreams

npstreams can also be installed with the conda package manager, from the conda-forge channel:

conda config --add channels conda-forge
conda install npstreams

To install the latest development version from Github:

python -m pip install git+git://github.com/LaurentRDC/npstreams.git

Tests can be run using the pytest package.

Citations

If you find this software useful, please consider citing the following publication:

L. P. René de Cotret, M. R. Otto, M. J. Stern. and B. J. Siwick, An open-source software ecosystem for the interactive exploration of ultrafast electron scattering data, Advanced Structural and Chemical Imaging 4:11 (2018) DOI: 10.1186/s40679-018-0060-y.

Support / Report Issues

All support requests and issue reports should be filed on Github as an issue.

License

npstreams is made available under the BSD License, same as NumPy. For more details, see LICENSE.txt.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].