All Projects → shoyer → Numbagg

shoyer / Numbagg

Licence: other
Fast N-dimensional aggregation functions with Numba

Programming Languages

python
139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to Numbagg

Double pendulum
Animations of random double pendulums
Stars: ✭ 73 (-29.81%)
Mutual labels:  numpy
Credit Risk Modelling
Credit Risk analysis by using Python and ML
Stars: ✭ 91 (-12.5%)
Mutual labels:  numpy
Sspipe
Simple Smart Pipe: python productivity-tool for rapid data manipulation
Stars: ✭ 96 (-7.69%)
Mutual labels:  numpy
Mygrad
A pure-python/numpy autograd tensor library
Stars: ✭ 77 (-25.96%)
Mutual labels:  numpy
Dareblopy
Data Reading Blocks for Python
Stars: ✭ 82 (-21.15%)
Mutual labels:  numpy
Pymc Example Project
Example PyMC3 project for performing Bayesian data analysis using a probabilistic programming approach to machine learning.
Stars: ✭ 90 (-13.46%)
Mutual labels:  numpy
Accupy
Accurate sums and dot products for Python.
Stars: ✭ 65 (-37.5%)
Mutual labels:  numpy
100 Pandas Puzzles
100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)
Stars: ✭ 1,382 (+1228.85%)
Mutual labels:  numpy
Ghpythonremote
A two-way connector to use regular Python from IronPython in Rhino/Grasshopper, and vice-versa.
Stars: ✭ 85 (-18.27%)
Mutual labels:  numpy
Pybind11 opencv numpy
Implementation of cv::Mat conversion to numpy.array for pybind11
Stars: ✭ 96 (-7.69%)
Mutual labels:  numpy
Learning python
Source material for Python Like You Mean it
Stars: ✭ 78 (-25%)
Mutual labels:  numpy
Uproot4
ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (-23.08%)
Mutual labels:  numpy
Tutorials
机器学习相关教程
Stars: ✭ 9,616 (+9146.15%)
Mutual labels:  numpy
Docker Alpine Python Machinelearning
Small Docker image with Python Machine Learning tools (~180MB) https://hub.docker.com/r/frolvlad/alpine-python-machinelearning/
Stars: ✭ 76 (-26.92%)
Mutual labels:  numpy
Numba Scipy
numba_scipy extends Numba to make it aware of SciPy
Stars: ✭ 98 (-5.77%)
Mutual labels:  numpy
Dicom Numpy
Properly generate a 3D numpy array from a set of DICOM files.
Stars: ✭ 64 (-38.46%)
Mutual labels:  numpy
Connected Components 3d
Connected components on multilabel 3D & 2D images. Handles 26, 18, and 6 connected variants.
Stars: ✭ 90 (-13.46%)
Mutual labels:  numpy
Boxdetection
A Box detection algorithm for any image containing boxes.
Stars: ✭ 104 (+0%)
Mutual labels:  numpy
Pythonstudy
Python related technologies used in work: crawler, data analysis, timing tasks, RPC, page parsing, decorator, built-in functions, Python objects, multi-threading, multi-process, asynchronous, redis, mongodb, mysql, openstack, etc.
Stars: ✭ 103 (-0.96%)
Mutual labels:  numpy
Pynvvl
A Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python
Stars: ✭ 95 (-8.65%)
Mutual labels:  numpy

Numbagg: Fast N-dimensional aggregation functions with Numba

.. image:: https://travis-ci.org/shoyer/numbagg.svg?branch=master :target: https://travis-ci.org/shoyer/numbagg .. image:: https://img.shields.io/pypi/v/numbagg.svg :target: https://pypi.org/project/numbagg/

Fast, flexible N-dimensional array functions written with Numba_ and NumPy's generalized ufuncs_.

.. _Bottleneck: https://github.com/kwgoodman/bottleneck .. _Numba: https://github.com/numba/numba .. _generalized ufuncs: http://docs.scipy.org/doc/numpy/reference/c-api.generalized-ufuncs.html

Currently accelerated functions:

  • Array functions: allnan, anynan, count, nanargmax, nanargmin, nanmax, nanmean, nanstd, nanvar, nanmin, nansum
  • Moving window functions: move_exp_nanmean, move_mean

Note: Only functions listed here (exposed in Numbagg's top level namespace) are supported as part of Numbagg's public API.

Easy to extend

Numbagg makes it easy to write, in pure Python/NumPy, flexible aggregation functions accelerated by Numba. All the hard work is done by Numba's JIT compiler and NumPy's gufunc machinery (as wrapped by Numba).

For example, here is how we wrote nansum::

import numpy as np
from numbagg.decorators import ndreduce

@ndreduce
def nansum(a):
    asum = 0.0
    for ai in a.flat:
        if not np.isnan(ai):
            asum += ai
    return asum

You are welcome to experiment with Numbagg's decorator functions, but these are not public APIs (yet): we reserve the right to change them at any time.

We'd rather get your pull requests to add new functions into Numbagg directly!

Advantages over Bottleneck

  • Way less code. Easier to add new functions. No ad-hoc templating system. No Cython!
  • Fast functions still work for >3 dimensions.
  • axis argument handles tuples of integers.

Most of the functions in Numbagg (including our test suite) are adapted from Bottleneck's battle-hardened implementations. Still, Numbagg is experimental, and probably not yet ready for production.

Benchmarks

Initial benchmarks are quite encouraging. Numbagg/Numba has comparable (slightly better) performance than Bottleneck's hand-written C::

import numbagg
import numpy as np
import bottleneck

x = np.random.RandomState(42).randn(1000, 1000)
x[x < -1] = np.NaN

# timings with numba=0.41.0 and bottleneck=1.2.1

In [2]: %timeit numbagg.nanmean(x)
1.8 ms ± 92.3 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [3]: %timeit numbagg.nanmean(x, axis=0)
3.63 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [4]: %timeit numbagg.nanmean(x, axis=1)
1.81 ms ± 41 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [5]: %timeit bottleneck.nanmean(x)
2.22 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [6]: %timeit bottleneck.nanmean(x, axis=0)
4.45 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [7]: %timeit bottleneck.nanmean(x, axis=1)
2.19 ms ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Our approach

Numbagg includes somewhat awkward workarounds for features missing from NumPy/Numba:

  • It implements its own cache for functions wrapped by Numba's guvectorize, because that decorator is rather slow.
  • It does its own handling of array transposes <https://github.com/shoyer/numbagg/blob/master/numbagg/decorators.py#L69>_ to handle the axis argument, which we hope will eventually be directly supported <https://github.com/numpy/numpy/issues/5197>_ by all NumPy gufuncs.
  • It uses some terrible hacks <https://github.com/shoyer/numbagg/blob/master/numbagg/transform.py>_ to hide the out-of-bound memory access necessary to write gufuncs that handle scalar values <https://github.com/numba/numba/blob/master/numba/tests/test_guvectorize_scalar.py>_ with Numba.

I hope that the need for most of these will eventually go away. In the meantime, expect Numbagg to be tightly coupled to Numba and NumPy release cycles.

License

3-clause BSD. Includes portions of Bottleneck, which is distributed under a Simplified BSD license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].