All Projects → wmayner → Pyemd

wmayner / Pyemd

Licence: mit
Fast EMD for Python: a wrapper for Pele and Werman's C++ implementation of the Earth Mover's Distance metric

Programming Languages

python
139335 projects - #7 most used programming language
cython
566 projects

Projects that are alternatives of or similar to Pyemd

Teaching
Teaching Materials for Dr. Waleed A. Yousef
Stars: ✭ 435 (+20.5%)
Mutual labels:  mathematics, image-processing
Casadi
CasADi is a symbolic framework for numeric optimization implementing automatic differentiation in forward and reverse modes on sparse matrix-valued computational graphs. It supports self-contained C-code generation and interfaces state-of-the-art codes such as SUNDIALS, IPOPT etc. It can be used from C++, Python or Matlab/Octave.
Stars: ✭ 714 (+97.78%)
Mutual labels:  mathematics, scientific-computing
Librmath.js
Javascript Pure Implementation of Statistical R "core" numerical libRmath.so
Stars: ✭ 425 (+17.73%)
Mutual labels:  mathematics, scientific-computing
Gonum
开源Go语言数值算法库(An open numerical library purely based on Go programming language)
Stars: ✭ 128 (-64.54%)
Mutual labels:  mathematics, scientific-computing
Python-Matematica
Explorando aspectos fundamentais da matemática com Python e Jupyter
Stars: ✭ 41 (-88.64%)
Mutual labels:  mathematics, scientific-computing
Poisson blend
Seamless copy-and-paste of images with Poisson Blending.
Stars: ✭ 277 (-23.27%)
Mutual labels:  mathematics, image-processing
Awesome Scientific Computing
😎 Curated list of awesome software for numerical analysis and scientific computing
Stars: ✭ 476 (+31.86%)
Mutual labels:  mathematics, scientific-computing
Computator.net
Computator.NET is a special kind of numerical software that is fast and easy to use but not worse than others feature-wise. It's features include: - Real and complex functions charts - Real and complex calculator - Real functions numerical calculations including different methods - Over 107 Elementary functions - Over 141 Special functions - Over 21 Matrix functions and operations - Scripting language with power to easy computations including matrices - You can declare your own custom functions with scripting language
Stars: ✭ 174 (-51.8%)
Mutual labels:  mathematics, scientific-computing
Stdlib
✨ Standard library for JavaScript and Node.js. ✨
Stars: ✭ 2,749 (+661.5%)
Mutual labels:  mathematics, scientific-computing
Primify
Embed any image into a prime number.
Stars: ✭ 266 (-26.32%)
Mutual labels:  mathematics, image-processing
Exprtk
C++ Mathematical Expression Parsing And Evaluation Library
Stars: ✭ 301 (-16.62%)
Mutual labels:  mathematics, scientific-computing
Geolib
Zero dependency library to provide some basic geo functions
Stars: ✭ 3,675 (+918.01%)
Mutual labels:  distance
Qilin App
Fully hackable text editor developed for exact sciences with built-in KaTeX and AsciiMath support. Extensible via plugins and themes. Exportable as HTML, PDF and GFM.
Stars: ✭ 336 (-6.93%)
Mutual labels:  mathematics
Towel
Throw in the towel.
Stars: ✭ 333 (-7.76%)
Mutual labels:  mathematics
Cheap Ruler
Fast approximations for common geodesic measurements 🌐
Stars: ✭ 334 (-7.48%)
Mutual labels:  distance
Sianet
An easy to use C# deep learning library with CUDA/OpenCL support
Stars: ✭ 353 (-2.22%)
Mutual labels:  image-processing
Pythonfromspace
Python Examples for Remote Sensing
Stars: ✭ 344 (-4.71%)
Mutual labels:  image-processing
Opence
Contrast Enhancement Techniques for low-light images
Stars: ✭ 333 (-7.76%)
Mutual labels:  image-processing
Arrayfire
ArrayFire: a general purpose GPU library.
Stars: ✭ 3,693 (+922.99%)
Mutual labels:  scientific-computing
Imgtoascii
A JavaScript implementation of a image to Ascii code
Stars: ✭ 331 (-8.31%)
Mutual labels:  image-processing

.. image:: https://img.shields.io/travis/wmayner/pyemd/develop.svg?style=flat-square&maxAge=3600 :target: https://travis-ci.org/wmayner/pyemd .. image:: https://img.shields.io/pypi/pyversions/pyemd.svg?style=flat-square&maxAge=86400 :target: https://wiki.python.org/moin/Python2orPython3 :alt: Python versions badge

PyEMD: Fast EMD for Python

PyEMD is a Python wrapper for Ofir Pele and Michael Werman's implementation <http://ofirpele.droppages.com/>_ of the Earth Mover's Distance <http://en.wikipedia.org/wiki/Earth_mover%27s_distance>_ that allows it to be used with NumPy. If you use this code, please cite the papers listed at the end of this document.

Installation

.. code:: bash

pip install pyemd

Usage

.. code:: python

>>> from pyemd import emd
>>> import numpy as np
>>> first_histogram = np.array([0.0, 1.0])
>>> second_histogram = np.array([5.0, 3.0])
>>> distance_matrix = np.array([[0.0, 0.5],
...                             [0.5, 0.0]])
>>> emd(first_histogram, second_histogram, distance_matrix)
3.5

You can also get the associated minimum-cost flow:

.. code:: python

>>> from pyemd import emd_with_flow
>>> emd_with_flow(first_histogram, second_histogram, distance_matrix)
(3.5, [[0.0, 0.0], [0.0, 1.0]])

You can also calculate the EMD directly from two arrays of observations:

.. code:: python

>>> from pyemd import emd_samples
>>> first_array = [1, 2, 3, 4]
>>> second_array = [2, 3, 4, 5]
>>> emd_samples(first_array, second_array, bins=2)
0.5

Documentation

emd()


.. code:: python

    emd(first_histogram,
        second_histogram,
        distance_matrix,
        extra_mass_penalty=-1.0)

*Arguments:*

- ``first_histogram`` *(np.ndarray)*: A 1D array of type ``np.float64`` of
  length *N*.
- ``second_histogram`` *(np.ndarray)*: A 1D array of ``np.float64`` of length
  *N*.
- ``distance_matrix`` *(np.ndarray)*: A 2D array of ``np.float64,`` of size at
  least *N* × *N*. This defines the underlying metric, or ground distance, by
  giving the pairwise distances between the histogram bins.
  **NOTE: It must represent a metric; there is no warning if it doesn't.**

*Keyword Arguments:*

- ``extra_mass_penalty`` *(float)*: The penalty for extra mass. If you want the
  resulting distance to be a metric, it should be at least half the diameter of
  the space (maximum possible distance between any two points). If you want
  partial matching you can set it to zero (but then the resulting distance is
  not guaranteed to be a metric). The default value is ``-1.0``, which means the
  maximum value in the distance matrix is used.

*Returns:* *(float)* The EMD value.

----

emd_with_flow()

.. code:: python

emd_with_flow(first_histogram,
              second_histogram,
              distance_matrix,
              extra_mass_penalty=-1.0)

Arguments are the same as for emd().

Returns: (tuple(float, list(list(float)))) The EMD value and the associated minimum-cost flow.


emd_samples()


.. code:: python

    emd_samples(first_array,
                second_array,
                extra_mass_penalty=-1.0,
                distance='euclidean',
                normalized=True,
                bins='auto',
                range=None)

*Arguments:*

- ``first_array`` *(Iterable)*: An array of samples used to generate a
  histogram.
- ``second_array`` *(Iterable)*: An array of samples used to generate a
  histogram.

*Keyword Arguments:*

- ``extra_mass_penalty`` *(float)*: Same as for ``emd()``.
- ``distance`` *(string or function)*: A string or function implementing
  a metric on a 1D ``np.ndarray``. Defaults to the Euclidean distance. Currently
  limited to 'euclidean' or your own function, which must take a 1D array and
  return a square 2D array of pairwise distances.
- ``normalized`` (*boolean*): If true (default), treat histograms as fractions
  of the dataset. If false, treat histograms as counts. In the latter case the
  EMD will vary greatly by array length.
- ``bins`` *(int or string)*: The number of bins to include in the generated
  histogram. If a string, must be one of the bin selection algorithms accepted
  by ``np.histogram()``. Defaults to ``'auto'``, which gives the maximum of the
  'sturges' and 'fd' estimators.
- ``range`` *(tuple(int, int))*: The lower and upper range of the bins, passed
  to ``numpy.histogram()``. Defaults to the range of the union of
  ``first_array`` and ``second_array``. Note: if the given range is not a
  superset of the default range, no warning will be given.

*Returns:* *(float)* The EMD value between the histograms of ``first_array`` and
``second_array``.

----

Limitations and Caveats
-----------------------

- ``emd()`` and ``emd_with_flow()``:

  - The ``distance_matrix`` is assumed to represent a metric; there is no check
    to ensure that this is true. See the documentation in
    ``pyemd/lib/emd_hat.hpp`` for more information.
  - The histograms and distance matrix must be numpy arrays of type
    ``np.float64``. The original C++ template function can accept any numerical
    C++ type, but this wrapper only instantiates the template with ``double``
    (Cython converts ``np.float64`` to ``double``). If there's demand, I can add
    support for other types.

- ``emd_with_flow()``:

  - The flow matrix does not contain the flows to/from the extra mass bin.

- ``emd_samples()``:

  - With ``numpy < 1.15.0``, using the default ``bins='auto'`` results in an
    extra call to ``np.histogram()`` to determine the bin lengths, since `the
    NumPy bin-selectors are not exposed in the public API
    <https://github.com/numpy/numpy/issues/10183>`_. For performance, you may
    want to set the bins yourself. If ``numpy >= 1.15`` is available,
    ``np.histogram_bin_edges()`` is called instead, which is more efficient.


Contributing
------------

To help develop PyEMD, fork the project on GitHub and install the requirements
with ``pip install -r requirements.txt``.

The ``Makefile`` defines some tasks to help with development:

- ``test``: Run the test suite
- ``build`` Generate and compile the Cython extension
- ``clean``: Remove the compiled Cython extension
- ``default``: Run ``build``

Tests for different Python environments can be run with ``tox``.


Credit
------

- All credit for the actual algorithm and implementation goes to `Ofir Pele
  <http://ofirpele.droppages.com/>`_ and `Michael Werman
  <http://www.cs.huji.ac.il/~werman/>`_. See the `relevant paper
  <https://doi.org/10.1109/ICCV.2009.5459199>`_.
- Thanks to the Cython developers for making this kind of wrapper relatively
  easy to write.

Please cite these papers if you use this code:

Ofir Pele and Michael Werman. Fast and robust earth mover's distances. Proc. 2009 IEEE 12th Int. Conf. on Computer Vision, Kyoto, Japan, 2009, pp. 460-467.

.. code-block:: latex

@INPROCEEDINGS{pele2009,
  title={Fast and robust earth mover's distances},
  author={Pele, Ofir and Werman, Michael},
  booktitle={2009 IEEE 12th International Conference on Computer Vision},
  pages={460--467},
  year={2009},
  month={September},
  organization={IEEE}
}

Ofir Pele and Michael Werman. A linear time histogram metric for improved SIFT matching. Computer Vision - ECCV 2008, Marseille, France, 2008, pp. 495-508.

.. code-block:: latex

@INPROCEEDINGS{pele2008,
  title={A linear time histogram metric for improved sift matching},
  author={Pele, Ofir and Werman, Michael},
  booktitle={Computer Vision--ECCV 2008},
  pages={495--508},
  year={2008},
  month={October},
  publisher={Springer}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].