All Projects → jeffalstott → Powerlaw

jeffalstott / Powerlaw

Projects that are alternatives of or similar to Powerlaw

Python Machine Learning Second Edition
Python Machine Learning - Second Edition, published by Packt
Stars: ✭ 376 (-0.79%)
Mutual labels:  jupyter-notebook
Java Guide
A guide of modern Java (Java 17)
Stars: ✭ 378 (-0.26%)
Mutual labels:  jupyter-notebook
Bert Multitask Learning
BERT for Multitask Learning
Stars: ✭ 380 (+0.26%)
Mutual labels:  jupyter-notebook
Nlp Python Deep Learning
NLP in Python with Deep Learning
Stars: ✭ 374 (-1.32%)
Mutual labels:  jupyter-notebook
Pg Is All You Need
Policy Gradient is all you need! A step-by-step tutorial for well-known PG methods.
Stars: ✭ 372 (-1.85%)
Mutual labels:  jupyter-notebook
Bap
Bayesian Analysis with Python (Second Edition)
Stars: ✭ 379 (+0%)
Mutual labels:  jupyter-notebook
Data Analysis
Data Science Using Python
Stars: ✭ 4,080 (+976.52%)
Mutual labels:  jupyter-notebook
Daily Deeplearning
🔥机器学习/深度学习/Python/算法面试/自然语言处理教程/剑指offer/machine learning/deeplearning/Python/Algorithm interview/NLP Tutorial
Stars: ✭ 381 (+0.53%)
Mutual labels:  jupyter-notebook
Causalml
The open source repository for the Causal Modeling in Machine Learning Workshop at Altdeep.ai @ www.altdeep.ai/courses/causalML
Stars: ✭ 376 (-0.79%)
Mutual labels:  jupyter-notebook
Augmented Neural Odes
Pytorch implementation of Augmented Neural ODEs 🌻
Stars: ✭ 381 (+0.53%)
Mutual labels:  jupyter-notebook
Scikit Learn Book
Source code for the "Learning scikit-learn: Machine Learning in Python"
Stars: ✭ 376 (-0.79%)
Mutual labels:  jupyter-notebook
Deep Learning Nano Foundation
Udacity's Deep Learning Nano Foundation program.
Stars: ✭ 377 (-0.53%)
Mutual labels:  jupyter-notebook
Caiman
Computational toolbox for large scale Calcium Imaging Analysis, including movie handling, motion correction, source extraction, spike deconvolution and result visualization.
Stars: ✭ 378 (-0.26%)
Mutual labels:  jupyter-notebook
Iclr2019 Openreviewdata
Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.
Stars: ✭ 376 (-0.79%)
Mutual labels:  jupyter-notebook
Deep Learning In Python
Hands-on, practical knowledge of how to use neural networks and deep learning with Keras 2.0
Stars: ✭ 379 (+0%)
Mutual labels:  jupyter-notebook
Over9000
Over9000 optimizer
Stars: ✭ 375 (-1.06%)
Mutual labels:  jupyter-notebook
Financeops
Research in investment finance with Python Notebooks
Stars: ✭ 378 (-0.26%)
Mutual labels:  jupyter-notebook
Python4datascience.ch
从0开始接触Python处理数据科学问题。包含Python0基础入门、科学计算工具入门、数学与计算机基础入门、统计学习入门。
Stars: ✭ 381 (+0.53%)
Mutual labels:  jupyter-notebook
Deep Learning From Scratch
深度学习入门-基于Python的理论与实现》,包含源代码和高清PDF(带书签);慕课网imooc《深度学习之神经网络(CNN-RNN-GAN)算法原理-实战》
Stars: ✭ 381 (+0.53%)
Mutual labels:  jupyter-notebook
Deep Reinforcement Learning
Repo for the Deep Reinforcement Learning Nanodegree program
Stars: ✭ 4,012 (+958.58%)
Mutual labels:  jupyter-notebook

powerlaw: A Python Package for Analysis of Heavy-Tailed Distributions

powerlaw is a toolbox using the statistical methods developed in Clauset et al. 2007 <http://arxiv.org/abs/0706.1062>_ and Klaus et al. 2011 <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0019779>_ to determine if a probability distribution fits a power law. Academics, please cite as:

Jeff Alstott, Ed Bullmore, Dietmar Plenz. (2014). powerlaw: a Python package for analysis of heavy-tailed distributions. PLoS ONE 9(1): e85777 <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0085777>_

Also available at arXiv:1305.0215 [physics.data-an] <http://arxiv.org/abs/1305.0215>_

Basic Usage

For the simplest, typical use cases, this tells you everything you need to know.::

import powerlaw
data = array([1.7, 3.2 ...]) # data can be list or numpy array
results = powerlaw.Fit(data)
print(results.power_law.alpha)
print(results.power_law.xmin)
R, p = results.distribution_compare('power_law', 'lognormal')

For more explanation, understanding, and figures, see the paper, which illustrates all of powerlaw's features. For details of the math, see Clauset et al. 2007, which developed these methods.

Quick Links

Paper illustrating all of powerlaw's features, with figures <http://arxiv.org/abs/1305.0215>__

Code examples from manuscript, as an IPython Notebook <http://nbviewer.ipython.org/github/jeffalstott/powerlaw/blob/master/manuscript/Manuscript_Code.ipynb>__ Note: Some results involving lognormals will now be different from the manuscript, as the lognormal fitting has been improved to allow for greater numerical precision.

Documentation <http://pythonhosted.org/powerlaw/>__

This code was developed and tested for Python 2.x with the Enthought Python Distribution <http://www.enthought.com/products/epd.php>, and later amended to be compatible with 3.x. The full version of Enthought is available for free for academic use <http://www.enthought.com/products/edudownload.php>.

Installation

powerlaw is hosted on PyPI <https://pypi.python.org/pypi/powerlaw>__, so installation is straightforward. The easiest way to install type this at the command line (Linux, Mac, or Windows)::

easy_install powerlaw

or, better yet::

pip install powerlaw

easy_install or pip just need to be on your PATH, which for Linux or Mac is probably the case.

pip should install all dependencies automagically. These other dependencies are numpy, scipy, and matplotlib. These are all present in Enthought, Anaconda, and most other scientific Python stacks. To fit truncated power laws or gamma distributions, mpmath is also required, which is less common and is installable with::

pip install mpmath

The requirement of mpmath will be dropped if/when the scipy functions gamma, gammainc and gammaincc are updated to have sufficient numerical accuracy for negative numbers.

You can also build from source from the code here on Github, though it may be a development version slightly ahead of the PyPI version.

Update Notifications and Mailing List

Get notified of updates by joining the Google Group here <https://groups.google.com/forum/?fromgroups#!forum/powerlaw-updates>__.

Questions/discussions/help go on the Google Group here <https://groups.google.com/forum/?fromgroups#!forum/powerlaw-general>__. Also receives update info.

Further Development

The original author of powerlaw, Jeff Alstott, is now only writing minor tweaks, but powerlaw remains open for further development by the community. If there's a feature you'd like to see in powerlaw you can submit an issue <https://github.com/jeffalstott/powerlaw/issues>_, but pull requests are even better. Offers for expansion or inclusion in other projects are welcomed and encouraged.

Acknowledgements

Many thanks to Andreas Klaus, Mika Rubinov and Shan Yu for helpful discussions. Thanks also to Andreas Klaus <http://neuroscience.nih.gov/Fellows/Fellow.asp?People_ID=2709>, Aaron Clauset, Cosma Shalizi <http://tuvalu.santafe.edu/~aaronc/powerlaws/>, and Adam Ginsburg <http://code.google.com/p/agpy/wiki/PowerLaw>_ for making their code available. Their implementations were a critical starting point for making powerlaw.

Power Laws vs. Lognormals and powerlaw's 'lognormal_positive' option

When fitting a power law to a data set, one should compare the goodness of fit to that of a lognormal distribution <https://en.wikipedia.org/wiki/Lognormal_distribution>. This is done because lognormal distributions are another heavy-tailed distribution, but they can be generated by a very simple process: multiplying random positive variables together. The lognormal is thus much like the normal distribution, which can be created by adding random variables together; in fact, the log of a lognormal distribution is a normal distribution (hence the name), and the exponential of a normal distribution is the lognormal (which maybe would be better called an expnormal). In contrast, creating a power law generally requires fancy or exotic generative mechanisms (this is probably why you're looking for a power law to begin with; they're sexy). So, even though the power law has only one parameter (alpha: the slope) and the lognormal has two (mu: the mean of the random variables in the underlying normal and sigma: the standard deviation of the underlying normal distribution), we typically consider the lognormal to be a simpler explanation for observed data, as long as the distribution fits the data just as well. For most data sets, a power law is actually a worse fit than a lognormal distribution, or perhaps equally good, but rarely better. This fact was one of the central empirical results of the paper Clauset et al. 2007 <http://arxiv.org/abs/0706.1062>, which developed the statistical methods that powerlaw implements.

However, for many data sets, the superior lognormal fit is only possible if one allows the fitted parameter mu to go negative. Whether or not this is sensible depends on your theory of what's generating the data. If the data is thought to be generated by multiplying random positive variables, mu is just the log of the distribution's median; a negative mu just indicates those variables' products are typically below 1. However, if the data is thought to be generated by exponentiating a normal distribution, then mu is interpreted as the median of the underlying normal data. In that case, the normal data is likely generated by summing random variables (positive and negative), and mu is those sums' median (and mean). A negative mu, then, indicates that the random variables are typically negative. For some physical systems, this is perfectly possible. For the data you're studying, though, it may be a weird assumption. For starters, all of the data points you're fitting to are positive by definition, since power laws must have positive values (indeed, powerlaw throws out 0s or negative values). Why would those data be generated by a process that sums and exponentiates negative variables?

If you think that your physical system could be modeled by summing and exponentiating random variables, but you think that those random variables should be positive, one possible hacks is powerlaw's lognormal_positive. This is just a regular lognormal distribution, except mu must be positive. Note that this does not force the underlying normal distribution to be the sum of only positive variables; it only forces the sums' average to be positive, but it's a start. You can compare a power law to this distribution in the normal way shown above::

R, p = results.distribution_compare('power_law', 'lognormal_positive')

You may find that a lognormal where mu must be positive gives a much worse fit to your data, and that leaves the power law looking like the best explanation of the data. Before concluding that the data is in fact power law distributed, consider carefully whether a more likely explanation is that the data was generated by multiplying positive random variables, or even by summing and exponentiating random variables; either one would allow for a lognormal with an intelligible negative value of mu.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].