All Projects → raphaelvallat → Pingouin

raphaelvallat / Pingouin

Licence: gpl-3.0
Statistical package in Python based on Pandas

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pingouin

Stats Maths With Python
General statistics, mathematical programming, and numerical/scientific computing scripts and notebooks in Python
Stars: ✭ 381 (-41.47%)
Mutual labels:  statistics, pandas, bayesian-statistics
Sweetviz
Visualize and compare datasets, target values and associations, with one line of code.
Stars: ✭ 1,851 (+184.33%)
Mutual labels:  statistics, pandas
Probflow
A Python package for building Bayesian models with TensorFlow or PyTorch
Stars: ✭ 95 (-85.41%)
Mutual labels:  statistics, bayesian-statistics
hdfe
No description or website provided.
Stars: ✭ 22 (-96.62%)
Mutual labels:  statistics, pandas
Fecon236
Tools for financial economics. Curated wrapper over Python ecosystem. Source code for fecon235 Jupyter notebooks.
Stars: ✭ 72 (-88.94%)
Mutual labels:  statistics, pandas
Bat.jl
A Bayesian Analysis Toolkit in Julia
Stars: ✭ 82 (-87.4%)
Mutual labels:  statistics, bayesian-statistics
Choochoo
Training Diary
Stars: ✭ 186 (-71.43%)
Mutual labels:  statistics, pandas
Machine Learning With Python
Practice and tutorial-style notebooks covering wide variety of machine learning techniques
Stars: ✭ 2,197 (+237.48%)
Mutual labels:  statistics, pandas
veridical-flow
Making it easier to build stable, trustworthy data-science pipelines.
Stars: ✭ 28 (-95.7%)
Mutual labels:  statistics, pandas
Data-Analyst-Nanodegree
Kai Sheng Teh - Udacity Data Analyst Nanodegree
Stars: ✭ 42 (-93.55%)
Mutual labels:  statistics, pandas
SuperNNova
Open Source Photometric classification https://supernnova.readthedocs.io
Stars: ✭ 18 (-97.24%)
Mutual labels:  pandas, bayesian-statistics
Pandas Profiling
Create HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+1179.42%)
Mutual labels:  statistics, pandas
Fecon235
Notebooks for financial economics. Keywords: Jupyter notebook pandas Federal Reserve FRED Ferbus GDP CPI PCE inflation unemployment wage income debt Case-Shiller housing asset portfolio equities SPX bonds TIPS rates currency FX euro EUR USD JPY yen XAU gold Brent WTI oil Holt-Winters time-series forecasting statistics econometrics
Stars: ✭ 708 (+8.76%)
Mutual labels:  statistics, pandas
Weightedcalcs
Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.
Stars: ✭ 83 (-87.25%)
Mutual labels:  statistics, pandas
Algorithmic-Trading
I have been deeply interested in algorithmic trading and systematic trading algorithms. This Repository contains the code of what I have learnt on the way. It starts form some basic simple statistics and will lead up to complex machine learning algorithms.
Stars: ✭ 47 (-92.78%)
Mutual labels:  statistics, pandas
fairlens
Identify bias and measure fairness of your data
Stars: ✭ 51 (-92.17%)
Mutual labels:  statistics, pandas
Dataframe Go
DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
Stars: ✭ 487 (-25.19%)
Mutual labels:  statistics, pandas
Analytics.usa.gov
The US federal government's web traffic.
Stars: ✭ 564 (-13.36%)
Mutual labels:  statistics
Smile
Statistical Machine Intelligence & Learning Engine
Stars: ✭ 5,412 (+731.34%)
Mutual labels:  statistics
Sequoia
A股自动选股程序,实现了海龟交易法则、缠中说禅牛市买点,以及其他若干种技术形态
Stars: ✭ 564 (-13.36%)
Mutual labels:  pandas

.. -- mode: rst --

|

.. image:: https://badge.fury.io/py/pingouin.svg :target: https://badge.fury.io/py/pingouin

.. image:: https://img.shields.io/conda/vn/conda-forge/pingouin.svg :target: https://anaconda.org/conda-forge/pingouin

.. image:: https://img.shields.io/github/license/raphaelvallat/pingouin.svg :target: https://github.com/raphaelvallat/pingouin/blob/master/LICENSE

.. image:: https://travis-ci.org/raphaelvallat/pingouin.svg?branch=master :target: https://travis-ci.org/raphaelvallat/pingouin

.. image:: https://codecov.io/gh/raphaelvallat/pingouin/branch/master/graph/badge.svg :target: https://codecov.io/gh/raphaelvallat/pingouin

.. image:: https://pepy.tech/badge/pingouin/month :target: https://pepy.tech/badge/pingouin/month

.. image:: http://joss.theoj.org/papers/d2254e6d8e8478da192148e4cfbe4244/status.svg :target: http://joss.theoj.org/papers/d2254e6d8e8478da192148e4cfbe4244

.. image:: https://badges.gitter.im/owner/repo.png :target: https://gitter.im/pingouin-stats/Lobby


.. figure:: https://github.com/raphaelvallat/pingouin/blob/master/docs/pictures/logo_pingouin.png :align: center

Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. Some of its main features are listed below. For a full list of available functions, please refer to the API documentation <https://pingouin-stats.org/api.html>_.

  1. ANOVAs: N-ways, repeated measures, mixed, ancova

  2. Pairwise post-hocs tests (parametric and non-parametric) and pairwise correlations

  3. Robust, partial, distance and repeated measures correlations

  4. Linear/logistic regression and mediation analysis

  5. Bayes Factors

  6. Multivariate tests

  7. Reliability and consistency

  8. Effect sizes and power analysis

  9. Parametric/bootstrapped confidence intervals around an effect size or a correlation coefficient

  10. Circular statistics

  11. Chi-squared tests

  12. Plotting: Bland-Altman plot, Q-Q plot, paired plot, robust correlation...

Pingouin is designed for users who want simple yet exhaustive statistical functions.

For example, the :code:ttest_ind function of SciPy returns only the T-value and the p-value. By contrast, the :code:ttest function of Pingouin returns the T-value, the p-value, the degrees of freedom, the effect size (Cohen's d), the 95% confidence intervals of the difference in means, the statistical power and the Bayes Factor (BF10) of the test.

Documentation

  • Link to documentation <https://pingouin-stats.org/index.html>_

Chat

If you have questions, please ask them in the public Gitter chat <https://gitter.im/pingouin-stats/Lobby>_

.. image:: https://badges.gitter.im/owner/repo.png :target: https://gitter.im/pingouin-stats/Lobby

Installation

Dependencies

The main dependencies of Pingouin are :

  • NumPy <https://numpy.org/>_ (>= 1.16.5)
  • SciPy <https://www.scipy.org/>_ (>= 1.3.0)
  • Pandas <https://pandas.pydata.org/>_ (>= 0.24)
  • Pandas-flavor <https://github.com/Zsailer/pandas_flavor>_ (>= 0.1.2)
  • Matplotlib <https://matplotlib.org/>_ (>= 3.0.2)
  • Seaborn <https://seaborn.pydata.org/>_ (>= 0.9.0)
  • Outdated <https://github.com/alexmojaki/outdated>_

In addition, some functions require :

  • Statsmodels <https://www.statsmodels.org/>_
  • Scikit-learn <https://scikit-learn.org/>_
  • Mpmath <http://mpmath.org/>_

Pingouin is a Python 3 package and is currently tested for Python 3.6 and 3.7. Pingouin does not work with Python 2.7.

User installation

Pingouin can be easily installed using pip

.. code-block:: shell

pip install pingouin

or conda

.. code-block:: shell

conda install -c conda-forge pingouin

New releases are frequent so always make sure that you have the latest version:

.. code-block:: shell

pip install --upgrade pingouin

Quick start

Click on the link below and navigate to the notebooks/ folder to run a collection of interactive Jupyter notebooks showing the main functionalities of Pingouin. No need to install Pingouin beforehand, the notebooks run in a Binder environment.

.. image:: https://mybinder.org/badge.svg :target: https://mybinder.org/v2/gh/raphaelvallat/pingouin/develop

10 minutes to Pingouin

  1. T-test #########

.. code-block:: python

import numpy as np import pingouin as pg

np.random.seed(123) mean, cov, n = [4, 5], [(1, .6), (.6, 1)], 30 x, y = np.random.multivariate_normal(mean, cov, n).T

T-test

pg.ttest(x, y)

.. table:: Output :widths: auto

====== ===== ========= ======= ============= ========= ====== ======= T dof tail p-val CI95% cohen-d BF10 power ====== ===== ========= ======= ============= ========= ====== ======= -3.401 58 two-sided 0.001 [-1.68 -0.43] 0.878 26.155 0.917 ====== ===== ========= ======= ============= ========= ====== =======


  1. Pearson's correlation ########################

.. code-block:: python

pg.corr(x, y)

.. table:: Output :widths: auto

=== ===== =========== ===== ======== ======= ====== ====== n r CI95% r2 adj_r2 p-val BF10 power === ===== =========== ===== ======== ======= ====== ====== 30 0.595 [0.3 0.79] 0.354 0.306 0.001 69.723 0.95 === ===== =========== ===== ======== ======= ====== ======


  1. Robust correlation #####################

.. code-block:: python

Introduce an outlier

x[5] = 18

Use the robust Shepherd's pi correlation

pg.corr(x, y, method="shepherd")

.. table:: Output :widths: auto

=== ========== ===== =========== ===== ======== ======= ======= n outliers r CI95% r2 adj_r2 p-val power === ========== ===== =========== ===== ======== ======= ======= 30 1 0.561 [0.25 0.77] 0.315 0.264 0.002 0.917 === ========== ===== =========== ===== ======== ======= =======


  1. Test the normality of the data #################################

The pingouin.normality function works with lists, arrays, or pandas DataFrame in wide or long-format.

.. code-block:: python

print(pg.normality(x)) # Univariate normality print(pg.multivariate_normality(np.column_stack((x, y)))) # Multivariate normality

.. table:: Output :widths: auto

===== ====== ======== W pval normal ===== ====== ======== 0.615 0.000 False ===== ====== ========

.. parsed-literal::

(False, 0.00018)


  1. One-way ANOVA using a pandas DataFrame #########################################

.. code-block:: python

Read an example dataset

df = pg.read_dataset('mixed_anova')

Run the ANOVA

aov = pg.anova(data=df, dv='Scores', between='Group', detailed=True) print(aov)

.. table:: Output :widths: auto

======== ======= ==== ===== ======= ======= ======= Source SS DF MS F p-unc np2 ======== ======= ==== ===== ======= ======= ======= Group 5.460 1 5.460 5.244 0.023 0.029 Within 185.343 178 1.041 nan nan nan ======== ======= ==== ===== ======= ======= =======


  1. Repeated measures ANOVA ##########################

.. code-block:: python

pg.rm_anova(data=df, dv='Scores', within='Time', subject='Subject', detailed=True)

.. table:: Output :widths: auto

======== ======= ==== ===== ======= ======= ======= ======= Source SS DF MS F p-unc np2 eps ======== ======= ==== ===== ======= ======= ======= ======= Time 7.628 2 3.814 3.913 0.023 0.062 0.999 Error 115.027 118 0.975 nan nan nan nan ======== ======= ==== ===== ======= ======= ======= =======


  1. Post-hoc tests corrected for multiple-comparisons ####################################################

.. code-block:: python

FDR-corrected post hocs with Hedges'g effect size

posthoc = pg.pairwise_ttests(data=df, dv='Scores', within='Time', subject='Subject', parametric=True, padjust='fdr_bh', effsize='hedges')

Pretty printing of table

pg.print_table(posthoc, floatfmt='.3f')

.. table:: Output :widths: auto

========== ======= ======= ======== ============ ====== ====== ========= ======= ======== ========== ====== ======== Contrast A B Paired Parametric T dof Tail p-unc p-corr p-adjust BF10 hedges ========== ======= ======= ======== ============ ====== ====== ========= ======= ======== ========== ====== ======== Time August January True True -1.740 59.000 two-sided 0.087 0.131 fdr_bh 0.582 -0.328 Time August June True True -2.743 59.000 two-sided 0.008 0.024 fdr_bh 4.232 -0.485 Time January June True True -1.024 59.000 two-sided 0.310 0.310 fdr_bh 0.232 -0.170 ========== ======= ======= ======== ============ ====== ====== ========= ======= ======== ========== ====== ========


  1. Two-way mixed ANOVA ######################

.. code-block:: python

Compute the two-way mixed ANOVA

aov = pg.mixed_anova(data=df, dv='Scores', between='Group', within='Time', subject='Subject', correction=False, effsize="np2") pg.print_table(aov)

.. table:: Output :widths: auto

=========== ===== ===== ===== ===== ===== ======= ===== ======= Source SS DF1 DF2 MS F p-unc np2 eps =========== ===== ===== ===== ===== ===== ======= ===== ======= Group 5.460 1 58 5.460 5.052 0.028 0.080 nan Time 7.628 2 116 3.814 4.027 0.020 0.065 0.999 Interaction 5.167 2 116 2.584 2.728 0.070 0.045 nan =========== ===== ===== ===== ===== ===== ======= ===== =======


  1. Pairwise correlations between columns of a dataframe #######################################################

.. code-block:: python

import pandas as pd np.random.seed(123) z = np.random.normal(5, 1, 30) data = pd.DataFrame({'X': x, 'Y': y, 'Z': z}) pg.pairwise_corr(data, columns=['X', 'Y', 'Z'], method='pearson')

.. table:: Output :widths: auto

=== === ======== ========= === ===== ============= ===== ======== ===== ======= ====== ======= X Y method tail n r CI95% r2 adj_r2 z p-unc BF10 power === === ======== ========= === ===== ============= ===== ======== ===== ======= ====== ======= X Y pearson two-sided 30 0.366 [0.01 0.64] 0.134 0.070 0.384 0.047 1.500 0.525 X Z pearson two-sided 30 0.251 [-0.12 0.56] 0.063 -0.006 0.257 0.181 0.534 0.272 Y Z pearson two-sided 30 0.020 [-0.34 0.38] 0.000 -0.074 0.020 0.916 0.228 0.051 === === ======== ========= === ===== ============= ===== ======== ===== ======= ====== =======

  1. Convert between effect sizes ################################

.. code-block:: python

# Convert from Cohen's d to Hedges' g
pg.convert_effsize(0.4, 'cohen', 'hedges', nx=10, ny=12)

.. parsed-literal::

0.384
  1. Multiple linear regression ##############################

.. code-block:: python

pg.linear_regression(data[['X', 'Z']], data['Y'])

.. table:: Linear regression summary :widths: auto

========= ====== ===== ====== ====== ===== ======== ========== =========== names coef se T pval r2 adj_r2 CI[2.5%] CI[97.5%] ========= ====== ===== ====== ====== ===== ======== ========== =========== Intercept 4.650 0.841 5.530 0.000 0.139 0.076 2.925 6.376 X 0.143 0.068 2.089 0.046 0.139 0.076 0.003 0.283 Z -0.069 0.167 -0.416 0.681 0.139 0.076 -0.412 0.273 ========= ====== ===== ====== ====== ===== ======== ========== ===========

  1. Mediation analysis ######################

.. code-block:: python

pg.mediation_analysis(data=data, x='X', m='Z', y='Y', seed=42, n_boot=1000)

.. table:: Mediation summary :widths: auto

======== ====== ===== ====== ========== =========== ===== path coef se pval CI[2.5%] CI[97.5%] sig ======== ====== ===== ====== ========== =========== ===== Z ~ X 0.103 0.075 0.181 -0.051 0.256 No Y ~ Z 0.018 0.171 0.916 -0.332 0.369 No Total 0.136 0.065 0.047 0.002 0.269 Yes Direct 0.143 0.068 0.046 0.003 0.283 Yes Indirect -0.007 0.025 0.898 -0.069 0.029 No ======== ====== ===== ====== ========== =========== =====

  1. Contingency analysis ########################

.. code-block:: python

data = pg.read_dataset('chi2_independence')
expected, observed, stats = pg.chi2_independence(data, x='sex', y='target')
stats

.. table:: Chi-squared tests summary :widths: auto

================== ======== ====== ===== ===== ======== ======= test lambda chi2 dof p cramer power ================== ======== ====== ===== ===== ======== ======= pearson 1.000 22.717 1.000 0.000 0.274 0.997 cressie-read 0.667 22.931 1.000 0.000 0.275 0.998 log-likelihood 0.000 23.557 1.000 0.000 0.279 0.998 freeman-tukey -0.500 24.220 1.000 0.000 0.283 0.998 mod-log-likelihood -1.000 25.071 1.000 0.000 0.288 0.999 neyman -2.000 27.458 1.000 0.000 0.301 0.999 ================== ======== ====== ===== ===== ======== =======

Integration with Pandas

Several functions of Pingouin can be used directly as pandas DataFrame methods. Try for yourself with the code below:

.. code-block:: python

import pingouin as pg

Example 1 | ANOVA

df = pg.read_dataset('mixed_anova') df.anova(dv='Scores', between='Group', detailed=True)

Example 2 | Pairwise correlations

data = pg.read_dataset('mediation') data.pairwise_corr(columns=['X', 'M', 'Y'], covar=['Mbin'])

Example 3 | Partial correlation matrix

data.pcorr()

The functions that are currently supported as pandas method are:

  • pingouin.anova <https://pingouin-stats.org/generated/pingouin.anova.html#pingouin.anova>_
  • pingouin.ancova <https://pingouin-stats.org/generated/pingouin.ancova.html#pingouin.ancova>_
  • pingouin.rm_anova <https://pingouin-stats.org/generated/pingouin.rm_anova.html#pingouin.rm_anova>_
  • pingouin.mixed_anova <https://pingouin-stats.org/generated/pingouin.mixed_anova.html#pingouin.mixed_anova>_
  • pingouin.welch_anova <https://pingouin-stats.org/generated/pingouin.welch_anova.html#pingouin.welch_anova>_
  • pingouin.pairwise_ttests <https://pingouin-stats.org/generated/pingouin.pairwise_ttests.html#pingouin.pairwise_ttests>_
  • pingouin.pairwise_ttests <https://pingouin-stats.org/generated/pingouin.pairwise_tukey.html#pingouin.pairwise_tukey>_
  • pingouin.pairwise_corr <https://pingouin-stats.org/generated/pingouin.pairwise_corr.html#pingouin.pairwise_corr>_
  • pingouin.partial_corr <https://pingouin-stats.org/generated/pingouin.partial_corr.html#pingouin.partial_corr>_
  • pingouin.pcorr <https://pingouin-stats.org/generated/pingouin.pcorr.html#pingouin.pcorr>_
  • pingouin.rcorr <https://pingouin-stats.org/generated/pingouin.rcorr.html#pingouin.rcorr>_
  • pingouin.mediation_analysis <https://pingouin-stats.org/generated/pingouin.mediation_analysis.html#pingouin.mediation_analysis>_

Development

Pingouin was created and is maintained by Raphael Vallat <https://raphaelvallat.github.io>_, mostly during his spare time. Contributions are more than welcome so feel free to contact me, open an issue or submit a pull request!

To see the code or report a bug, please visit the GitHub repository <https://github.com/raphaelvallat/pingouin>_.

Note that this program is provided with NO WARRANTY OF ANY KIND. If you can, always double check the results with another statistical software.

Contributors

  • Nicolas Legrand
  • Richard Höchenberger <http://hoechenberger.net/>_
  • Arthur Paulino <https://github.com/arthurpaulino>_
  • Eelke Spaak <https://eelkespaak.nl/>_
  • Johannes Elfner <https://www.linkedin.com/in/johannes-elfner/>_
  • Stefan Appelhoff <https://stefanappelhoff.com>_

How to cite Pingouin?

If you want to cite Pingouin, please use the publication in JOSS:

  • Vallat, R. (2018). Pingouin: statistics in Python. Journal of Open Source Software, 3(31), 1026, https://doi.org/10.21105/joss.01026 <https://doi.org/10.21105/joss.01026>_

Acknowledgement

Several functions of Pingouin were inspired from R or Matlab toolboxes, including:

  • effsize package (R) <https://cran.r-project.org/web/packages/effsize/effsize.pdf>_
  • ezANOVA package (R) <https://cran.r-project.org/web/packages/ez/ez.pdf>_
  • pwr package (R) <https://cran.r-project.org/web/packages/pwr/pwr.pdf>_
  • circular statistics (Matlab) <https://www.mathworks.com/matlabcentral/fileexchange/10676-circular-statistics-toolbox-directional-statistics>_
  • robust correlations (Matlab) <https://sourceforge.net/projects/robustcorrtool/>_
  • repeated-measure correlation (R) <https://cran.r-project.org/web/packages/rmcorr/index.html>_
  • real-statistics.com <https://www.real-statistics.com/>_
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].