All Projects → mlgill → Pdlsr

mlgill / Pdlsr

Licence: bsd-3-clause
Pandas-aware non-linear least squares regression using Lmfit

Projects that are alternatives of or similar to Pdlsr

Python And Spark For Data Analysis
A four-day course on Python, the Scientific Python stack and PySpark, adapted from a training course given by Patrick Varilly to one of our clients in December 2015
Stars: ✭ 10 (-9.09%)
Mutual labels:  jupyter-notebook
Pytorch Everybodydancenow
Implementation of Everybody Dance Now by pytorch
Stars: ✭ 861 (+7727.27%)
Mutual labels:  jupyter-notebook
Data Science On Gcp
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (+7754.55%)
Mutual labels:  jupyter-notebook
Dl Workshop Series
Material used for Deep Learning related workshops for Machine Learning Tokyo (MLT)
Stars: ✭ 857 (+7690.91%)
Mutual labels:  jupyter-notebook
Idiomatic Robotframework
Stars: ✭ 10 (-9.09%)
Mutual labels:  jupyter-notebook
Optimization Cookbook
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook
Changepoint Detection
Online Change-point Detection Algorithm for Multi-Variate Data: Applications on Human/Robot Demonstrations.
Stars: ✭ 10 (-9.09%)
Mutual labels:  jupyter-notebook
Tf box classify
A simple TensorFlow example for training CNN models using input queues and labelled JPEGs
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook
Cs224u
Code for Stanford CS224u
Stars: ✭ 857 (+7690.91%)
Mutual labels:  jupyter-notebook
Awesome Google Colab
Google Colaboratory Notebooks and Repositories (by @firmai)
Stars: ✭ 863 (+7745.45%)
Mutual labels:  jupyter-notebook
Convolutional Pose Machines Release
Code repository for Convolutional Pose Machines, CVPR'16
Stars: ✭ 857 (+7690.91%)
Mutual labels:  jupyter-notebook
Wikipediagenderinequality
Stars: ✭ 10 (-9.09%)
Mutual labels:  jupyter-notebook
Adlawithr Gettingstarted
Getting Started with ADLA with R
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook
Pandas jupyter
Laboranyagok
Stars: ✭ 10 (-9.09%)
Mutual labels:  jupyter-notebook
Advanced pymc3
A talk illustrating some of the Advanced features of PyMC3
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook
Tensorflow Tutorial
Basics of Tensorflow
Stars: ✭ 10 (-9.09%)
Mutual labels:  jupyter-notebook
Marketing campaign response prediction
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook
Curso pdi
Material do Curso PDI (Jupyter)
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook
Pandas Tutorials
How To's and Tutorials in Jupyter Notebook
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook
Neurally Embedded Emojis
Convolutional variational autoencoders and text-question, emoji-answer models
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook

pdLSR: Pandas-aware least squares regression

Overview

pdLSR is a library for performing least squares regression. It attempts to seamlessly incorporate this task in a Pandas-focused workflow. Input data are expected in dataframes, and multiple regressions can be performed using functionality similar to Pandas groupby. Results are returned as grouped dataframes and include best-fit parameters, statistics, residuals, and more. The results can be easily visualized using seaborn.

pdLSR currently utilizes lmfit, a flexible and powerful library for least squares minimization, which in turn, makes use of scipy.optimize.leastsq. I began using lmfit because it is one of the few libraries that supports non-linear least squares regression, which is commonly used in the natural sciences. I also like the flexibility it offers for testing different modeling scenarios and the variety of assessment statistics it provides. However, I found myself writing many for loops to perform regressions on groups of data and aggregate the resulting output. Simplification of this task was my inspiration for writing pdLSR.

pdLSR is related to libraries such as statsmodels and scikit-learn that provide linear regression functions that operate on dataframes. However, these libraries don't support grouping operations on dataframes and don't aggregate output into dataframes. Supporting statsmodels and scikit-learn in the future is being considered. (And pull requests adding this functionality would be welcome.)

Some additional 'niceties' associated with the input of parameters and equations have also been incorporated. pdLSR also utilizes multithreading for the calculation of confidence intervals, as this process is time consuming when there are more than a few groups.

Setup

Dependencies

The following libraries are required for pdLSR:

  • numpy
  • pandas
  • lmfit
  • multiprocess

multiprocess is a fork of Python's multiprocessing library that provides more robust multithreading. I found that this library is required for multithreading to work with pdLSR. Both multiprocess and lmfit will install automatically from pip or conda (see below).

For plotting, matplotlib is required and seaborn is recommended.

pdLSR works with Python 2 and 3.

Installation and Demo

Binder

The preferred method for installing pdLSR and all of its dependencies is to use the conda or pip package managers.

  • For conda: conda install -c mlgill pdlsr -- unfortunately conda seems to require lowercase names for packages
  • For pip: pip install pdLSR

However it can also be installed manually by cloning the repo into your PYTHONPATH.

There is a demo notebook that can be executed locally or live from GitHub using mybinder.org. After clicking the badge at the top of this section, navigate to pdLSR --> demo --> pdLSR_demo.ipynb and everything should be setup to execute the demo in a browser. No installation required!

Documentation

The functions of pdLSR are documented within the code, but currently the best single source for using pdLSR is the demo notebook. Developing stand-alone documentation is a future goal.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].