All Projects → glm-tools → Pyglmnet

glm-tools / Pyglmnet

Licence: mit
Python implementation of elastic-net regularized generalized linear models

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pyglmnet

Gspread Pandas
A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
Stars: ✭ 226 (-3.83%)
Mutual labels:  data-science
Tablesaw
Java dataframe and visualization library
Stars: ✭ 2,785 (+1085.11%)
Mutual labels:  data-science
Ploomber
A convention over configuration workflow orchestrator. Develop locally (Jupyter or your favorite editor), deploy to Airflow or Kubernetes.
Stars: ✭ 221 (-5.96%)
Mutual labels:  data-science
Streamlit
Streamlit — The fastest way to build data apps in Python
Stars: ✭ 16,906 (+7094.04%)
Mutual labels:  data-science
R4ds Exercise Solutions
Exercise solutions to "R for Data Science"
Stars: ✭ 226 (-3.83%)
Mutual labels:  data-science
Prodigy Recipes
🍳 Recipes for the Prodigy, our fully scriptable annotation tool
Stars: ✭ 229 (-2.55%)
Mutual labels:  data-science
Machine Learning Resources
A curated list of awesome machine learning frameworks, libraries, courses, books and many more.
Stars: ✭ 226 (-3.83%)
Mutual labels:  data-science
Plotly Graphing Library For Matlab
Plotly Graphing Library for MATLAB®
Stars: ✭ 234 (-0.43%)
Mutual labels:  data-science
Mydatascienceportfolio
Applying Data Science and Machine Learning to Solve Real World Business Problems
Stars: ✭ 227 (-3.4%)
Mutual labels:  data-science
Deepgraph
Analyze Data with Pandas-based Networks. Documentation:
Stars: ✭ 232 (-1.28%)
Mutual labels:  data-science
Dash
Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.
Stars: ✭ 15,592 (+6534.89%)
Mutual labels:  data-science
Functional intro to python
[tutorial]A functional, Data Science focused introduction to Python
Stars: ✭ 228 (-2.98%)
Mutual labels:  data-science
Pysparkling
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Stars: ✭ 231 (-1.7%)
Mutual labels:  data-science
Elastic
R client for the Elasticsearch HTTP API
Stars: ✭ 227 (-3.4%)
Mutual labels:  data-science
Datascience
Curated list of Python resources for data science.
Stars: ✭ 3,051 (+1198.3%)
Mutual labels:  data-science
Full Stack Data Science
Full Stack Data Science in Python
Stars: ✭ 227 (-3.4%)
Mutual labels:  data-science
Webstruct
NER toolkit for HTML data
Stars: ✭ 230 (-2.13%)
Mutual labels:  data-science
Ntm One Shot Tf
One Shot Learning using Memory-Augmented Neural Networks (MANN) based on Neural Turing Machine architecture in Tensorflow
Stars: ✭ 238 (+1.28%)
Mutual labels:  data-science
Data Mining Conferences
Ranking, acceptance rate, deadline, and publication tips
Stars: ✭ 236 (+0.43%)
Mutual labels:  data-science
Data Science Free
Free Resources For Data Science created by Shubham Kumar
Stars: ✭ 232 (-1.28%)
Mutual labels:  data-science

pyglmnet

A python implementation of elastic-net regularized generalized linear models


|License| |Travis| |Codecov| |Circle| |Gitter| |DOI| |JOSS|

`[Documentation (stable version)]`_ `[Documentation (development version)]`_

.. image:: https://user-images.githubusercontent.com/15852194/67919367-70482600-fb76-11e9-9b86-891969bd2bee.jpg

-  Pyglmnet provides a wide range of noise models (and paired canonical
   link functions): ``'gaussian'``, ``'binomial'``, ``'probit'``,
   ``'gamma'``, '``poisson``', and ``'softplus'``.

-  It supports a wide range of regularizers: ridge, lasso, elastic net,
   `group
   lasso <https://en.wikipedia.org/wiki/Proximal_gradient_methods_for_learning#Group_lasso>`__,
   and `Tikhonov
   regularization <https://en.wikipedia.org/wiki/Tikhonov_regularization>`__.

-  We have implemented a cyclical coordinate descent optimizer with
   Newton update, active sets, update caching, and warm restarts. This
   optimization approach is identical to the one used in R package.

-  A number of Python wrappers exist for the R glmnet package (e.g.
   `here <https://github.com/civisanalytics/python-glmnet>`__ and
   `here <https://github.com/dwf/glmnet-python>`__) but in contrast to
   these, Pyglmnet is a pure python implementation. Therefore, it is
   easy to modify and introduce additional noise models and regularizers
   in the future.

Installation
~~~~~~~~~~~~

Install the stable PyPI version with ``pip``

.. code:: bash

    $ pip install pyglmnet

For the bleeding edge development version:

Clone the repository.

.. code:: bash

    $ pip install https://api.github.com/repos/glm-tools/pyglmnet/zipball/master

Getting Started
~~~~~~~~~~~~~~~


Here is an example on how to use the ``GLM`` estimator.

.. code:: python

    import numpy as np
    import scipy.sparse as sps

    import matplotlib.pyplot as plt
    from pyglmnet import GLM, simulate_glm

    n_samples, n_features = 1000, 100
    distr = 'poisson'

    # sample a sparse model
    np.random.seed(42)
    beta0 = np.random.rand()
    beta = sps.random(1, n_features, density=0.2).toarray()[0]

    # simulate data
    Xtrain = np.random.normal(0.0, 1.0, [n_samples, n_features])
    ytrain = simulate_glm('poisson', beta0, beta, Xtrain)
    Xtest = np.random.normal(0.0, 1.0, [n_samples, n_features])
    ytest = simulate_glm('poisson', beta0, beta, Xtest)

    # create an instance of the GLM class
    glm = GLM(distr='poisson', score_metric='pseudo_R2', reg_lambda=0.01)

    # fit the model on the training data
    glm.fit(Xtrain, ytrain)

    # predict using fitted model on the test data
    yhat = glm.predict(Xtest)

    # score the model on test data
    pseudo_R2 = glm.score(Xtest, ytest)
    print('Pseudo R^2 is %.3f' % pseudo_R2)

    # plot the true coefficients and the estimated ones
    plt.stem(beta, markerfmt='r.', label='True coefficients')
    plt.stem(glm.beta_, markerfmt='b.', label='Estimated coefficients')
    plt.ylabel(r'$\beta$')
    plt.legend(loc='upper right')

    # plot the true vs predicted label
    plt.figure()
    plt.plot(ytest, yhat, '.')
    plt.xlabel('True labels')
    plt.ylabel('Predicted labels')
    plt.plot([0, ytest.max()], [0, ytest.max()], 'r--')
    plt.show()

`More pyglmnet examples and use
cases <http://glm-tools.github.io/pyglmnet/auto_examples/index.html>`__.

Tutorial
~~~~~~~~

Here is an `extensive
tutorial <http://glm-tools.github.io/pyglmnet/tutorial.html>`__ on GLMs,
optimization and pseudo-code.

Here are
`slides <https://pavanramkumar.github.io/pydata-chicago-2016>`__ from a
talk at `PyData Chicago
2016 <http://pydata.org/chicago2016/schedule/presentation/15/>`__,
corresponding `tutorial
notebooks <http://github.com/pavanramkumar/pydata-chicago-2016>`__ and a
`video <https://www.youtube.com/watch?v=zXec96KD1uA>`__.

How to contribute?
~~~~~~~~~~~~~~~~~~

We welcome pull requests. Please see our `developer documentation
page <https://glm-tools.github.io/pyglmnet/contributing.html>`__ for more
details.

Citation
~~~~~~~~

If you use ``pyglmnet`` package in your publication, please cite us from
our `JOSS publication <https://doi.org/10.21105/joss.01959>`__ using the following BibTex

.. code::

   @article{Jas2020,
   doi = {10.21105/joss.01959},
   url = {https://doi.org/10.21105/joss.01959},
   year = {2020},
   publisher = {The Open Journal},
   volume = {5},
   number = {47},
   pages = {1959},
   author = {Mainak Jas and Titipat Achakulvisut and Aid Idrizović
             and Daniel Acuna and Matthew Antalek and Vinicius Marques
             and Tommy Odland and Ravi Garg and Mayank Agrawal
             and Yu Umegaki and Peter Foley and Hugo Fernandes
             and Drew Harris and Beibin Li and Olivier Pieters
             and Scott Otterson and Giovanni De Toni and Chris Rodgers
             and Eva Dyer and Matti Hamalainen and Konrad Kording and Pavan Ramkumar},
   title = {{P}yglmnet: {P}ython implementation of elastic-net regularized generalized linear models},
   journal = {Journal of Open Source Software}
   }

Acknowledgments
~~~~~~~~~~~~~~~

-  `Konrad Kording <http://kordinglab.com>`__ for funding and support
-  `Sara
   Solla <http://www.physics.northwestern.edu/people/joint-faculty/sara-solla.html>`__
   for masterful GLM lectures

License
~~~~~~~

MIT License Copyright (c) 2016-2019 Pavan Ramkumar

.. |License| image:: https://img.shields.io/badge/license-MIT-blue.svg?style=flat
   :target: https://github.com/glm-tools/pyglmnet/blob/master/LICENSE
.. |Travis| image:: https://api.travis-ci.org/glm-tools/pyglmnet.svg?branch=master
   :target: https://travis-ci.org/glm-tools/pyglmnet
.. |Codecov| image:: https://codecov.io/github/glm-tools/pyglmnet/coverage.svg?precision=0
   :target: https://codecov.io/gh/glm-tools/pyglmnet
.. |Circle| image:: https://circleci.com/gh/glm-tools/pyglmnet.svg?style=svg
   :target: https://circleci.com/gh/glm-tools/pyglmnet
.. |Gitter| image:: https://badges.gitter.im/glm-tools/pyglmnet.svg
   :target: https://gitter.im/pavanramkumar/pyglmnet?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge
.. |DOI| image:: https://zenodo.org/badge/55302570.svg
   :target: https://zenodo.org/badge/latestdoi/55302570
.. |JOSS| image:: https://joss.theoj.org/papers/10.21105/joss.01959/status.svg
   :target: https://doi.org/10.21105/joss.01959
.. _[Documentation (stable version)]: http://glm-tools.github.io/pyglmnet
.. _[Documentation (development version)]: https://circleci.com/api/v1.1/project/github/glm-tools/pyglmnet/latest/artifacts/0/html/index.html?branch=master
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].