Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → danielhanchen → Hyperlearn

danielhanchen / Hyperlearn

Licence: bsd-3-clause

50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

Programming Languages

python

139335 projects - #7 most used programming language

Labels

jupyter-notebook deep-learning machine-learning pytorch data-science neural-network gpu statistics optimization data-analysis scikit-learn research tensor

Projects that are alternatives of or similar to Hyperlearn

Ml Workspace

🛠 All-in-one web-based IDE specialized for machine learning and data science.

Stars: ✭ 2,337 (+94.1%)

Mutual labels: jupyter-notebook, data-science, data-analysis, gpu, scikit-learn

Bayesian Cognitive Modeling In Pymc3

PyMC3 codes of Lee and Wagenmakers' Bayesian Cognitive Modeling - A Pratical Course

Stars: ✭ 93 (-92.28%)

Mutual labels: jupyter-notebook, data-science, statistics, data-analysis

Datacamp

🍧 A repository that contains courses I have taken on DataCamp

Stars: ✭ 69 (-94.27%)

Mutual labels: jupyter-notebook, data-science, statistics, data-analysis

Interactive machine learning

IPython widgets, interactive plots, interactive machine learning

Stars: ✭ 140 (-88.37%)

Mutual labels: jupyter-notebook, data-science, statistics, scikit-learn

Pandas Profiling

Create HTML profiling reports from pandas DataFrame objects

Stars: ✭ 8,329 (+591.78%)

Mutual labels: jupyter-notebook, data-science, statistics, data-analysis

Virgilio

Virgilio is developed and maintained by these awesome people. You can email us virgilio.datascience (at) gmail.com or join the Discord chat.

Stars: ✭ 13,200 (+996.35%)

Mutual labels: jupyter-notebook, data-science, statistics, scikit-learn

Dat8

General Assembly's 2015 Data Science course in Washington, DC

Stars: ✭ 1,516 (+25.91%)

Mutual labels: jupyter-notebook, data-science, data-analysis, scikit-learn

Amazing Feature Engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

Stars: ✭ 218 (-81.89%)

Mutual labels: jupyter-notebook, data-science, data-analysis, scikit-learn

Covid19 Severity Prediction

Extensive and accessible COVID-19 data + forecasting for counties and hospitals. 📈

Stars: ✭ 170 (-85.88%)

Mutual labels: jupyter-notebook, data-science, statistics, data-analysis

Machine Learning With Python

Practice and tutorial-style notebooks covering wide variety of machine learning techniques

Stars: ✭ 2,197 (+82.48%)

Mutual labels: jupyter-notebook, data-science, statistics, scikit-learn

Imodels

Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).

Stars: ✭ 194 (-83.89%)

Mutual labels: jupyter-notebook, data-science, statistics, scikit-learn

Awesome Python Data Science

Probably the best curated list of data science software in Python.

Stars: ✭ 812 (-32.56%)

Mutual labels: data-science, statistics, data-analysis, scikit-learn

Crime Analysis

Association Rule Mining from Spatial Data for Crime Analysis

Stars: ✭ 20 (-98.34%)

Mutual labels: jupyter-notebook, data-science, scikit-learn

Data Science On Gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

Stars: ✭ 864 (-28.24%)

Mutual labels: jupyter-notebook, data-science, data-analysis

My Journey In The Data Science World

📢 Ready to learn or review your knowledge!

Stars: ✭ 1,175 (-2.41%)

Mutual labels: jupyter-notebook, data-science, data-analysis

Python for ml

brief introduction to Python for machine learning

Stars: ✭ 29 (-97.59%)

Mutual labels: jupyter-notebook, data-science, scikit-learn

Mlcourse.ai

Open Machine Learning Course

Stars: ✭ 7,963 (+561.38%)

Mutual labels: data-science, data-analysis, scikit-learn

Mlj.jl

A Julia machine learning framework

Stars: ✭ 982 (-18.44%)

Mutual labels: jupyter-notebook, data-science, statistics

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (-18.11%)

Mutual labels: jupyter-notebook, data-science, data-analysis

Socrat

A Dynamic Web Toolbox for Interactive Data Processing, Analysis, and Visualization

Stars: ✭ 26 (-97.84%)

Mutual labels: data-science, statistics, data-analysis

View All Similar Projects ➔

[Due to the time taken @ uni, work + hell breaking loose in my life, since things have calmed down a bit, will continue commiting!!!] [By the way, I'm still looking for new contributors! Please help make HyperLearn no1!!]

HyperLearn is what drives Umbra's AI engines. It is open source to everyone, everywhere, and we hope humanity can rise to the stars.

[Notice - I will be updating the package monthly or bi-weekly due to other commitments]

https://hyperlearn.readthedocs.io/en/latest/index.html

Faster, Leaner GPU Sklearn, Statsmodels written in PyTorch

50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels combo with new novel algorithms.

HyperLearn is written completely in PyTorch, NoGil Numba, Numpy, Pandas, Scipy & LAPACK, and mirrors (mostly) Scikit Learn. HyperLearn also has statistical inference measures embedded, and can be called just like Scikit Learn's syntax (model.confidence_interval_) Ongoing documentation: https://hyperlearn.readthedocs.io/en/latest/index.html

I'm also writing a mini book! A sneak peak:

Comparison of Speed / Memory

Algorithm	n	p	Time(s)		RAM(mb)		Notes
			Sklearn	Hyperlearn	Sklearn	Hyperlearn
QDA (Quad Dis A)	1000000	100	54.2	22.25	2,700	1,200	Now parallelized
LinearRegression	1000000	100	5.81	0.381	700	10	Guaranteed stable & fast

Time(s) is Fit + Predict. RAM(mb) = max( RAM(Fit), RAM(Predict) )

I've also added some preliminary results for N = 5000, P = 6000

Since timings are not good, I have submitted 2 bug reports to Scipy + PyTorch:

EIGH very very slow --> suggesting an easy fix #9212 https://github.com/scipy/scipy/issues/9212
SVD very very slow and GELS gives nans, -inf #11174 https://github.com/pytorch/pytorch/issues/11174

Help is really needed! Message me!

Key Methodologies and Aims

1. Embarrassingly Parallel For Loops

2. 50%+ Faster, 50%+ Leaner

3. Why is Statsmodels sometimes unbearably slow?

4. Deep Learning Drop In Modules with PyTorch

5. 20%+ Less Code, Cleaner Clearer Code

6. Accessing Old and Exciting New Algorithms

1. Embarrassingly Parallel For Loops

Including Memory Sharing, Memory Management
CUDA Parallelism through PyTorch & Numba

2. 50%+ Faster, 50%+ Leaner

Matrix Multiplication Ordering: https://en.wikipedia.org/wiki/Matrix_chain_multiplication
Element Wise Matrix Multiplication reducing complexity to O(n^2) from O(n^3): https://en.wikipedia.org/wiki/Hadamard_product_(matrices)
Reducing Matrix Operations to Einstein Notation: https://en.wikipedia.org/wiki/Einstein_notation
Evaluating one-time Matrix Operations in succession to reduce RAM overhead.
If p>>n, maybe decomposing X.T is better than X.
Applying QR Decomposition then SVD might be faster in some cases.
Utilise the structure of the matrix to compute faster inverse (eg triangular matrices, Hermitian matrices).
Computing SVD(X) then getting pinv(X) is sometimes faster than pure pinv(X)

3. Why is Statsmodels sometimes unbearably slow?

Confidence, Prediction Intervals, Hypothesis Tests & Goodness of Fit tests for linear models are optimized.
Using Einstein Notation & Hadamard Products where possible.
Computing only what is necessary to compute (Diagonal of matrix and not entire matrix).
Fixing the flaws of Statsmodels on notation, speed, memory issues and storage of variables.

4. Deep Learning Drop In Modules with PyTorch

Using PyTorch to create Scikit-Learn like drop in replacements.

5. 20%+ Less Code, Cleaner Clearer Code

Using Decorators & Functions where possible.
Intuitive Middle Level Function names like (isTensor, isIterable).
Handles Parallelism easily through hyperlearn.multiprocessing

6. Accessing Old and Exciting New Algorithms

Matrix Completion algorithms - Non Negative Least Squares, NNMF
Batch Similarity Latent Dirichelt Allocation (BS-LDA)
Correlation Regression
Feasible Generalized Least Squares FGLS
Outlier Tolerant Regression
Multidimensional Spline Regression
Generalized MICE (any model drop in replacement)
Using Uber's Pyro for Bayesian Deep Learning

Goals & Development Schedule

Will Focus on & why:

1. Singular Value Decomposition & QR Decomposition

* SVD/QR is the backbone for many algorithms including:
    * Linear & Ridge Regression (Regression)
    * Statistical Inference for Regression methods (Inference)
    * Principal Component Analysis (Dimensionality Reduction)
    * Linear & Quadratic Discriminant Analysis (Classification & Dimensionality Reduction)
    * Pseudoinverse, Truncated SVD (Linear Algebra)
    * Latent Semantic Indexing LSI (NLP)
    * (new methods) Correlation Regression, FGLS, Outlier Tolerant Regression, Generalized MICE, Splines (Regression)

~~On Licensing:~~ ~~HyperLearn is under a GNU v3 License. This means:~~

Commercial use is restricted. Only software with 0 cost can be released. Ie: no closed source versions are allowed.
Using HyperLearn must entail all of the code being avaliable to everyone who uses your public software.
HyperLearn is intended for academic, research and personal purposes. Any explicit commercialisation of the algorithms and anything inside HyperLearn is strictly prohibited.

HyperLearn promotes a free and just world. Hence, it is free to everyone, except for those who wish to commercialise on top of HyperLearn. Ongoing documentation: https://hyperlearn.readthedocs.io/en/latest/index.html [As of 2020, HyperLearn's license has been changed to BSD 3]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 1,204

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (15) 🔗