Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → allentran → Pca Magic

allentran / Pca Magic

Licence: apache-2.0

PCA that iteratively replaces missing data

Programming Languages

python

139335 projects - #7 most used programming language

Labels

machine-learning data-science pca

Projects that are alternatives of or similar to Pca Magic

Vizuka

Explore high-dimensional datasets and how your algo handles specific regions.

Stars: ✭ 100 (-45.95%)

Mutual labels: data-science, pca

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+2957.3%)

Mutual labels: data-science, pca

Book list

Python, Machine Learning, Deep Learning and Data Science Books

Stars: ✭ 176 (-4.86%)

Mutual labels: data-science

Awesome Computer Science Opportunities

An awesome list of events and fellowship opportunities for Computer Science students

Stars: ✭ 2,445 (+1221.62%)

Mutual labels: data-science

Andrew Ng Notes

This is Andrew NG Coursera Handwritten Notes.

Stars: ✭ 180 (-2.7%)

Mutual labels: data-science

Metrics

Machine learning metrics for distributed, scalable PyTorch applications.

Stars: ✭ 162 (-12.43%)

Mutual labels: data-science

Computationalhealthcare

A platform for analysis & development of machine learning models using large de-identified healthcare datasets.

Stars: ✭ 180 (-2.7%)

Mutual labels: data-science

Web Database Analytics

Web scrapping and related analytics using Python tools

Stars: ✭ 175 (-5.41%)

Mutual labels: data-science

Collapse

Advanced and Fast Data Transformation in R

Stars: ✭ 184 (-0.54%)

Mutual labels: data-science

Docker Galaxy Stable

🐳📊📚 Docker Images tracking the stable Galaxy releases.

Stars: ✭ 179 (-3.24%)

Mutual labels: data-science

Imbalanced Algorithms

Python-based implementations of algorithms for learning on imbalanced data.

Stars: ✭ 180 (-2.7%)

Mutual labels: data-science

Data Science Masters

Self-study plan to achieve mastery in data science

Stars: ✭ 179 (-3.24%)

Mutual labels: data-science

Soda Sql

Metric collection, data testing and monitoring for SQL accessible data

Stars: ✭ 173 (-6.49%)

Mutual labels: data-science

Lets Plot Kotlin

Kotlin API for Lets-Plot - an open-source plotting library for statistical data.

Stars: ✭ 181 (-2.16%)

Mutual labels: data-science

Chefboost

A Lightweight Decision Tree Framework supporting regular algorithms: ID3, C4,5, CART, CHAID and Regression Trees; some advanced techniques: Gradient Boosting (GBDT, GBRT, GBM), Random Forest and Adaboost w/categorical features support for Python

Stars: ✭ 176 (-4.86%)

Mutual labels: data-science

Awesome R Learning Resources

A curated collection of free resources to help deepen your understanding of the R programming language. Updated regularly. Contributions encouraged via pull request (see contributing.md).

Stars: ✭ 181 (-2.16%)

Mutual labels: data-science

Scikit Plot

An intuitive library to add plotting functionality to scikit-learn objects.

Stars: ✭ 2,162 (+1068.65%)

Mutual labels: data-science

Deep Rules

Ten Quick Tips for Deep Learning in Biology

Stars: ✭ 179 (-3.24%)

Mutual labels: data-science

Ml Glossary

Machine learning glossary

Stars: ✭ 2,338 (+1163.78%)

Mutual labels: data-science

Homlr

Supplementary material for Hands-On Machine Learning with R, an applied book covering the fundamentals of machine learning with R.

Stars: ✭ 185 (+0%)

Mutual labels: data-science

View All Similar Projects ➔

pca-magic

An implementaton of probabilisitc principal components analysis which is a variant of vanilla PCA that can be used to

compute factors where some of the data are missing
interpolate data by using information from additional series

Often, you want to use PCA but your data is smattered with missing data. See below, where the white represents missing data in 14k+ time series in the Current Population Survey, a monthly survey of about 60k households conducted by the United States Census Bureau since 1940.

If enough of the data is not missing, you can fill in the missing data with sample means or some other interpolated value but if you have too much missing data, your rudimentary interpolation is going to overwhelm the signal in the data with noise. (Think about the limiting case with all but one missing data point).

A better way: suppose you had the latent factors representing the matrix. Construct a linear model for each series and then use the resulting model for interpolation. Intuitively, this will preserve the signal from the data as the interpolated values come from latent factors.

However, the problem is you never have these factors to begin with. The old chicken and egg problem. But no matter, fixed point algorithms via Probabilistic PCA to the rescue.

With this strategy, over 50 percent of the variance in those 14k+ time series in the CPS can be explained by just 12 factors.

Installation

Install via pip:

pip install ppca

Load in the data which should be arranged as n_samples by features. As usual, you should make sure your data is stationary (take first differences if possible) and standardized.

from ppca import PPCA
ppca = PPCA()

Fit the model with parameter d specifying the number of components and verbose printing convergence output if required.

ppca.fit(data=data, d=100, verbose=True)

The model parameters and components will be attached to the ppca object.

variance_explained = ppca.var_exp
components = ppca.data
model_params = ppca.C

If you want the principal components, call transform.

component_mat = ppca.transform()

Post fitting the model, save the model if you want.

ppca.save('mypcamodel')

Load a model, post instantiating a PPCA object. This will make fitting/transforming much faster.

ppca.load('mypcamodel.npy')

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 185

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗