Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.

Stars: ✭ 1,868 (+1188.28%)

Mutual labels: scikit-learn

Onnx

Open standard for machine learning interoperability

Stars: ✭ 11,829 (+8057.93%)

Mutual labels: scikit-learn

Pbpython

Code, Notebooks and Examples from Practical Business Python

Stars: ✭ 1,724 (+1088.97%)

Mutual labels: scikit-learn

Hep ml

Machine Learning for High Energy Physics.

Stars: ✭ 133 (-8.28%)

Mutual labels: scikit-learn

Studybook

Study E-Book(ComputerVision DeepLearning MachineLearning Math NLP Python ReinforcementLearning)

Stars: ✭ 1,457 (+904.83%)

Mutual labels: scikit-learn

Ml Forex Prediction

Predicting Forex Future Price with Machine Learning

Stars: ✭ 142 (-2.07%)

Mutual labels: scikit-learn

Dive Into Machine Learning

Dive into Machine Learning with Python Jupyter notebook and scikit-learn! First posted in 2016, maintained as of 2021. Pull requests welcome.

Stars: ✭ 10,810 (+7355.17%)

Mutual labels: scikit-learn

Interactive machine learning

IPython widgets, interactive plots, interactive machine learning

Stars: ✭ 140 (-3.45%)

Mutual labels: scikit-learn

Python Flask Sklearn Docker Template

A simple example of python api for real time machine learning, using scikit-learn, Flask and Docker

Stars: ✭ 117 (-19.31%)

Mutual labels: scikit-learn

Auto ml

[UNMAINTAINED] Automated machine learning for analytics & production

Stars: ✭ 1,559 (+975.17%)

Mutual labels: scikit-learn

Qlik Py Tools

Data Science algorithms for Qlik implemented as a Python Server Side Extension (SSE).

Stars: ✭ 135 (-6.9%)

Mutual labels: scikit-learn

Ml Email Clustering

Email clustering with machine learning

Stars: ✭ 116 (-20%)

Mutual labels: scikit-learn

Python Cheat Sheet

Python Cheat Sheet NumPy, Matplotlib

Stars: ✭ 1,739 (+1099.31%)

Mutual labels: scikit-learn

Daal4py

sources for daal4py - a convenient Python API to oneDAL

Stars: ✭ 113 (-22.07%)

Mutual labels: scikit-learn

Pydata Chicago2016 Ml Tutorial

Machine learning with scikit-learn tutorial at PyData Chicago 2016

Stars: ✭ 128 (-11.72%)

Mutual labels: scikit-learn

Python Machine Learning Book

The "Python Machine Learning (1st edition)" book code repository and info resource

Stars: ✭ 11,428 (+7781.38%)

Mutual labels: scikit-learn

Py4chemoinformatics

Python for chemoinformatics

Stars: ✭ 140 (-3.45%)

Mutual labels: scikit-learn

Gesture Recognition

✋ Recognizing "Hand Gestures" using OpenCV and Python.

Stars: ✭ 136 (-6.21%)

Mutual labels: scikit-learn

View All Similar Projects ➔

pyensemble v0.41

---> ARCHIVED March 2021 <---

An implementation of [Caruana et al's Ensemble Selection algorithm] (http://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml04.icdm06long.pdf) [1][2] in Python, based on scikit-learn.

From the abstract:

We present a method for constructing ensembles from libraries of thousands of models. Model libraries are generated using different learning algorithms and parameter settings. Forward stepwise selection is used to add to the ensemble the models that maximize its performance. Ensemble selection allows ensembles to be optimized to performance metrics such as accuracy, cross entropy, mean precision or ROC Area. Experiments with seven test problems and ten metrics demonstrate the benefit of ensemble selection.

It's a work in progress, so things can/might/will change.

David C. Lambert
dcl [at] panix [dot] com

Files

ensemble.py

Containing the EnsembleSelectionClassifier object

The EnsembleSelectionClassifier object tries to implement all of the methods in the combined paper, including internal cross validation, bagged ensembling, initialization with the best models, pruning of the worst models prior to selection, and sampling with replacement of the model candidates.

It uses sqlite as the backing store containing pickled unfitted models, fitted model 'siblings' for each internal cross validation fold, scores and predictions for each model, and the list of model ids and weightings for the final ensemble.

Hillclimbing can be performed using auc, accuracy, rmse, cross entropy or F1 score.

If the object is initialized with the model parameter equal to None, the object tries to load a fitted ensemble from the database specified.

(NOTE: Expects class labels to be sequential integers starting at zero [for now].)

model_library.py

Example model library building code.

ensemble_train.py

Training utility to run ensemble selection on svm data files.

The user can choose from the following candidate models:

sgd : Stochastic Gradient Descent
svc : Support Vector Machines
gbc : Gradient Boosting Classifiers
dtree : Decision Trees
forest : Random Forests
extra : Extra Trees
kmp : KMeans->LogisticRegression Pipelines
kernp : Nystroem Approx->Logistic Regression Pipelines

Some model choices are very slow. The default is to use decision trees, which are reasonably fast.

The simplest command line is:

unix> ./ensemble_train.py some_dbfile.db some_data.svm

(NOTE: Expects 'some_dbfile.db' not to exist, and will quit if it does [so you don't accidentally blow away your model].)

Full usage is:

usage: ensemble_train.py [-h]
                         [-M {svc,sgd,gbc,dtree,forest,extra,kmp,kernp}
                            [{svc,sgd,gbc,dtree,forest,extra,kmp,kernp} ...]]
                         [-S {f1,auc,rmse,accuracy,xentropy}] [-b N_BAGS]
                         [-f BAG_FRACTION] [-B N_BEST] [-m MAX_MODELS]
                         [-F N_FOLDS] [-p PRUNE_FRACTION] [-u] [-U]
                         [-e EPSILON] [-t TEST_SIZE] [-s SEED] [-v]
                         db_file data_file

EnsembleSelectionClassifier training harness

positional arguments:
  db_file               sqlite db file for backing store
  data_file             training data in svm format

optional arguments:
  -h, --help            show this help message and exit
  -M {svc,sgd,gbc,dtree,forest,extra,kmp,kernp}
    [{svc,sgd,gbc,dtree,forest,extra,kmp,kernp} ...]
                        model types to include as ensemble candidates
                        (default: ['dtree'])
  -S {f1,auc,rmse,accuracy,xentropy}
                        scoring metric used for hillclimbing (default:
                        accuracy)
  -b N_BAGS             bags to create (default: 20)
  -f BAG_FRACTION       fraction of models in each bag (after pruning)
                        (default: 0.25)
  -B N_BEST             number of best models in initial ensemble (default: 5)
  -m MAX_MODELS         maximum number of models per bagged ensemble (default:
                        25)
  -F N_FOLDS            internal cross-validation folds (default: 3)
  -p PRUNE_FRACTION     fraction of worst models pruned pre-selection
                        (default: 0.75)
  -u                    use epsilon to stop adding models (default: False)
  -U                    use bootstrap sample to generate training/hillclimbing
                        folds (default: False)
  -e EPSILON            score improvement threshold to include new model
                        (default: 0.0001)
  -t TEST_SIZE          fraction of data to use for testing (default: 0.75)
  -s SEED               random seed
  -v                    show progress messages

ensemble_predict.py

Get predictions from trained EnsembleSelectionClassifier given svm format data file.

Can output predicted classes or probabilities from the full ensemble or just the best model.

Expects to find a trained ensemble in the sqlite db specified.

usage: ensemble_predict.py [-h] [-s {best,ens}] [-p] db_file data_file

Get EnsembleSelectionClassifier predictions

positional arguments:
  db_file        sqlite db file containing model
  data_file      testing data in svm format

optional arguments:
  -h, --help     show this help message and exit
  -s {best,ens}  choose source of prediction ["best", "ens"]
  -p             predict probabilities

Requirements

Written using Python 2.7.3, numpy 1.6.1, scipy 0.10.1, scikit-learn 0.14.1 and sqlite 3.7.14

References

[1] Caruana, et al, "Ensemble Selection from Libraries of Rich Models", Proceedings of the 21st International Conference on Machine Learning (ICML `04).

[2] Caruana, et al, "Getting the Most Out of Ensemble Selection", Proceedings of the 6th International Conference on Data Mining (ICDM `06).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 145

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗