Calibrated Boosting-Forest
Calibrated Boosting-Forest (CBF) is an integrative technique that leverages both continuous and binary labels and output calibrated posterior probabilities. It is originally designed for ligand-based virtual screening and can be extended to
other applications.
Calibrated Boosting-Forest is a package created by Haozhen Wu from Small Molecule Screening Facility
at University of Wisconsin-Madison.
For more details, please see our paper:
Calibrated Boosting-Forest
by Haozhen Wu
Key features:
- Take both continuous and binary labels as input (multi-labels)
- Superior ranking power over individual regression or classification model
- Output well calibrated posterior probabilities
- Streamlined hyper-parameter tuning stage
- Support multiple evaluation and stopping metrics
- Competitive benchmark results for well-known public datasets
- XGBoost backend
Table of contents:
Dependencies:
- scikit-learn version = 0.18.1
- XGBoost version = 0.6
- numpy version = 1.11.1
- scipy version = 0.18.1
- pandas version = 0.18.1
- rdkit version = 2015.09.1
- pytest (optional)
Installation
We recommend you to use Anaconda for convenient installing packages. Right now, LightChem has been tested for Python 2.7 under OS X and linux Ubuntu Server 16.04.
-
Download 64-bit Python 2.7 version of Anaconda for linux/OS X here and follow the instruction. After you installed Anaconda, you will have most of the dependencies ready.
-
Install git if do not have:
Linux Ubuntu:sudo yum install git-all
-
Install
scikit-learn
:conda install scikit-learn=0.18
-
Install conda distribution of xgboost
conda install --yes -c conda-forge xgboost=0.6a2
-
Install rdkit Note:
rdkit
is only used to transform SMILE string into fingerprint.conda install -c omnia rdkit
-
Clone the
Calibrated-Boosting-Forest
github repository:git clone https://github.com/haozhenWu/Calibrated-Boosting-Forest.git
cd
intoCalibrated-Boosting-Forest
directory and executepip install -e .
Testing
To test that the dependencies have been installed correctly, simply enter pytest
in the lightchem directory. This requires the optional pytest
Python package.
The current tests 1.confirm that the required dependencies exist and can be
imported, 2.confirm the model performance results of one target MUV-466 fall into
expected ranges.
FAQ
- When I import lightchem, the following error shows up
version GLIBCXX_3.4.20 not found
:
Try:Sourceconda install libgcc
Reference
- [DeepChem] (https://github.com/deepchem/deepchem): Deep-learning models for Drug Discovery and Quantum Chemistry