All Projects → bsharchilev → influence_boosting

bsharchilev / influence_boosting

Licence: other
Supporting code for the paper "Finding Influential Training Samples for Gradient Boosted Decision Trees"

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
shell
77523 projects

Projects that are alternatives of or similar to influence boosting

Bestofml
The best resources around Machine Learning
Stars: ✭ 349 (+512.28%)
Mutual labels:  paper, machine-learning-algorithms
Awesome Decision Tree Papers
A collection of research papers on decision, classification and regression trees with implementations.
Stars: ✭ 1,908 (+3247.37%)
Mutual labels:  gradient-boosting, catboost
cheapml
Machine Learning algorithms coded from scratch
Stars: ✭ 17 (-70.18%)
Mutual labels:  machine-learning-algorithms, gradient-boosting
Machine Learning Workflow With Python
This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation
Stars: ✭ 157 (+175.44%)
Mutual labels:  machine-learning-algorithms, gradient-boosting
Papers Literature Ml Dl Rl Ai
Highly cited and useful papers related to machine learning, deep learning, AI, game theory, reinforcement learning
Stars: ✭ 1,341 (+2252.63%)
Mutual labels:  paper, machine-learning-algorithms
stackgbm
🌳 Stacked Gradient Boosting Machines
Stars: ✭ 24 (-57.89%)
Mutual labels:  gradient-boosting, catboost
orderbook modeling
Example of order book modeling.
Stars: ✭ 38 (-33.33%)
Mutual labels:  gradient-boosting, catboost
claire
Continuously Learning Artificial Intelligence Rules Engine (Claire) for Smart Homes
Stars: ✭ 18 (-68.42%)
Mutual labels:  machine-learning-algorithms
LayeredSceneDecomposition
No description or website provided.
Stars: ✭ 22 (-61.4%)
Mutual labels:  paper
audioContextEncoder
A context encoder for audio inpainting
Stars: ✭ 18 (-68.42%)
Mutual labels:  paper
NeuroEvolution-Flappy-Bird
A comparison between humans, neuroevolution and multilayer perceptrons playing Flapy Bird implemented in Python
Stars: ✭ 17 (-70.18%)
Mutual labels:  machine-learning-algorithms
Cross-View-Gait-Based-Human-Identification-with-Deep-CNNs
Code for 2016 TPAMI(IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE) A Comprehensive Study on Cross-View Gait Based Human Identification with Deep CNNs
Stars: ✭ 21 (-63.16%)
Mutual labels:  paper
paper-survey
Summary of machine learning papers
Stars: ✭ 26 (-54.39%)
Mutual labels:  paper
arboreto
A scalable python-based framework for gene regulatory network inference using tree-based ensemble regressors.
Stars: ✭ 33 (-42.11%)
Mutual labels:  gradient-boosting
ML-ProjectKart
🙌Kart of 210+ projects based on machine learning, deep learning, computer vision, natural language processing and all. Show your support by ✨ this repository.
Stars: ✭ 162 (+184.21%)
Mutual labels:  machine-learning-algorithms
LMMS
Language Modelling Makes Sense - WSD (and more) with Contextual Embeddings
Stars: ✭ 79 (+38.6%)
Mutual labels:  paper
Curriculum-Learning-PaperList-Materials
Curriculum Learning related papers and materials
Stars: ✭ 50 (-12.28%)
Mutual labels:  paper
groove2groove
Code for "Groove2Groove: One-Shot Music Style Transfer with Supervision from Synthetic Data"
Stars: ✭ 88 (+54.39%)
Mutual labels:  paper
zoofs
zoofs is a python library for performing feature selection using a variety of nature-inspired wrapper algorithms. The algorithms range from swarm-intelligence to physics-based to Evolutionary. It's easy to use , flexible and powerful tool to reduce your feature size.
Stars: ✭ 142 (+149.12%)
Mutual labels:  machine-learning-algorithms
AdversarialAudioSeparation
Code accompanying the paper "Semi-supervised adversarial audio source separation applied to singing voice extraction"
Stars: ✭ 70 (+22.81%)
Mutual labels:  paper

Finding Influential Training Samples for Gradient Boosted Decision Trees

This repository implements the LeafRefit and LeafInfluence methods described in the paper Finding Influential Training Samples for Gradient Boosted Decision Trees.

The paper deals with the problem of finding infuential training samples using the Infuence Functions framework from classical statistics recently revisited in the paper "Understanding Black-box Predictions via Influence Functions" (code). The classical approach, however, is only applicable to smooth parametric models. In our paper, we introduce LeafRefit and LeafInfuence, methods for extending the Infuence Functions framework to non-parametric Gradient Boosted Decision Trees ensembles.

Requirements

We recommend using the Anaconda Python distribution for easy installation.

Python packages

The following Python 2.7 packages are required:

Note: versions of the packages specified below are the versions with which the experiments reported in the paper were tested.

  • numpy==1.14.0
  • scipy==0.19.1
  • pandas==0.20.3
  • scikit-learn==0.19.0
  • matplotlib==2.0.2
  • tensorflow==1.6.0rc0
  • tqdm==4.19.5
  • ipywidgets>=7.0.0 (for Jupyter Notebook rendering)

The create_influence_boosting_env.sh script creates the influence_boosting Conda environment with the required packages installed. You can run the script by running the following in the influence_boosting directory:

bash create_influence_boosting_env.sh

CatBoost

The code in this repository uses CatBoost for an implementation of GBDT. We tested our package with CatBoost version 0.6 built from GitHub. Installation instructions are available in the documentation.

Note: if you are using the influence_boosting environment described above, make sure to install CatBoost specifically for this environment.

export_catboost

Since CatBoost is written in C++, in order to use CatBoost models with our Python package, we also include export_catboost, a binary that exports a saved CatBoost model to a human-readable JSON.

This repository assumes that a program named export_catboost is available in the shell. To ensure that, you can do the following:

  • Select one of the two binaries, export_catboost_macosx or export_catboost_linux, depending on your OS.
  • Copy it to export_catboost in the root repository directory.
  • Add the path to the root repository directory to the PATH environment variable.

Note: since CatBoost's treatment of categorical features can be fairly complicated, export_catboost currently supports numerical features only.

Example

An example experiment showing the API and a use-case of Influence Functions can be found in the influence_for_error_fixing.ipynb notebook.

Note: in this notebook, CatBoost parameters are loaded from the catboost_params.json file. In particular, the task_type parameter is set to CPU by default. If you have a GPU with CUDA available on your machine and compiled CatBoost with GPU support, you can change this parameter to GPU in order to train CatBoost faster on GPU. The majority of the experiments in the paper were conducted using the GPU mode.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].