All Projects → gdmarmerola → cfml_tools

gdmarmerola / cfml_tools

Licence: Apache-2.0 license
My collection of causal inference algorithms built on top of accessible, simple, out-of-the-box ML methods, aimed at being explainable and useful in the business context

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to cfml tools

causeinfer
Machine learning based causal inference/uplift in Python
Stars: ✭ 45 (+87.5%)
Mutual labels:  causality, causal-inference, treatment-effects, uplift-modeling
Dowhy
DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
Stars: ✭ 3,480 (+14400%)
Mutual labels:  causality, causal-inference, treatment-effects
cfvqa
[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias
Stars: ✭ 96 (+300%)
Mutual labels:  causality, causal-inference
Causal Reading Group
We will keep updating the paper list about machine learning + causal theory. We also internally discuss related papers between NExT++ (NUS) and LDS (USTC) by week.
Stars: ✭ 339 (+1312.5%)
Mutual labels:  causality, causal-inference
causaldag
Python package for the creation, manipulation, and learning of Causal DAGs
Stars: ✭ 82 (+241.67%)
Mutual labels:  causality, causal-inference
causal-learn
Causal Discovery for Python. Translation and extension of the Tetrad Java code.
Stars: ✭ 428 (+1683.33%)
Mutual labels:  causality, causal-inference
Awesome-Neural-Logic
Awesome Neural Logic and Causality: MLN, NLRL, NLM, etc. 因果推断,神经逻辑,强人工智能逻辑推理前沿领域。
Stars: ✭ 106 (+341.67%)
Mutual labels:  causality, causal-inference
causal-ml
Must-read papers and resources related to causal inference and machine (deep) learning
Stars: ✭ 387 (+1512.5%)
Mutual labels:  causal-inference, treatment-effects
RECCON
This repository contains the dataset and the PyTorch implementations of the models from the paper Recognizing Emotion Cause in Conversations.
Stars: ✭ 126 (+425%)
Mutual labels:  causality, causal-inference
Causalml
Uplift modeling and causal inference with machine learning algorithms
Stars: ✭ 2,499 (+10312.5%)
Mutual labels:  causal-inference, uplift-modeling
CIKM18-LCVA
Code for CIKM'18 paper, Linked Causal Variational Autoencoder for Inferring Paired Spillover Effects.
Stars: ✭ 13 (-45.83%)
Mutual labels:  causal-inference, treatment-effects
ACE
Code for our paper, Neural Network Attributions: A Causal Perspective (ICML 2019).
Stars: ✭ 47 (+95.83%)
Mutual labels:  causality
CREST
A Causal Relation Schema for Text
Stars: ✭ 19 (-20.83%)
Mutual labels:  causality
Coz
Coz: Causal Profiling
Stars: ✭ 2,719 (+11229.17%)
Mutual labels:  causal-inference
cobalt
Covariate Balance Tables and Plots - An R package for assessing covariate balance
Stars: ✭ 52 (+116.67%)
Mutual labels:  causal-inference
Causal-Deconvolution-of-Networks
Causal Deconvolution of Networks by Algorithmic Generative Models
Stars: ✭ 25 (+4.17%)
Mutual labels:  causality
ENCO
Official repository of the paper "Efficient Neural Causal Discovery without Acyclicity Constraints"
Stars: ✭ 52 (+116.67%)
Mutual labels:  causality
Pgmpy
Python Library for learning (Structure and Parameter) and inference (Probabilistic and Causal) in Bayesian Networks.
Stars: ✭ 1,942 (+7991.67%)
Mutual labels:  causal-inference
invariant-risk-minimization
Implementation of Invariant Risk Minimization https://arxiv.org/abs/1907.02893
Stars: ✭ 76 (+216.67%)
Mutual labels:  causality
Generalization-Causality
关于domain generalization,domain adaptation,causality,robutness,prompt,optimization,generative model各式各样研究的阅读笔记
Stars: ✭ 482 (+1908.33%)
Mutual labels:  causality

cfml_tools: Counterfactual Machine Learning Tools

For a long time, ML practitioners and statisticians repeated the same mantra: Correlation is not causation. This warning prevented (at least, some) people from drawing wrong conclusions from models. However, it also created a misconception that models cannot be causal. With a few tweaks drawn from the causal inference literature (DeepIV, Generalized Random Forests, Causal Trees) and Reinforcement Learning literature (Bandits, Thompson Sampling) we actually can make machine learning methods aware of causality!

cfml_tools is my collection of causal inference algorithms built on top of accessible, simple, out-of-the-box ML methods, aimed at being explainable and useful in the business context.

Installation

Open up your terminal and perform:

git clone https://github.com/gdmarmerola/cfml_tools.git
cd cfml_tools
python setup.py install

Basic Usage

Use the example dataset to test the package:

from cfml_tools.utils import make_confounded_data

# compliments to nubank fklearn library
df_rnd, df_obs, df_cf = make_confounded_data(500000)

# organizing data into X, W and y
X = df_obs[['sex','age','severity']]
W = df_obs['medication'].astype(int)
y = df_obs['recovery']

Our end result is to get counterfactual predictions. This is obtained by using the .predict() method, which returns a dataframe with the expected outcome for each treatment, when it is possible to calculate it. The package uses a scikit-learn like API, and is fairly easy to use.

# importing cfml-tools
from cfml_tools.tree import DecisionTreeCounterfactual

# instance of DecisionTreeCounterfactual
dtcf = DecisionTreeCounterfactual(save_explanatory=True)

# fitting model to our data
dtcf.fit(X, W, y)

# predicting counterfactuals
counterfactuals = dtcf.predict(X)
counterfactuals.iloc[5:10]

DecisionTreeCounterfactual, in particular, builds a decision tree to solve a regression or classification problem from explanatory variables X to target y, and then compares outcomes for every treatment W at each leaf node to build counterfactual predictions. It yields great results on fklearn's causal inference problem out-of-the-box.

and it is very fast (500k records dataset):

Additional features

Run cross validation to get X to y prediction results

Check how the underlying model predicts the outcome, regardless of treatment (this should be as high as possible):

cv_scores = dtcf.get_cross_val_scores(X, y)
print(cv_scores)
[0.55007156 0.54505553 0.54595812 0.55107778 0.5513648 ]

Explain counterfactual predictions using leaves

The .explain() method provides explanations by using the elements on the leaf node used to perform the counterfactual prediction.

# sample we wish to explain
test_sample = X.iloc[[5000]]

# running explanation
comparables_table = dtcf.explain(test_sample)
comparables_table.groupby('W').head(2)
index sex age severity W y
5000 1 36 0.23 0 52
5719 1 35 0.25 0 38
23189 1 37 0.22 1 13
35839 1 35 0.25 1 11

This way we can compare samples and check if we can rely on the effect being calculated. In this particular case, it seems that we can rely on the prediction, as we have very similar individuals on treated and untreated groups:

fig, ax = plt.subplots(1, 4, figsize=(16, 5), dpi=150)
comparables_table.boxplot('age','W', ax=ax[0])
comparables_table.boxplot('sex','W', ax=ax[1])
comparables_table.boxplot('severity','W', ax=ax[2])
comparables_table.boxplot('y','W', ax=ax[3])

[Experimental] Further criticize the model using leaf diagnostics

We can inspect the model further by using the .run_leaf_diagnostics() method.

# running leaf diagnostics
leaf_diagnostics_df = dtcf.run_leaf_diagnostics()
leaf_diagnostics_df.head()

The method provides a diagnostic on leaves valid for counterfactual inference, showing some interesting quantities:

  • average outcomes across treatments (avg_outcome)
  • explanatory variable distribution across treatments (percentile_* variables)
  • a confounding score for each variable, meaning how much we can predict the treatment from explanatory variables inside leaf nodes using a linear model (confounding_score)

Particularly, confounding_score tells us if treatments are not randomly assigned given explanatory variables, and it is a big source of bias in causal inference models. As this score gets bigger, we tend to miss the real effect more:

[Experimental] Better visualize and understand your problem with forest embeddings

Besides DecisionTreeCounterfactual, we provide ForestEmbeddingsCounterfactual, which still is at an experimental phase. A cool thing to do with this model is plot the forest embeddings of your problem. The method uses leaf co-occurence as similarity metric and UMAP for dimensionality reduction.

# getting embedding from data
reduced_embed = fecf.get_umap_embedding(X)

This allows for some cool visuals for diagnosing your problem, such as the distribution of features across the embedding:

or how treatments and outcomes are distributed to check "where" inference is valid:

Additional resouces

You can check several additional resources:

I hope you'll use cfml_tools for your causal inference problems soon! All feedback is appreciated :)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].