All Projects → amaiya → causalnlp

amaiya / causalnlp

Licence: Apache-2.0 license
CausalNLP is a practical toolkit for causal inference with text as treatment, outcome, or "controlled-for" variable.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to causalnlp

cibookex-r
Causal Inference: What If. R and Stata code for Exercises
Stars: ✭ 54 (-44.9%)
Mutual labels:  causal-inference
CIKM18-LCVA
Code for CIKM'18 paper, Linked Causal Variational Autoencoder for Inferring Paired Spillover Effects.
Stars: ✭ 13 (-86.73%)
Mutual labels:  causal-inference
Causalml
Uplift modeling and causal inference with machine learning algorithms
Stars: ✭ 2,499 (+2450%)
Mutual labels:  causal-inference
Causal Reading Group
We will keep updating the paper list about machine learning + causal theory. We also internally discuss related papers between NExT++ (NUS) and LDS (USTC) by week.
Stars: ✭ 339 (+245.92%)
Mutual labels:  causal-inference
causal-learn
Causal Discovery for Python. Translation and extension of the Tetrad Java code.
Stars: ✭ 428 (+336.73%)
Mutual labels:  causal-inference
cobalt
Covariate Balance Tables and Plots - An R package for assessing covariate balance
Stars: ✭ 52 (-46.94%)
Mutual labels:  causal-inference
drtmle
Nonparametric estimators of the average treatment effect with doubly-robust confidence intervals and hypothesis tests
Stars: ✭ 14 (-85.71%)
Mutual labels:  causal-inference
tlverse-handbook
🎯 📕 Targeted Learning in R: A Causal Data Science Handbook
Stars: ✭ 50 (-48.98%)
Mutual labels:  causal-inference
pcalg-py
Implement PC algorithm in Python | PC 算法的 Python 实现
Stars: ✭ 52 (-46.94%)
Mutual labels:  causal-inference
Pgmpy
Python Library for learning (Structure and Parameter) and inference (Probabilistic and Causal) in Bayesian Networks.
Stars: ✭ 1,942 (+1881.63%)
Mutual labels:  causal-inference
doubleml-for-py
DoubleML - Double Machine Learning in Python
Stars: ✭ 129 (+31.63%)
Mutual labels:  causal-inference
doubleml-for-r
DoubleML - Double Machine Learning in R
Stars: ✭ 58 (-40.82%)
Mutual labels:  causal-inference
causaldag
Python package for the creation, manipulation, and learning of Causal DAGs
Stars: ✭ 82 (-16.33%)
Mutual labels:  causal-inference
Awesome-Neural-Logic
Awesome Neural Logic and Causality: MLN, NLRL, NLM, etc. 因果推断,神经逻辑,强人工智能逻辑推理前沿领域。
Stars: ✭ 106 (+8.16%)
Mutual labels:  causal-inference
Coz
Coz: Causal Profiling
Stars: ✭ 2,719 (+2674.49%)
Mutual labels:  causal-inference
evalsp20.classes.andrewheiss.com
🎓 GSU MPA/MPP course on program evaluation and causal inference
Stars: ✭ 22 (-77.55%)
Mutual labels:  causal-inference
Python-for-Epidemiologists
Tutorial in Python targeted at Epidemiologists. Will discuss the basics of analysis in Python 3
Stars: ✭ 107 (+9.18%)
Mutual labels:  causal-inference
cfvqa
[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias
Stars: ✭ 96 (-2.04%)
Mutual labels:  causal-inference
cfml tools
My collection of causal inference algorithms built on top of accessible, simple, out-of-the-box ML methods, aimed at being explainable and useful in the business context
Stars: ✭ 24 (-75.51%)
Mutual labels:  causal-inference
Dowhy
DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
Stars: ✭ 3,480 (+3451.02%)
Mutual labels:  causal-inference

Welcome to CausalNLP

What is CausalNLP?

CausalNLP is a practical toolkit for causal inference with text as treatment, outcome, or "controlled-for" variable.

Features

Install

  1. pip install -U pip
  2. pip install causalnlp

NOTE: On Python 3.6.x, if you get a RuntimeError: Python version >= 3.7 required, try ensuring NumPy is installed before CausalNLP (e.g., pip install numpy==1.18.5).

Usage

To try out the examples yourself:

Open In Colab

Example: What is the causal impact of a positive review on a product click?

import pandas as pd
df = pd.read_csv('sample_data/music_seed50.tsv', sep='\t', error_bad_lines=False)

The file music_seed50.tsv is a semi-simulated dataset from here. Columns of relevance include:

  • Y_sim: outcome, where 1 means product was clicked and 0 means not.
  • text: raw text of review
  • rating: rating associated with review (1 through 5)
  • T_true: 0 means rating less than 3, 1 means rating of 5, where T_true affects the outcome Y_sim.
  • T_ac: an approximation of true review sentiment (T_true) created with Autocoder from raw review text
  • C_true:confounding categorical variable (1=audio CD, 0=other)

We'll pretend the true sentiment (i.e., review rating and T_true) is hidden and only use T_ac as the treatment variable.

Using the text_col parameter, we include the raw review text as another "controlled-for" variable.

from causalnlp import CausalInferenceModel
from lightgbm import LGBMClassifier
cm = CausalInferenceModel(df, 
                         metalearner_type='t-learner', learner=LGBMClassifier(num_leaves=500),
                         treatment_col='T_ac', outcome_col='Y_sim', text_col='text',
                         include_cols=['C_true'])
cm.fit()
outcome column (categorical): Y_sim
treatment column: T_ac
numerical/categorical covariates: ['C_true']
text covariate: text
preprocess time:  1.1179866790771484  sec
start fitting causal inference model
time to fit causal inference model:  10.361494302749634  sec

Estimating Treatment Effects

CausalNLP supports estimation of heterogeneous treatment effects (i.e., how causal impacts vary across observations, which could be documents, emails, posts, individuals, or organizations).

We will first calculate the overall average treatment effect (or ATE), which shows that a positive review increases the probability of a click by 13 percentage points in this dataset.

Average Treatment Effect (or ATE):

print( cm.estimate_ate() )
{'ate': 0.1309311542209525}

Conditional Average Treatment Effect (or CATE): reviews that mention the word "toddler":

print( cm.estimate_ate(df['text'].str.contains('toddler')) )
{'ate': 0.15559234254638685}

Individualized Treatment Effects (or ITE):

test_df = pd.DataFrame({'T_ac' : [1], 'C_true' : [1], 
                        'text' : ['I never bought this album, but I love his music and will soon!']})
effect = cm.predict(test_df)
print(effect)
[[0.80538201]]

Model Interpretability:

print( cm.interpret(plot=False)[1][:10] )
v_music    0.079042
v_cd       0.066838
v_album    0.055168
v_like     0.040784
v_love     0.040635
C_true     0.039949
v_just     0.035671
v_song     0.035362
v_great    0.029918
v_heard    0.028373
dtype: float64

Features with the v_ prefix are word features. C_true is the categorical variable indicating whether or not the product is a CD.

Text is Optional in CausalNLP

Despite the "NLP" in CausalNLP, the library can be used for causal inference on data without text (e.g., only numerical and categorical variables). See the examples for more info.

Documentation

API documentation and additional usage examples are available at: https://amaiya.github.io/causalnlp/

How to Cite

Please cite the following paper when using CausalNLP in your work:

@article{maiya2021causalnlp,
    title={CausalNLP: A Practical Toolkit for Causal Inference with Text},
    author={Arun S. Maiya},
    year={2021},
    eprint={2106.08043},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    journal={arXiv preprint arXiv:2106.08043},
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].