causaltext / Causal Text Papers
Curated research at the intersection of causal inference and natural language processing.
Stars: ✭ 72
Projects that are alternatives of or similar to Causal Text Papers
Emnlp2018 nli
Repository for NLI models (EMNLP 2018)
Stars: ✭ 62 (-13.89%)
Mutual labels: natural-language-processing
Convai Bot 1337
NIPS Conversational Intelligence Challenge 2017 Winner System: Skill-based Conversational Agent with Supervised Dialog Manager
Stars: ✭ 65 (-9.72%)
Mutual labels: natural-language-processing
Ai Writer data2doc
PyTorch Implementation of NBA game summary generator.
Stars: ✭ 69 (-4.17%)
Mutual labels: natural-language-processing
Emailparser
remove signature blocks from emails
Stars: ✭ 63 (-12.5%)
Mutual labels: natural-language-processing
Kor2vec
Library for Korean morpheme and word vector representation
Stars: ✭ 64 (-11.11%)
Mutual labels: natural-language-processing
How To Mine Newsfeed Data And Extract Interactive Insights In Python
A practical guide to topic mining and interactive visualizations
Stars: ✭ 61 (-15.28%)
Mutual labels: natural-language-processing
Usaddress
🇺🇸 a python library for parsing unstructured address strings into address components
Stars: ✭ 1,165 (+1518.06%)
Mutual labels: natural-language-processing
Chicksexer
A Python package for gender classification.
Stars: ✭ 64 (-11.11%)
Mutual labels: natural-language-processing
Touchdown
Cornell Touchdown natural language navigation and spatial reasoning dataset.
Stars: ✭ 69 (-4.17%)
Mutual labels: natural-language-processing
Languagetoys
Random fun with statistical language models.
Stars: ✭ 63 (-12.5%)
Mutual labels: natural-language-processing
Multilingual Latent Dirichlet Allocation Lda
A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.
Stars: ✭ 64 (-11.11%)
Mutual labels: natural-language-processing
Repo 2017
Python codes in Machine Learning, NLP, Deep Learning and Reinforcement Learning with Keras and Theano
Stars: ✭ 1,123 (+1459.72%)
Mutual labels: natural-language-processing
Get started with deep learning for text with allennlp
Getting started with AllenNLP and PyTorch by training a tweet classifier
Stars: ✭ 69 (-4.17%)
Mutual labels: natural-language-processing
Slate
A Super-Lightweight Annotation Tool for Experts: Label text in a terminal with just Python
Stars: ✭ 61 (-15.28%)
Mutual labels: natural-language-processing
Text Analytics With Python
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.
Stars: ✭ 1,132 (+1472.22%)
Mutual labels: natural-language-processing
Estnltk
Open source tools for Estonian natural language processing
Stars: ✭ 71 (-1.39%)
Mutual labels: natural-language-processing
Label Embedding Network
Label Embedding Network
Stars: ✭ 69 (-4.17%)
Mutual labels: natural-language-processing
Hackerrank
This is the Repository where you can find all the solution of the Problems which you solve on competitive platforms mainly HackerRank and HackerEarth
Stars: ✭ 68 (-5.56%)
Mutual labels: natural-language-processing
Papers about Causal Inference and Language
A collection of papers and codebases about influence, causality, and language.
Pull requests welcome!
Datasets and Simulations
Type | Description | Code |
---|---|---|
Semi-simulated | Given text (amazon reviews), extracts treatments (0 or 5 stars) and confounds (product type), then samples outcomes (sales) conditioned on the extracted treatments and confounds. | git |
Fully synthetic | Samples outcomes, treatments, and confounds from binomial distributions, then words from a uniform distribution conditioned on those sampled variables. | git |
Learning resources and blog posts
Title | Description | Code |
---|---|---|
Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates Katherine A. Keith, David Jensen, and Brendan O’Connor |
Survey of studies that use text to remove confouding. Also highlights numerous open problems in the space of text and causal inference. | |
Text Feature Selection for Causal Inference Reid Pryzant and Dan Jurafsky |
Blog post about text as treatment (operationalized through lexicons) | git |
Causal Inference with Text Variables
Text as treatment
Title | Description | Code |
---|---|---|
Causal Effects of Linguistic Properties Reid Pryzant, Dallas Card, Dan Jurafsky, Victor Veitch, Dhanya Sridhar |
Develops an adjustment procedure for text-based causal inference with classifier-based treatments. Proves bounds on the bias | git |
Challenges of Using Text Classifiers for Causal Inference Zach Wood-Doughty, Ilya Shpitser, Mark Dredze |
Looks at different errors that can stem from estimating treatment labels with classifiers, proposes adjustments to account for said errors | git |
Deconfounded Lexicon Induction for Interpretable Social Science Reid Pryzant, Kelly Shen, Dan Jurafsky, Stefan Wager |
Looks at effect of text as manifested in lexicons or individual words, proposes algorithms for estimating effects and evaluating lexicons | git |
How to Make Causal Inferences Using Texts Naoki Egami, Christian J. Fong, Justin Grimmer, Margaret E. Roberts, and Brandon M. Stewart |
(Also text as outcome). Covers assumptions needed for text as treatment and concludes that you should use a train/test set. | |
Discovery of treatments from text corpora Christian Fong, Justin Grimmer |
Propose a new experimental design and statistical model to simultaneously discover treatments in a corpora and estimate causal effects for these discovered treatments. | |
The effect of wording on message propagation: Topic and author-controlled natural experiments on twitter Chenhao Tan, Lillian Lee, and Bo Pang |
Controls for confouding by looking at Tweets containing the same url and written by the same user but employing different wording. |
Text as mediator
Title | Description | Code |
---|---|---|
Adapting Text Embeddings for Causal Inference Victor Veitch, Dhanya Sridhar, and David Blei |
(also text as confounder) Adapts BERT embeddings for causal inference by predicting propensity scores and potential outcomes alongside masked language modeling objective. |
tensorflow pytorch |
Text as outcome
Title | Description | Code |
---|---|---|
Estimating Causal Effects of Tone in Online Debates Dhanya Sridhar and Lise Getoor |
(Also text as confounder). Looks at effect of reply tone on the sentiment of subsiquent responses in online debates. | git |
How Judicial Identity Changes the Text of Legal Rulings Michael Gill and Andrew Hall |
Looks at how the random assignment of a female judge or a non-white judge affects the language of legal rulings. | |
Measuring semantic similarity of clinical trial outcomes using deep pre-trained language representations Anna Koroleva, Sanjay Kamath, Patrick Paroubek |
Text as confounder
Title | Description | Code |
---|---|---|
Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates Katherine A. Keith, David Jensen, and Brendan O’Connor |
Survey of studies that use text to remove confouding. Also highlights numerous open problems in the space of text and causal inference. | |
Adjusting for confounding with text matching Margaret E Roberts, Brandon M Stewart, and Richard A Nielsen |
Estimate a low-dimensional summary of the text and condition on this summary via matching to remove confouding. Proposes a method of text matching, topical inverse regression matching, that matches on both on the topical content and propensity score. | |
Matching with text data: An experimental evaluation of methods for matching documents and of measuring match quality Reagan Mozer, Luke Miratrix, Aaron Russell Kaufman, L Jason Anastasopoulos |
Characterizes and empirically evaluates a framework for matching text documents that decomposes existing methods into: the choice of text representation, and the choice of distance metric. | |
Learning representations for counterfactual inference Fredrik Johansson, Uri Shalit, David Sontag |
One of their semi-synthetic experiments has news content as a confounder. |
Causality to Improve NLP
Causal interpretations and explanations
Title | Description | Code |
---|---|---|
CausaLM: Causal Model Explanation Through Counterfactual Language Models Amir Feder, Nadav Oved, Uri Shalit and Roi Reichart |
Suggested a method for generating causal explanations through counterfactual language representations. | git |
Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Yaron Singer and Stuart Shieber |
Uses causal mediation analysis to interpret NLP models. | git |
Sensitivity and Robustness
Title | Description | Code |
---|---|---|
Applications in the Social Sciences
Linguistics
Title | Description | Code |
---|---|---|
Decoupling entrainment from consistency using deep neural networks Andreas Weise, Rivka Levitan |
Isolated the individual style of a speaker when modeling entrainment in speech. | |
Estimating causal effects of exercise from mood logging data Dhanya Sridhar, Aaron Springer, Victoria Hollis, Steve Whittaker, Lise Getoor |
Confouder: Text of mood triggers. Confounding adjustment method: Propensity score matching |
Marketing
Title | Description | Code |
---|---|---|
Predicting Sales from the Language of Product Descriptions Reid Pryzant, Young-Joo Chung, and Dan Jurafsky |
Found features of product descriptions most predictive of sales while controlling for brand & price. | git |
Interpretable Neural Architectures for Attributing an Ad’s Performance to its Writing Style Reid Pryzant, Kazoo Sone, and Sugato Basu |
Found features of ad copy most predictive of high CTR while controlling for advertiser and targeting. | git |
Persuasion & Argumentation
Title | Description | Code |
---|---|---|
Influence via Ethos: On the Persuasive Power of Reputation in Deliberation Online Emaad Manzoor, George H. Chen, Dokyun Lee, Michael D. Smith |
Controls for unstructured argument text using neural models of language in the double machine-learning framework. | |
Mental Health
Title | Description | Code |
---|---|---|
The language of social support in social media and its effect on suicidal ideation risk Munmun De Choudhury and Emre Kiciman |
Confouder: previous text written in a Reddit forum. Confounding adjustment method: stratified propensity scores matching. | |
Discovering shifts to suicidal ideation from mental health content in social media Munmun De Choudhury, Emre Kiciman, Mark Dredze, Glen Coppersmith, Mrinal Kumar |
Confouder: User’s previous posts and comments received. Confounding adjustment method: stratified propensity scores matching |
Psychology
Title | Description | Code |
---|---|---|
Increasing vegetable intake by emphasizing tasty and enjoyable attributes: A randomized controlled multisite intervention for taste-focused labeling Bradley Turnwald, Jaclyn Bertoldo, Margaret Perry, Peggy Policastro, Maureen Timmons, Christopher Bosso, Priscilla Connors, Robert Valgenti, Lindsey Pine, Ghislaine Challamel, Christopher Gardner, Alia Crum |
Did RCT on cafeteria food labels, observing effect on how much of those foods students took. | |
A social media study on the effects of psychiatric medication use Koustuv Saha, Benjamin Sugar, John Torous, Bruno Abrahao, Emre Kıcıman, Munmun De Choudhury |
Confounder: users' previous posts on Twitter. Confounding adjustment method: Stratified propensity score matching. |
Economics
Title | Description | Code |
---|---|---|
A deep causal inference approach to measuring the effects of forming group loans in online non-profit microfinance platform Thai T Pham and Yuanyuan Shen |
Confounder: Microloan descriptions on Kiva. Confounding adjustment method: A-IPTW, TMLE on embeddings. |
Bias and Fairness
Title | Description | Code |
---|---|---|
Unsupervised Discovery of Implicit Gender Bias | Propensity score matching and adversarial learning to get a model to focus on bias instead of other artifacts. | |
Tweetment Effects on the Tweeted: Experimentally Reducing Racist Harassment Kevin Munger |
Did RCT sending de-escalation messages to racist twitter users, changing the "from" user and observing effects on downstream behavior. |
Social Media
Title | Description | Code |
---|---|---|
Estimating the effect of exercising on users online behavior Seyed Amin Mirlohi Falavarjani, Hawre Hosseini, Zeinab Noorian, Ebrahim Bagheri |
Confouder: Pre-treatment topical interest shift. Confounding adjustment method: Matching on topic models. | |
Distilling the outcomes of personal experiences: A propensity-scored analysis of social media Alexandra Olteanu, Onur Varol, Emre Kiciman |
Confouder: Past word use on Twitter. Confoudnig adjustment method: Stratified propensity score matching. | |
Using longitudinal social media analysis to understand the effects of early college alcohol use Emre Kiciman, Scott Counts, Melissa Gasser |
Confounder: Previous posts on Twitter. Confounding adjustment method: Stratified propensity score matching. |
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].