Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → thunlp → Taadpapers

thunlp / Taadpapers

Must-read Papers on Textual Adversarial Attack and Defense

Labels

nlp adversarial-learning

Projects that are alternatives of or similar to Taadpapers

linguistic-style-transfer-pytorch

Implementation of "Disentangled Representation Learning for Non-Parallel Text Style Transfer(ACL 2019)" in Pytorch

Stars: ✭ 55 (-93.12%)

Mutual labels: adversarial-learning

🗣️ NALP is a library that covers Natural Adversarial Language Processing.

Stars: ✭ 17 (-97.87%)

Mutual labels: adversarial-learning

Adversarial Learning for Semi-supervised Semantic Segmentation, BMVC 2018

Stars: ✭ 382 (-52.25%)

Mutual labels: adversarial-learning

cool-papers-in-pytorch

Reimplementing cool papers in PyTorch...

Stars: ✭ 21 (-97.37%)

Mutual labels: adversarial-learning

Attending to Discriminative Certainty for Domain Adaptation

Stars: ✭ 17 (-97.87%)

Mutual labels: adversarial-learning

[IJCAI 2021] Cross-Domain Few-Shot Classification via Adversarial Task Augmentation

Stars: ✭ 21 (-97.37%)

Mutual labels: adversarial-learning

Adversarial-Learning-for-Generative-Conversational-Agents

This repository contains a new adversarial training method for Generative Conversational Agents

Stars: ✭ 71 (-91.12%)

Mutual labels: adversarial-learning

Next RecSys Library

Stars: ✭ 731 (-8.62%)

Mutual labels: adversarial-learning

Improving Document Binarization via Adversarial Noise-Texture Augmentation

Stars: ✭ 34 (-95.75%)

Mutual labels: adversarial-learning

[CVPR 2019 Oral] Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation

Stars: ✭ 366 (-54.25%)

Mutual labels: adversarial-learning

Semantic Pyramid for Image Generation

PyTorch reimplementation of the paper: "Semantic Pyramid for Image Generation" [CVPR 2020].

Stars: ✭ 45 (-94.37%)

Mutual labels: adversarial-learning

Guiding Entity Alignment via Adversarial Knowledge Embedding

Stars: ✭ 15 (-98.12%)

Mutual labels: adversarial-learning

Awesome Domain Adaptation Python Toolbox

Stars: ✭ 46 (-94.25%)

Mutual labels: adversarial-learning

Implementation of 'Self-Adversarial Variational Autoencoder with Gaussian Anomaly Prior Distribution for Anomaly Detection'

Stars: ✭ 17 (-97.87%)

Mutual labels: adversarial-learning

Domain-Adversarial Neural Network in Tensorflow

Stars: ✭ 556 (-30.5%)

Mutual labels: adversarial-learning

Open set domain adaptation

Tensorflow Implementation of open set domain adaptation by backpropagation

Stars: ✭ 27 (-96.62%)

Mutual labels: adversarial-learning

Gym environments modified with adversarial agents

Stars: ✭ 26 (-96.75%)

Mutual labels: adversarial-learning

Neural Structured Learning

Training neural models with structured signals.

Stars: ✭ 790 (-1.25%)

Mutual labels: adversarial-learning

Learning to Adapt Structured Output Space for Semantic Segmentation, CVPR 2018 (spotlight)

Stars: ✭ 654 (-18.25%)

Mutual labels: adversarial-learning

Adversarial Examples Pytorch

Implementation of Papers on Adversarial Examples

Stars: ✭ 293 (-63.37%)

Mutual labels: adversarial-learning

View All Similar Projects ➔

Must-read Papers on Textual Adversarial Attack and Defense (TAAD)

Mainly Contributed and Maintained by Fanchao Qi, Chenghao Yang and Yuan Zang.

Thanks for all great contributors on GitHub!

Contents

0. Toolkits
1. Survey Papers
2. Attack Papers (classified according to perturbation level)
3. Defense Papers
4. Certified Robustness
5. Benchmark and Evaluation
6. Other Papers
Acknowledgements

0. Toolkits

OpenAttack. Guoyang Zeng, Fanchao Qi, Qianrui Zhou, Tingji Zhang, Bairu Hou, Yuan Zang, Zhiyuan Liu, Maosong Sun. [website] [doc] [pdf]
TextAttack. John X. Morris, Eli Liﬂand, Jin Yong Yoo, Yanjun Qi. [website] [doc] [pdf]

1. Survey Papers

Towards a Robust Deep Neural Network in Texts: A Survey. Wenqi Wang, Lina Wang, Benxiao Tang, Run Wang, Aoshuang Ye. arXiv 2020. [pdf]
Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey. Wei Emma Zhang, Quan Z. Sheng, Ahoud Alhazmi, Chenliang Li. ACM TIST 2020. [pdf]
Adversarial Attacks and Defenses in Images, Graphs and Text: A Review. Han Xu, Yao Ma, Haochen Liu, Debayan Deb, Hui Liu, Jiliang Tang, Anil K. Jain. arXiv 2019. [pdf]
Analysis Methods in Neural Language Processing: A Survey. Yonatan Belinkov, James Glass. TACL 2019. [pdf]

2. Attack Papers

Each paper is attached to one or more following labels indicating how much information the attack model knows about the victim model: gradient (=white, all information), score (output decision and scores), decision (only output decision) and blind (nothing)

2.1 Sentence-level Attack

CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation. Tianlu Wang, Xuezhi Wang, Yao Qin, Ben Packer, Kang Lee, Jilin Chen, Alex Beutel, Ed Chi. EMNLP 2020. score [pdf]
Adversarial Attack and Defense of Structured Prediction Models. Wenjuan Han, Liwen Zhang, Yong Jiang, Kewei Tu. EMNLP 2020. blind [pdf] [code]
MALCOM: Generating Malicious Comments to Attack Neural Fake News Detection Models. Thai Le, Suhang Wang, Dongwon Lee. ICDM 2020. gradient [pdf] [code]
Improving the Robustness of Question Answering Systems to Question Paraphrasing. Wee Chung Gan, Hwee Tou Ng. ACL 2019. blind [pdf] [data]
Trick Me If You Can: Human-in-the-Loop Generation of Adversarial Examples for Question Answering. Eric Wallace, Pedro Rodriguez, Shi Feng, Ikuya Yamada, Jordan Boyd-Graber. TACL 2019. score [pdf]
PAWS: Paraphrase Adversaries from Word Scrambling. Yuan Zhang, Jason Baldridge, Luheng He. NAACL-HLT 2019. blind [pdf] [dataset]
Evaluating and Enhancing the Robustness of Dialogue Systems: A Case Study on a Negotiation Agent. Minhao Cheng, Wei Wei, Cho-Jui Hsieh. NAACL-HLT 2019. gradient score [pdf] [code]
Semantically Equivalent Adversarial Rules for Debugging NLP Models. Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin. ACL 2018. decision [pdf] [code]
Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge. Pasquale Minervini, Sebastian Riedel. CoNLL 2018. score [pdf] [code&data]
Robust Machine Comprehension Models via Adversarial Training. Yicheng Wang, Mohit Bansal. NAACL-HLT 2018. decision [pdf] [dataset]
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. Mohit Iyyer, John Wieting, Kevin Gimpel, Luke Zettlemoyer. NAACL-HLT 2018. blind [pdf] [code&data]
Generating Natural Adversarial Examples. Zhengli Zhao, Dheeru Dua, Sameer Singh. ICLR 2018. decision [pdf] [code]
Adversarial Examples for Evaluating Reading Comprehension Systems. Robin Jia, and Percy Liang. EMNLP 2017. score decision blind [pdf] [code]
Adversarial Sets for Regularising Neural Link Predictors. Pasquale Minervini, Thomas Demeester, Tim Rocktäschel, Sebastian Riedel. UAI 2017. score [pdf] [code]

2.2 Word-level Attack

Generating Natural Language Attacks in a Hard Label Black Box Setting。 Rishabh Maheshwary, Saket Maheshwary and Vikram Pudi. AAAI 2021. decision [pdf] [code]
BERT-ATTACK: Adversarial Attack Against BERT Using BERT. Linyang Li, Ruotian Ma, Qipeng Guo, Xiangyang Xue, Xipeng Qiu. EMNLP 2020. score [pdf] [code]
BAE: BERT-based Adversarial Examples for Text Classification. Siddhant Garg, Goutham Ramakrishnan. EMNLP 2020. score [pdf]
Robustness to Modification with Shared Words in Paraphrase Identification. Zhouxing Shi, and Minlie Huang. Findings of ACL: EMNLP 2020. score [pdf]
Word-level Textual Adversarial Attacking as Combinatorial Optimization. Yuan Zang, Fanchao Qi, Chenghao Yang, Zhiyuan Liu, Meng Zhang, Qun Liu, Maosong Sun. ACL 2020. score [pdf] [code]
It's Morphin' Time! Combating Linguistic Discrimination with Inflectional Perturbations. Samson Tan, Shafiq Joty, Min-Yen Kan, Richard Socher. ACL 2020. score [pdf] [code]
On the Robustness of Language Encoders against Grammatical Errors. Fan Yin, Quanyu Long, Tao Meng, Kai-Wei Chang. ACL 2020. score [pdf] [code]
Evaluating and Enhancing the Robustness of Neural Network-based Dependency Parsing Models with Adversarial Examples. Xiaoqing Zheng, Jiehang Zeng, Yi Zhou, Cho-Jui Hsieh, Minhao Cheng, Xuanjing Huang. ACL 2020. gradient score [pdf] [code]
A Reinforced Generation of Adversarial Examples for Neural Machine Translation. Wei Zou, Shujian Huang, Jun Xie, Xinyu Dai, Jiajun Chen. ACL 2020. decision [pdf]
Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. Di Jin, Zhijing Jin, Joey Tianyi Zhou, Peter Szolovits. AAAI 2020. score [pdf] [code]
Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples. Minhao Cheng, Jinfeng Yi, Pin-Yu Chen, Huan Zhang, Cho-Jui Hsieh. AAAI 2020. score [pdf] [code]
Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data. Puyudi Yang, Jianbo Chen, Cho-Jui Hsieh, Jane-LingWang, Michael I. Jordan. JMLR 2020. score [pdf] [code]
On the Robustness of Self-Attentive Models. Yu-Lun Hsieh, Minhao Cheng, Da-Cheng Juan, Wei Wei, Wen-Lian Hsu, and Cho-Jui Hsieh. ACL 2019. score [pdf]
Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency. Shuhuai Ren, Yihe Deng, Kun He, Wanxiang Che. ACL 2019. score [pdf] [code]
Generating Fluent Adversarial Examples for Natural Languages. Huangzhao Zhang, Hao Zhou, Ning Miao, Lei Li. ACL 2019. gradient score [pdf] [code]
Robust Neural Machine Translation with Doubly Adversarial Inputs. Yong Cheng, Lu Jiang, Wolfgang Macherey. ACL 2019. gradient [pdf]
Universal Adversarial Attacks on Text Classifiers. Melika Behjati, Seyed-Mohsen Moosavi-Dezfooli, Mahdieh Soleymani Baghshah, Pascal Frossard. ICASSP 2019. gradient [pdf]
Generating Natural Language Adversarial Examples. Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, Kai-Wei Chang. EMNLP 2018. score [pdf] [code]
Breaking NLI Systems with Sentences that Require Simple Lexical Inferences. Max Glockner, Vered Shwartz, Yoav Goldberg. ACL 2018. blind [pdf] [dataset]
Deep Text Classification Can be Fooled. Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, Wenchang Shi. IJCAI 2018. gradient score [pdf]
Interpretable Adversarial Perturbation in Input Embedding Space for Text. Sato, Motoki, Jun Suzuki, Hiroyuki Shindo, and Yuji Matsumoto. IJCAI 2018. gradient [pdf] [code]
Towards Crafting Text Adversarial Samples. Suranjana Samanta, Sameep Mehta. ECIR 2018. gradient [pdf]
Crafting Adversarial Input Sequences For Recurrent Neural Networks. Nicolas Papernot, Patrick McDaniel, Ananthram Swami, Richard Harang. MILCOM 2016. gradient [pdf]

2.3 Char-level Attack

Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems. Steffen Eger, Gözde Gül ¸Sahin, Andreas Rücklé, Ji-Ung Lee, Claudia Schulz, Mohsen Mesgar, Krishnkant Swarnkar, Edwin Simpson, Iryna Gurevych. NAACL-HLT 2019. blind [pdf] [code&data]
White-to-Black: Efficient Distillation of Black-Box Adversarial Attacks. SYotam Gil, Yoav Chai, Or Gorodissky, Jonathan Berant. NAACL-HLT 2019. blind [pdf] [code]
Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers. Ji Gao, Jack Lanchantin, Mary Lou Soffa, Yanjun Qi. IEEE SPW 2018. score[pdf] [code]
On Adversarial Examples for Character-Level Neural Machine Translation. Javid Ebrahimi, Daniel Lowd, Dejing Dou. COLING 2018. gradient [pdf] [code]
Synthetic and Natural Noise Both Break Neural Machine Translation. Yonatan Belinkov, Yonatan Bisk. ICLR 2018. blind [pdf] [code&data]

2.4 Multi-level Attack

T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack. Boxin Wang, Hengzhi Pei, Boyuan Pan, Qian Chen, Shuohang Wang, Bo Li. EMNLP 2020. gradient [pdf] [code]
Universal Adversarial Triggers for Attacking and Analyzing NLP. Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, Sameer Singh. EMNLP-IJCNLP 2019. gradient [pdf] [code] [website]
TEXTBUGGER: Generating Adversarial Text Against Real-world Applications. Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, Ting Wang. NDSS 2019. gradient score [pdf]
Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model. Prashanth Vijayaraghavan, Deb Roy. ECMLPKDD 2019. score [pdf]
HotFlip: White-Box Adversarial Examples for Text Classification. Javid Ebrahimi, Anyi Rao, Daniel Lowd, Dejing Dou. ACL 2018. gradient [pdf] [code]
Adversarial Over-Sensitivity and Over-Stability Strategies for Dialogue Models. Tong Niu, Mohit Bansal. CoNLL 2018. blind [pdf] [code&data]
Comparing Attention-based Convolutional and Recurrent Neural Networks: Success and Limitations in Machine Reading Comprehension. Matthias Blohm, Glorianna Jagfeld, Ekta Sood, Xiang Yu, Ngoc Thang Vu. CoNLL 2018. gradient [pdf] [code]

3. Defense Papers

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective. Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu. ICLR 2021. [pdf] [code]
Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding. Samson Tan, Shafiq Joty, Lav R. Varshney, Min-Yen Kan. EMNLP 2020. [pdf] [code]
Robust Encodings: A Framework for Combating Adversarial Typos. Erik Jones, Robin Jia, Aditi Raghunathan, Percy Liang. ACL 2020. [pdf] [code]
Joint Character-level Word Embedding and Adversarial Stability Training to Defend Adversarial Text. Hui Liu, Yongzheng Zhang, Yipeng Wang, Zheng Lin, Yige Chen. AAAI 2020. [pdf]
A Robust Adversarial Training Approach to Machine Reading Comprehension. Kai Liu, Xin Liu, An Yang, Jing Liu, Jinsong Su, Sujian Li, Qiaoqiao She. AAAI 2020. [pdf]
Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification. Yichao Zhou, Jyun-Yu Jiang, Kai-Wei Chang, Wei Wang. EMNLP-IJCNLP 2019. [pdf] [code]
Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack. Emily Dinan, Samuel Humeau, Bharath Chintagunta, Jason Weston. EMNLP-IJCNLP 2019. [pdf] [data]
Combating Adversarial Misspellings with Robust Word Recognition. Danish Pruthi, Bhuwan Dhingra, Zachary C. Lipton. ACL 2019. [pdf] [code]
Robust-to-Noise Models in Natural Language Processing Tasks. Valentin Malykh. ACL 2019. [pdf] [code]

4. Certified Robustness

SAFER: A Structure-free Approach for Certified Robustness to Adversarial Word Substitutions. Mao Ye, Chengyue Gong, Qiang Liu. ACL 2020. [pdf] [code]
Robustness Verification for Transformers. Zhouxing Shi, Huan Zhang, Kai-Wei Chang, Minlie Huang, Cho-Jui Hsieh. ICLR 2020. [pdf] [code]
Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation. Po-Sen Huang, Robert Stanforth, Johannes Welbl, Chris Dyer, Dani Yogatama, Sven Gowal, Krishnamurthy Dvijotham, Pushmeet Kohli. EMNLP-IJCNLP 2019. [pdf]
Certified Robustness to Adversarial Word Substitutions. Robin Jia, Aditi Raghunathan, Kerem Göksel, Percy Liang. EMNLP-IJCNLP 2019. [pdf] [code]
POPQORN: Quantifying Robustness of Recurrent Neural Networks. Ching-Yun Ko, Zhaoyang Lyu, Lily Weng, Luca Daniel, Ngai Wong, Dahua Lin. ICML 2019. [pdf] [code]

5. Benchmark and Evaluation

From Hero to Zéroe: A Benchmark of Low-Level Adversarial Attacks. Steffen Eger, Yannik Benz. AACL-IJCNLP 2020. [pdf] [code & data]
Adversarial NLI: A New Benchmark for Natural Language Understanding. Yixin Nie, Adina Williams, Emily Dinan, Mohit Bansal, Jason Weston, Douwe Kiela. ACL 2020. [pdf] [demo] [dataset & leaderboard]
Evaluating NLP Models via Contrast Sets. Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang, Ben Zhou. Findings of ACL: EMNLP 2020. [pdf] [website]
On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models. Paul Michel, Xian Li, Graham Neubig, Juan Miguel Pino. NAACL-HLT 2019. [pdf] [code]

6. Other Papers

LexicalAT: Lexical-Based Adversarial Reinforcement Training for Robust Sentiment Classification. Jingjing Xu, Liang Zhao, Hanqi Yan, Qi Zeng, Yun Liang, Xu Sun. EMNLP-IJCNLP 2019. [pdf] [code]
Unified Visual-Semantic Embeddings: Bridging Vision and Language with Structured Meaning Representations. Hao Wu, Jiayuan Mao, Yufeng Zhang, Yuning Jiang, Lei Li, Weiwei Sun, Wei-Ying Ma. CVPR 2019. [pdf]
AdvEntuRe: Adversarial Training for Textual Entailment with Knowledge-Guided Examples. Dongyeop Kang, Tushar Khot, Ashish Sabharwal, Eduard Hovy. ACL 2018. [pdf] [code]
Learning Visually-Grounded Semantics from Contrastive Adversarial Samples. Haoyue Shi, Jiayuan Mao, Tete Xiao, Yuning Jiang, Jian Sun. COLING 2018. [pdf] [code]

Acknowledgements

Great thanks to other contributors Di Jin, Boxin Wang, Jingkang Wang, Chenglei Si, Thai Le, Rishabh Maheshwary, Jiayuan Mao! (names are not listed in particular order)

Please contact us if we miss your names in this list, we will add you back ASAP!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 800

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗