TEXT Open Intent Recognition (TEXTOIR)
TEXTOIR is the first high-quality Text Open Intent Recognition platform. This repo contains a convenient toolkit with extensible interfaces, integrating a series of algorithms of two tasks (open intent detection and open intent discovery). We also release the pipeline framework and the visualized platform in the repo TEXTOIR-DEMO.
Introduction
TEXTOIR aims to provide a convenience toolkit for researchers to reproduce the related text open classification and clustering methods. It contains two tasks, which are defined as open intent detection and open intent discovery. Open intent detection aims to identify n-class known intents, and detect one-class open intent. Open intent discovery aims to leverage limited prior knowledge of known intents to find fine-grained known and open intent-wise clusters. Related papers and codes are collected in our previous released reading list.
We strongly recommend you to use our TEXTOIR toolkit, which has standard and unified interfaces (especially data setting) to obtain fair and persuable results on benchmark intent datasets!
Benchmark Datasets
Integrated Models
Open Intent Detection
- Deep Open Intent Classification with Adaptive Decision Boundary (ADB, AAAI 2021)
- Deep Unknown Intent Detection with Margin Loss (DeepUnk, ACL 2019)
- DOC: Deep Open Classification of Text Documents (DOC, EMNLP 2017)
- A Baseline For Detecting Misclassified and Out-of-distribution Examples in Neural Networks (MSP, ICLR 2017)
- Towards Open Set Deep Networks (OpenMax, CVPR 2016)
Open Intent Discovery
- Semi-supervised Clustering Methods
- Discovering New Intents with Deep Aligned Clustering (DeepAligned, AAAI 2021)
- Discovering New Intents via Constrained Deep Adaptive Clustering with Cluster Refinement (CDAC+, AAAI 2020)
- Learning to Discover Novel Visual Categories via Deep Transfer Clustering (DTC*, ICCV 2019)
- Multi-class Classification Without Multi-class Labels (MCL*, ICLR 2019)
- Learning to cluster in order to transfer across domains and tasks (KCL*, ICLR 2018)
- Unsupervised Clustering Methods
- Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering (DCN, ICML 2017)
- Unsupervised Deep Embedding for Clustering Analysis (DEC, ICML 2016)
- Stacked auto-encoder K-Means (SAE-KM)
- Agglomerative clustering (AG)
- K-Means (KM)
(* denotes the CV model replaced with the BERT backbone)
Quick Start
- Use anaconda to create Python (version >= 3.6) environment
conda create --name textoir python=3.6
conda activate textoir
- Install PyTorch (Cuda version 11.2)
conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch -c conda-forge
- Clone the TEXTOIR repository, and choose the task (Take open intent detection as an example).
git clone [email protected]:HanleiZhang/TEXTOIR.git
cd TEXTOIR
cd open_intent_detection
- Install related environmental dependencies
pip install -r requirements.txt
- Run examples (Take ADB as an example)
sh examples/run_ADB.sh
Extensibility and Reliability
Extensibility
This toolkit is extensible and supports adding new methods, datasets, configurations, backbones, dataloaders, losses conveniently. More detailed information can be seen in the directory open_intent_detection and open_intent_discovery respectively.
Reliability
The codes in this repo have been confirmed and are reliable. The experimental results are close to the reported ones in our AAAI 2021 papers Discovering New Intents with DeepAligned Clustering and Deep Open Intent Classification with Adaptive Decision Boundary. Note that the results of some methods may fluctuate in a small range due to the selected random seeds, hyper-parameters, optimizers, etc. The final results are the average of 10 random seeds to reduce the influence of different selected known classes.
Acknowledgements
If you are interested in this work, and use the codes in this repo, please star this repository, and cite our ACL 2021 demo paper:
@inproceedings{zhang-etal-2021-textoir,
title = "{TEXTOIR}: An Integrated and Visualized Platform for Text Open Intent Recognition",
author = "Zhang, Hanlei and
Li, Xiaoteng and
Xu, Hua and
Zhang, Panpan and
Zhao, Kang and
Gao, Kai",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations",
year = "2021",
pages = "167--174",
}
We also thank Ting-En Lin, Qianrui Zhou, Shaojie Zhao, Xin Wang and Huisheng Mao for their contributions on this repo.
Bugs or questions?
If you have any questions, feel free to open issues and pull request. Please illustrate your problems as detailed as possible. If you want to integrate your method in our repo, please contact us ([email protected]).