TEXT Open Intent Recognition (TEXTOIR)

TEXTOIR is the first high-quality Text Open Intent Recognition platform. This repo contains a convenient toolkit with extensible interfaces, integrating a series of algorithms of two tasks (open intent detection and open intent discovery). We also release the pipeline framework and the visualized platform in the repo TEXTOIR-DEMO.

Introduction

TEXTOIR aims to provide a convenience toolkit for researchers to reproduce the related text open classification and clustering methods. It contains two tasks, which are defined as open intent detection and open intent discovery. Open intent detection aims to identify n-class known intents, and detect one-class open intent. Open intent discovery aims to leverage limited prior knowledge of known intents to find fine-grained known and open intent-wise clusters. Related papers and codes are collected in our previous released reading list.

Open Intent Recognition:

We strongly recommend you to use our TEXTOIR toolkit, which has standard and unified interfaces (especially data setting) to obtain fair and persuable results on benchmark intent datasets!

Benchmark Datasets

Integrated Models

Open Intent Detection

Deep Open Intent Classification with Adaptive Decision Boundary (ADB, AAAI 2021)
Deep Unknown Intent Detection with Margin Loss (DeepUnk, ACL 2019)
DOC: Deep Open Classification of Text Documents (DOC, EMNLP 2017)
A Baseline For Detecting Misclassified and Out-of-distribution Examples in Neural Networks (MSP, ICLR 2017)
Towards Open Set Deep Networks (OpenMax, CVPR 2016)

Open Intent Discovery

Semi-supervised Clustering Methods
- Discovering New Intents with Deep Aligned Clustering (DeepAligned, AAAI 2021)
- Discovering New Intents via Constrained Deep Adaptive Clustering with Cluster Refinement (CDAC+, AAAI 2020)
- Learning to Discover Novel Visual Categories via Deep Transfer Clustering (DTC*, ICCV 2019)
- Multi-class Classification Without Multi-class Labels (MCL*, ICLR 2019)
- Learning to cluster in order to transfer across domains and tasks (KCL*, ICLR 2018)
Unsupervised Clustering Methods
- Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering (DCN, ICML 2017)
- Unsupervised Deep Embedding for Clustering Analysis (DEC, ICML 2016)
- Stacked auto-encoder K-Means (SAE-KM)
- Agglomerative clustering (AG)
- K-Means (KM)

(* denotes the CV model replaced with the BERT backbone)

Quick Start

Use anaconda to create Python (version >= 3.6) environment

conda create --name textoir python=3.6
conda activate textoir

Install PyTorch (Cuda version 11.2)

conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch -c conda-forge

Clone the TEXTOIR repository, and choose the task (Take open intent detection as an example).

git clone [email protected]:HanleiZhang/TEXTOIR.git
cd TEXTOIR
cd open_intent_detection

Install related environmental dependencies

pip install -r requirements.txt

Run examples (Take ADB as an example)

sh examples/run_ADB.sh

Extensibility and Reliability

Extensibility

This toolkit is extensible and supports adding new methods, datasets, configurations, backbones, dataloaders, losses conveniently. More detailed information can be seen in the directory open_intent_detection and open_intent_discovery respectively.

Reliability

The codes in this repo have been confirmed and are reliable. The experimental results are close to the reported ones in our AAAI 2021 papers Discovering New Intents with DeepAligned Clustering and Deep Open Intent Classification with Adaptive Decision Boundary. Note that the results of some methods may fluctuate in a small range due to the selected random seeds, hyper-parameters, optimizers, etc. The final results are the average of 10 random seeds to reduce the influence of different selected known classes.

Acknowledgements

If you are interested in this work, and use the codes in this repo, please star this repository, and cite our ACL 2021 demo paper:

@inproceedings{zhang-etal-2021-textoir,
    title = "{TEXTOIR}: An Integrated and Visualized Platform for Text Open Intent Recognition",
    author = "Zhang, Hanlei  and
      Li, Xiaoteng  and
      Xu, Hua  and
      Zhang, Panpan  and
      Zhao, Kang  and
      Gao, Kai",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations",
    year = "2021",
    pages = "167--174",
}

We also thank Ting-En Lin, Qianrui Zhou, Shaojie Zhao, Xin Wang and Huisheng Mao for their contributions on this repo.

Bugs or questions?

If you have any questions, feel free to open issues and pull request. Please illustrate your problems as detailed as possible. If you want to integrate your method in our repo, please contact us ([email protected]).

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

thuiar / TEXTOIR

Programming Languages

Labels

Projects that are alternatives of or similar to TEXTOIR