All Projects → thuiar → TEXTOIR

thuiar / TEXTOIR

Licence: GPL-3.0 License
TEXTOIR is a flexible toolkit for open intent detection and discovery. (ACL 2021)

Programming Languages

python
139335 projects - #7 most used programming language
c
50402 projects - #5 most used programming language
C++
36643 projects - #6 most used programming language
shell
77523 projects
cython
566 projects

Projects that are alternatives of or similar to TEXTOIR

AnnA Anki neuronal Appendix
Using machine learning on your anki collection to enhance the scheduling via semantic clustering and semantic similarity
Stars: ✭ 39 (+25.81%)
Mutual labels:  clustering, bert
keras-bert-ner
Keras solution of Chinese NER task using BiLSTM-CRF/BiGRU-CRF/IDCNN-CRF model with Pretrained Language Model: supporting BERT/RoBERTa/ALBERT
Stars: ✭ 7 (-77.42%)
Mutual labels:  bert
erc
Emotion recognition in conversation
Stars: ✭ 34 (+9.68%)
Mutual labels:  bert
VideoBERT
Using VideoBERT to tackle video prediction
Stars: ✭ 56 (+80.65%)
Mutual labels:  bert
Filipino-Text-Benchmarks
Open-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (-29.03%)
Mutual labels:  bert
KoBERT-nsmc
Naver movie review sentiment classification with KoBERT
Stars: ✭ 57 (+83.87%)
Mutual labels:  bert
revolver
REVOLVER - Repeated Evolution in Cancer
Stars: ✭ 52 (+67.74%)
Mutual labels:  clustering
ML2017FALL
Machine Learning (EE 5184) in NTU
Stars: ✭ 66 (+112.9%)
Mutual labels:  clustering
scrapyr
a simple & tiny scrapy clustering solution, considered a drop-in replacement for scrapyd
Stars: ✭ 50 (+61.29%)
Mutual labels:  clustering
knowledge-graph-nlp-in-action
从模型训练到部署,实战知识图谱(Knowledge Graph)&自然语言处理(NLP)。涉及 Tensorflow, Bert+Bi-LSTM+CRF,Neo4j等 涵盖 Named Entity Recognition,Text Classify,Information Extraction,Relation Extraction 等任务。
Stars: ✭ 58 (+87.1%)
Mutual labels:  bert
M3C
Monte Carlo Reference-based Consensus Clustering
Stars: ✭ 24 (-22.58%)
Mutual labels:  clustering
dialogue-datasets
collect the open dialog corpus and some useful data processing utils.
Stars: ✭ 24 (-22.58%)
Mutual labels:  dialogue-systems
policy-data-analyzer
Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.
Stars: ✭ 22 (-29.03%)
Mutual labels:  bert
HugsVision
HugsVision is a easy to use huggingface wrapper for state-of-the-art computer vision
Stars: ✭ 154 (+396.77%)
Mutual labels:  bert
MVGL
TCyb 2018: Graph learning for multiview clustering
Stars: ✭ 26 (-16.13%)
Mutual labels:  clustering
syntaxdot
Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.
Stars: ✭ 32 (+3.23%)
Mutual labels:  bert
ADL2019
Applied Deep Learning (2019 Spring) @ NTU
Stars: ✭ 20 (-35.48%)
Mutual labels:  bert
Clustering-in-Python
Clustering methods in Machine Learning includes both theory and python code of each algorithm. Algorithms include K Mean, K Mode, Hierarchical, DB Scan and Gaussian Mixture Model GMM. Interview questions on clustering are also added in the end.
Stars: ✭ 27 (-12.9%)
Mutual labels:  clustering
bert-as-a-service TFX
End-to-end pipeline with TFX to train and deploy a BERT model for sentiment analysis.
Stars: ✭ 32 (+3.23%)
Mutual labels:  bert
text-generation-transformer
text generation based on transformer
Stars: ✭ 36 (+16.13%)
Mutual labels:  bert

TEXT Open Intent Recognition (TEXTOIR)

TEXTOIR is the first high-quality Text Open Intent Recognition platform. This repo contains a convenient toolkit with extensible interfaces, integrating a series of algorithms of two tasks (open intent detection and open intent discovery). We also release the pipeline framework and the visualized platform in the repo TEXTOIR-DEMO.

Introduction

TEXTOIR aims to provide a convenience toolkit for researchers to reproduce the related text open classification and clustering methods. It contains two tasks, which are defined as open intent detection and open intent discovery. Open intent detection aims to identify n-class known intents, and detect one-class open intent. Open intent discovery aims to leverage limited prior knowledge of known intents to find fine-grained known and open intent-wise clusters. Related papers and codes are collected in our previous released reading list.

Open Intent Recognition:
Example

We strongly recommend you to use our TEXTOIR toolkit, which has standard and unified interfaces (especially data setting) to obtain fair and persuable results on benchmark intent datasets!

Benchmark Datasets

Integrated Models

Open Intent Detection

Open Intent Discovery

(* denotes the CV model replaced with the BERT backbone)

Quick Start

  1. Use anaconda to create Python (version >= 3.6) environment
conda create --name textoir python=3.6
conda activate textoir
  1. Install PyTorch (Cuda version 11.2)
conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch -c conda-forge  
  1. Clone the TEXTOIR repository, and choose the task (Take open intent detection as an example).
git clone [email protected]:HanleiZhang/TEXTOIR.git
cd TEXTOIR
cd open_intent_detection
  1. Install related environmental dependencies
pip install -r requirements.txt
  1. Run examples (Take ADB as an example)
sh examples/run_ADB.sh

Extensibility and Reliability

Extensibility

This toolkit is extensible and supports adding new methods, datasets, configurations, backbones, dataloaders, losses conveniently. More detailed information can be seen in the directory open_intent_detection and open_intent_discovery respectively.

Reliability

The codes in this repo have been confirmed and are reliable. The experimental results are close to the reported ones in our AAAI 2021 papers Discovering New Intents with DeepAligned Clustering and Deep Open Intent Classification with Adaptive Decision Boundary. Note that the results of some methods may fluctuate in a small range due to the selected random seeds, hyper-parameters, optimizers, etc. The final results are the average of 10 random seeds to reduce the influence of different selected known classes.

Acknowledgements

If you are interested in this work, and use the codes in this repo, please star this repository, and cite our ACL 2021 demo paper:

@inproceedings{zhang-etal-2021-textoir,
    title = "{TEXTOIR}: An Integrated and Visualized Platform for Text Open Intent Recognition",
    author = "Zhang, Hanlei  and
      Li, Xiaoteng  and
      Xu, Hua  and
      Zhang, Panpan  and
      Zhao, Kang  and
      Gao, Kai",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations",
    year = "2021",
    pages = "167--174",
}

We also thank Ting-En Lin, Qianrui Zhou, Shaojie Zhao, Xin Wang and Huisheng Mao for their contributions on this repo.

Bugs or questions?

If you have any questions, feel free to open issues and pull request. Please illustrate your problems as detailed as possible. If you want to integrate your method in our repo, please contact us ([email protected]).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].