All Projects → osainz59 → Ask2Transformers

osainz59 / Ask2Transformers

Licence: Apache-2.0 license
A Framework for Textual Entailment based Zero Shot text classification

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Ask2Transformers

Product-Categorization-NLP
Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).
Stars: ✭ 30 (-70.59%)
Mutual labels:  text-classification, transformers, topic-modeling
text2class
Multi-class text categorization using state-of-the-art pre-trained contextualized language models, e.g. BERT
Stars: ✭ 15 (-85.29%)
Mutual labels:  text-classification, transformers
Text and Audio classification with Bert
Text Classification in Turkish Texts with Bert
Stars: ✭ 34 (-66.67%)
Mutual labels:  text-classification, transformers
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+2368.63%)
Mutual labels:  text-classification, transformers
MetaLifelongLanguage
Repository containing code for the paper "Meta-Learning with Sparse Experience Replay for Lifelong Language Learning".
Stars: ✭ 21 (-79.41%)
Mutual labels:  text-classification, relation-extraction
TorchBlocks
A PyTorch-based toolkit for natural language processing
Stars: ✭ 85 (-16.67%)
Mutual labels:  text-classification, transformers
kwx
BERT, LDA, and TFIDF based keyword extraction in Python
Stars: ✭ 33 (-67.65%)
Mutual labels:  text-classification, topic-modeling
Pytorch-NLU
Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (+48.04%)
Mutual labels:  text-classification, transformers
Lightnlp
基于Pytorch和torchtext的自然语言处理深度学习框架。
Stars: ✭ 739 (+624.51%)
Mutual labels:  text-classification, relation-extraction
Sarcasm Detection
Detecting Sarcasm on Twitter using both traditonal machine learning and deep learning techniques.
Stars: ✭ 73 (-28.43%)
Mutual labels:  text-classification, topic-modeling
Macadam
Macadam是一个以Tensorflow(Keras)和bert4keras为基础,专注于文本分类、序列标注和关系抽取的自然语言处理工具包。支持RANDOM、WORD2VEC、FASTTEXT、BERT、ALBERT、ROBERTA、NEZHA、XLNET、ELECTRA、GPT-2等EMBEDDING嵌入; 支持FineTune、FastText、TextCNN、CharCNN、BiRNN、RCNN、DCNN、CRNN、DeepMoji、SelfAttention、HAN、Capsule等文本分类算法; 支持CRF、Bi-LSTM-CRF、CNN-LSTM、DGCNN、Bi-LSTM-LAN、Lattice-LSTM-Batch、MRC等序列标注算法。
Stars: ✭ 149 (+46.08%)
Mutual labels:  text-classification, relation-extraction
Marktool
这是一款基于web的通用文本标注工具,支持大规模实体标注、关系标注、事件标注、文本分类、基于字典匹配和正则匹配的自动标注以及用于实现归一化的标准名标注,同时也支持文本的迭代标注和实体的嵌套标注。标注规范可自定义且同类型任务中可“一次创建多次复用”。通过分级实体集合扩大了实体类型的规模,并设计了全新高效的标注方式,提升了用户体验和标注效率。此外,本工具增加了审核环节,可对多人的标注结果进行一致性检验和调整,提高了标注语料的准确率和可靠性。
Stars: ✭ 190 (+86.27%)
Mutual labels:  text-classification, relation-extraction
small-text
Active Learning for Text Classification in Python
Stars: ✭ 241 (+136.27%)
Mutual labels:  text-classification, transformers
NSP-BERT
The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"
Stars: ✭ 166 (+62.75%)
Mutual labels:  text-classification, zero-shot
10kGNAD
Ten Thousand German News Articles Dataset for Topic Classification
Stars: ✭ 63 (-38.24%)
Mutual labels:  text-classification, topic-classification
policy-data-analyzer
Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.
Stars: ✭ 22 (-78.43%)
Mutual labels:  text-classification, transformers
Relation-Classification
Relation Classification - SEMEVAL 2010 task 8 dataset
Stars: ✭ 46 (-54.9%)
Mutual labels:  text-classification, relation-extraction
COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (-76.47%)
Mutual labels:  text-classification, transformers
Text mining resources
Resources for learning about Text Mining and Natural Language Processing
Stars: ✭ 358 (+250.98%)
Mutual labels:  text-classification, topic-modeling
Simpletransformers
Transformers for Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI
Stars: ✭ 2,881 (+2724.51%)
Mutual labels:  text-classification, transformers

Ask2Transformers

A Framework for Textual Entailment based Zero Shot text classification

Contributor Covenant

This repository contains the code for out of the box ready to use zero-shot classifiers among different tasks, such as Topic Labelling or Relation Extraction. It is built on top of 🤗 HuggingFace Transformers library, so you are free to choose among hundreds of models. You can either, use a dataset specific classifier or define one yourself with just labels descriptions or templates! The repository contains the code for the following publications:

To get started with the repository consider reading the new documentation!

Demo 🕹️

We have realeased a demo on Zero-Shot Information Extraction using Textual Entailment (ZS4IE: A toolkit for Zero-Shot Information Extraction with simple Verbalizations) accepted in the Demo Track of NAACL 2022. The code is publicly availabe on its own GitHub repository: ZS4IE.

Installation

By using Pip (check the last release)

pip install a2t

By clonning the repository

git clone https://github.com/osainz59/Ask2Transformers.git
cd Ask2Transformers
pip install .

Or directly by

pip install git+https://github.com/osainz59/Ask2Transformers

Models

Available models

By default, roberta-large-mnli checkpoint is used to perform the inference. You can try different models to perform the zero-shot classification, but they need to be finetuned on a NLI task and be compatible with the AutoModelForSequenceClassification class from Transformers. For example:

  • roberta-large-mnli
  • joeddav/xlm-roberta-large-xnli
  • facebook/bart-large-mnli
  • microsoft/deberta-v2-xlarge-mnli

Coming soon: t5-large like generative models support.

Pre-trained models 🆕

We now provide (task specific) pre-trained entailment models to: (1) reproduce the results of the papers and (2) reuse them for new schemas of the same tasks. The models are publicly available on the 🤗 HuggingFace Models Hub.

The model name describes the configuration used for training as follows:

HiTZ/A2T_[pretrained_model]_[NLI_datasets]_[finetune_datasets]

  • pretrained_model: The checkpoint used for initialization. For example: RoBERTalarge.
  • NLI_datasets: The NLI datasets used for pivot training.
    • S: Standford Natural Language Inference (SNLI) dataset.
    • M: Multi Natural Language Inference (MNLI) dataset.
    • F: Fever-nli dataset.
    • A: Adversarial Natural Language Inference (ANLI) dataset.
  • finetune_datasets: The datasets used for fine tuning the entailment model. Note that for more than 1 dataset the training was performed sequentially. For example: ACE-arg.

Some models like HiTZ/A2T_RoBERTa_SMFA_ACE-arg have been trained marking some information between square brackets ('[[' and ']]') like the event trigger span. Make sure you follow the same preprocessing in order to obtain the best results.

Training your own models

There is no special script for fine-tuning your own entailment based models. In our experiments, we have used the publicly available run_glue.py python script (from HuggingFace Transformers). To train your own model, first, you will need to convert your actual dataset in some sort of NLI data, we recommend you to have a look to tacred2mnli.py script that serves as an example.

Tutorials (Notebooks)

Coming soon!

Results and evaluation

To obtain the results reported in the papers run the evaluation.py script with the corresponding configuration files. A configuration file containing the task and evaluation information should look like this:

{
    "name": "BabelDomains",
    "task_name": "topic-classification",
    "features_class": "a2t.tasks.text_classification.TopicClassificationFeatures",
    "hypothesis_template": "The domain of the sentence is about {label}.",
    "nli_models": [
        "roberta-large-mnli"
    ],
    "labels": [
        "Animals",
        "Art, architecture, and archaeology",
        "Biology",
        "Business, economics, and finance",
        "Chemistry and mineralogy",
        "Computing",
        "Culture and society",
        ...
        "Royalty and nobility",
        "Sport and recreation",
        "Textile and clothing",
        "Transport and travel",
        "Warfare and defense"
    ],
    "preprocess_labels": true,
    "dataset": "babeldomains",
    "test_path": "data/babeldomains.domain.gloss.tsv",
    "use_cuda": true,
    "half": true
}

Consider reading the papers to access the results.

About legacy code

The old code of this repository has been moved to a2t.legacy module and is only intended to be use for experimental reproducibility. Please, consider moving to the new code. If you need help read the new documentation or post an Issue on GitHub.

Citation

If you use this work, please consider citing at least one of the following papers. You can find the bibtex files in their corresponding aclanthology page.

Oscar Sainz, Haoling Qiu, Oier Lopez de Lacalle, Eneko Agirre, and Bonan Min. 2022. ZS4IE: A toolkit for Zero-Shot Information Extraction with simple Verbalizations. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations, pages 27–38, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.

Oscar Sainz, Itziar Gonzalez-Dios, Oier Lopez de Lacalle, Bonan Min, and Eneko Agirre. 2022. Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2439–2455, Seattle, United States. Association for Computational Linguistics.

Oscar Sainz, Oier Lopez de Lacalle, Gorka Labaka, Ander Barrena, and Eneko Agirre. 2021. Label Verbalization and Entailment for Effective Zero and Few-Shot Relation Extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1199–1212, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.

Oscar Sainz and German Rigau. 2021. Ask2Transformers: Zero-Shot Domain labelling with Pretrained Language Models. In Proceedings of the 11th Global Wordnet Conference, pages 44–52, University of South Africa (UNISA). Global Wordnet Association.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].