All Projects → 0x7o → text2keywords

0x7o / text2keywords

Licence: MIT license
Trained T5 and T5-large model for creating keywords from text

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to text2keywords

ttt
A package for fine-tuning Transformers with TPUs, written in Tensorflow2.0+
Stars: ✭ 35 (-33.96%)
Mutual labels:  transformers, t5
Nn
🧑‍🏫 50! Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
Stars: ✭ 5,720 (+10692.45%)
Mutual labels:  transformers, transformer
golgotha
Contextualised Embeddings and Language Modelling using BERT and Friends using R
Stars: ✭ 39 (-26.42%)
Mutual labels:  transformers, transformer
t5-japanese
Codes to pre-train Japanese T5 models
Stars: ✭ 39 (-26.42%)
Mutual labels:  transformer, t5
question generator
An NLP system for generating reading comprehension questions
Stars: ✭ 188 (+254.72%)
Mutual labels:  transformers, t5
COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (-54.72%)
Mutual labels:  transformers, transformer
trapper
State-of-the-art NLP through transformer models in a modular design and consistent APIs.
Stars: ✭ 28 (-47.17%)
Mutual labels:  transformers, transformer
fastT5
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
Stars: ✭ 421 (+694.34%)
Mutual labels:  transformer, t5
Transformer-MM-Explainability
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
Stars: ✭ 484 (+813.21%)
Mutual labels:  transformers, transformer
Text-Summarization
Abstractive and Extractive Text summarization using Transformers.
Stars: ✭ 38 (-28.3%)
Mutual labels:  transformers, t5
chef-transformer
Chef Transformer 🍲 .
Stars: ✭ 29 (-45.28%)
Mutual labels:  transformers, t5
MinTL
MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems
Stars: ✭ 61 (+15.09%)
Mutual labels:  transformer
GoEmotions-pytorch
Pytorch Implementation of GoEmotions 😍😢😱
Stars: ✭ 95 (+79.25%)
Mutual labels:  transformers
BERT-NER
Using pre-trained BERT models for Chinese and English NER with 🤗Transformers
Stars: ✭ 114 (+115.09%)
Mutual labels:  transformers
MISE
Multimodal Image Synthesis and Editing: A Survey
Stars: ✭ 214 (+303.77%)
Mutual labels:  transformers
CSV2RDF
Streaming, transforming, SPARQL-based CSV to RDF converter. Apache license.
Stars: ✭ 48 (-9.43%)
Mutual labels:  transformer
ParsBigBird
Persian Bert For Long-Range Sequences
Stars: ✭ 58 (+9.43%)
Mutual labels:  transformers
remixer-pytorch
Implementation of the Remixer Block from the Remixer paper, in Pytorch
Stars: ✭ 37 (-30.19%)
Mutual labels:  transformers
OverlapPredator
[CVPR 2021, Oral] PREDATOR: Registration of 3D Point Clouds with Low Overlap.
Stars: ✭ 293 (+452.83%)
Mutual labels:  transformer
optimum
🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools
Stars: ✭ 567 (+969.81%)
Mutual labels:  transformers

text to keywords

Trained T5-base and T5-large model for creating keywords from text. Supported languages: ru

Pretraining Large version | Pretraining Base version

habr article

Usage

Example usage (the code returns a list with keywords. duplicates are possible):

Try Model Training In Colab!

pip install transformers sentencepiece
from itertools import groupby
import torch
from transformers import T5ForConditionalGeneration, T5Tokenizer
model_name = "0x7194633/keyt5-large" # or 0x7194633/keyt5-base
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

def generate(text, **kwargs):
    inputs = tokenizer(text, return_tensors='pt')
    with torch.no_grad():
        hypotheses = model.generate(**inputs, num_beams=5, **kwargs)
    s = tokenizer.decode(hypotheses[0], skip_special_tokens=True)
    s = s.replace('; ', ';').replace(' ;', ';').lower().split(';')[:-1]
    s = [el for el, _ in groupby(s)]
    return s

article = """Reuters сообщил об отмене 3,6 тыс. авиарейсов из-за «омикрона» и погоды
Наибольшее число отмен авиарейсов 2 января пришлось на американские авиакомпании 
SkyWest и Southwest, у каждой — более 400 отмененных рейсов. При этом среди 
отмененных 2 января авиарейсов — более 2,1 тыс. рейсов в США. Также свыше 6400 
рейсов были задержаны."""

print(generate(article, top_p=1.0, max_length=64))  
# ['авиаперевозки', 'отмена авиарейсов', 'отмена рейсов', 'отмена авиарейсов', 'отмена рейсов', 'отмена авиарейсов']

Training

To teach the keyT5-base and keyT5-large models, you will need a table in csv format, like this:

KeyT5 models were trained on ~7000 compressed habr.com articles. data.csv collect.py Exclusively supports the Russian language!

X Y
Some text that is fed to the input The text that should come out
Some text that is fed to the input The text that should come out

Go to the training notebook and learn more about it:

Try Model Training In Colab!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].