All Projects → marumalo → pytorch-translm

marumalo / pytorch-translm

Licence: other
An implementation of transformer-based language model for sentence rewriting tasks such as summarization, simplification, and grammatical error correction.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to pytorch-translm

Sumrized
Automatic Text Summarization (English/Arabic).
Stars: ✭ 37 (+68.18%)
Mutual labels:  text-summarization, nlp-machine-learning
lingua-go
👄 The most accurate natural language detection library for Go, suitable for long and short text alike
Stars: ✭ 684 (+3009.09%)
Mutual labels:  language-modeling, nlp-machine-learning
anuvada
Interpretable Models for NLP using PyTorch
Stars: ✭ 102 (+363.64%)
Mutual labels:  nlp-machine-learning
knime-textprocessing
KNIME - Text Processing Extension (Labs)
Stars: ✭ 17 (-22.73%)
Mutual labels:  nlp-machine-learning
lidtk
Language Identification Toolkit
Stars: ✭ 17 (-22.73%)
Mutual labels:  nlp-machine-learning
Multi-Type-TD-TSR
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:
Stars: ✭ 174 (+690.91%)
Mutual labels:  nlp-machine-learning
IndRNN pytorch
Independently Recurrent Neural Networks (IndRNN) implemented in pytorch.
Stars: ✭ 112 (+409.09%)
Mutual labels:  language-modeling
Quora QuestionPairs DL
Kaggle Competition: Using deep learning to solve quora's question pairs problem
Stars: ✭ 54 (+145.45%)
Mutual labels:  nlp-machine-learning
Question-Answering-based-on-SQuAD
Question Answering System using BiDAF Model on SQuAD v2.0
Stars: ✭ 20 (-9.09%)
Mutual labels:  nlp-machine-learning
ShortText-Fasttext
ShortText classification
Stars: ✭ 12 (-45.45%)
Mutual labels:  nlp-machine-learning
elastic transformers
Making BERT stretchy. Semantic Elasticsearch with Sentence Transformers
Stars: ✭ 153 (+595.45%)
Mutual labels:  nlp-machine-learning
kex
Kex is a python library for unsupervised keyword extraction from a document, providing an easy interface and benchmarks on 15 public datasets.
Stars: ✭ 46 (+109.09%)
Mutual labels:  nlp-machine-learning
brand-sentiment-analysis
Scripts utilizing Heartex platform to build brand sentiment analysis from the news
Stars: ✭ 21 (-4.55%)
Mutual labels:  nlp-machine-learning
mlconjug3
A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning techniques.
Stars: ✭ 47 (+113.64%)
Mutual labels:  nlp-machine-learning
group-transformer
Official code for Group-Transformer (Scale down Transformer by Grouping Features for a Lightweight Character-level Language Model, COLING-2020).
Stars: ✭ 21 (-4.55%)
Mutual labels:  language-modeling
vnla
Code accompanying the CVPR 2019 paper: https://arxiv.org/abs/1812.04155
Stars: ✭ 60 (+172.73%)
Mutual labels:  nlp-machine-learning
Very-deep-cnn-tensorflow
Very deep CNN for text classification
Stars: ✭ 18 (-18.18%)
Mutual labels:  nlp-machine-learning
Naive-Bayes-Evening-Workshop
Companion code for Introduction to Python for Data Science: Coding the Naive Bayes Algorithm evening workshop
Stars: ✭ 23 (+4.55%)
Mutual labels:  nlp-machine-learning
Conditional-SeqGAN-Tensorflow
Conditional Sequence Generative Adversarial Network trained with policy gradient, Implementation in Tensorflow
Stars: ✭ 47 (+113.64%)
Mutual labels:  nlp-machine-learning
vlainic.github.io
My GitHub blog: things you might be interested, and probably not...
Stars: ✭ 26 (+18.18%)
Mutual labels:  nlp-machine-learning

Sentence Rewriting with Language Model

An implementation of the transformer-based language model for sentence rewriting tasks such as summarization, text simplification, paraphrase generation, style transfer, and grammatical error correction. The following figure shows the architecture overview. This model receives an input that joint original sentence and simplified sentence by special token <SEP>, which means the delimiter. Then, the model generates target sentences. This architecture is very simple, but have shown the great result in text summarization task and text simplification task.


Installation

This code are depend on the following.

  • python==3.6.5
  • pytorch==1.1.0
  • torchtext==0.3.1
git clone https://github.com/t080/pytorch-translm.git
cd ./pytorch-translm
pip install -r requirements.txt

Usages

Pre-training

The dataset for fine-tuning must be a text file. The input sentence must be segmented to words by whitespace. If you want to use GPU, please set the option --gpu.

python train.py pretrain \
    --train ./path/to/train.txt \
    --savedir ./checkpoints/pre-trained \
    --gpu

Fine-tuning

The dataset for fine-tuning must be TSV format. The source sentences and target sentences must be segmented to words by whitespace. If you want to use GPU, please set the option --gpu.

python train.py finetune \
    --model ./checkpoints/pre-trained/checkpoint_best.pt \
    --train ./path/to/train.tsv \
    --valid ./path/valid.tsv \
    --savedir ./checkpoints/fine-tuned \
    --gpu

Translation

In the translation step, you must set the option --model and --input. You can set sentence length of the model's output using the --maxlen option (default: 100 tokens).

python generate.py \
    --model ./checkpoints/fine-tuned/checkpoint_best.pt \
    --input ./path/to/test.txt \
    --gpu

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].