marumalo / pytorch-translm

Licence: other

An implementation of transformer-based language model for sentence rewriting tasks such as summarization, simplification, and grammatical error correction.

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to pytorch-translm

Sumrized

Automatic Text Summarization (English/Arabic).

Stars: ✭ 37 (+68.18%)

Mutual labels: text-summarization, nlp-machine-learning

lingua-go

👄 The most accurate natural language detection library for Go, suitable for long and short text alike

Stars: ✭ 684 (+3009.09%)

Mutual labels: language-modeling, nlp-machine-learning

anuvada

Interpretable Models for NLP using PyTorch

Stars: ✭ 102 (+363.64%)

Mutual labels: nlp-machine-learning

knime-textprocessing

KNIME - Text Processing Extension (Labs)

Stars: ✭ 17 (-22.73%)

Mutual labels: nlp-machine-learning

lidtk

Language Identification Toolkit

Stars: ✭ 17 (-22.73%)

Mutual labels: nlp-machine-learning

Multi-Type-TD-TSR

Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

Stars: ✭ 174 (+690.91%)

Mutual labels: nlp-machine-learning

IndRNN pytorch

Independently Recurrent Neural Networks (IndRNN) implemented in pytorch.

Stars: ✭ 112 (+409.09%)

Mutual labels: language-modeling

Quora QuestionPairs DL

Kaggle Competition: Using deep learning to solve quora's question pairs problem

Stars: ✭ 54 (+145.45%)

Mutual labels: nlp-machine-learning

Question-Answering-based-on-SQuAD

Question Answering System using BiDAF Model on SQuAD v2.0

Stars: ✭ 20 (-9.09%)

Mutual labels: nlp-machine-learning

ShortText-Fasttext

ShortText classification

Stars: ✭ 12 (-45.45%)

Mutual labels: nlp-machine-learning

elastic transformers

Making BERT stretchy. Semantic Elasticsearch with Sentence Transformers

Stars: ✭ 153 (+595.45%)

Mutual labels: nlp-machine-learning

kex

Kex is a python library for unsupervised keyword extraction from a document, providing an easy interface and benchmarks on 15 public datasets.

Stars: ✭ 46 (+109.09%)

Mutual labels: nlp-machine-learning

brand-sentiment-analysis

Scripts utilizing Heartex platform to build brand sentiment analysis from the news

Stars: ✭ 21 (-4.55%)

Mutual labels: nlp-machine-learning

mlconjug3

A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning techniques.

Stars: ✭ 47 (+113.64%)

Mutual labels: nlp-machine-learning

group-transformer

Official code for Group-Transformer (Scale down Transformer by Grouping Features for a Lightweight Character-level Language Model, COLING-2020).

Stars: ✭ 21 (-4.55%)

Mutual labels: language-modeling

vnla

Code accompanying the CVPR 2019 paper: https://arxiv.org/abs/1812.04155

Stars: ✭ 60 (+172.73%)

Mutual labels: nlp-machine-learning

Very-deep-cnn-tensorflow

Very deep CNN for text classification

Stars: ✭ 18 (-18.18%)

Mutual labels: nlp-machine-learning

Naive-Bayes-Evening-Workshop

Companion code for Introduction to Python for Data Science: Coding the Naive Bayes Algorithm evening workshop

Stars: ✭ 23 (+4.55%)

Mutual labels: nlp-machine-learning

Conditional-SeqGAN-Tensorflow

Conditional Sequence Generative Adversarial Network trained with policy gradient, Implementation in Tensorflow

Stars: ✭ 47 (+113.64%)

Mutual labels: nlp-machine-learning

vlainic.github.io

My GitHub blog: things you might be interested, and probably not...

Stars: ✭ 26 (+18.18%)

Mutual labels: nlp-machine-learning

View All Similar Projects ➔

Sentence Rewriting with Language Model

An implementation of the transformer-based language model for sentence rewriting tasks such as summarization, text simplification, paraphrase generation, style transfer, and grammatical error correction. The following figure shows the architecture overview. This model receives an input that joint original sentence and simplified sentence by special token <SEP>, which means the delimiter. Then, the model generates target sentences. This architecture is very simple, but have shown the great result in text summarization task and text simplification task.

Installation

This code are depend on the following.

python==3.6.5
pytorch==1.1.0
torchtext==0.3.1

git clone https://github.com/t080/pytorch-translm.git
cd ./pytorch-translm
pip install -r requirements.txt

Usages

Pre-training

The dataset for fine-tuning must be a text file. The input sentence must be segmented to words by whitespace. If you want to use GPU, please set the option --gpu.

python train.py pretrain \
    --train ./path/to/train.txt \
    --savedir ./checkpoints/pre-trained \
    --gpu

Fine-tuning

The dataset for fine-tuning must be TSV format. The source sentences and target sentences must be segmented to words by whitespace. If you want to use GPU, please set the option --gpu.

python train.py finetune \
    --model ./checkpoints/pre-trained/checkpoint_best.pt \
    --train ./path/to/train.tsv \
    --valid ./path/valid.tsv \
    --savedir ./checkpoints/fine-tuned \
    --gpu

Translation

In the translation step, you must set the option --model and --input. You can set sentence length of the model's output using the --maxlen option (default: 100 tokens).

python generate.py \
    --model ./checkpoints/fine-tuned/checkpoint_best.pt \
    --input ./path/to/test.txt \
    --gpu

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

marumalo / pytorch-translm

Programming Languages

Labels

Projects that are alternatives of or similar to pytorch-translm

Sentence Rewriting with Language Model

Installation

Usages

Pre-training

Fine-tuning

Translation

References