All Projects → Sanqiang → text_simplification

Sanqiang / text_simplification

Licence: other
Text Simplification Model based on Encoder-Decoder (includes Transformer and Seq2Seq) model.

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
PostScript
262 projects
HTML
75241 projects
Makefile
30231 projects
C++
36643 projects - #6 most used programming language

Projects that are alternatives of or similar to text simplification

learningspoons
nlp lecture-notes and source code
Stars: ✭ 29 (-56.06%)
Mutual labels:  transformer, seq2seq-model
Neural-Machine-Translation
Several basic neural machine translation models implemented by PyTorch & TensorFlow
Stars: ✭ 29 (-56.06%)
Mutual labels:  transformer, seq2seq-model
fastT5
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
Stars: ✭ 421 (+537.88%)
Mutual labels:  transformer
Transformer Temporal Tagger
Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging
Stars: ✭ 55 (-16.67%)
Mutual labels:  transformer
TransBTS
This repo provides the official code for : 1) TransBTS: Multimodal Brain Tumor Segmentation Using Transformer (https://arxiv.org/abs/2103.04430) , accepted by MICCAI2021. 2) TransBTSV2: Towards Better and More Efficient Volumetric Segmentation of Medical Images(https://arxiv.org/abs/2201.12785).
Stars: ✭ 254 (+284.85%)
Mutual labels:  transformer
php-serializer
Serialize PHP variables, including objects, in any format. Support to unserialize it too.
Stars: ✭ 47 (-28.79%)
Mutual labels:  transformer
sparql-transformer
A more handy way to use SPARQL data in your web app
Stars: ✭ 38 (-42.42%)
Mutual labels:  transformer
R-MeN
Transformer-based Memory Networks for Knowledge Graph Embeddings (ACL 2020) (Pytorch and Tensorflow)
Stars: ✭ 74 (+12.12%)
Mutual labels:  transformer
Neural-Scam-Artist
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Stars: ✭ 18 (-72.73%)
Mutual labels:  transformer
kaggle-champs
Code for the CHAMPS Predicting Molecular Properties Kaggle competition
Stars: ✭ 49 (-25.76%)
Mutual labels:  transformer
Variational-Transformer
Variational Transformers for Diverse Response Generation
Stars: ✭ 79 (+19.7%)
Mutual labels:  transformer
cape
Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch
Stars: ✭ 29 (-56.06%)
Mutual labels:  transformer
project-code-py
Leetcode using AI
Stars: ✭ 100 (+51.52%)
Mutual labels:  transformer
TokenLabeling
Pytorch implementation of "All Tokens Matter: Token Labeling for Training Better Vision Transformers"
Stars: ✭ 385 (+483.33%)
Mutual labels:  transformer
ru-dalle
Generate images from texts. In Russian
Stars: ✭ 1,606 (+2333.33%)
Mutual labels:  transformer
transformer-ls
Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).
Stars: ✭ 201 (+204.55%)
Mutual labels:  transformer
ViTs-vs-CNNs
[NeurIPS 2021]: Are Transformers More Robust Than CNNs? (Pytorch implementation & checkpoints)
Stars: ✭ 145 (+119.7%)
Mutual labels:  transformer
libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Stars: ✭ 284 (+330.3%)
Mutual labels:  transformer
sister
SImple SenTence EmbeddeR
Stars: ✭ 66 (+0%)
Mutual labels:  transformer
dingo-serializer-switch
A middleware to switch fractal serializers in dingo
Stars: ✭ 49 (-25.76%)
Mutual labels:  transformer

Integrating Transformer and Paraphrase Rules for Sentence Simplification

Paper Link: http://www.aclweb.org/anthology/D18-1355

Note that some improvement from original EMNLP paper:

  • we modified the code to allow supporting subword and the model performs well.
  • we found replacing name entities might not be a good idea (i.e. replace John to person0) since it lose some information. Instead, subword is helpful for reducing the huge vocabulary coming from name entities.
  • we found the context(memory) addressing is probably redundant. Without it, the model can achieve same(even better) performance.

Data Download:

https://drive.google.com/open?id=132Jlza-16Ws1DJ7h4O89TyxJiFSFAPw7

Pretrained Model Download:

https://drive.google.com/open?id=16gO8cLXttGR64_xvLHgMwgJeB1DzT93N

Command to run the model:

python model/train.py -ngpus 1 -bsize 64 -fw transformer -out bertal_wkori_direct -op adagrad -lr 0.01 --mode transbert_ori -nh 8 -nhl 6 -nel 6 -ndl 6 -lc True -eval_freq 0 --fetch_mode tf_example_dataset --subword_vocab_size 0 --dmode wk --tie_embedding all --bert_mode bert_token:bertbase:init --environment aws --memory direct python model/eval.py -ngpus 1 -bsize 256 -fw transformer -out bertal_wkori_direct -op adagrad -lr 0.01 --mode transbert_ori -nh 8 -nhl 6 -nel 6 -ndl 6 -lc True -eval_freq 0 --subword_vocab_size 0 --dmode wk --tie_embedding all --bert_mode bert_token:bertbase:init --environment aws

Arugument instruction

  • bsize: batch size
  • out: the output folder will contains log, best model and result report
  • tie_embedding: all means tie the encoder/decoder/projection w embedding, we found it can speed up the training
  • bert_mode: the mode of using BERT bert_token indicates we use the subtoken vocabulary from BERT; bertbase indicates we use BERT base version (due to the memory issue, we did not try BERT large version yet)
  • environment: the path config of the experiment. Please change it in model/model_config.py to fit to your system

More config you can check them in util/arguments.py

Citation

Zhao, Sanqiang, et al. "Integrating Transformer and Paraphrase Rules for Sentence Simplification." Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018.

@article{zhao2018integrating,
  title={Integrating Transformer and Paraphrase Rules for Sentence Simplification},
  author={Zhao, Sanqiang and Meng, Rui and He, Daqing and Andi, Saptono and Bambang, Parmanto},
  journal={arXiv preprint arXiv:1810.11193},
  year={2018}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].