All Projects → kyzhouhzau → Nlpgnn

kyzhouhzau / Nlpgnn

Licence: mit
1. Use BERT, ALBERT and GPT2 as tensorflow2.0's layer. 2. Implement GCN, GAN, GIN and GraphSAGE based on message passing.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Nlpgnn

Photo2cartoon
人像卡通化探索项目 (photo-to-cartoon translation project)
Stars: ✭ 2,845 (+1187.33%)
Mutual labels:  gan
Papers
Summaries of machine learning papers
Stars: ✭ 2,362 (+968.78%)
Mutual labels:  gan
Artgan
ArtGAN: This work presents a series of new approaches to improve Generative Adversarial Network (GAN) for conditional image synthesis and we name the proposed model as “ArtGAN”. Implementations are in Caffe/Tensorflow.
Stars: ✭ 210 (-4.98%)
Mutual labels:  gan
Ginhello
Gin 学习示例代码
Stars: ✭ 197 (-10.86%)
Mutual labels:  gin
Swapnet
Virtual Clothing Try-on with Deep Learning. PyTorch reproduction of SwapNet by Raj et al. 2018. Now with Docker support!
Stars: ✭ 202 (-8.6%)
Mutual labels:  gan
Gopherlabs
Go - Beginners | Intermediate | Advanced
Stars: ✭ 205 (-7.24%)
Mutual labels:  gin
Arbitrary Text To Image Papers
A collection of arbitrary text to image papers with code (constantly updating)
Stars: ✭ 196 (-11.31%)
Mutual labels:  gan
Goview
Goview is a lightweight, minimalist and idiomatic template library based on golang html/template for building Go web application.
Stars: ✭ 213 (-3.62%)
Mutual labels:  gin
Triple Gan
See Triple-GAN-V2 in PyTorch: https://github.com/taufikxu/Triple-GAN
Stars: ✭ 203 (-8.14%)
Mutual labels:  gan
Gan Sandbox
Vanilla GAN implemented on top of keras/tensorflow enabling rapid experimentation & research. Branches correspond to implementations of stable GAN variations (i.e. ACGan, InfoGAN) and other promising variations of GANs like conditional and Wasserstein.
Stars: ✭ 210 (-4.98%)
Mutual labels:  gan
Mman
( ECCV2018 ) Macro-Micro Adversarial Network for Human Parsing
Stars: ✭ 200 (-9.5%)
Mutual labels:  gan
Iseebetter
iSeeBetter: Spatio-Temporal Video Super Resolution using Recurrent-Generative Back-Projection Networks | Python3 | PyTorch | GANs | CNNs | ResNets | RNNs | Published in Springer Journal of Computational Visual Media, September 2020, Tsinghua University Press
Stars: ✭ 202 (-8.6%)
Mutual labels:  gan
Pix2vox
Sketch-Based 3D Exploration with Stacked Generative Adversarial Networks
Stars: ✭ 206 (-6.79%)
Mutual labels:  gan
Sn Gan
pyTorch implementation of Spectral Normalization for Generative Adversarial Networks
Stars: ✭ 198 (-10.41%)
Mutual labels:  gan
Ranksrgan
ICCV 2019 (oral) RankSRGAN: Generative Adversarial Networks with Ranker for Image Super-Resolution. PyTorch implementation
Stars: ✭ 213 (-3.62%)
Mutual labels:  gan
Munit
Multimodal Unsupervised Image-to-Image Translation
Stars: ✭ 2,404 (+987.78%)
Mutual labels:  gan
Singan
Pytorch implementation of "SinGAN: Learning a Generative Model from a Single Natural Image"
Stars: ✭ 204 (-7.69%)
Mutual labels:  gan
Anogan Tf
Unofficial Tensorflow Implementation of AnoGAN (Anomaly GAN)
Stars: ✭ 218 (-1.36%)
Mutual labels:  gan
Warpgan
(CVPR 2019 Oral) Style Transfer with Geometric Deformation
Stars: ✭ 215 (-2.71%)
Mutual labels:  gan
Paddlegan
PaddlePaddle GAN library, including lots of interesting applications like First-Order motion transfer, wav2lip, picture repair, image editing, photo2cartoon, image style transfer, and so on.
Stars: ✭ 4,987 (+2156.56%)
Mutual labels:  gan

nlpgnn

Build Status PyPI version GitHub version Maintainability License Coverage Status

Package description

The field of natural language processing is currently undergoing tremendous changes, and many excellent models have been proposed in recent years, including BERT, GPT, etc.
At the same time, graph neural network as an exquisite design is constantly being used in the field of natural language processing, such as TextGCN and Tensor-TextGCN.
This toolbox is dedicated to natural language processing and expects to implement models in the simplest way.
Keywords: NLP; GNN

Models:

  • BERT
  • ALBERT
  • GPT2
  • TextCNN
  • Bilstm+Attention
  • GCN, GAN
  • GIN, GraphSAGE
  • TextGCN, TextSAGE

Examples (See tests for more details):

  • BERT-NER (Chinese and English Version)
  • BERT-CRF-NER (Chinese and English Version)
  • BERT-CLS (Chinese and English Version)
  • ALBERT-NER (Chinese and English Version)
  • ALBERT-CLS (Chinese and English Version)
  • GPT2-generation (English Version)
  • Bilstm+Attention (Chinese and English Version)
  • TextCNN(Chinese and English Version)
  • GCN, GAN, GIN, GraphSAGE (Base on message passing)
  • TextGCN and TextSAGE for text classification

All the above experiments were tested on GTX 1080 GPU with memory 8000MiB.

Status

2020/5/--: convert the project name to NLPGNN from fennlp.

2020/5/17: try to convert sentence to graph based on bert attention matrix, but failed. This section provides a solution to visualize the BERT attention matrix. For more detail, you can check dictionary "BERT-GCN".

2020/5/11: add TextGCN and TextSAGE for text classification.

2020/5/5: add GIN, GraphSAGE for graph classfication.

2020/4/25: add GAN, GIN model, based on message passing methods.

2020/4/23: add GCN model, based on message passing methods.

2020/4/16:currently focusing on models of GNN in nlp, and trying to integrate some GNN models into fennlp.

2020/4/2: add GPT2 model, could used parameters released by OpenAI (base,medium,large). More detail reference dictionary "TG/EN/interactive.py"

2020/3/26: add Bilstm+Attention example for classification

2020/3/23: add RAdam optimizer.

2020/3/19: add test example "albert_ner_train.py" "albert_ner_test.py"

2020/3/16: add model for training sub word embedding based on bpe methods. The trained embedding is used in TextCNN model for improve it's improvement. See "tran_bpe_embeding.py" for more details.

2020/3/8: add test example "run_tucker.py" for train TuckER on WN18.

2020/3/3: add test example "tran_text_cnn.py" for train TextCNN model.

2020/3/2: add test example "train_bert_classification.py" for text classification based on bert.

Requirement

  • tensorflow-gpu>=2.0
  • typeguard
  • gensim
  • tqdm
  • sentencepiece

Usage

  1. clone source
git clone https://github.com/kyzhouhzau/NLPGNN.git
  1. install package
python setup.py install 
  1. run model
python bert_ner_train.py

For NER:

Input

  • put train, valid and test file in "Input" dictionary.

  • data format: reference data in "tests\NER\Input\train"

    e.g. "拮 抗 RANKL 对 破 骨 细 胞 的 作 用 。 O O O O B-Anatomy I-Anatomy I-Anatomy E-Anatomy O O O O"

    For each line in train contains two parts, the first part "拮 抗 RANKL 对 破 骨 细 胞 的 作 用 。" is a sentence. The second part "O O O O B-Anatomy I-Anatomy I-Anatomy E-Anatomy O O O O" is the tag for each word in the sentence. Both of them use '\t' to concatenate.

1、bert (base, large)

from nlpgnn.models import bert
bert = bert.BERT()
python bert_ner_train.py
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
bert (BERT)                  multiple                  101677056 
_________________________________________________________________
dense (Dense)                multiple                  35374     
=================================================================
Total params: 101,712,430
Trainable params: 101,712,430
Non-trainable params: 0
_________________________________________________________________

2、bert + crf

from nlpgnn.models import bert
from nlpgnn.metrics.crf import CrfLogLikelihood
bert = bert.BERT()
crf = CrfLogLikelihood()
python bert_ner_crf_train.py
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
bert (BERT)                  multiple                  101677056 
_________________________________________________________________
dense (Dense)                multiple                  35374     
_________________________________________________________________
crf (CrfLogLikelihood)       multiple                  2116      
=================================================================
Total params: 101,714,546
Trainable params: 101,714,546
Non-trainable params: 0
_________________________________________________________________

3、albert (base, large, xlage, xxlage)

from nlpgnn.models import albert
bert = albert.ALBERT()
python albert_ner_train.py 
large
Model: "albert_ner"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
albert (ALBERT)              multiple                  11092992  
_________________________________________________________________
dense (Dense)                multiple                  6921      
=================================================================
Total params: 11,099,913
Trainable params: 11,099,913
Non-trainable params: 0
_________________________________________________________________

Using the default parameters, we get the following results on "中文糖尿病标注数据集" and "CoNLL-2003" valid data.

model macro-F1 macro-P macro-R lr epoch maxlen batch_size data
bert+base 0.7005 0.7244 0.7031 2e-5 3 128 6 中文糖尿病标注数据集
bert+base+crf 0.7009 0.7237 0.7041 2e-5(bert),2e-3(crf) 3 128 6 中文糖尿病标注数据集
bert+base 0.9128 0.9208 0.9227 2e-5 5 128 8 CoNLL-2003
albert+base 0.8512 0.8678 0.8589 1e-4 8 128 16 CoNLL-2003
albert+large 0.8670 0.8778 0.8731 2e-5 10 128 4 CoNLL-2003

For Sentence Classfication

Input

  • put train, valid and test file in "Input" dictionary.

  • data format: reference data in "\tests\CLS\BERT( or ALBERT)\Input".

    e.g. "作 为 地 球 上 曾 经 最 强 的 拳 王 之 一 , 小 克 里 琴 科 谈 自 己 是 否 会 复 出 2"

    For each line in train(test,valid) contains two parts, the first part "作 为 地 球 上 曾 经 最 强 的 拳 王 之 一 , 小 克 里 琴 科 谈 自 己 是 否 会 复 出" is the sentence, and second part "2" is the label.

1、bert (base, large)

from nlpgnn.models import bert
bert = bert.BERT()
python train_bert_classification.py
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
bert (BERT)                  multiple                  102267648 
_________________________________________________________________
dense (Dense)                multiple                  11535     
=================================================================
Total params: 102,279,183
Trainable params: 102,279,183
Non-trainable params: 0
_________________________________________________________________

2、TextCNN

from nlpgnn.models import TextCNN
model = TextCNN.TextCNN()
python train_text_cnn.py
Use "WordPiece embedding" to Initialize word embedding. Train your embeddings.
python train_bpe_embedding.py

For more detail reference WordPiece

Using the default parameters, we get the following results on "新闻标题短文本分类" and SST-2 valid data.

model ACC lr epoch maxlen batch_size data
bert+base 0.8899 1e-5 5 50 32 新闻标题短文本分类
bert+base 0.9266 2e-5 3 128 8 SST-2
albert+base 0.9186 1e-5 3 128 16 SST-2
albert+large 0.9461 1e-6 3 128 4 SST-2
Bilstm+attention 0.8269 0.01 3 128 64 SST-2
TextCNN 0.8233 0.01 3 128 64 SST-2

For Text Generation

1、GPT2

from nlpgnn.models import gpt2
bert = gpt2.GPT2()
python interactive.py
Model: "gen_gp_t2" base
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
gpt2 (GPT2)                  multiple                  124439808 
=================================================================
Total params: 124,439,808
Trainable params: 124,439,808
Non-trainable params: 0
_________________________________________________________________

Example:

Input >>> Stocks continue to fall this week

Output >>> as stocks fall for the second consecutive week as investors flee for safe havens.

"The market is off the charts," said John Schmieding, senior vice president, market strategy at RBC Capital Markets. 
"We don't know what the Fed's intent is on, what direction it's going in. We don't know where they plan to go. 
We don't know what direction they're going to move into."

TensorBoard

tensorboard can help you visualize losses and evaluate indicators:

useage:

 tensorboard --port 6006 --logdir="./tensorboard"

loss acc

GNN

1、GCN, GAN, GIN, GraphSAGE (Based on message passing)

Same data split and parameters setting as proposed in this paper

  • Nodes Classfication
model Cora Pubmed Citeseer
GCN 81.80 79.50 71.20
GAN 83.00 79.00 72.30
GAAE 82.40 79.60 71.70
  • Graph Classfication
model MUTAG PROTEINS NCI1
GIN 87.62±8.76# 73.05±1.85# 73.13±5.57#
GraphSAGE 86.06±8.26 75.11±2.87 76.91±3.45

Note: The # sign indicates that the current result is less than the paper result. In the paper the author use this method to evaluate models. This method is time expensive. So I did not do it like that here.

  • Text Classfication
model R8 R52
TextSAGE 96.68±0.42 92.80±0.32
TextGCN2019 97.108±0.243 92.512±0.249

Parameter Settings

1、For English tasks, you need to set the parameter "cased" (in fennlp.datas.checkpoint.LoadCheckpoint) to be consistent with your preprocessed input data to ensure that the tokenizer can correctly distinguish case.

2、When you use bert or albert, the following parameters is necessary:

param.maxlen
param.label_size
param.batch_size

if you don't know the count of label_size, the script will tell you when you first run the train codes.

3、Learning rate and batch_size will determine model convergence, see Link for more detail.

4、If you are not familiar with the optimizer in bert and albert, it does not matter. The most important thing you need to remember is that the parameters "learning_rate" and "decay_steps" (in fennlp.optimizers.optim.AdamWarmup) is important. You can set the "learning rate" to a relatively small value, and let "decay_steps" equal to samples*epoch/batch_size or little higher.

5、If you find that the code runs slower, you can try to use @ tf.function and set the appropriate model writing and evaluation frequency.

6、Any other problem you can concat me by "[email protected]" or ask questions in issue.

Reference

[1] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
[2] ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
[3] Language Models are Unsupervised Multitask Learners
[4] Neural Message Passing for Quantum Chemistry
[5] Semi-Supervised Classification with Graph Convolutional Networks
[6] Graph Attention Networks
[7] How Powerful are Graph Neural Networks?
[8] GraphSAGE: Inductive Representation Learning on Large Graphs
[9] Diffusion Improves Graph Learning
[10] Benchmarking Graph Neural Networks
[11] Text Level Graph Neural Network for Text Classification
[12] Graph Convolutional Networks for Text Classification
[13] Tensor Graph Convolutional Networks for Text Classification
[14] Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].