All Projects → Ricardokevins → Kevinpro-NLP-demo

Ricardokevins / Kevinpro-NLP-demo

Licence: other
All NLP you Need Here. 个人实现了一些好玩的NLP demo,目前包含13个NLP应用的pytorch实现

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to Kevinpro-NLP-demo

COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (-79.49%)
Mutual labels:  text-classification, transformer, bert, textclassification
Filipino-Text-Benchmarks
Open-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (-81.2%)
Mutual labels:  text-classification, transformer, bert
Text-Classification-LSTMs-PyTorch
The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (-61.54%)
Mutual labels:  text-classification, baseline
sister
SImple SenTence EmbeddeR
Stars: ✭ 66 (-43.59%)
Mutual labels:  transformer, bert
NLP-paper
🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/
Stars: ✭ 23 (-80.34%)
Mutual labels:  transformer, bert
protonet-bert-text-classification
finetune bert for small dataset text classification in a few-shot learning manner using ProtoNet
Stars: ✭ 28 (-76.07%)
Mutual labels:  text-classification, bert
GLUE-bert4keras
基于bert4keras的GLUE基准代码
Stars: ✭ 59 (-49.57%)
Mutual labels:  baseline, bert
ERNIE-text-classification-pytorch
This repo contains a PyTorch implementation of a pretrained ERNIE model for text classification.
Stars: ✭ 49 (-58.12%)
Mutual labels:  text-classification, bert
Onnxt5
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.
Stars: ✭ 143 (+22.22%)
Mutual labels:  text-classification, transformer
backprop
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.
Stars: ✭ 229 (+95.73%)
Mutual labels:  text-classification, bert
BERT-chinese-text-classification-pytorch
This repo contains a PyTorch implementation of a pretrained BERT model for text classification.
Stars: ✭ 92 (-21.37%)
Mutual labels:  text-classification, bert
FasterTransformer
Transformer related optimization, including BERT, GPT
Stars: ✭ 1,571 (+1242.74%)
Mutual labels:  transformer, bert
sticker2
Further developed as SyntaxDot: https://github.com/tensordot/syntaxdot
Stars: ✭ 14 (-88.03%)
Mutual labels:  transformer, bert
Paddlenlp
NLP Core Library and Model Zoo based on PaddlePaddle 2.0
Stars: ✭ 212 (+81.2%)
Mutual labels:  text-classification, transformer
vietnamese-roberta
A Robustly Optimized BERT Pretraining Approach for Vietnamese
Stars: ✭ 22 (-81.2%)
Mutual labels:  transformer, bert
Kashgari
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
Stars: ✭ 2,235 (+1810.26%)
Mutual labels:  text-classification, bert
les-military-mrc-rank7
莱斯杯:全国第二届“军事智能机器阅读”挑战赛 - Rank7 解决方案
Stars: ✭ 37 (-68.38%)
Mutual labels:  transformer, bert
text-classification-baseline
Pipeline for fast building text classification TF-IDF + LogReg baselines.
Stars: ✭ 55 (-52.99%)
Mutual labels:  text-classification, baseline
Keras Textclassification
中文长文本分类、短句子分类、多标签分类、两句子相似度(Chinese Text Classification of Keras NLP, multi-label classify, or sentence classify, long or short),字词句向量嵌入层(embeddings)和网络层(graph)构建基类,FastText,TextCNN,CharCNN,TextRNN, RCNN, DCNN, DPCNN, VDCNN, CRNN, Bert, Xlnet, Albert, Attention, DeepMoji, HAN, 胶囊网络-CapsuleNet, Transformer-encode, Seq2seq, SWEM, LEAM, TextGCN
Stars: ✭ 914 (+681.2%)
Mutual labels:  text-classification, transformer
Nlp Experiments In Pytorch
PyTorch repository for text categorization and NER experiments in Turkish and English.
Stars: ✭ 35 (-70.09%)
Mutual labels:  text-classification, transformer

Kevinpro-NLP-Demo

Some Simple implement of Fun NLP algorithm in Pytorch. updating and maintaining

If you have any question, please comment in Issue

If project helps you, welcome Star~ (Please Dont Just Fork without Star (´・ω・`) )

Attention: The part of code in this repository may origin from other open source materials, just for my own interest and experiment. May contain bugs and references to other people's code.

中文版本README

The main content

you can go into each project folder for more details in folder's readme.md inside,

  1. Text Classification Based on many Models (BiLSTM,Transformer) go here
  2. Summary Generation (Pointer Generator NetWork) go here
  3. Dialogue Translation (Seq2Seq) to build your own DialogueBot~~ go here
  4. Use GNN in Text Classification go here
  5. Transformer Mask Language Model Pretraining go here
  6. GPT for Text Generation and GPT for math problem go hereSource Repo
  7. Adversarial training (FGM) go here
  8. Very Simple and quick Use/Deploy of Seq2Seq-Transformer. Including Several Eamples(Denoise Pretrain, Medical-QuestionAnswering go here
  9. Practical use of Pytorch_Lighting go here
  10. AMP and Fp16 training for Pytorch go here
  11. Usefully Visualize Toolkit for Attention Map(or Other weighted Matrix go here
  12. Diffusion models implement and application on Fashion MNIST go here
  13. Simple Taste of Stable Learning (Building) go here
  14. Simple Taste of Meta Learning (Building) go here

My other open source NLP projects

  1. BERT in Relation ExtractionRicardokevins/Bert-In-Relation-Extraction: 使用Bert完成实体之间关系抽取 (github.com)
  2. Text-matchingRicardokevins/Text_Matching: NLP2020中兴捧月句子相似度匹配 (github.com)
  3. Transformer implement and useful NLP toolkitRicardokevins/EasyTransformer: Quick start with strong baseline of Bert and Transformer without pretrain (github.com)

What's New ~~

2022.8.31

  1. Update for Diffusion Model. We adopt the code from tutorial And made some necessary changes to the code to make it work locally. The practical training model and inference results can be found in the Diffusion/Result2 folder.

2022.3.25

  1. Thanks to @rattlesnakey's Issue(more discussion detail here). I add Feature in Pretrain Project. Set the Attention Weight of MASK-Token to Zero to prevent MASK-Tokens Self-Attention Each other. You can enable this feature in Transformer.py by setting "self.pretrain=True". PS:The New Feature has not been verified for the time being, and the effect on the pre-training has not been verified. I'll fill in the tests later

2022.1.28

  1. Rebuild the code structure in Transformer. Make Code Easier to Use and deploy
  2. Add Examples: Denoise-Pretrain in Transformer (Easy to use)

2022.1.16

  1. Update use Seq2Seq Transformer to Modeling Medical QA task (Tuing on 55w pairs of Chinese Medical QA data) More detail to be seen in README.md of Transformer/MedQAdemo/
  2. Update new Trainer and useful tools
  3. remove previous implement of Transformer (with some unfixable bugs)

Update History

2021.1.23

  1. 初次commit 添加句子分类模块,包含Transformer和BiLSTM以及BiLSTM+Attn模型
  2. 上传基本数据集,句子二分类作为Demo例子
  3. 加上和使用对抗学习思路

2021.5.1

  1. 重新整理和更新了很多东西.... 略

2021.6.22

  1. 修复了Text Classification的一些整理问题
  2. 增加了Text Classification对应的使用说明

2021.7.2

  1. 增加了MLM预训练技术实践
  2. 修复了句子分类模型里,过分大且不必要的Word Embed(因为太懒,所以只修改了Transformer的)
  3. 在句子分类里增加了加载预训练的可选项
  4. 修复了一些BUG

2021.7.11

  1. 增加了GNN在NLP中的应用
  2. 实现了GNN在文本分类上的使用
  3. 效果不好,暂时怀疑是数据处理的问题

2021.7.29

  1. 增加了CHI+TFIDF传统机器学习算法在文本分类上的应用
  2. 实现和测试了算法性能
  3. 更新了README

2021.8.2

  1. 重构了对话机器人模型于Seq2Seq文件夹
  2. 实现了BeamSearch解码方式
  3. 修复了PGN里的BeamSearch Bug

2021.9.11

  1. 添加了GPT在文本续写和数学题问题的解决(偷了karpathy/minGPT: A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training (github.com)代码实现的很好,对理解GPT很有帮助,偷过来看看能不能用在好玩的东西
  2. 重构了Pointer Generator NetWork,之前的表现一直不好,打算干脆重构,一行一行的重新捋一遍,感觉会安心很多。施工ing。

2021.9.16

  1. 修复了Pretrain里Mask Token未对齐,位置不一致问题

2021.9.29

  1. 在Transformer里增加了一个随机数字串恢复的Demo,对新手理解Transformer超友好,不需要外部数据,利用随机构造的数字串训练
  2. 新增实验TransfomerVAE,暂时有BUG,施工中

2021.11.20

  1. update BM25 and TF-IDF algorithm for quick match of Text.

2021.12.10

  1. Update Practical use of Pytorch_Lighting, Use Text_classification as Example. Convert the Pytorch to LightningLite. More details in LightingMain.py。
  2. Remove the redundant code

2021.12.9

  1. update Practical use of Amp(Automatic Mixed Precision). Implement in VAEGenerator, Test on local MX150, Significant improve the training time and Memory-Usage, More details in Comments at the end of the code
  2. Based the command of Amp, Modified the definition of 1e-9 to inf in model.py

2021.12.17

  1. Update Weighted Matrix Visualize Toolkit(eg. used for visualize of Attention Map) implement in Visualize. More Useful toolkit in the future
  2. Update Python comment Code Standards. More formal code practices will be followed in the future.

参考

BM25

https://blog.csdn.net/chaojianmo/article/details/105143657

Automatic Mixed Precision (AMP)

https://featurize.cn/notebooks/368cbc81-2b27-4036-98a1-d77589b1f0c4

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].