PaddlePaddle / Paddlenlp
Licence: apache-2.0
NLP Core Library and Model Zoo based on PaddlePaddle 2.0
Stars: ✭ 212
Programming Languages
python
139335 projects - #7 most used programming language
Projects that are alternatives of or similar to Paddlenlp
Sockeye
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet
Stars: ✭ 990 (+366.98%)
Mutual labels: seq2seq, transformer
Multiturndialogzoo
Multi-turn dialogue baselines written in PyTorch
Stars: ✭ 106 (-50%)
Mutual labels: seq2seq, transformer
Seq2seqchatbots
A wrapper around tensor2tensor to flexibly train, interact, and generate data for neural chatbots.
Stars: ✭ 466 (+119.81%)
Mutual labels: seq2seq, transformer
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+1087.74%)
Mutual labels: seq2seq, text-classification
Keras Textclassification
中文长文本分类、短句子分类、多标签分类、两句子相似度(Chinese Text Classification of Keras NLP, multi-label classify, or sentence classify, long or short),字词句向量嵌入层(embeddings)和网络层(graph)构建基类,FastText,TextCNN,CharCNN,TextRNN, RCNN, DCNN, DPCNN, VDCNN, CRNN, Bert, Xlnet, Albert, Attention, DeepMoji, HAN, 胶囊网络-CapsuleNet, Transformer-encode, Seq2seq, SWEM, LEAM, TextGCN
Stars: ✭ 914 (+331.13%)
Mutual labels: text-classification, transformer
Delta
DELTA is a deep learning based natural language and speech processing platform.
Stars: ✭ 1,479 (+597.64%)
Mutual labels: text-classification, seq2seq
Bert Multitask Learning
BERT for Multitask Learning
Stars: ✭ 380 (+79.25%)
Mutual labels: text-classification, transformer
Nlp pytorch project
Embedding, NMT, Text_Classification, Text_Generation, NER etc.
Stars: ✭ 153 (-27.83%)
Mutual labels: text-classification, seq2seq
Onnxt5
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.
Stars: ✭ 143 (-32.55%)
Mutual labels: text-classification, transformer
Joeynmt
Minimalist NMT for educational purposes
Stars: ✭ 420 (+98.11%)
Mutual labels: seq2seq, transformer
Tensorflow Ml Nlp
텐서플로우와 머신러닝으로 시작하는 자연어처리(로지스틱회귀부터 트랜스포머 챗봇까지)
Stars: ✭ 176 (-16.98%)
Mutual labels: seq2seq, transformer
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+92.45%)
Mutual labels: seq2seq, transformer
Nlp Experiments In Pytorch
PyTorch repository for text categorization and NER experiments in Turkish and English.
Stars: ✭ 35 (-83.49%)
Mutual labels: text-classification, transformer
Nlp Tutorials
Simple implementations of NLP models. Tutorials are written in Chinese on my website https://mofanpy.com
Stars: ✭ 394 (+85.85%)
Mutual labels: seq2seq, transformer
Bert seq2seq
pytorch实现bert做seq2seq任务,使用unilm方案,现在也可以做自动摘要,文本分类,情感分析,NER,词性标注等任务,支持GPT2进行文章续写。
Stars: ✭ 298 (+40.57%)
Mutual labels: text-classification, seq2seq
Text Classification Models Pytorch
Implementation of State-of-the-art Text Classification Models in Pytorch
Stars: ✭ 379 (+78.77%)
Mutual labels: seq2seq, transformer
Transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Stars: ✭ 55,742 (+26193.4%)
Mutual labels: transformer, seq2seq
Kashgari
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
Stars: ✭ 2,235 (+954.25%)
Mutual labels: text-classification, seq2seq
简体中文 | English
简介
PaddleNLP 2.0拥有覆盖多场景的模型库、简洁易用的全流程API与动静统一的高性能分布式训练能力,旨在为飞桨开发者提升文本领域建模效率,并提供基于PaddlePaddle 2.0的NLP领域最佳实践。
特性
-
覆盖多场景的模型库
-
简洁易用的全流程API
- 深度兼容飞桨2.0的高层API体系,内置可复用的文本建模模块(Embedding, CRF, Seq2Vec, Transformer),可大幅度减少在数据处理、模型组网、训练与评估、推理部署环节的开发量,提升NLP任务迭代与落地的效率。
-
动静统一的高性能分布式训练
- 基于飞桨2.0核心框架『动静统一』的特性与领先的混合精度优化策略,结合Fleet分布式训练API,可充分利用GPU集群资源,高效完成大规模预训练模型的分布式训练。
安装
环境依赖
- python >= 3.6
- paddlepaddle >= 2.0.1
pip安装
pip install --upgrade paddlenlp -i https://pypi.org/simple
更多关于PaddlePaddle的安装和PaddleNLP安装详细教程请查看Installation
快速开始
数据集快速加载
from paddlenlp.datasets import load_dataset
train_ds, dev_ds, test_ds = load_dataset("chnsenticorp", splits=["train", "dev", "test"])
可参考Dataset文档查看更多数据集。
一键加载预训练中文词向量
from paddlenlp.embeddings import TokenEmbedding
wordemb = TokenEmbedding("w2v.baidu_encyclopedia.target.word-word.dim300")
print(wordemb.cosine_sim("国王", "王后"))
>>> 0.63395125
wordemb.cosine_sim("艺术", "火车")
>>> 0.14792643
内置50+中文词向量,更多使用方法请参考Embedding文档。
一键加载高质量中文预训练模型
from paddlenlp.transformers import ErnieModel, BertModel, RobertaModel, ElectraModel, GPT2ForPretraining
ernie = ErnieModel.from_pretrained('ernie-1.0')
bert = BertModel.from_pretrained('bert-wwm-chinese')
roberta = RobertaModel.from_pretrained('roberta-wwm-ext')
electra = ElectraModel.from_pretrained('chinese-electra-small')
gpt2 = GPT2ForPretraining.from_pretrained('gpt2-base-cn')
便捷获取文本特征
import paddle
from paddlenlp.transformers import ErnieTokenizer, ErnieModel
tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
model = ErnieModel.from_pretrained('ernie-1.0')
text = tokenizer('自然语言处理')
pooled_output, sequence_output = model(input_ids=paddle.to_tensor([text['input_ids']]))
请参考Transformer API文档查看目前支持的预训练模型。
模型库及其应用
PaddleNLP模型库整体介绍请参考文档PaddleNLP Model Zoo。 模型应用场景介绍请参考PaddleNLP Examples。
进阶应用
API 使用文档
-
Transformer API
- 基于Transformer结构相关的预训练模型API,包含ERNIE, BERT, RoBERTa, Electra等主流经典结构和下游任务。
-
Data API
- 文本数据处理Pipeline的相关API说明。
-
Dataset API
- 数据集相关API,包含自定义数据集,数据集贡献与数据集快速加载等功能说明。
-
Embedding API
- 词向量相关API,支持一键快速加载包预训练的中文词向量,VisulDL高维可视化等功能说明。
-
Metrics API
- 针对NLP场景的评估指标说明,与飞桨2.0框架高层API兼容。
交互式Notebook教程
- 使用Seq2Vec模块进行句子情感分类
- 如何通过预训练模型Fine-tune下游任务
- 使用BiGRU-CRF模型完成快递单信息抽取
- 使用预训练模型ERNIE优化快递单信息抽取
- 使用Seq2Seq模型完成自动对联
- 使用预训练模型ERNIE-GEN实现智能写诗
- 使用TCN网络完成新冠疫情病例数预测
更多教程参见PaddleNLP on AI Studio。
社区贡献与技术交流
特殊兴趣小组
- 欢迎您加入PaddleNLP的SIG社区,贡献优秀的模型实现、公开数据集、教程与案例、外围小工具。
- 现在就加入PaddleNLP的QQ技术交流群,一起交流NLP技术吧!⬇️
Slack
- 欢迎加入PaddleNLP Slack channel与我们的开发者进行技术交流。
License
PaddleNLP遵循Apache-2.0开源协议。
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].