All Projects → PaddlePaddle → Paddlenlp

PaddlePaddle / Paddlenlp

Licence: apache-2.0
NLP Core Library and Model Zoo based on PaddlePaddle 2.0

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Paddlenlp

Sockeye
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet
Stars: ✭ 990 (+366.98%)
Mutual labels:  seq2seq, transformer
Multiturndialogzoo
Multi-turn dialogue baselines written in PyTorch
Stars: ✭ 106 (-50%)
Mutual labels:  seq2seq, transformer
Machine Translation
Stars: ✭ 51 (-75.94%)
Mutual labels:  seq2seq, transformer
Seq2seqchatbots
A wrapper around tensor2tensor to flexibly train, interact, and generate data for neural chatbots.
Stars: ✭ 466 (+119.81%)
Mutual labels:  seq2seq, transformer
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+1087.74%)
Mutual labels:  seq2seq, text-classification
Keras Textclassification
中文长文本分类、短句子分类、多标签分类、两句子相似度(Chinese Text Classification of Keras NLP, multi-label classify, or sentence classify, long or short),字词句向量嵌入层(embeddings)和网络层(graph)构建基类,FastText,TextCNN,CharCNN,TextRNN, RCNN, DCNN, DPCNN, VDCNN, CRNN, Bert, Xlnet, Albert, Attention, DeepMoji, HAN, 胶囊网络-CapsuleNet, Transformer-encode, Seq2seq, SWEM, LEAM, TextGCN
Stars: ✭ 914 (+331.13%)
Mutual labels:  text-classification, transformer
Delta
DELTA is a deep learning based natural language and speech processing platform.
Stars: ✭ 1,479 (+597.64%)
Mutual labels:  text-classification, seq2seq
Bert Multitask Learning
BERT for Multitask Learning
Stars: ✭ 380 (+79.25%)
Mutual labels:  text-classification, transformer
Nlp pytorch project
Embedding, NMT, Text_Classification, Text_Generation, NER etc.
Stars: ✭ 153 (-27.83%)
Mutual labels:  text-classification, seq2seq
Onnxt5
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.
Stars: ✭ 143 (-32.55%)
Mutual labels:  text-classification, transformer
Joeynmt
Minimalist NMT for educational purposes
Stars: ✭ 420 (+98.11%)
Mutual labels:  seq2seq, transformer
Tensorflow Ml Nlp
텐서플로우와 머신러닝으로 시작하는 자연어처리(로지스틱회귀부터 트랜스포머 챗봇까지)
Stars: ✭ 176 (-16.98%)
Mutual labels:  seq2seq, transformer
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+92.45%)
Mutual labels:  seq2seq, transformer
Nlp Experiments In Pytorch
PyTorch repository for text categorization and NER experiments in Turkish and English.
Stars: ✭ 35 (-83.49%)
Mutual labels:  text-classification, transformer
Nlp Tutorials
Simple implementations of NLP models. Tutorials are written in Chinese on my website https://mofanpy.com
Stars: ✭ 394 (+85.85%)
Mutual labels:  seq2seq, transformer
Asr
Stars: ✭ 54 (-74.53%)
Mutual labels:  seq2seq, transformer
Bert seq2seq
pytorch实现bert做seq2seq任务,使用unilm方案,现在也可以做自动摘要,文本分类,情感分析,NER,词性标注等任务,支持GPT2进行文章续写。
Stars: ✭ 298 (+40.57%)
Mutual labels:  text-classification, seq2seq
Text Classification Models Pytorch
Implementation of State-of-the-art Text Classification Models in Pytorch
Stars: ✭ 379 (+78.77%)
Mutual labels:  seq2seq, transformer
Transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Stars: ✭ 55,742 (+26193.4%)
Mutual labels:  transformer, seq2seq
Kashgari
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
Stars: ✭ 2,235 (+954.25%)
Mutual labels:  text-classification, seq2seq

简体中文 | English


PyPI - PaddleNLP Version PyPI - Python Version PyPI Status python version support os GitHub

简介

PaddleNLP 2.0拥有覆盖多场景的模型库简洁易用的全流程API动静统一的高性能分布式训练能力,旨在为飞桨开发者提升文本领域建模效率,并提供基于PaddlePaddle 2.0的NLP领域最佳实践。

特性

安装

环境依赖

  • python >= 3.6
  • paddlepaddle >= 2.0.1

pip安装

pip install --upgrade paddlenlp -i https://pypi.org/simple

更多关于PaddlePaddle的安装和PaddleNLP安装详细教程请查看Installation

快速开始

数据集快速加载

from paddlenlp.datasets import load_dataset

train_ds, dev_ds, test_ds = load_dataset("chnsenticorp", splits=["train", "dev", "test"])

可参考Dataset文档查看更多数据集。

一键加载预训练中文词向量

from paddlenlp.embeddings import TokenEmbedding

wordemb = TokenEmbedding("w2v.baidu_encyclopedia.target.word-word.dim300")
print(wordemb.cosine_sim("国王", "王后"))
>>> 0.63395125
wordemb.cosine_sim("艺术", "火车")
>>> 0.14792643

内置50+中文词向量,更多使用方法请参考Embedding文档

一键加载高质量中文预训练模型

from paddlenlp.transformers import ErnieModel, BertModel, RobertaModel, ElectraModel, GPT2ForPretraining

ernie = ErnieModel.from_pretrained('ernie-1.0')
bert = BertModel.from_pretrained('bert-wwm-chinese')
roberta = RobertaModel.from_pretrained('roberta-wwm-ext')
electra = ElectraModel.from_pretrained('chinese-electra-small')
gpt2 = GPT2ForPretraining.from_pretrained('gpt2-base-cn')

便捷获取文本特征

import paddle
from paddlenlp.transformers import ErnieTokenizer, ErnieModel

tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
model = ErnieModel.from_pretrained('ernie-1.0')

text = tokenizer('自然语言处理')
pooled_output, sequence_output = model(input_ids=paddle.to_tensor([text['input_ids']]))

请参考Transformer API文档查看目前支持的预训练模型。

模型库及其应用

PaddleNLP模型库整体介绍请参考文档PaddleNLP Model Zoo。 模型应用场景介绍请参考PaddleNLP Examples

进阶应用

API 使用文档

  • Transformer API
    • 基于Transformer结构相关的预训练模型API,包含ERNIE, BERT, RoBERTa, Electra等主流经典结构和下游任务。
  • Data API
    • 文本数据处理Pipeline的相关API说明。
  • Dataset API
    • 数据集相关API,包含自定义数据集,数据集贡献与数据集快速加载等功能说明。
  • Embedding API
    • 词向量相关API,支持一键快速加载包预训练的中文词向量,VisulDL高维可视化等功能说明。
  • Metrics API
    • 针对NLP场景的评估指标说明,与飞桨2.0框架高层API兼容。

交互式Notebook教程

更多教程参见PaddleNLP on AI Studio

社区贡献与技术交流

特殊兴趣小组

  • 欢迎您加入PaddleNLP的SIG社区,贡献优秀的模型实现、公开数据集、教程与案例、外围小工具。

QQ

  • 现在就加入PaddleNLP的QQ技术交流群,一起交流NLP技术吧!⬇️

Slack

License

PaddleNLP遵循Apache-2.0开源协议

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].