Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

中文长文本分类、短句子分类、多标签分类、两句子相似度（Chinese Text Classification of Keras NLP, multi-label classify, or sentence classify, long or short），字词句向量嵌入层（embeddings）和网络层（graph）构建基类，FastText，TextCNN，CharCNN，TextRNN, RCNN, DCNN, DPCNN, VDCNN, CRNN, Bert, Xlnet, Albert, Attention, DeepMoji, HAN, 胶囊网络-CapsuleNet, Transformer-encode, Seq2seq, SWEM, LEAM, TextGCN

Stars: ✭ 914 (+331.13%)

Mutual labels: text-classification, transformer

Delta

DELTA is a deep learning based natural language and speech processing platform.

Stars: ✭ 1,479 (+597.64%)

Mutual labels: text-classification, seq2seq

Bert Multitask Learning

BERT for Multitask Learning

Stars: ✭ 380 (+79.25%)

Mutual labels: text-classification, transformer

Nlp pytorch project

Embedding, NMT, Text_Classification, Text_Generation, NER etc.

Stars: ✭ 153 (-27.83%)

Mutual labels: text-classification, seq2seq

Onnxt5

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Stars: ✭ 143 (-32.55%)

Mutual labels: text-classification, transformer

Joeynmt

Minimalist NMT for educational purposes

Stars: ✭ 420 (+98.11%)

Mutual labels: seq2seq, transformer

Tensorflow Ml Nlp

텐서플로우와 머신러닝으로 시작하는 자연어처리(로지스틱회귀부터 트랜스포머 챗봇까지)

Stars: ✭ 176 (-16.98%)

Mutual labels: seq2seq, transformer

Neural sp

End-to-end ASR/LM implementation with PyTorch

Stars: ✭ 408 (+92.45%)

Mutual labels: seq2seq, transformer

Nlp Experiments In Pytorch

PyTorch repository for text categorization and NER experiments in Turkish and English.

Stars: ✭ 35 (-83.49%)

Mutual labels: text-classification, transformer

Nlp Tutorials

Simple implementations of NLP models. Tutorials are written in Chinese on my website https://mofanpy.com

Stars: ✭ 394 (+85.85%)

Mutual labels: seq2seq, transformer

Asr

Stars: ✭ 54 (-74.53%)

Mutual labels: seq2seq, transformer

Bert seq2seq

pytorch实现bert做seq2seq任务，使用unilm方案,现在也可以做自动摘要，文本分类，情感分析，NER，词性标注等任务,支持GPT2进行文章续写。

Stars: ✭ 298 (+40.57%)

Mutual labels: text-classification, seq2seq

Text Classification Models Pytorch

Implementation of State-of-the-art Text Classification Models in Pytorch

Stars: ✭ 379 (+78.77%)

Mutual labels: seq2seq, transformer

Transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Stars: ✭ 55,742 (+26193.4%)

Mutual labels: transformer, seq2seq

Kashgari

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Stars: ✭ 2,235 (+954.25%)

Mutual labels: text-classification, seq2seq

View All Similar Projects ➔

简体中文 | English

简介

PaddleNLP 2.0拥有覆盖多场景的模型库、简洁易用的全流程API与动静统一的高性能分布式训练能力，旨在为飞桨开发者提升文本领域建模效率，并提供基于PaddlePaddle 2.0的NLP领域最佳实践。

特性

覆盖多场景的模型库
- PaddleNLP集成了RNN与Transformer等多种主流模型结构，涵盖从词向量、词法分析、命名实体识别、语义表示等NLP基础技术，到文本分类、文本匹配、文本生成、文本图学习、信息抽取等NLP核心技术。同时针对机器翻译、通用对话、阅读理解等系统应用提供相应核心组件与预训练模型。更多详细介绍请查看PaddleNLP应用示例。
简洁易用的全流程API
- 深度兼容飞桨2.0的高层API体系，内置可复用的文本建模模块(Embedding, CRF, Seq2Vec, Transformer)，可大幅度减少在数据处理、模型组网、训练与评估、推理部署环节的开发量，提升NLP任务迭代与落地的效率。
动静统一的高性能分布式训练
- 基于飞桨2.0核心框架『动静统一』的特性与领先的混合精度优化策略，结合Fleet分布式训练API，可充分利用GPU集群资源，高效完成大规模预训练模型的分布式训练。

安装

环境依赖

python >= 3.6
paddlepaddle >= 2.0.1

pip安装

pip install --upgrade paddlenlp -i https://pypi.org/simple

更多关于PaddlePaddle的安装和PaddleNLP安装详细教程请查看Installation

快速开始

数据集快速加载

from paddlenlp.datasets import load_dataset

train_ds, dev_ds, test_ds = load_dataset("chnsenticorp", splits=["train", "dev", "test"])

可参考Dataset文档查看更多数据集。

一键加载预训练中文词向量

from paddlenlp.embeddings import TokenEmbedding

wordemb = TokenEmbedding("w2v.baidu_encyclopedia.target.word-word.dim300")
print(wordemb.cosine_sim("国王", "王后"))
>>> 0.63395125
wordemb.cosine_sim("艺术", "火车")
>>> 0.14792643

内置50+中文词向量，更多使用方法请参考Embedding文档。

一键加载高质量中文预训练模型

from paddlenlp.transformers import ErnieModel, BertModel, RobertaModel, ElectraModel, GPT2ForPretraining

ernie = ErnieModel.from_pretrained('ernie-1.0')
bert = BertModel.from_pretrained('bert-wwm-chinese')
roberta = RobertaModel.from_pretrained('roberta-wwm-ext')
electra = ElectraModel.from_pretrained('chinese-electra-small')
gpt2 = GPT2ForPretraining.from_pretrained('gpt2-base-cn')

便捷获取文本特征

import paddle
from paddlenlp.transformers import ErnieTokenizer, ErnieModel

tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
model = ErnieModel.from_pretrained('ernie-1.0')

text = tokenizer('自然语言处理')
pooled_output, sequence_output = model(input_ids=paddle.to_tensor([text['input_ids']]))

请参考Transformer API文档查看目前支持的预训练模型。

模型库及其应用

PaddleNLP模型库整体介绍请参考文档PaddleNLP Model Zoo。模型应用场景介绍请参考PaddleNLP Examples。

进阶应用

模型压缩

API 使用文档

Transformer API
- 基于Transformer结构相关的预训练模型API，包含ERNIE, BERT, RoBERTa, Electra等主流经典结构和下游任务。
Data API
- 文本数据处理Pipeline的相关API说明。
Dataset API
- 数据集相关API，包含自定义数据集，数据集贡献与数据集快速加载等功能说明。
Embedding API
- 词向量相关API，支持一键快速加载包预训练的中文词向量，VisulDL高维可视化等功能说明。
Metrics API
- 针对NLP场景的评估指标说明，与飞桨2.0框架高层API兼容。

交互式Notebook教程

更多教程参见PaddleNLP on AI Studio。

社区贡献与技术交流

特殊兴趣小组

欢迎您加入PaddleNLP的SIG社区，贡献优秀的模型实现、公开数据集、教程与案例、外围小工具。

QQ

现在就加入PaddleNLP的QQ技术交流群，一起交流NLP技术吧！⬇️

Slack

欢迎加入PaddleNLP Slack channel与我们的开发者进行技术交流。

License

PaddleNLP遵循Apache-2.0开源协议。

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 212

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (32) 🔗