Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…

Stars: ✭ 151 (+60.64%)

Mutual labels: transformers, bert

label-studio-transformers

Label data using HuggingFace's transformers and automatically get a prediction service

Stars: ✭ 117 (+24.47%)

Mutual labels: transformers, bert

GoEmotions-pytorch

Pytorch Implementation of GoEmotions 😍😢😱

Stars: ✭ 95 (+1.06%)

Mutual labels: transformers, bert

golgotha

Contextualised Embeddings and Language Modelling using BERT and Friends using R

Stars: ✭ 39 (-58.51%)

Mutual labels: transformers, bert

LightLM

高性能小模型测评 Shared Tasks in NLPCC 2020. Task 1 - Light Pre-Training Chinese Language Model for NLP Task

Stars: ✭ 54 (-42.55%)

Mutual labels: chinese, bert

classy

classy is a simple-to-use library for building high-performance Machine Learning models in NLP.

Stars: ✭ 61 (-35.11%)

Mutual labels: transformers, bert

Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.

Stars: ✭ 2,828 (+2908.51%)

Mutual labels: transformers, bert

View All Similar Projects ➔

OpenDialog

~~我们现在拥有了测试接口了，搜索微信公众号 OpenDialog 可以使用~~

OpenDialog建立在基于PyTorch的transformers之上。提供一系列transformer-based的中文开放域对话模型（闲聊对话），网罗已有的数据资源并持续不断的补充对应的中文对话系统的数据集，意图构建一个开源的中文闲聊对话平台。

最新进展：

2020.8.20, 完成LCCC-GPT-Large生成式Open-Domain预训练模型的接口，运行下面代码可以启动对应的服务
```
./run_flask lccc <gpu_id>
```
2020.10.26, 完成一批bi-encoder的检索式对话模型(bert-bi-encoder, polyencoder等)
...

使用教程

1. 项目结构和文件简述

OpenDialog核心文件和目录:

data: 数据集，配置文件，词表，词向量，数据集处理脚本
models: 对话模型
metrics: 评价指标
multiview: 多角度重排模型，针对获得对话候选回复进行重排序
ckpt: 存放训练模型
rest: 存放tensorboard日志和test阶段生成的结果文件
utils: 存放工具函数
dataloader.py: 数据集加载脚本
main.py: 主运行文件
header.py: 需要导入的package
eval.py: 调用metrics中的评价指标的评估脚本，测试rest中生成文件的结果
run.sh: 运行批处理脚本
run_flask.sh: 调用模型，启动服务

2. 准备环境

基础系统环境: Linux/Ubuntu-16.04+, Python 3.6+, GPU (default 1080 Ti)
安装python依赖库

pip install -r requirements.txt

安装 ElasticSearch

基于检索的对话系统需要首先使用elasticsearch进行粗筛。同时为了实现粗筛检索阶段的中文分词，同时需要下载和安装中文分词器
安装 mongodb

启动服务之后，会使用mongodb存储会话历史和必要的数据

3. 准备数据

数据集百度云链接: https://pan.baidu.com/s/1xJibJmOOCGIzmJVC6CZ39Q; 提取码: vmua
将对应的数据文件存放在data目录下对应的子目录中，词向量文件chinese_w2v.txt和english_w2v.bin存放在data下即可。
数据细节和预处理数据详见data/README.md。
可用的数据集

5. 训练模型

训练模型支持多GPU并行，只需要<gpu_ids>指定多个gpu id即可，比如0,1,2,3
dataset名称和data目录下的名称一致

Model	CMD	Type	Details	Refer
bertretrieval	./run.sh train <dataset> bertretrieval <gpu_ids>	retrieval	基于bert的精排模型(fine-tuning)	Paper
gpt2	./run.sh train <dataset> gpt2 <gpu_ids>	generative	GPT2生成式对话模型	Code
gpt2gan	./run.sh train <dataset> gpt2gan <gpu_ids>	generative	GAN-based对话模型，生成式模型是GPT2，判别模型是bert二分类模型	Paper

6. 实验结果

7. 启动flask服务

启动flask服务
```
./run_flask.sh <model_name> <gpu_id>
```
调用接口
- 微信公众号
- postman

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

gmftbyGMFTBY / OpenDialog

Programming Languages

Labels

Projects that are alternatives of or similar to OpenDialog

OpenDialog

使用教程

1. 项目结构和文件简述

2. 准备环境

3. 准备数据

5. 训练模型

6. 实验结果

7. 启动flask服务