All Projects → gmftbyGMFTBY → OpenDialog

gmftbyGMFTBY / OpenDialog

Licence: MIT License
An Open-Source Package for Chinese Open-domain Conversational Chatbot (中文闲聊对话系统,一键部署微信闲聊机器人)

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to OpenDialog

Clue
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (+2479.79%)
Mutual labels:  corpus, transformers, chinese, bert
Text-Summarization
Abstractive and Extractive Text summarization using Transformers.
Stars: ✭ 38 (-59.57%)
Mutual labels:  transformers, bert, gpt2
Roberta zh
RoBERTa中文预训练模型: RoBERTa for Chinese
Stars: ✭ 1,953 (+1977.66%)
Mutual labels:  chinese, bert, gpt2
Nlp chinese corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+6980.85%)
Mutual labels:  corpus, chinese, bert
CBLUE
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Stars: ✭ 379 (+303.19%)
Mutual labels:  corpus, chinese
COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (-74.47%)
Mutual labels:  transformers, bert
Text and Audio classification with Bert
Text Classification in Turkish Texts with Bert
Stars: ✭ 34 (-63.83%)
Mutual labels:  transformers, bert
ParsBigBird
Persian Bert For Long-Range Sequences
Stars: ✭ 58 (-38.3%)
Mutual labels:  transformers, bert
CLUEmotionAnalysis2020
CLUE Emotion Analysis Dataset 细粒度情感分析数据集
Stars: ✭ 3 (-96.81%)
Mutual labels:  corpus, chinese
text2class
Multi-class text categorization using state-of-the-art pre-trained contextualized language models, e.g. BERT
Stars: ✭ 15 (-84.04%)
Mutual labels:  transformers, bert
robo-vln
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"
Stars: ✭ 34 (-63.83%)
Mutual labels:  transformers, bert
bert-squeeze
🛠️ Tools for Transformers compression using PyTorch Lightning ⚡
Stars: ✭ 56 (-40.43%)
Mutual labels:  transformers, bert
TV4Dialog
No description or website provided.
Stars: ✭ 33 (-64.89%)
Mutual labels:  corpus, chinese
Pytorch-NLU
Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (+60.64%)
Mutual labels:  transformers, bert
label-studio-transformers
Label data using HuggingFace's transformers and automatically get a prediction service
Stars: ✭ 117 (+24.47%)
Mutual labels:  transformers, bert
GoEmotions-pytorch
Pytorch Implementation of GoEmotions 😍😢😱
Stars: ✭ 95 (+1.06%)
Mutual labels:  transformers, bert
golgotha
Contextualised Embeddings and Language Modelling using BERT and Friends using R
Stars: ✭ 39 (-58.51%)
Mutual labels:  transformers, bert
LightLM
高性能小模型测评 Shared Tasks in NLPCC 2020. Task 1 - Light Pre-Training Chinese Language Model for NLP Task
Stars: ✭ 54 (-42.55%)
Mutual labels:  chinese, bert
classy
classy is a simple-to-use library for building high-performance Machine Learning models in NLP.
Stars: ✭ 61 (-35.11%)
Mutual labels:  transformers, bert
Transformers-Tutorials
This repository contains demos I made with the Transformers library by HuggingFace.
Stars: ✭ 2,828 (+2908.51%)
Mutual labels:  transformers, bert

OpenDialog

我们现在拥有了测试接口了,搜索微信公众号 OpenDialog 可以使用

OpenDialog建立在基于PyTorch的transformers之上。 提供一系列transformer-based的中文开放域对话模型(闲聊对话),网罗已有的数据资源并持续不断的补充对应的中文对话系统的数据集,意图构建一个开源的中文闲聊对话平台。

最新进展:

  • 2020.8.20, 完成LCCC-GPT-Large生成式Open-Domain预训练模型的接口,运行下面代码可以启动对应的服务

    ./run_flask lccc <gpu_id>
  • 2020.10.26, 完成一批bi-encoder的检索式对话模型(bert-bi-encoder, polyencoder等)

  • ...

使用教程

1. 项目结构和文件简述

OpenDialog核心文件和目录:

  • data: 数据集,配置文件,词表,词向量,数据集处理脚本
  • models: 对话模型
  • metrics: 评价指标
  • multiview: 多角度重排模型,针对获得对话候选回复进行重排序
  • ckpt: 存放训练模型
  • rest: 存放tensorboard日志和test阶段生成的结果文件
  • utils: 存放工具函数
  • dataloader.py: 数据集加载脚本
  • main.py: 主运行文件
  • header.py: 需要导入的package
  • eval.py: 调用metrics中的评价指标的评估脚本,测试rest中生成文件的结果
  • run.sh: 运行批处理脚本
  • run_flask.sh: 调用模型,启动服务

2. 准备环境

  1. 基础系统环境: Linux/Ubuntu-16.04+, Python 3.6+, GPU (default 1080 Ti)

  2. 安装python依赖库

pip install -r requirements.txt
  1. 安装 ElasticSearch

    基于检索的对话系统需要首先使用elasticsearch进行粗筛。同时为了实现粗筛检索阶段的中文分词,同时需要下载和安装中文分词器

  2. 安装 mongodb

    启动服务之后,会使用mongodb存储会话历史和必要的数据

3. 准备数据

  1. 数据集百度云链接: https://pan.baidu.com/s/1xJibJmOOCGIzmJVC6CZ39Q; 提取码: vmua
  2. 将对应的数据文件存放在data目录下对应的子目录中,词向量文件chinese_w2v.txtenglish_w2v.bin存放在data下即可。
  3. 数据细节和预处理数据详见data/README.md
  4. 可用的数据集

5. 训练模型

  • 训练模型支持多GPU并行,只需要<gpu_ids>指定多个gpu id即可,比如0,1,2,3
  • dataset名称和data目录下的名称一致
Model CMD Type Details Refer Pre-train Model
bertretrieval ./run.sh train <dataset> bertretrieval <gpu_ids> retrieval 基于bert的精排模型(fine-tuning) Paper
gpt2 ./run.sh train <dataset> gpt2 <gpu_ids> generative GPT2生成式对话模型 Code
gpt2gan ./run.sh train <dataset> gpt2gan <gpu_ids> generative GAN-based对话模型,生成式模型是GPT2,判别模型是bert二分类模型 Paper

6. 实验结果

7. 启动flask服务

  1. 启动flask服务

    ./run_flask.sh <model_name> <gpu_id>
    
  2. 调用接口

    • 微信公众号
    • postman
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].