All Projects → menghuanlater → Tianchi2020ChineseMedicineQuestionGeneration

menghuanlater / Tianchi2020ChineseMedicineQuestionGeneration

Licence: other
2020 阿里云天池大数据竞赛-中医药文献问题生成挑战赛

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Tianchi2020ChineseMedicineQuestionGeneration

Transformer-QG-on-SQuAD
Implement Question Generator with SOTA pre-trained Language Models (RoBERTa, BERT, GPT, BART, T5, etc.)
Stars: ✭ 28 (+40%)
Mutual labels:  bert, question-generation, roberta
classy
classy is a simple-to-use library for building high-performance Machine Learning models in NLP.
Stars: ✭ 61 (+205%)
Mutual labels:  sequence-to-sequence, bert
Bertviz
Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)
Stars: ✭ 3,443 (+17115%)
Mutual labels:  bert, roberta
les-military-mrc-rank7
莱斯杯:全国第二届“军事智能机器阅读”挑战赛 - Rank7 解决方案
Stars: ✭ 37 (+85%)
Mutual labels:  bert, roberta
Chinese Bert Wwm
Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
Stars: ✭ 6,357 (+31685%)
Mutual labels:  bert, roberta
Roberta zh
RoBERTa中文预训练模型: RoBERTa for Chinese
Stars: ✭ 1,953 (+9665%)
Mutual labels:  bert, roberta
vietnamese-roberta
A Robustly Optimized BERT Pretraining Approach for Vietnamese
Stars: ✭ 22 (+10%)
Mutual labels:  bert, roberta
KLUE
📖 Korean NLU Benchmark
Stars: ✭ 420 (+2000%)
Mutual labels:  bert, roberta
question generator
An NLP system for generating reading comprehension questions
Stars: ✭ 188 (+840%)
Mutual labels:  bert, question-generation
roberta-wwm-base-distill
this is roberta wwm base distilled model which was distilled from roberta wwm by roberta wwm large
Stars: ✭ 61 (+205%)
Mutual labels:  bert, roberta
Albert zh
A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS, 海量中文预训练ALBERT模型
Stars: ✭ 3,500 (+17400%)
Mutual labels:  bert, roberta
COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (+20%)
Mutual labels:  bert, roberta
CLUE pytorch
CLUE baseline pytorch CLUE的pytorch版本基线
Stars: ✭ 72 (+260%)
Mutual labels:  bert, roberta
Clue
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (+12025%)
Mutual labels:  bert, roberta
erc
Emotion recognition in conversation
Stars: ✭ 34 (+70%)
Mutual labels:  bert, roberta
text-generation-transformer
text generation based on transformer
Stars: ✭ 36 (+80%)
Mutual labels:  sequence-to-sequence, bert
text2text
Text2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (+840%)
Mutual labels:  bert, question-generation
beir
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
Stars: ✭ 738 (+3590%)
Mutual labels:  bert, question-generation
Text-Summarization
Abstractive and Extractive Text summarization using Transformers.
Stars: ✭ 38 (+90%)
Mutual labels:  bert, roberta
berserker
Berserker - BERt chineSE woRd toKenizER
Stars: ✭ 17 (-15%)
Mutual labels:  sequence-to-sequence, bert

Tianchi2020ChineseMedicineQuestionGeneration

2020 阿里云天池大数据竞赛-中医药文献问题生成挑战赛

官网链接: https://tianchi.aliyun.com/competition/entrance/531826/introduction

初赛成绩: 0.6133(11/868) 复赛成绩: 0.6215(8/868=>复赛代码审核后为第6)

均为single model

包含数据集的完整项目文件百度盘链接: https://pan.baidu.com/s/1crAYwtDLrGnkls9xdfQdQg 提取码:qagl (备注:网盘链接不稳定, 有可能会被百度误封, 如需完整数据文件, 可私信[email protected])

模型整体思路: 预训练语言模型(RoBERTa_wwm_ext_large)作为编码器, Transformer-XL作为解码器(train from scratch),使用其他阅读理解数据集进行预学习,再在比赛数据集上进行微调

整体流程:

  1. 数据预处理:python preprocess.py生成multi-task.pkl
  2. 在DuReader数据集上粗粒度的预学习nohup python -u MultiTaskXLIR-DuReader train gpu-0 & (自行设置batch-size和gpu数量)
  3. 在DRCD和CMRC2018数据集上细粒度的预学习nohup python -u MultiTaskXLIR-DRMC train gpu-0 &
  4. 在比赛数据集上进行学习nohup python -u MultiTaskXLIR-Final train gpu-0 final &
  5. 使用beam_search生成测试集结果python MultiTaskXLIR-Final test gpu-0
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].