All Projects → eva-n27 → BERT-for-Chinese-Question-Answering

eva-n27 / BERT-for-Chinese-Question-Answering

Licence: Apache-2.0 license
No description or website provided.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to BERT-for-Chinese-Question-Answering

text2text
Text2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (+150.67%)
Mutual labels:  question-answering, bert
Nlp chinese corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+8774.67%)
Mutual labels:  question-answering, bert
cdQA-ui
⛔ [NOT MAINTAINED] A web interface for cdQA and other question answering systems.
Stars: ✭ 19 (-74.67%)
Mutual labels:  question-answering, bert
mcQA
🔮 Answering multiple choice questions with Language Models.
Stars: ✭ 23 (-69.33%)
Mutual labels:  question-answering, bert
cmrc2019
A Sentence Cloze Dataset for Chinese Machine Reading Comprehension (CMRC 2019)
Stars: ✭ 118 (+57.33%)
Mutual labels:  question-answering, bert
Medi-CoQA
Conversational Question Answering on Clinical Text
Stars: ✭ 22 (-70.67%)
Mutual labels:  question-answering, bert
iamQA
中文wiki百科QA阅读理解问答系统,使用了CCKS2016数据的NER模型和CMRC2018的阅读理解模型,还有W2V词向量搜索,使用torchserve部署
Stars: ✭ 46 (-38.67%)
Mutual labels:  question-answering, bert
SQUAD2.Q-Augmented-Dataset
Augmented version of SQUAD 2.0 for Questions
Stars: ✭ 31 (-58.67%)
Mutual labels:  question-answering, bert
FinBERT-QA
Financial Domain Question Answering with pre-trained BERT Language Model
Stars: ✭ 70 (-6.67%)
Mutual labels:  question-answering, bert
DrFAQ
DrFAQ is a plug-and-play question answering NLP chatbot that can be generally applied to any organisation's text corpora.
Stars: ✭ 29 (-61.33%)
Mutual labels:  question-answering, bert
Haystack
🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.
Stars: ✭ 3,409 (+4445.33%)
Mutual labels:  question-answering, bert
backprop
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.
Stars: ✭ 229 (+205.33%)
Mutual labels:  question-answering, bert
TriB-QA
吹逼我们是认真的
Stars: ✭ 45 (-40%)
Mutual labels:  question-answering, bert
KitanaQA
KitanaQA: Adversarial training and data augmentation for neural question-answering models
Stars: ✭ 58 (-22.67%)
Mutual labels:  question-answering, bert
NLP-Review-Scorer
Score your NLP paper review
Stars: ✭ 25 (-66.67%)
Mutual labels:  bert
label-studio-transformers
Label data using HuggingFace's transformers and automatically get a prediction service
Stars: ✭ 117 (+56%)
Mutual labels:  bert
Instahelp
Instahelp is a Q&A portal website similar to Quora
Stars: ✭ 21 (-72%)
Mutual labels:  question-answering
roberta-wwm-base-distill
this is roberta wwm base distilled model which was distilled from roberta wwm by roberta wwm large
Stars: ✭ 61 (-18.67%)
Mutual labels:  bert
TeBaQA
A question answering system which utilises machine learning.
Stars: ✭ 17 (-77.33%)
Mutual labels:  question-answering
bert nli
A Natural Language Inference (NLI) model based on Transformers (BERT and ALBERT)
Stars: ✭ 97 (+29.33%)
Mutual labels:  bert

BERT-for-Chinese-Question-Answering

本仓库的代码来源于PyTorch Pretrained Bert,仅做适配中文的QA任务的修改

主要修改的地方为read_squad_examples函数,由于SQuAD是英文的,因此源代码处理的方式是按照英文的方式,即此处

另外,增加了训练中每隔save_checkpoints_steps次进行evaluate,并保存dev上效果最好的模型参数。

因此修改为:

1.先使用tokenizer先使用tokenizer.basic_tokenizer.tokenize对doc进行处理得到doc_tokens(代码161行)

2.对orig_answer_text使用tokenizer.basic_tokenizer.tokenize,然后再计算answer的start_position和end_position(代码172-191)

使用方法

  • 首先需要将你的语料转换成SQuAD形式的,将数据以及模型文件放到data目录下(需要自己创建)

  • 执行

python3 run_squad.py \
  --do_train 
  --do_predict 
  --save_checkpoints_steps 3000 
  --train_batch_size 12 
  --num_train_epochs 5
  • 测试 eval.py中增加了使用BERT的tokenization,然后再计算EM和F1
python3 eval.py data/squad_dev.json output/predictions.json

欢迎各位大佬批评和指正,感谢

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].