Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → SunnyMarkLiu → les-military-mrc-rank7

SunnyMarkLiu / les-military-mrc-rank7

Licence: other

莱斯杯：全国第二届“军事智能机器阅读”挑战赛 - Rank7 解决方案

Programming Languages

Jupyter Notebook

11667 projects

139335 projects - #7 most used programming language

77523 projects

Labels

pytorch transformer reading-comprehension bert roberta

Projects that are alternatives of or similar to les-military-mrc-rank7

COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers

Rank 1 / 216

Stars: ✭ 24 (-35.14%)

Mutual labels: transformer, bert, roberta

Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)

Stars: ✭ 3,443 (+9205.41%)

Mutual labels: transformer, bert, roberta

vietnamese-roberta

A Robustly Optimized BERT Pretraining Approach for Vietnamese

Stars: ✭ 22 (-40.54%)

Mutual labels: transformer, bert, roberta

Filipino-Text-Benchmarks

Open-source benchmark datasets and pretrained transformer models in the Filipino language.

Stars: ✭ 22 (-40.54%)

Mutual labels: transformer, bert

Contextualised Embeddings and Language Modelling using BERT and Friends using R

Stars: ✭ 39 (+5.41%)

Mutual labels: transformer, bert

are-16-heads-really-better-than-1

Code for the paper "Are Sixteen Heads Really Better than One?"

Stars: ✭ 128 (+245.95%)

Mutual labels: transformer, bert

SIGIR2021 Conure

One Person, One Model, One World: Learning Continual User Representation without Forgetting

Stars: ✭ 23 (-37.84%)

Mutual labels: transformer, bert

text-generation-transformer

text generation based on transformer

Stars: ✭ 36 (-2.7%)

Mutual labels: transformer, bert

bert in a flask

A dockerized flask API, serving ALBERT and BERT predictions using TensorFlow 2.0.

Stars: ✭ 32 (-13.51%)

Mutual labels: transformer, bert

Natural Language Processing Tutorial for Deep Learning Researchers

Stars: ✭ 9,895 (+26643.24%)

Mutual labels: transformer, bert

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Stars: ✭ 55,742 (+150554.05%)

Mutual labels: transformer, bert

transformer-models

Deep Learning Transformer models in MATLAB

Stars: ✭ 90 (+143.24%)

Mutual labels: transformer, bert

XPersona: Evaluating Multilingual Personalized Chatbot

Stars: ✭ 54 (+45.95%)

Mutual labels: transformer, bert

semantic-document-relations

Implementation, trained models and result data for the paper "Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles"

Stars: ✭ 21 (-43.24%)

Mutual labels: transformer, bert

The official PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing" (WebConf '21)

Stars: ✭ 44 (+18.92%)

Mutual labels: transformer, bert

bert-as-a-service TFX

End-to-end pipeline with TFX to train and deploy a BERT model for sentiment analysis.

Stars: ✭ 32 (-13.51%)

Mutual labels: transformer, bert

🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/

Stars: ✭ 23 (-37.84%)

Mutual labels: transformer, bert

Kevinpro-NLP-demo

All NLP you Need Here. 个人实现了一些好玩的NLP demo，目前包含13个NLP应用的pytorch实现

Stars: ✭ 117 (+216.22%)

Mutual labels: transformer, bert

Google AI 2018 BERT pytorch implementation

Stars: ✭ 4,642 (+12445.95%)

Mutual labels: transformer, bert

A Sentence Cloze Dataset for Chinese Machine Reading Comprehension (CMRC 2019)

Stars: ✭ 118 (+218.92%)

Mutual labels: reading-comprehension, bert

View All Similar Projects ➔

les-military-mrc

莱斯杯：全国第二届“军事智能机器阅读”挑战赛 Rank7 解决方案（baseline）。

Architecture

本次竞赛数据呈现如下特点:

每个问题包含五篇长度较长且存在一定噪声的文档;
部分问题需要基于桥接实体的深层次的推理;
部分问题可能包含多答案，多答案可能来自一个文档或多个文档。

为解决上述问题，本团队采用如下图所示的整体技术架构:

Text Preprocess

为方便后续模型训练处理，将数据集转化成 dureader 格式。由于原始文本中包含大量噪声文本，采用的数据清洗包括:

\u200b、\x10、\f、\r 等(unicode)空字符的去除; l 相关 url 链接、html 标签的去除
处理------，.....，等类型的重复字符
广告文本的去除
去除空段落和重复段落

Paragraph Selection

由于文档长度较长，为保证筛选的上下文长度尽量短以及答案覆盖率，我们采用以答案为基本中心，截取的最大长度 max_doc_len 为 1024，具体做法（此方法未进行复杂的段落筛选，简化成以答案为基本中心的裁剪）:

对于长度小于 1024 的文档，全部保留;
长度大于 1024 且答案位置在偏左侧上下文中，截取前 1024 长度;
长度大于 1024 且答案位置在偏右侧上下文中，截取前 1024 长度;
以上均不满足，则以答案为基本中心(中心点存在随机性)，截取 1024长度

注意，在文档长度较长且答案基本处于中间位置的情况，为避免截断过程中存在的答案位置的偏置，本方案设置了答案开始下标距离文档左边界的随机性，截断方法如下图所示:

Features

利用 jieba 分词工具提取问题和文档的 POS、 Keyword 特征，同时针对文档的每个字符提取是否在问题中出现的 doc_char_in_question 特征;
利用 foolnltk 工具提取问题和文档的命名实体，一共包含 7 类实体，并进行 one-hot 处理

Experiment

Teammates

Lucky Boys

License

This project is licensed under the terms of the MIT license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 37

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗