All Projects → malteos → semantic-document-relations

malteos / semantic-document-relations

Licence: MIT License
Implementation, trained models and result data for the paper "Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles"

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to semantic-document-relations

NLP-paper
🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/
Stars: ✭ 23 (+9.52%)
Mutual labels:  transformer, bert, xlnet
sticker2
Further developed as SyntaxDot: https://github.com/tensordot/syntaxdot
Stars: ✭ 14 (-33.33%)
Mutual labels:  transformer, bert
KitanaQA
KitanaQA: Adversarial training and data augmentation for neural question-answering models
Stars: ✭ 58 (+176.19%)
Mutual labels:  transformer, bert
Kevinpro-NLP-demo
All NLP you Need Here. 个人实现了一些好玩的NLP demo,目前包含13个NLP应用的pytorch实现
Stars: ✭ 117 (+457.14%)
Mutual labels:  transformer, bert
BertSimilarity
Computing similarity of two sentences with google's BERT algorithm。利用Bert计算句子相似度。语义相似度计算。文本相似度计算。
Stars: ✭ 348 (+1557.14%)
Mutual labels:  similarity, bert
tensorflow-ml-nlp-tf2
텐서플로2와 머신러닝으로 시작하는 자연어처리 (로지스틱회귀부터 BERT와 GPT3까지) 실습자료
Stars: ✭ 245 (+1066.67%)
Mutual labels:  transformer, bert
GroupDocs.Classification-for-.NET
GroupDocs.Classification-for-.NET samples and showcase (text and documents classification and sentiment analysis)
Stars: ✭ 38 (+80.95%)
Mutual labels:  document, document-classification
sister
SImple SenTence EmbeddeR
Stars: ✭ 66 (+214.29%)
Mutual labels:  transformer, bert
COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (+14.29%)
Mutual labels:  transformer, bert
PDN
The official PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing" (WebConf '21)
Stars: ✭ 44 (+109.52%)
Mutual labels:  transformer, bert
Xpersona
XPersona: Evaluating Multilingual Personalized Chatbot
Stars: ✭ 54 (+157.14%)
Mutual labels:  transformer, bert
FasterTransformer
Transformer related optimization, including BERT, GPT
Stars: ✭ 1,571 (+7380.95%)
Mutual labels:  transformer, bert
TabFormer
Code & Data for "Tabular Transformers for Modeling Multivariate Time Series" (ICASSP, 2021)
Stars: ✭ 209 (+895.24%)
Mutual labels:  transformer, bert
banglabert
This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla" accpeted in Findings of the Annual Conference of the North American Chap…
Stars: ✭ 186 (+785.71%)
Mutual labels:  document-classification, bert
les-military-mrc-rank7
莱斯杯:全国第二届“军事智能机器阅读”挑战赛 - Rank7 解决方案
Stars: ✭ 37 (+76.19%)
Mutual labels:  transformer, bert
golgotha
Contextualised Embeddings and Language Modelling using BERT and Friends using R
Stars: ✭ 39 (+85.71%)
Mutual labels:  transformer, bert
Representation-Learning-for-Information-Extraction
Pytorch implementation of Paper by Google Research - Representation Learning for Information Extraction from Form-like Documents.
Stars: ✭ 82 (+290.48%)
Mutual labels:  transformer, document
TwinBert
pytorch implementation of the TwinBert paper
Stars: ✭ 36 (+71.43%)
Mutual labels:  similarity, bert
Text-Summarization
Abstractive and Extractive Text summarization using Transformers.
Stars: ✭ 38 (+80.95%)
Mutual labels:  bert, xlnet
transformer-models
Deep Learning Transformer models in MATLAB
Stars: ✭ 90 (+328.57%)
Mutual labels:  transformer, bert

Semantic Relations between Wikipedia Articles

Open In Colab DOI

Implementation, trained models and result data for the paper Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles (PDF on Arxiv). The supplemental material is available for download under GitHub Releases or Zenodo.

Wikipedia Relations

Getting started

Requirements:

  • Python >= 3.7 (Conda)
  • Jupyter notebook (for evaluation)
  • GPU with CUDA-support (for training Transformer models)

At first we advise to create a new virtual environment for Python 3.7 with Conda:

conda create -n docrel python=3.7
conda activate docrel

Install all Python dependencies:

pip install -r requirements.txt

Download dataset (and pretrained models):

# Navigate to data directory
cd data

# Wikipedia corpus
# - download
wget https://github.com/malteos/semantic-document-relations/releases/download/1.0/enwiki-20191101-pages-articles.weighted.10k.jsonl.bz2

# - decompress 
bzip2 -d enwiki-20191101-pages-articles.weighted.10k.jsonl.bz2

# Train and test data
# - download
wget https://github.com/malteos/semantic-document-relations/releases/download/1.0/train_testdata__4folds.tar.gz

# - decompress
tar -xzf train_testdata__4folds.tar.gz

# Models
# - download
wget https://github.com/malteos/semantic-document-relations/releases/download/1.0/model_wiki.bert_base__joint__seq512.tar.gz

# - decompress
tar -xzf model_wiki.bert_base__joint__seq512.tar.gz

Experiments

Run predefined experiment (settings can be found in experiments/predefined/wiki)

# Config: wiki.bert_base__joint__seq128
# GPU ID: 1 (set via CUDA_VISIBLE_DEVICES=1)
# Output dir: ./output
python cli.py run ./output 1 wiki.bert_base__joint__seq512

Demo

You can run a Jupyter notebook on Google Colab:

Open In Colab

How to cite

If you are using our code, please cite our paper:

@InProceedings{Ostendorff2020,
  title = {Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles},
  booktitle = {Proceedings of the {ACM}/{IEEE} {Joint} {Conference} on {Digital} {Libraries} ({JCDL})},
  author = {Ostendorff, Malte and Ruas, Terry and Schubotz, Moritz and Gipp, Bela},
  year = {2020},
  month = {Aug.},
}

See also

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].