All Projects → malteos → Pytorch Bert Document Classification

malteos / Pytorch Bert Document Classification

Licence: mit
Enriching BERT with Knowledge Graph Embedding for Document Classification (PyTorch)

Projects that are alternatives of or similar to Pytorch Bert Document Classification

Pytorch learning
书籍:深度学习框架pytorch入门与实践
Stars: ✭ 98 (-1.01%)
Mutual labels:  jupyter-notebook
Ds For Telco
Source material for Data Science for Telecom Tutorial at Strata Singapore 2015
Stars: ✭ 98 (-1.01%)
Mutual labels:  jupyter-notebook
Linear algebra with python
Lecture Notes for Linear Algebra Featuring Python
Stars: ✭ 1,355 (+1268.69%)
Mutual labels:  jupyter-notebook
Bigdata
NJU Master Course **Big Data Mining and Analysis**
Stars: ✭ 98 (-1.01%)
Mutual labels:  jupyter-notebook
Estimation Of Remaining Useful Life Using Cnn
Convolutional Neural Network based regression approach for estimating machinery's remaining useful life
Stars: ✭ 98 (-1.01%)
Mutual labels:  jupyter-notebook
Cbe20255
Introduction to Chemical Engineering Analysis
Stars: ✭ 98 (-1.01%)
Mutual labels:  jupyter-notebook
Python
Source code about Python Development
Stars: ✭ 98 (-1.01%)
Mutual labels:  jupyter-notebook
Gp regression
A Primer on Gaussian Processes for Regression Analysis (PyData NYC 2019)
Stars: ✭ 99 (+0%)
Mutual labels:  jupyter-notebook
Keras Tutorial
Tutorial teaching the basics of Keras and some deep learning concepts
Stars: ✭ 98 (-1.01%)
Mutual labels:  jupyter-notebook
Nab
The Numenta Anomaly Benchmark
Stars: ✭ 1,352 (+1265.66%)
Mutual labels:  jupyter-notebook
Keras Gradcam
Keras implementation of GradCAM.
Stars: ✭ 98 (-1.01%)
Mutual labels:  jupyter-notebook
Tf Vs Pytorch
A companion code for my Medium post
Stars: ✭ 98 (-1.01%)
Mutual labels:  jupyter-notebook
Objectron
Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes
Stars: ✭ 1,352 (+1265.66%)
Mutual labels:  jupyter-notebook
Droneblocks Tello Python
A DroneBlocks course on drone programming with Tello using Python scripts
Stars: ✭ 98 (-1.01%)
Mutual labels:  jupyter-notebook
Physlight
Stars: ✭ 99 (+0%)
Mutual labels:  jupyter-notebook
Hmm
An implementation of the Viterbi Algorithm for training Hidden Markov models. This repo accompanies the video found here: https://www.youtube.com/watch?v=kqSzLo9fenk
Stars: ✭ 98 (-1.01%)
Mutual labels:  jupyter-notebook
Scipy 2014 julia
Stars: ✭ 98 (-1.01%)
Mutual labels:  jupyter-notebook
Hands On Exploratory Data Analysis With Python
Hands-on Exploratory Data Analysis with Python, published by Packt
Stars: ✭ 99 (+0%)
Mutual labels:  jupyter-notebook
Kmeans pytorch
kmeans using PyTorch
Stars: ✭ 98 (-1.01%)
Mutual labels:  jupyter-notebook
Almond
A Scala kernel for Jupyter
Stars: ✭ 1,354 (+1267.68%)
Mutual labels:  jupyter-notebook

PyTorch BERT Document Classification

Implementation and pre-trained models of the paper Enriching BERT with Knowledge Graph Embedding for Document Classification (PDF). A submission to the GermEval 2019 shared task on hierarchical text classification. If you encounter any problems, feel free to contact us or submit a GitHub issue.

Content

Model architecture

BERT + Knowledge Graph Embeddings

Installation

Requirements:

  • Python 3.6
  • CUDA GPU
  • Jupyter Notebook

Install dependencies:

pip install -r requirements.txt

Prepare data

GermEval data

Author Embeddings

python wikidata_for_authors.py run ~/datasets/wikidata/index_enwiki-20190420.db \
    ~/datasets/wikidata/index_dewiki-20190420.db \
    ~/datasets/wikidata/torchbiggraph/wikidata_translation_v1.tsv.gz \
    ~/notebooks/bert-text-classification/authors.pickle \
    ~/notebooks/bert-text-classification/author2embedding.pickle

# OPTIONAL: Projector format
python wikidata_for_authors.py convert_for_projector \
    ~/notebooks/bert-text-classification/author2embedding.pickle
    extras/author2embedding.projector.tsv \
    extras/author2embedding.projector_meta.tsv

Reproduce paper results

Download pre-trained models: GitHub releases

Available experiment settings

Detailed settings for each experiment can found in cli.py.

task-a__bert-german_full
task-a__bert-german_manual_no-embedding
task-a__bert-german_no-manual_embedding
task-a__bert-german_text-only
task-a__author-only
task-a__bert-multilingual_text-only

task-b__bert-german_full
task-b__bert-german_manual_no-embedding
task-b__bert-german_no-manual_embedding
task-b__bert-german_text-only
task-b__author-only
task-b__bert-multilingual_text-only

Enviroment variables

  • TRAIN_DF_PATH: Path to Pandas Dataframe (pickle)
  • GPU_ID: Run experiments on this GPU (used for CUDA_VISIBLE_DEVICES)
  • OUTPUT_DIR: Directory to store experiment output
  • EXTRAS_DIR: Directory where author embeddings and gender data is located
  • BERT_MODELS_DIR: Directory where pre-trained BERT models are located

Validation set

python cli.py run_on_val <name> $GPU_ID $EXTRAS_DIR $TRAIN_DF_PATH $VAL_DF_PATH $OUTPUT_DIR --epochs 5

Test set

python cli.py run_on_test <name> $GPU_ID $EXTRAS_DIR $FULL_DF_PATH $TEST_DF_PATH $OUTPUT_DIR --epochs 5

Evaluation

The scores from the result table can be reproduced with the evaluation.ipynb notebook.

How to cite

If you are using our code, please cite our paper:

@inproceedings{Ostendorff2019,
    address = {Erlangen, Germany},
    author = {Ostendorff, Malte and Bourgonje, Peter and Berger, Maria and Moreno-Schneider, Julian and Rehm, Georg},
    booktitle = {Proceedings of the GermEval 2019 Workshop},
    title = {{Enriching BERT with Knowledge Graph Embedding for Document Classification}},
    year = {2019}
}

References

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].