malteos / Pytorch Bert Document Classification
Licence: mit
Enriching BERT with Knowledge Graph Embedding for Document Classification (PyTorch)
Stars: ✭ 99
Labels
Projects that are alternatives of or similar to Pytorch Bert Document Classification
Ds For Telco
Source material for Data Science for Telecom Tutorial at Strata Singapore 2015
Stars: ✭ 98 (-1.01%)
Mutual labels: jupyter-notebook
Linear algebra with python
Lecture Notes for Linear Algebra Featuring Python
Stars: ✭ 1,355 (+1268.69%)
Mutual labels: jupyter-notebook
Bigdata
NJU Master Course **Big Data Mining and Analysis**
Stars: ✭ 98 (-1.01%)
Mutual labels: jupyter-notebook
Estimation Of Remaining Useful Life Using Cnn
Convolutional Neural Network based regression approach for estimating machinery's remaining useful life
Stars: ✭ 98 (-1.01%)
Mutual labels: jupyter-notebook
Cbe20255
Introduction to Chemical Engineering Analysis
Stars: ✭ 98 (-1.01%)
Mutual labels: jupyter-notebook
Gp regression
A Primer on Gaussian Processes for Regression Analysis (PyData NYC 2019)
Stars: ✭ 99 (+0%)
Mutual labels: jupyter-notebook
Keras Tutorial
Tutorial teaching the basics of Keras and some deep learning concepts
Stars: ✭ 98 (-1.01%)
Mutual labels: jupyter-notebook
Tf Vs Pytorch
A companion code for my Medium post
Stars: ✭ 98 (-1.01%)
Mutual labels: jupyter-notebook
Objectron
Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes
Stars: ✭ 1,352 (+1265.66%)
Mutual labels: jupyter-notebook
Droneblocks Tello Python
A DroneBlocks course on drone programming with Tello using Python scripts
Stars: ✭ 98 (-1.01%)
Mutual labels: jupyter-notebook
Hmm
An implementation of the Viterbi Algorithm for training Hidden Markov models. This repo accompanies the video found here: https://www.youtube.com/watch?v=kqSzLo9fenk
Stars: ✭ 98 (-1.01%)
Mutual labels: jupyter-notebook
Hands On Exploratory Data Analysis With Python
Hands-on Exploratory Data Analysis with Python, published by Packt
Stars: ✭ 99 (+0%)
Mutual labels: jupyter-notebook
PyTorch BERT Document Classification
Implementation and pre-trained models of the paper Enriching BERT with Knowledge Graph Embedding for Document Classification (PDF). A submission to the GermEval 2019 shared task on hierarchical text classification. If you encounter any problems, feel free to contact us or submit a GitHub issue.
Content
- CLI script to run all experiments
- WikiData author embeddings (view on Tensorboard Projector)
- Data preparation
- Requirements
- Trained model weights as release files
Model architecture
Installation
Requirements:
- Python 3.6
- CUDA GPU
- Jupyter Notebook
Install dependencies:
pip install -r requirements.txt
Prepare data
GermEval data
- Download from shared-task website: here
- Run all steps in Jupyter Notebook: germeval-data.ipynb
Author Embeddings
- Download pre-trained Wikidata embedding (30GB): Facebook PyTorch-BigGraph
- Download WikiMapper index files (de+en)
python wikidata_for_authors.py run ~/datasets/wikidata/index_enwiki-20190420.db \
~/datasets/wikidata/index_dewiki-20190420.db \
~/datasets/wikidata/torchbiggraph/wikidata_translation_v1.tsv.gz \
~/notebooks/bert-text-classification/authors.pickle \
~/notebooks/bert-text-classification/author2embedding.pickle
# OPTIONAL: Projector format
python wikidata_for_authors.py convert_for_projector \
~/notebooks/bert-text-classification/author2embedding.pickle
extras/author2embedding.projector.tsv \
extras/author2embedding.projector_meta.tsv
Reproduce paper results
Download pre-trained models: GitHub releases
Available experiment settings
Detailed settings for each experiment can found in cli.py
.
task-a__bert-german_full
task-a__bert-german_manual_no-embedding
task-a__bert-german_no-manual_embedding
task-a__bert-german_text-only
task-a__author-only
task-a__bert-multilingual_text-only
task-b__bert-german_full
task-b__bert-german_manual_no-embedding
task-b__bert-german_no-manual_embedding
task-b__bert-german_text-only
task-b__author-only
task-b__bert-multilingual_text-only
Enviroment variables
-
TRAIN_DF_PATH
: Path to Pandas Dataframe (pickle) -
GPU_ID
: Run experiments on this GPU (used forCUDA_VISIBLE_DEVICES
) -
OUTPUT_DIR
: Directory to store experiment output -
EXTRAS_DIR
: Directory where author embeddings and gender data is located -
BERT_MODELS_DIR
: Directory where pre-trained BERT models are located
Validation set
python cli.py run_on_val <name> $GPU_ID $EXTRAS_DIR $TRAIN_DF_PATH $VAL_DF_PATH $OUTPUT_DIR --epochs 5
Test set
python cli.py run_on_test <name> $GPU_ID $EXTRAS_DIR $FULL_DF_PATH $TEST_DF_PATH $OUTPUT_DIR --epochs 5
Evaluation
The scores from the result table can be reproduced with the evaluation.ipynb
notebook.
How to cite
If you are using our code, please cite our paper:
@inproceedings{Ostendorff2019,
address = {Erlangen, Germany},
author = {Ostendorff, Malte and Bourgonje, Peter and Berger, Maria and Moreno-Schneider, Julian and Rehm, Georg},
booktitle = {Proceedings of the GermEval 2019 Workshop},
title = {{Enriching BERT with Knowledge Graph Embedding for Document Classification}},
year = {2019}
}
References
- GermEval 2019 Task 1 on Codalab
- Google BERT Tensorflow
- Huggingface PyTorch Transformer
- Deepset AI - BERT-german
- Facebook PyTorch BigGraph
License
MIT
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].