All Projects → thunlp → BabelNet-Sememe-Prediction

thunlp / BabelNet-Sememe-Prediction

Licence: MIT License
Code and data of the AAAI-20 paper "Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets"

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to BabelNet-Sememe-Prediction

grasp
Essential NLP & ML, short & fast pure Python code
Stars: ✭ 58 (+222.22%)
Mutual labels:  semantics
oerschema
A RDF vocabulary for OER content on the web.
Stars: ✭ 21 (+16.67%)
Mutual labels:  semantics
vericert
A formally verified high-level synthesis tool based on CompCert and written in Coq.
Stars: ✭ 63 (+250%)
Mutual labels:  semantics
CommonCoreOntologies
The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.
Stars: ✭ 109 (+505.56%)
Mutual labels:  semantics
accessible-name-automation-proof-of-concept
This is an experiment based on Accessibility Object Model (AOM). It tries to demonstrate that it is theoretically possible (in a certain way) to predict what the screen reader will say by focusing on semantic and non semantic elements with a bit of automated testing, thus reducing the need for manual testing.
Stars: ✭ 15 (-16.67%)
Mutual labels:  semantics
CLSP
Code and data for EMNLP 2018 paper "Cross-lingual Lexical Sememe Prediction"
Stars: ✭ 19 (+5.56%)
Mutual labels:  sememe
Character-enhanced-Sememe-Prediction
Code accompanying Incorporating Chinese Characters of Words for Lexical Sememe Prediction (ACL2018) https://arxiv.org/abs/1806.06349
Stars: ✭ 22 (+22.22%)
Mutual labels:  sememe
binary-decompilation
Extracting high level semantic information from binary code
Stars: ✭ 55 (+205.56%)
Mutual labels:  semantics
envo
A community-driven ontology for the representation of environments
Stars: ✭ 106 (+488.89%)
Mutual labels:  semantics
m3gm
Max-Margin Markov Graph Models for WordNet (EMNLP 2018)
Stars: ✭ 40 (+122.22%)
Mutual labels:  semantics
koika
A core language for rule-based hardware design 🦑
Stars: ✭ 103 (+472.22%)
Mutual labels:  semantics
lambda-notebook
Lambda Notebook: Formal Semantics in Jupyter
Stars: ✭ 16 (-11.11%)
Mutual labels:  semantics
sememe prediction
Codes for Lexical Sememe Prediction via Word Embeddings and Matrix Factorization (IJCAI 2017).
Stars: ✭ 59 (+227.78%)
Mutual labels:  sememe
score-zeroshot
Semantically consistent regularizer for zero-shot learning
Stars: ✭ 65 (+261.11%)
Mutual labels:  semantics
biomappings
🗺️ Community curated and predicted equivalences and related mappings between named biological entities that are not available from primary sources.
Stars: ✭ 24 (+33.33%)
Mutual labels:  semantics
pfootprint
Political Discourse Analysis Using Pre-Trained Word Vectors.
Stars: ✭ 20 (+11.11%)
Mutual labels:  semantics
copycat
Modern port of Melanie Mitchell's and Douglas Hofstadter's Copycat
Stars: ✭ 84 (+366.67%)
Mutual labels:  semantics
SDLM-pytorch
Code accompanying EMNLP 2018 paper Language Modeling with Sparse Product of Sememe Experts
Stars: ✭ 27 (+50%)
Mutual labels:  sememe
delving-deeper-into-the-decoder-for-video-captioning
Source code for Delving Deeper into the Decoder for Video Captioning
Stars: ✭ 36 (+100%)
Mutual labels:  semantics
semantic-python-overview
(subjective) overview of projects which are related both to python and semantic technologies (RDF, OWL, Reasoning, ...)
Stars: ✭ 406 (+2155.56%)
Mutual labels:  semantics

BabelNet-Sememe-Prediction

Code and data of the AAAI-20 paper "Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets" [pdf]

Requirements

  • Tensorflow-gpu >= 1.13.0
  • Python 3.x

Data

This repo contains two types of data.

Annotated BabelSememe Dataset

  • BabelSememe Dataset ./BabelSememe/synset_sememes.txt

Experimental Dataset

  • Dataset of all POS tags (Noun, Verb, Adj, Adv)

    ./data-all/entitiy2id.txt: All entities and corresponding IDs, one per line.

    ./data-all/relation2id.txt: All relations and corresponding ids, one per line.

    ./data-all/train2id.txt: Training set. All lines are in the format (e1, e2, rel) which indicates there is a relation rel between e1 and e2. The ids of entities and relations are from entitiy2id.txt and relation2id.txt.

    ./data-all/valid2id.txt: Validation set. The lines are all in the format (e1, e2, rel) which indicates there is a relation rel between e1 and e2. The ids of entities and relations are from entitiy2id.txt and relation2id.txt.

    ./data-all/test2id.txt: Test set. The lines are all in the format (e1, e2, rel) which indicates there is a relation rel between e1 and e2. The ids of entities and relations are from entitiy2id.txt and relation2id.txt.

  • Dataset of Nouns

    The format of the noun dataset is the same as the all dataset.

    ./data-noun/entitiy2id.txt

    ./data-noun/relation2id.txt

    ./data-noun/train2id.txt

    ./data-noun/valid2id.txt

    ./data-noun/test2id.txt

  • Synset embeddings from NASARI

    ./SPBS-SR/synset_vec.txt

Models

SPBS-SR

Usage

Commands for training and testing models:

cd ./SPBS-SR/
python EvalSememePre_SPWE.py 1

SPBS-RR

Usage

Commands for training and testing models:

cd ./SPBS-RR/src/
bash train.sh

Note: Test results are recorded in the training log.

Ensemble

Usage

After training the above two models, copy the output files ./SPBS-RR/sememePre_TransE.txt and ./SPBS-SR/sememePre_SPWE.txt to the Ensemble directory, and then run the Ensemble model with the following command:

cd ./Ensemble/
python Ensemble.py

Cite

If you use any code or data, please cite this paper

@article{qi2019towards,
  title={Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets},
  author={Qi, Fanchao and Chang, Liang and Sun, Maosong and Ouyang, Sicong and Liu, Zhiyuan},
  journal={arXiv preprint arXiv:1912.01795},
  year={2019}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].