All Projects → thunlp → CLSP

thunlp / CLSP

Licence: MIT license
Code and data for EMNLP 2018 paper "Cross-lingual Lexical Sememe Prediction"

Programming Languages

c
50402 projects - #5 most used programming language
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to CLSP

exams-qa
A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering
Stars: ✭ 25 (+31.58%)
Mutual labels:  cross-lingual
cross-lingual-open-ie
MT/IE: Cross-lingual Open Information Extraction with Neural Sequence-to-Sequence Models
Stars: ✭ 22 (+15.79%)
Mutual labels:  cross-lingual
Character-enhanced-Sememe-Prediction
Code accompanying Incorporating Chinese Characters of Words for Lexical Sememe Prediction (ACL2018) https://arxiv.org/abs/1806.06349
Stars: ✭ 22 (+15.79%)
Mutual labels:  sememe
Awesome Sentence Embedding
A curated list of pretrained sentence and word embedding models
Stars: ✭ 1,973 (+10284.21%)
Mutual labels:  cross-lingual
unify-srl
Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources (NAACL-2021).
Stars: ✭ 12 (-36.84%)
Mutual labels:  cross-lingual
text2text
Text2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (+889.47%)
Mutual labels:  cross-lingual
cross-lingual-struct-flow
PyTorch implementation of ACL paper https://arxiv.org/abs/1906.02656
Stars: ✭ 23 (+21.05%)
Mutual labels:  cross-lingual
Cross-Lingual-MRC
Cross-Lingual Machine Reading Comprehension (EMNLP 2019)
Stars: ✭ 66 (+247.37%)
Mutual labels:  cross-lingual
mixed-language-training
Attention-Informed Mixed-Language Training for Zero-shot Cross-lingual Task-oriented Dialogue Systems (AAAI-2020)
Stars: ✭ 29 (+52.63%)
Mutual labels:  cross-lingual
SE-WRL-SAT
Revised Version of SAT Model in "Improved Word Representation Learning with Sememes"
Stars: ✭ 46 (+142.11%)
Mutual labels:  sememe
BabelNet-Sememe-Prediction
Code and data of the AAAI-20 paper "Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets"
Stars: ✭ 18 (-5.26%)
Mutual labels:  sememe
SDLM-pytorch
Code accompanying EMNLP 2018 paper Language Modeling with Sparse Product of Sememe Experts
Stars: ✭ 27 (+42.11%)
Mutual labels:  sememe
sememe prediction
Codes for Lexical Sememe Prediction via Word Embeddings and Matrix Factorization (IJCAI 2017).
Stars: ✭ 59 (+210.53%)
Mutual labels:  sememe

Cross-lingual Lexical Sememe Prediction

This is the open-source code of the EMNLP 2018 paper Cross-lingual Lexical Sememe Prediction [pdf].

Introduction

Sememes are defined as the minimum semantic units of human languages. As important knowledge sources, sememe-based linguistic knowledge bases have been widely used in many NLP tasks. However, most languages still do not have sememe-based linguistic knowledge bases. Thus we present a task of cross-lingual lexical sememe prediction (CLSP), aiming to automatically predict sememes for words in other languages. We propose a novel framework to model correlations between sememes and multi-lingual words in low-dimensional semantic space for sememe prediction. Experimental results on real-world datasets show that our proposed model achieves consistent and significant improvements as compared to baseline methods in cross-lingual sememe prediction.

Usage

bash run.sh

To change the training corpus, please just switch the -mono-train1 and -mono-train2 parameters in bash.sh. Notice that lang1 refers to the source language and lang2 refers to the target language.

Datasets

Process Type Source Target
Training Corpus Sogou-T Wikipedia
Seed Lexicon Google Translate API
Sememe-based KB HowNet_zh -
Testing Sememe Prediction - HowNet_en
Bilingual Lexicon Induction Chinese-English Translation Lexicon 3.0 Version
Word Similarity Computation Wordsim-240 WordSim-353
WordSim-297 SimLex-999

Cite

If the codes or datasets help you, please cite the following paper:

@InProceedings{qi2018cross,
  Title      = {Cross-lingual lexical sememe prediction},
  Author     = {Qi, Fanchao and Lin, Yankai and Sun, Maosong and Zhu, Hao and Xie, Ruobing and Liu, Zhiyuan},
  Booktitle  = {Proceedings of EMNLP},
  Year       = {2018},
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].