thunlp / SE-WRL-SAT

Licence: MIT License

Revised Version of SAT Model in "Improved Word Representation Learning with Sememes"

Programming Languages

50402 projects - #5 most used programming language

python

139335 projects - #7 most used programming language

shell

77523 projects

Makefile

30231 projects

Projects that are alternatives of or similar to SE-WRL-SAT

Sequence-Models-coursera

Sequence Models by Andrew Ng on Coursera. Programming Assignments and Quiz Solutions.

Stars: ✭ 53 (+15.22%)

Mutual labels: word-embedding

BabelNet-Sememe-Prediction

Code and data of the AAAI-20 paper "Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets"

Stars: ✭ 18 (-60.87%)

Mutual labels: sememe

SDLM-pytorch

Code accompanying EMNLP 2018 paper Language Modeling with Sparse Product of Sememe Experts

Stars: ✭ 27 (-41.3%)

Mutual labels: sememe

NMFADMM

A sparsity aware implementation of "Alternating Direction Method of Multipliers for Non-Negative Matrix Factorization with the Beta-Divergence" (ICASSP 2014).

Stars: ✭ 39 (-15.22%)

Mutual labels: word-embedding

walklets

A lightweight implementation of Walklets from "Don't Walk Skip! Online Learning of Multi-scale Network Embeddings" (ASONAM 2017).

Stars: ✭ 94 (+104.35%)

Mutual labels: word-embedding

geomm

Geometry-aware Multilingual Embeddings

Stars: ✭ 23 (-50%)

Mutual labels: word-embedding

sememe prediction

Codes for Lexical Sememe Prediction via Word Embeddings and Matrix Factorization (IJCAI 2017).

Stars: ✭ 59 (+28.26%)

Mutual labels: sememe

CLSP

Code and data for EMNLP 2018 paper "Cross-lingual Lexical Sememe Prediction"

Stars: ✭ 19 (-58.7%)

Mutual labels: sememe

Character-enhanced-Sememe-Prediction

Code accompanying Incorporating Chinese Characters of Words for Lexical Sememe Prediction (ACL2018) https://arxiv.org/abs/1806.06349

Stars: ✭ 22 (-52.17%)

Mutual labels: sememe

Bert As Service

Mapping a variable-length sentence to a fixed-length vector using BERT model

Stars: ✭ 9,779 (+21158.7%)

Mutual labels: word-embedding

Text-Analysis

Explaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.

Stars: ✭ 48 (+4.35%)

Mutual labels: word-embedding

SAT

This is the revised version of SAT (Sememe Attention over Target) model, which is presented in the ACL 2017 paper Improved Word Representation Learning with Sememes. To get more details about the model, please read the paper or access the original project website.

Updates

Datasets：
- Remove the wrong sememes 中国""Taiwan台湾 and 中国""Japan日本 from SememeFile and revise the corresponding lines in Word_Sense_Sememe_File
- Remove the single-sense words in Word_Sense_Sememe_File, which are not used in training process
Input：
- Learn vocabulary from the training file rather than read the existing vocabulary file.
Output：
- Output the vocabulary file learned from the training file
- Output word, sense and sememe embeddings in 3 separate files
Code:
- Rewrite most parts of the original code
- Remove the redundant codes and rename some variables to improve readability.
- Add evaluation programs including word similarity and analogy.
- Add more comments

How to Run

bash run_SAT.sh

To change training file, you can just switch the data/train_sample.txt in run_SAT.sh to your training file name.

New Results

The results are based on the 21G Sogou-T as the training file, which can be downloaded from here (password: f2ul). And the hyper-parameters for all the models are the same as those in run_SAT.sh. You can download the trained word embeddings from here.

Word Similarity

Model	Wordsim-240	Wordsim-297
CBOW	56.05	62.58
Skip-gram	56.72	61.99
GloVe	55.83	58.44
SAT	62.11	62.74

Word Similarity

Model	city-acc	city-rank	family-acc	family-rank	capital-acc	capital-rank	total-acc	total-rank
Skip-gram	84.14	1.50	86.67	1.21	61.30	8.31	70.70	5.66
SAT	98.85	1.01	77.20	5.27	80.06	10.10	82.29	7.52

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

thunlp / SE-WRL-SAT

Programming Languages

Labels

Projects that are alternatives of or similar to SE-WRL-SAT

SAT

Updates

How to Run

New Results

Word Similarity

Word Similarity