All Projects → cnclabs → Ice

cnclabs / Ice

ICE: Item Concept Embedding

Projects that are alternatives of or similar to Ice

Simclr
PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
Stars: ✭ 750 (+803.61%)
Mutual labels:  representation-learning
Compress
Compressing Representations for Self-Supervised Learning
Stars: ✭ 43 (-48.19%)
Mutual labels:  representation-learning
Evalne
Source code for EvalNE, a Python library for evaluating Network Embedding methods.
Stars: ✭ 67 (-19.28%)
Mutual labels:  representation-learning
Transferlearning
Transfer learning / domain adaptation / domain generalization / multi-task learning etc. Papers, codes, datasets, applications, tutorials.-迁移学习
Stars: ✭ 8,481 (+10118.07%)
Mutual labels:  representation-learning
Gradfeat20
Gradients as Features for Deep Representation Learning
Stars: ✭ 30 (-63.86%)
Mutual labels:  representation-learning
Usss iccv19
Code for Universal Semi-Supervised Semantic Segmentation models paper accepted in ICCV 2019
Stars: ✭ 57 (-31.33%)
Mutual labels:  representation-learning
Deberta
The implementation of DeBERTa
Stars: ✭ 541 (+551.81%)
Mutual labels:  representation-learning
Asne
A sparsity aware and memory efficient implementation of "Attributed Social Network Embedding" (TKDE 2018).
Stars: ✭ 73 (-12.05%)
Mutual labels:  representation-learning
Word2vec Win32
A word2vec port for Windows.
Stars: ✭ 41 (-50.6%)
Mutual labels:  representation-learning
Dogtorch
Who Let The Dogs Out? Modeling Dog Behavior From Visual Data https://arxiv.org/pdf/1803.10827.pdf
Stars: ✭ 66 (-20.48%)
Mutual labels:  representation-learning
Knowledge Graphs
A collection of research on knowledge graphs
Stars: ✭ 845 (+918.07%)
Mutual labels:  representation-learning
Meta Emb
Multilingual Meta-Embeddings for Named Entity Recognition (RepL4NLP & EMNLP 2019)
Stars: ✭ 28 (-66.27%)
Mutual labels:  representation-learning
Codeslam
Implementation of CodeSLAM — Learning a Compact, Optimisable Representation for Dense Visual SLAM paper (https://arxiv.org/pdf/1804.00874.pdf)
Stars: ✭ 64 (-22.89%)
Mutual labels:  representation-learning
Bagofconcepts
Python implementation of bag-of-concepts
Stars: ✭ 18 (-78.31%)
Mutual labels:  representation-learning
Graph 2d cnn
Code and data for the paper 'Classifying Graphs as Images with Convolutional Neural Networks' (new title: 'Graph Classification with 2D Convolutional Neural Networks')
Stars: ✭ 67 (-19.28%)
Mutual labels:  representation-learning
Unsupervised Classification
SCAN: Learning to Classify Images without Labels (ECCV 2020), incl. SimCLR.
Stars: ✭ 605 (+628.92%)
Mutual labels:  representation-learning
Gat2vec
representation learning on attributed graphs
Stars: ✭ 48 (-42.17%)
Mutual labels:  representation-learning
Mklpy
A package for Multiple Kernel Learning in Python
Stars: ✭ 81 (-2.41%)
Mutual labels:  representation-learning
Self Supervised Learning Overview
📜 Self-Supervised Learning from Images: Up-to-date reading list.
Stars: ✭ 73 (-12.05%)
Mutual labels:  representation-learning
Dna Gan
DNA-GAN: Learning Disentangled Representations from Multi-Attribute Images
Stars: ✭ 65 (-21.69%)
Mutual labels:  representation-learning

ICE: Item Concept Embedding via Textual Information

1. Introduction

The ICE toolkit is designed to embed the concepts of items into an embedding representation such that the resulted embeddings can be compared in terms of overall conceptual similarity regardless of item types (ICE: Item Concept Embedding via Textual Information, SIGIR 2017). For example, a song can be used to retrieve conceptually similar songs (homogeneous) as well as conceptually similar concepts (heterogeneous).

In specific, ICE incorporates items and their representative concepts (words extracted from the item's textual information) using a heterogeneous network and then learns the embeddings for both items and concepts in terms of the shared concept words. Since items are defined in terms of concepts, adding expanded concepts into the network allows the learned embeddings to be used to retrieve conceptually more diverse and yet relevant results.

1.1. System Requirements

  • gcc 6.4
  • python3
  • cython

1.2. Getting Started

Download:

$ git clone https://github.com/cnclabs/ICE
$ cd ./ICE/ICE

Compile command line interface:

$ make ice

Compile python3 API:

This is an alternative way to use the toolkit via its APIs. For the usage, please refer to Section 2.2.2.

$ make python

[Note: The API is only tested with Python 3.]

2. Usages

2.1. ICE Network Construction

Users need to provide an entity-text network and a text-text network to construct an ICE network. For more details, please refer to our paper.

Entity-text network format: "item word weight"

Toy_Story toys 1
Toy_Story stuffed_animals 1
Star_Wars jedi 1
Star_Wars rebel 1

Text-text network format: "word word weight"

toys toys 1
toys stuffed_animals 1
stuffed_animals toys 1
stuffed_animals stuffed_animals 1
jedi jedi 1
rebel rebel 1
Run:
$ python3 construct_graph.py -et ../data/movie_et.edge -tt ../data/movie_tt.edge -ice movie_ice.edge
Parameters:
    -et <string>, --et_network <string>
        Input Entity-text Network
    -tt <string>, --tt_network <string>
        Input Text-text Network
    -ice <string>, --ice_network <string>
        Output ICE Network

For sample files, please see data/movie_et.edge and data/movie_tt.edge.

2.2. ICE Embedding Learning

2.2.1 Command line interface usage

Run:
./ice -train movie_ice.edge -save movie.embd -dim 4 -sample 10 -neg 5 -thread 1 -alpha 0.025
Parameters:
Options:
    -train <string>
        Path to the network used for embedding learning
    -save <string>
        Path to save the embedding file
    -dim <int>
        Dimension of embedding; default is 64
    -neg <int>
        Number of negative examples; default is 5
    -sample <int>
        Number of training samples *Million; default is 10
    -thread <int>
        Number of training threads; default is 1
    -alpha <float>
        Initial learning rate; default is 0.025

2.2.2 python3 API usage

After compiling, please use python3 example.py for running the following codes.

from pyICE import pyICE

ice = pyICE()
network = {
    'MAYDAY': {'Taiwanese': 1, 'rock': 1,'band': 1},
    '[email protected]': {'Taiwanese': 1, 'rock': 1, 'band': 1},
    'Sodagreen': {'Taiwanese': 1, 'indie': 1, 'pop_rock': 1, 'band': 1},
    'SEKAI_NO_OWARI': {'Japanese': 1, 'indie': 1, 'pop_rock': 1, 'band': 1},
    'The_Beatles': {'England': 1, 'rock': 1, 'pop': 1}
}
ice.load_dict(network)
ice.init(dimension=4)
ice.train(sample=11, neg=5, alpha=0.025, workers=1)
ice.save_weights(model_name='example.embd')

3. Experimental Results

Here, we report the average performance based on 10 embeddings trained under the same setting. For more details, please refer to our paper.

  • IMDB word-to-movie retrieval task:
    • Graph construction: 20 representative words per item and 5 expanded words per representative word.
    • Embedding learning: dim=256, sample=200, neg=2
Genre Horror Thriller Western Action Short Sci-Fi Average
[email protected] 0.322 0.206 0.318 0.449 0.100 0.386 0.297
[email protected] 0.316 0.203 0.281 0.423 0.080 0.382 0.281

4. Citation

@inproceedings{Wang:2017:IIC:3077136.3080807,
    author = {Wang, Chuan-Ju and Wang, Ting-Hsiang and Yang, Hsiu-Wei and Chang, Bo-Sin and Tsai, Ming-Feng},
    title = {ICE: Item Concept Embedding via Textual Information},
    booktitle = {Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval},
    series = {SIGIR '17},
    year = {2017},
    isbn = {978-1-4503-5022-8},
    location = {Shinjuku, Tokyo, Japan},
    pages = {85--94},
    numpages = {10},
    url = {http://doi.acm.org/10.1145/3077136.3080807},
    doi = {10.1145/3077136.3080807},
    acmid = {3080807},
    publisher = {ACM},
    address = {New York, NY, USA},
    keywords = {concept embedding, conceptual retrieval, information network, textual information},
} 
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].