All Projects → lujiaying → Movietaster Open

lujiaying / Movietaster Open

Licence: other
A practical movie recommend project based on Item2vec.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Movietaster Open

Debiaswe
Remove problematic gender bias from word embeddings.
Stars: ✭ 175 (-30.83%)
Mutual labels:  word2vec
Chameleon recsys
Source code of CHAMELEON - A Deep Learning Meta-Architecture for News Recommender Systems
Stars: ✭ 202 (-20.16%)
Mutual labels:  word2vec
Cw2vec
cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information
Stars: ✭ 224 (-11.46%)
Mutual labels:  word2vec
Tensorflow Tutorials
텐서플로우를 기초부터 응용까지 단계별로 연습할 수 있는 소스 코드를 제공합니다
Stars: ✭ 2,096 (+728.46%)
Mutual labels:  word2vec
Germanwordembeddings
Toolkit to obtain and preprocess german corpora, train models using word2vec (gensim) and evaluate them with generated testsets
Stars: ✭ 189 (-25.3%)
Mutual labels:  word2vec
Sensegram
Making sense embedding out of word embeddings using graph-based word sense induction
Stars: ✭ 209 (-17.39%)
Mutual labels:  word2vec
Wordvectors
Pre-trained word vectors of 30+ languages
Stars: ✭ 2,043 (+707.51%)
Mutual labels:  word2vec
Aravec
AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.
Stars: ✭ 239 (-5.53%)
Mutual labels:  word2vec
Shallowlearn
An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.
Stars: ✭ 196 (-22.53%)
Mutual labels:  word2vec
Practical 1
Oxford Deep NLP 2017 course - Practical 1: word2vec
Stars: ✭ 220 (-13.04%)
Mutual labels:  word2vec
Sentence Similarity
对四种句子/文本相似度计算方法进行实验与比较
Stars: ✭ 181 (-28.46%)
Mutual labels:  word2vec
Nlp learning
结合python一起学习自然语言处理 (nlp): 语言模型、HMM、PCFG、Word2vec、完形填空式阅读理解任务、朴素贝叶斯分类器、TFIDF、PCA、SVD
Stars: ✭ 188 (-25.69%)
Mutual labels:  word2vec
Gemsec
The TensorFlow reference implementation of 'GEMSEC: Graph Embedding with Self Clustering' (ASONAM 2019).
Stars: ✭ 210 (-17%)
Mutual labels:  word2vec
Splitter
A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).
Stars: ✭ 177 (-30.04%)
Mutual labels:  word2vec
Koan
A word2vec negative sampling implementation with correct CBOW update.
Stars: ✭ 232 (-8.3%)
Mutual labels:  word2vec
Deep Math Machine Learning.ai
A blog which talks about machine learning, deep learning algorithms and the Math. and Machine learning algorithms written from scratch.
Stars: ✭ 173 (-31.62%)
Mutual labels:  word2vec
Word2vec
Python interface to Google word2vec
Stars: ✭ 2,370 (+836.76%)
Mutual labels:  word2vec
Alink
Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
Stars: ✭ 2,936 (+1060.47%)
Mutual labels:  word2vec
Book deeplearning in pytorch source
Stars: ✭ 236 (-6.72%)
Mutual labels:  word2vec
Stocksensation
基于情感字典和机器学习的股市舆情情感分类可视化Web
Stars: ✭ 215 (-15.02%)
Mutual labels:  word2vec

MovieTaster-Open

A movie recommend project based on Item2vec.

Reference:

  • Barkan, Oren, and Noam Koenigstein. "Item2vec: neural item embedding for collaborative filtering." Machine Learning for Signal Processing (MLSP), 2016 IEEE 26th International Workshop on. IEEE, 2016.
  • JayveeHe, https://github.com/JayveeHe/MusicTaster. Github.

Demo>

More details for this project, plese refer to blog> or zhihu>

Project Struct

  • datas: to store corpus
  • models: to store item vectors for movies
  • utils: a collection of tool functions

Usage

  1. Compile fasttext under root of the project

  2. Process corpus, the data file would be generated under ./datas/.

$ cd datas && tar -xzvf corpus.tar.gz
$ cd ..
$ python utils/process.py
  1. Train model by running fasttext. Please refer to ./run.sh for other configuration.
$ ./fasttext skipgram -input ./datas/doulist_0804_09.movie_id -output ./models/fasttext_model_0804_09_skipgram -minCount 5 -epoch 50 -neg 100
  1. Here's a result from the default configuration.
$ python utils/eval.py
movie_name_id_dict is 130206 from [./datas/doulist_0804_09.json]
movie_name_id_dict is 130206 from [./datas/doulist_0804_09.json]
movie_id_name_dict is 130206 from [./datas/doulist_0804_09.json]
[2017-08-18 07:52:53,579][pid:22831] eval.topk_like: INFO: [1210]小时代 top 5 likes:
[2017-08-18 07:52:54,149][pid:22831] eval.topk_like: INFO: [2396]小时代2:青木时代 0.820323
[2017-08-18 07:52:54,149][pid:22831] eval.topk_like: INFO: [3387]分手合约 0.606063
[2017-08-18 07:52:54,150][pid:22831] eval.topk_like: INFO: [4087]不二神探 0.604207
[2017-08-18 07:52:54,150][pid:22831] eval.topk_like: INFO: [3839]天台爱情 0.601879
[2017-08-18 07:52:54,150][pid:22831] eval.topk_like: INFO: [821]致我们终将逝去的青春 0.600836
[2017-08-18 07:52:54,150][pid:22831] eval.topk_like: INFO: [144]倩女幽魂 top 5 likes:
[2017-08-18 07:52:54,690][pid:22831] eval.topk_like: INFO: [2719]倩女幽魂3:道道道 0.685621
[2017-08-18 07:52:54,690][pid:22831] eval.topk_like: INFO: [1812]倩女幽魂2:人间道 0.681594
[2017-08-18 07:52:54,691][pid:22831] eval.topk_like: INFO: [466]胭脂扣 0.678032
[2017-08-18 07:52:54,691][pid:22831] eval.topk_like: INFO: [261]青蛇 0.671541
[2017-08-18 07:52:54,692][pid:22831] eval.topk_like: INFO: [156]东邪西毒 0.664057
[2017-08-18 07:52:54,692][pid:22831] eval.topk_like: INFO: [2602]悟空传 top 5 likes:
[2017-08-18 07:52:55,253][pid:22831] eval.topk_like: INFO: [5648]闪光少女 0.671337
[2017-08-18 07:52:55,253][pid:22831] eval.topk_like: INFO: [2189]绣春刀II:修罗战场 0.646861
[2017-08-18 07:52:55,254][pid:22831] eval.topk_like: INFO: [8297]逆时营救 0.634753
[2017-08-18 07:52:55,254][pid:22831] eval.topk_like: INFO: [12571]京城81号Ⅱ 0.625549
[2017-08-18 07:52:55,254][pid:22831] eval.topk_like: INFO: [10545]父子雄兵 0.623032
[2017-08-18 07:52:55,254][pid:22831] eval.topk_like: INFO: [56]美国往事 top 5 likes:
[2017-08-18 07:52:55,789][pid:22831] eval.topk_like: INFO: [18]天堂电影院 0.756449
[2017-08-18 07:52:55,790][pid:22831] eval.topk_like: INFO: [14]辛德勒的名单 0.737502
[2017-08-18 07:52:55,790][pid:22831] eval.topk_like: INFO: [17]教父 0.735216
[2017-08-18 07:52:55,790][pid:22831] eval.topk_like: INFO: [69]闻香识女人 0.734119
[2017-08-18 07:52:55,790][pid:22831] eval.topk_like: INFO: [42]西西里的美丽传说 0.732334
[2017-08-18 07:52:55,791][pid:22831] eval.topk_like: INFO: [2644]战狼2 top 5 likes:
[2017-08-18 07:52:56,346][pid:22831] eval.topk_like: INFO: [1456]大护法 0.612700
[2017-08-18 07:52:56,347][pid:22831] eval.topk_like: INFO: [2107]战狼 0.580637
[2017-08-18 07:52:56,347][pid:22831] eval.topk_like: INFO: [2602]悟空传 0.580365
[2017-08-18 07:52:56,347][pid:22831] eval.topk_like: INFO: [9941]建军大业 0.575422
[2017-08-18 07:52:56,347][pid:22831] eval.topk_like: INFO: [19040]阿唐奇遇 0.573613

MovieTaster-Open

使用Item2Vec做电影推荐

参考:

  • Barkan, Oren, and Noam Koenigstein. "Item2vec: neural item embedding for collaborative filtering." Machine Learning for Signal Processing (MLSP), 2016 IEEE 26th International Workshop on. IEEE, 2016.
  • JayveeHe, https://github.com/JayveeHe/MusicTaster. Github.

Demo>

原理详情请参考>

目录结构

  • datas: 存放语料
  • models: 存放电影向量
  • utils: 工具类,包括生成推荐电影列表等

使用说明

  1. 在根目录下载并编译fasttext

  2. 处理语料。结束后将在./datas/下生成fasttext所需要的格式

$ cd datas && tar -xzvf corpus.tar.gz
$ cd ..
$ python utils/process.py
  1. 参考./run.sh中生成词向量的参数配置,任选一个。
$ ./fasttext skipgram -input ./datas/doulist_0804_09.movie_id -output ./models/fasttext_model_0804_09_skipgram -minCount 5 -epoch 50 -neg 100
  1. 查看模型效果
$ python utils/eval.py
movie_name_id_dict is 130206 from [./datas/doulist_0804_09.json]
movie_name_id_dict is 130206 from [./datas/doulist_0804_09.json]
movie_id_name_dict is 130206 from [./datas/doulist_0804_09.json]
[2017-08-18 07:52:53,579][pid:22831] eval.topk_like: INFO: [1210]小时代 top 5 likes:
[2017-08-18 07:52:54,149][pid:22831] eval.topk_like: INFO: [2396]小时代2:青木时代 0.820323
[2017-08-18 07:52:54,149][pid:22831] eval.topk_like: INFO: [3387]分手合约 0.606063
[2017-08-18 07:52:54,150][pid:22831] eval.topk_like: INFO: [4087]不二神探 0.604207
[2017-08-18 07:52:54,150][pid:22831] eval.topk_like: INFO: [3839]天台爱情 0.601879
[2017-08-18 07:52:54,150][pid:22831] eval.topk_like: INFO: [821]致我们终将逝去的青春 0.600836
[2017-08-18 07:52:54,150][pid:22831] eval.topk_like: INFO: [144]倩女幽魂 top 5 likes:
[2017-08-18 07:52:54,690][pid:22831] eval.topk_like: INFO: [2719]倩女幽魂3:道道道 0.685621
[2017-08-18 07:52:54,690][pid:22831] eval.topk_like: INFO: [1812]倩女幽魂2:人间道 0.681594
[2017-08-18 07:52:54,691][pid:22831] eval.topk_like: INFO: [466]胭脂扣 0.678032
[2017-08-18 07:52:54,691][pid:22831] eval.topk_like: INFO: [261]青蛇 0.671541
[2017-08-18 07:52:54,692][pid:22831] eval.topk_like: INFO: [156]东邪西毒 0.664057
[2017-08-18 07:52:54,692][pid:22831] eval.topk_like: INFO: [2602]悟空传 top 5 likes:
[2017-08-18 07:52:55,253][pid:22831] eval.topk_like: INFO: [5648]闪光少女 0.671337
[2017-08-18 07:52:55,253][pid:22831] eval.topk_like: INFO: [2189]绣春刀II:修罗战场 0.646861
[2017-08-18 07:52:55,254][pid:22831] eval.topk_like: INFO: [8297]逆时营救 0.634753
[2017-08-18 07:52:55,254][pid:22831] eval.topk_like: INFO: [12571]京城81号Ⅱ 0.625549
[2017-08-18 07:52:55,254][pid:22831] eval.topk_like: INFO: [10545]父子雄兵 0.623032
[2017-08-18 07:52:55,254][pid:22831] eval.topk_like: INFO: [56]美国往事 top 5 likes:
[2017-08-18 07:52:55,789][pid:22831] eval.topk_like: INFO: [18]天堂电影院 0.756449
[2017-08-18 07:52:55,790][pid:22831] eval.topk_like: INFO: [14]辛德勒的名单 0.737502
[2017-08-18 07:52:55,790][pid:22831] eval.topk_like: INFO: [17]教父 0.735216
[2017-08-18 07:52:55,790][pid:22831] eval.topk_like: INFO: [69]闻香识女人 0.734119
[2017-08-18 07:52:55,790][pid:22831] eval.topk_like: INFO: [42]西西里的美丽传说 0.732334
[2017-08-18 07:52:55,791][pid:22831] eval.topk_like: INFO: [2644]战狼2 top 5 likes:
[2017-08-18 07:52:56,346][pid:22831] eval.topk_like: INFO: [1456]大护法 0.612700
[2017-08-18 07:52:56,347][pid:22831] eval.topk_like: INFO: [2107]战狼 0.580637
[2017-08-18 07:52:56,347][pid:22831] eval.topk_like: INFO: [2602]悟空传 0.580365
[2017-08-18 07:52:56,347][pid:22831] eval.topk_like: INFO: [9941]建军大业 0.575422
[2017-08-18 07:52:56,347][pid:22831] eval.topk_like: INFO: [19040]阿唐奇遇 0.573613
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].