All Projects → Galaxies99 → oh-my-papers

Galaxies99 / oh-my-papers

Licence: MIT license
Oh-My-Papers: a Hybrid Context-aware Paper Recommendation System

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to oh-my-papers

Mydatascienceportfolio
Applying Data Science and Machine Learning to Solve Real World Business Problems
Stars: ✭ 227 (+1161.11%)
Mutual labels:  recommendation-system
ScasNet
Semantic Labeling in VHR Images via A Self-Cascaded CNN (ISPRS JPRS, IF=6.942)
Stars: ✭ 24 (+33.33%)
Mutual labels:  context-aware
Diverse-RecSys
Collection of diverse recommendation papers
Stars: ✭ 39 (+116.67%)
Mutual labels:  recommendation-system
Recsys core
[电影推荐系统] Based on the movie scoring data set, the movie recommendation system is built with FM and LR as the core(基于爬取的电影评分数据集,构建以FM和LR为核心的电影推荐系统).
Stars: ✭ 245 (+1261.11%)
Mutual labels:  recommendation-system
AIML-Projects
Projects I completed as a part of Great Learning's PGP - Artificial Intelligence and Machine Learning
Stars: ✭ 85 (+372.22%)
Mutual labels:  recommendation-system
Answerable
Recommendation system for Stack Overflow unanswered questions
Stars: ✭ 13 (-27.78%)
Mutual labels:  recommendation-system
Chameleon recsys
Source code of CHAMELEON - A Deep Learning Meta-Architecture for News Recommender Systems
Stars: ✭ 202 (+1022.22%)
Mutual labels:  recommendation-system
mudrod
Mining and Utilizing Dataset Relevancy from Oceanographic Datasets to Improve Data Discovery and Access, online demo: https://mudrod.jpl.nasa.gov/#/
Stars: ✭ 15 (-16.67%)
Mutual labels:  recommendation-system
News-Manager
🗞news scraping and recommendation system
Stars: ✭ 14 (-22.22%)
Mutual labels:  recommendation-system
raptor
A lightweight product recommendation system (Item Based Collaborative Filtering) developed in Haskell.
Stars: ✭ 34 (+88.89%)
Mutual labels:  recommendation-system
Recommendersystem Dataset
This repository contains some datasets that I have collected in Recommender Systems.
Stars: ✭ 249 (+1283.33%)
Mutual labels:  recommendation-system
compatibility-family-learning
Compatibility Family Learning for Item Recommendation and Generation
Stars: ✭ 21 (+16.67%)
Mutual labels:  recommendation-system
Ranking Papers
Papers on recommendation system / search ranking.
Stars: ✭ 29 (+61.11%)
Mutual labels:  recommendation-system
Recommendationsystem
Book recommender system using collaborative filtering based on Spark
Stars: ✭ 244 (+1255.56%)
Mutual labels:  recommendation-system
Machine-Learning
Examples of all Machine Learning Algorithm in Apache Spark
Stars: ✭ 15 (-16.67%)
Mutual labels:  recommendation-system
Tutorials
AI-related tutorials. Access any of them for free → https://towardsai.net/editorial
Stars: ✭ 204 (+1033.33%)
Mutual labels:  recommendation-system
MachineLearning
Machine learning for beginner(Data Science enthusiast)
Stars: ✭ 104 (+477.78%)
Mutual labels:  recommendation-system
image embeddings
Using efficientnet to provide embeddings for retrieval
Stars: ✭ 107 (+494.44%)
Mutual labels:  recommendation-system
yumme
Yum-me is a nutrient based food recommendation system
Stars: ✭ 34 (+88.89%)
Mutual labels:  recommendation-system
laracombee
📊 A Recombee integration for Laravel
Stars: ✭ 91 (+405.56%)
Mutual labels:  recommendation-system

Oh-My-Papers: a Hybrid Context-aware Paper Recommendation System

[Paper]

Introduction

Current scholar search engine cannot recognize "jargon", that is, specialized termilogy associated with a particular field or area of activity. For example, if you type "ResNet" in Google Scholar and other scholar search engine, you can not find the ResNet paper: "Deep Residual Learning for Image Recognition". In order to make the search more precisely, we build Oh-My-Papers, a hybrid context-aware citation recommendation system, as well as a scholar search engine.

We first point out that we can learn jargons from academic paper citation information. Since specialists usually use jargons such as "ResNet" in the academic paper writings, and the reference to the corresponding paper usually follows the jargon as illustrated below, citation information (especially citation context) of academic papers can help us to improve the searching results.

In this work, we also create a large dataset recording citation information from papers of computer vision field in recent years. The dataset is ~10x larger than the biggest dataset from the previous works.

Besides paper recommendation, our search engine can also be regarded as a citation recommendation system. It can also perform auto-citation. We also provide related paper recommendation service based on our VGAE model.

We implement a simple front-end website, which has three functions: context searching, auto-citation and related paper recommendation. See the details from the demo repository oh-my-papers-website for details.

Requirements

Execute the following commands to install requirements.

pip install -r requirements.txt

You may need to manually install pytorch-geometric from its official repo in order to install the correct version that is matched with your pytorch and cuda versions.

Data Preparation

You can download full data at Baidu Netdisk (Extract Code: sc88) and Google Drive, and you need to put it into data folder. We also prepare a tiny dataset, which is a subset of the full dataset. If you want to prepare the data yourself, please see docs/data_preparation.md for details.

Pretrained Models

You can download full pretrained models at Baidu Netdisk (Extract Code: lsgp) and Google Drive. After downloading the stats.zip, unzip it and put in under the main folder. Then, you can directly use it for inference and evaluation. Please see docs/pretrained_models.md for details.

Models

Our repository includes three models:

  • Model 1: VGAE model for related paper recommendation (ours).
  • Model 2: Bert model for context-aware citation recommendation (baseline);
  • Model 3: Citation-bert model for context-aware citation recommendation (ours).

Please see docs/models.md for details.

Configurations

Before training, evaluation and inference of the models, please set up your own configurations correctly. Please see docs/configurations.md for details.

Training (Optional)

If you have download our pretrained models, you can skip the following process. Before you start your own training from beginning, please keep the stats folder clean.

Execute the following commands to train model 1, model 2, model 3 respectively.

python train_vgae.py --cfg [Configuration Path]
python train_bert.py --cfg [Configuration Path]
python train_citation_bert.py --cfg [Configuration Path]

where [Configuration Path] is the path to your configuration file.

Note. If you want to train model 3, please train the model 1 first to generate the paper embeddings.

Evaluation

If you want to evaluate the performance of our models, especially model 2 and model 3 (since model 1 almost has nothing to evaluate). For evaluation, please make sure that either you have downloaded pretrained models and put it in the correct place, or you have trained the models by yourselves.

Execute the following commands to evaluate model 2, model 3 respectively.

python eval_bert.py --cfg [Configuration Path]
python eval_citation_bert.py --cfg [Configuration Path]

where [Configuration Path] is the path to your configuration file.

Please see docs/evaluation.md for more details about the evaluation process.

Inference

For inference, we create a class for the inference of each model. Please make sure that either you have downloaded pretrained models and put it in the correct place, or you have trained the models by yourselves.

Execute the following commands for inference of model 1, model 2 and model 3 respectively.

python inference_vgae.py --cfg [Configuration Path] --input [Input Path] --output [Output Path]
python inference_bert.py --cfg [Configuration Path] --input [Input Path] --output [Output Path]
python inferece_citation_bert.py --cfg [Configuration Path] --input [Input Path] --output [Output Path]

where [Configuration Path] is the path to your configuration file, and [Input Path] and [Output Path] is the path to the input file and the output file respectively.

Note. The input file and output file all has a json format.

Please see docs/inference.md for more details about the inference and the file format.

Citations

@misc{fang2021ohmypapers,
  author =       {Hongjie Fang, Zhanda Zhu, Haoran Zhao},
  title =        {Oh-my-papers: a Hybrid Context-aware Citation Recommendation System},
  howpublished = {\url{https://github.com/Galaxies99/oh-my-papers}},
  year =         {2021}
}

References

  1. Science Parse: official repo;
  2. Transformers: HuggingFace repo;
  3. Specter: official repo;
  4. Yang L, Zheng Y, Cai X, et al. A LSTM based model for personalized context-aware citation recommendation[J]. IEEE access, 2018, 6: 59618-59627.
  5. Jeong C, Jang S, Park E, et al. A context-aware citation recommendation model with BERT and graph convolutional networks[J]. Scientometrics, 2020, 124(3): 1907-1922.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].