All Projects → sundeepblue → yelper_recommendation_system

sundeepblue / yelper_recommendation_system

Licence: other
Yelper recommendation system

Programming Languages

javascript
184084 projects - #8 most used programming language
CSS
56736 projects
XSLT
1337 projects
python
139335 projects - #7 most used programming language
HTML
75241 projects
scala
5932 projects
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to yelper recommendation system

cs6101
The Web IR / NLP Group (WING)'s public reading group at the National University of Singapore.
Stars: ✭ 17 (-85.47%)
Mutual labels:  recommendation-system
Tf-Rec
Tf-Rec is a python💻 package for building⚒ Recommender Systems. It is built on top of Keras and Tensorflow 2 to utilize GPU Acceleration during training.
Stars: ✭ 18 (-84.62%)
Mutual labels:  recommendation-system
TIFUKNN
kNN-based next-basket recommendation
Stars: ✭ 38 (-67.52%)
Mutual labels:  recommendation-system
STWalk
Implementation of "STWalk: Learning Trajectory Representations in Temporal Graphs"
Stars: ✭ 18 (-84.62%)
Mutual labels:  graph-analysis
BERT4Rec-VAE-Pytorch
Pytorch implementation of BERT4Rec and Netflix VAE.
Stars: ✭ 212 (+81.2%)
Mutual labels:  recommendation-system
Recommendation-system
推荐系统资料笔记收录/ Everything about Recommendation System. 专题/书籍/论文/产品/Demo
Stars: ✭ 169 (+44.44%)
Mutual labels:  recommendation-system
recommender system with Python
recommender system tutorial with Python
Stars: ✭ 106 (-9.4%)
Mutual labels:  recommendation-system
ethereum-privacy
Profiling and Deanonymizing Ethereum Users
Stars: ✭ 37 (-68.38%)
Mutual labels:  graph-analysis
mildnet
Visual Similarity research at Fynd. Contains code to reproduce 2 of our research papers.
Stars: ✭ 76 (-35.04%)
Mutual labels:  recommendation-system
intergo
A package for interleaving / multileaving ranking generation in go
Stars: ✭ 30 (-74.36%)
Mutual labels:  recommendation-system
WSDM2022-PTUPCDR
This is the official implementation of our paper Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR), which has been accepted by WSDM2022.
Stars: ✭ 65 (-44.44%)
Mutual labels:  recommendation-system
Recommendation-System-Baseline
Some common recommendation system baseline, with description and link.
Stars: ✭ 34 (-70.94%)
Mutual labels:  recommendation-system
listenbrainz-labs
A collection tools/scripts to explore the ListenBrainz data using Apache Spark.
Stars: ✭ 16 (-86.32%)
Mutual labels:  recommendation-system
seminar
ECNU ICA seminar materials
Stars: ✭ 14 (-88.03%)
Mutual labels:  recommendation-system
toptal-recommengine
Prototype recommendation engine built to accompany an article on Toptal Blog
Stars: ✭ 109 (-6.84%)
Mutual labels:  recommendation-system
auction-website
🏷️ An e-commerce marketplace template. An online auction and shopping website for buying and selling a wide variety of goods and services worldwide.
Stars: ✭ 44 (-62.39%)
Mutual labels:  recommendation-system
Long-Tail-GAN
Adversarial learning framework to enhance long-tail recommendation in Neural Collaborative Filtering
Stars: ✭ 19 (-83.76%)
Mutual labels:  recommendation-system
grblas
Python wrapper around GraphBLAS
Stars: ✭ 22 (-81.2%)
Mutual labels:  graph-analysis
Person-Recommendation-Algorithms
推荐算法个人学习笔记以及代码实战
Stars: ✭ 50 (-57.26%)
Mutual labels:  recommendation-system
JD2Skills-BERT-XMLC
Code and Dataset for the Bhola et al. (2020) Retrieving Skills from Job Descriptions: A Language Model Based Extreme Multi-label Classification Framework
Stars: ✭ 33 (-71.79%)
Mutual labels:  recommendation-system

Yelper: A Collaborative Filtering Based Recommendation System

Chuan Sun

[chuansun76 at gmail dot com]

[twitter.com/sundeepblue]

Blog: https://sundeepblue.wordpress.com/2016/09/25/yelper-a-collaborative-filtering-based-recommendation-system/ or here: https://nycdatascience.com/blog/student-works/capstone/yelper-collaborative-filtering-based-recommendation-system/

This README file describes several major component of the "Yelper", a business recommendation system built mainly in Python using Spark framework.

Below are some features of the "Yelper":

  • Divide original business data by cities allows fine tuned and customized recommendation
  • Matrix Factorization based recommendation using Spark MLlib
  • User-business graph analysis using Spark GraphX in Scala
  • Real-time user request handling using Spark Streaming and Apache Kafka
  • User-business graph visualization using D3 and graph-tool library
  • Functional webserver to recommend high rated stuff for users

Now let me introduce in detail how to reproduce everything!

1. Preprocessing

(1) Convert all user ids and business ids to integers. This made subsequent graph building a lot easier.

(2) Split the entire business data into smaller subsets by city. Obtained 9 major cities:

  • us_charlotte
  • us_lasvegas
  • us_madison
  • us_phoenix
  • us_pittsburgh
  • us_urbana_champaign
  • canada_montreal
  • germany_karlsruhe
  • uk_edinburgh

All necessary util functions can be found here: ./rating_data_utils.py

Run this command:

$ spark-submit ./parse_ratingdata_for_major_cities.py

2. Network analysis for user-business graph

Extract connected components using Spark GraphX

Since there is no Python support for GraphX, I wrote the code in Scala. Note, the scala code has to be built using "sbt". Make sure the GraphX library is properly configured in file "./spark_graphx_analysis/config.sbt".

Source code: "./spark_graphx_analysis/src/main/scala/YelpUserBusinessGraphAnalysis.scala"

Below are the commands to run the graph analysis:

  • $ cd /Users/sundeepblue/Bootcamp/allweek/week9/capstone/spark_graphx_analysis
  • $ sbt package
  • $ spark-submit --master local --class "YelpUserBusinessGraphAnalysis" target/scala-2.11/simple-project_2.11-1.0.jar

The file is saved to "businessid_to_indegree.csv"

3. Build MF-based recommendation models for 9 major cities

Run this command to prepare mf based model for each major city:

$ python mf_based_recommendation_trainer.py

4. Build real-time user request handler using Spark Streaming and Apache Kafka

The purpose here is to simulate continuous user request handling.

STEP 0: Start Zookeeper and Kafka server

Note that kafka zookeeper default port is 2181 not 9092! And, the zookeeper server and kafka server should be started in two separate terminals.

  • $ cd /Users/sundeepblue/Bootcamp/allweek/week9/capstone/kafka/kafka_2.11-0.10.0.1
  • $ bin/zookeeper-server-start.sh config/zookeeper.properties
  • $ bin/kafka-server-start.sh config/server.properties

STEP 1: Create Kafka topic

  • $ bin/kafka-topics.sh --create --zookeeper localhost:2181 --topic user-request-topic --partitions 1 --replication-factor 1

STEP 2: Launch Spark Streaming

Note that this command should also be run in a new terminal. Use port 2181 and use this topic: "user-request-topic"

  • $ cd /Users/sundeepblue/Bootcamp/allweek/week9/capstone
  • $ spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.0 ./handle_user_requests_streaming.py

STEP 3: Produce user requests (TBD)

Note, do not specify port in KafkaProducer()!

  • $ cd /Users/sundeepblue/Bootcamp/allweek/week9/capstone
  • $ python ./user_requests_producer.py

5. Build user-business graph for D3 visualization

The purpose here is to generate .js file using all the nodes and edges in the user-business graph, such that I can load it and visualize graphs in web server.

See file:

./build_nodes_and_edges_js_for_d3_visualization.py

6. Finally, how to run local web server to recommend something?

This web server was built using:

  • Spark
  • Flask
  • cherrypy
  • Python paste

How to launch the web server?

  • $ cd /Users/sundeepblue/Bootcamp/allweek/week9/capstone/webserver
  • $ unset PYSPARK_DRIVER_PYTHON
  • $ spark-submit server.py

Sample web server URLs

recommendation for user '10081786' in city 'us_charlotte':

recommendation for user '10033545' in city 'us_madison':

users-businesses social network in Madison, USA:

The javascript file "./webserver/static/data/generated_nodes_and_edges_from_json_us_madison.js" was programmatically generated by python code "./build_nodes_and_edges_js_for_d3_visualization.py"

The most important files for the server

  • ./webserver/app.py (contains code to interact with Spark, recommend businesses, etc)
  • ./webserver/server.py (how to make spark work with flask)
  • ./templates/map.html (contain Google Map API calling)

7. Future works

  • More graph analysis
    • Graph pagerank analysis using GraphX
    • Community discovery (similar to Facebook social network)
  • Improve recommendation
    • Content-based recommendation
    • Clustering all businesses
    • Extract object from business photos using Convolutional Neural Network
  • Code
    • Redirect spark execution log to txt file
    • Add try/except to handle potential exceptions in codes
    • Add more comments
    • Add test cases for critical logics
  • Google Map based web page
    • Fine tune the webpage to support more features
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].