All Projects → caserec → Datasets For Recommender Systems

caserec / Datasets For Recommender Systems

This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS)

Projects that are alternatives of or similar to Datasets For Recommender Systems

Openml R
R package to interface with OpenML
Stars: ✭ 81 (-85.64%)
Mutual labels:  jupyter-notebook, data-science, datasets, database
Codesearchnet
Datasets, tools, and benchmarks for representation learning of code.
Stars: ✭ 1,378 (+144.33%)
Mutual labels:  jupyter-notebook, data-science, datasets
Griffon Vm
Griffon Data Science Virtual Machine
Stars: ✭ 128 (-77.3%)
Mutual labels:  jupyter-notebook, data-science, database
Etl with python
ETL with Python - Taught at DWH course 2017 (TAU)
Stars: ✭ 68 (-87.94%)
Mutual labels:  jupyter-notebook, data-science, database
Web Database Analytics
Web scrapping and related analytics using Python tools
Stars: ✭ 175 (-68.97%)
Mutual labels:  jupyter-notebook, data-science, database
Turbodbc
Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with the Python Database API Specification 2.0.
Stars: ✭ 449 (-20.39%)
Mutual labels:  data-science, database
Tensor House
A collection of reference machine learning and optimization models for enterprise operations: marketing, pricing, supply chain
Stars: ✭ 449 (-20.39%)
Mutual labels:  jupyter-notebook, data-science
Courses
Quiz & Assignment of Coursera
Stars: ✭ 454 (-19.5%)
Mutual labels:  jupyter-notebook, data-science
Data Analysis And Machine Learning Projects
Repository of teaching materials, code, and data for my data analysis and machine learning projects.
Stars: ✭ 5,166 (+815.96%)
Mutual labels:  jupyter-notebook, data-science
Opensource Roadmap Datascience
¡Camino a una educación autodidacta en Ciencia de Datos!
Stars: ✭ 429 (-23.94%)
Mutual labels:  jupyter-notebook, data-science
Pba
Efficient Learning of Augmentation Policy Schedules
Stars: ✭ 461 (-18.26%)
Mutual labels:  jupyter-notebook, data-science
Course V3
The 3rd edition of course.fast.ai
Stars: ✭ 4,785 (+748.4%)
Mutual labels:  jupyter-notebook, data-science
Food Recipe Cnn
food image to recipe with deep convolutional neural networks.
Stars: ✭ 448 (-20.57%)
Mutual labels:  jupyter-notebook, data-science
Python Ml Course
Curso de Introducción a Machine Learning con Python
Stars: ✭ 442 (-21.63%)
Mutual labels:  jupyter-notebook, data-science
Python Causality Handbook
Causal Inference for the Brave and True. A light-hearted yet rigorous approach to learning about impact estimation and sensitivity analysis.
Stars: ✭ 449 (-20.39%)
Mutual labels:  jupyter-notebook, data-science
Code search
Code For Medium Article: "How To Create Natural Language Semantic Search for Arbitrary Objects With Deep Learning"
Stars: ✭ 436 (-22.7%)
Mutual labels:  jupyter-notebook, data-science
Edward
A probabilistic programming language in TensorFlow. Deep generative models, variational inference.
Stars: ✭ 4,674 (+728.72%)
Mutual labels:  jupyter-notebook, data-science
Awesome Twitter Data
A list of Twitter datasets and related resources.
Stars: ✭ 533 (-5.5%)
Mutual labels:  data-science, datasets
Data Science Your Way
Ways of doing Data Science Engineering and Machine Learning in R and Python
Stars: ✭ 530 (-6.03%)
Mutual labels:  jupyter-notebook, data-science
Intro To Python
An intro to Python & programming for wanna-be data scientists
Stars: ✭ 536 (-4.96%)
Mutual labels:  jupyter-notebook, data-science

Public Datasets For Recommender Systems

This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS). They are collected and tidied from Stack Overflow, articles, recommender sites and academic experiments. Most of the datasets presented here are free, having open sorce linceses, however, some are not and you need to ask permission to use or cite the authors' work.

In addition, this repository contains some pre-processed datasets with treatment for academic experiments.

Link and datasets descriptions

Book

  • Book Crossing:: The BookCrossing (BX) dataset was collected by Cai-Nicolas in a 4-week crawl (August / September 2004) from the Book-Crossing community

Dating

  • Dating Agency:: This dataset contains 17,359,346 anonymous ratings of 168,791 profiles made by 135,359 LibimSeTi users as dumped on April 4, 2006.

E-commerce

  • Amazon:: This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014
  • Retailrocket recommender system dataset:: The dataset consists of three files: a file with behaviour data (events.csv), a file with item properties (item_properties.сsv) and a file, which describes category tree (category_tree.сsv). The data has been collected from a real-world ecommerce website.

Music

  • Amazon Music:: This digital music dataset contains reviews and metadata from Amazon
  • Yahoo Music:: This dataset represents a snapshot of the Yahoo! Music community's preferences for various musical artists.
  • LastFM (Implicit):: This dataset contains social networking, tagging, and music artist listening information from a set of 2K users from Last.fm online music system.
  • Million Song Dataset:: The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.

Movies

  • MovieLens:: GroupLens Research has collected and made available rating datasets from their movie web site
  • Yahoo Movies:: This dataset contains ratings for songs collected from two different sources. The first source consists of ratings supplied by users during normal interaction with Yahoo! Music services.
  • CiaoDVD:: CiaoDVD is a dataset crawled from the entire category of DVDs from the dvd.ciao.co.uk website in December, 2013
  • FilmTrust:: FilmTrust is a small dataset crawled from the entire FilmTrust website in June, 2011
  • Netflix:: This is the official data set used in the Netflix Prize competition.

Games

  • Steam Video Games:: This dataset is a list of user behaviors, with columns: user-id, game-title, behavior-name, value. The behaviors included are 'purchase' and 'play'. The value indicates the degree to which the behavior was performed - in the case of 'purchase' the value is always 1, and in the case of 'play' the value represents the number of hours the user has played the game.

Jokes

  • Jester:: This Joke dataset contains 4.1 million continuous ratings (-10.00 to +10.00) of 100 jokes from 73,496 users

Food

  • Chicago Entree:: This dataset contains a record of user interactions with the Entree Chicago restaurant recommendation system.

Anime

  • Anime Recommendations Database:: This data set contains information on user preference data from 73,516 users on 12,294 anime. Each user is able to add anime to their completed list and give it a rating and this data set is a compilation of those ratings.

Other dataset

You can find more datasets in:

  • GroupLens Datasets link
  • LibRec Datasets link
  • Yahoo Research link
  • Datasets for Machine Learning link
  • Stanford Large Network Dataset Collection link

Usage and License

Before using these data sets, please review their README files or sites for the usage licenses, acknowledgments and other details.

Note : If you have difficulties in downloading any of these datasets please contact me. I have backup of all datasets.

Recommender Tools

Contributors

Arthur Fortes da Costa {fortes [dot] arthur [at] gmail [dot] com} [Editor]
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].