All Projects → david-cortes → recometrics

david-cortes / recometrics

Licence: BSD-2-Clause license
(Python, R, C++) Library-agnostic evaluation framework for implicit-feedback recommender systems

Programming Languages

C++
36643 projects - #6 most used programming language
r
7636 projects
python
139335 projects - #7 most used programming language
cython
566 projects

Projects that are alternatives of or similar to recometrics

retailbox
🛍️RetailBox - eCommerce Recommender System using Machine Learning
Stars: ✭ 32 (+68.42%)
Mutual labels:  matrix-factorization, recommender-systems
torchmf
matrix factorization in PyTorch
Stars: ✭ 114 (+500%)
Mutual labels:  matrix-factorization, recommender-systems
Librec
LibRec: A Leading Java Library for Recommender Systems, see
Stars: ✭ 3,045 (+15926.32%)
Mutual labels:  matrix-factorization, recommender-systems
Rectorch
rectorch is a pytorch-based framework for state-of-the-art top-N recommendation
Stars: ✭ 121 (+536.84%)
Mutual labels:  matrix-factorization
Robustpca
Robust PCA implementation and examples (Matlab)
Stars: ✭ 138 (+626.32%)
Mutual labels:  matrix-factorization
Implicit
Fast Python Collaborative Filtering for Implicit Feedback Datasets
Stars: ✭ 2,569 (+13421.05%)
Mutual labels:  matrix-factorization
RcppML
Rcpp Machine Learning: Fast robust NMF, divisive clustering, and more
Stars: ✭ 52 (+173.68%)
Mutual labels:  matrix-factorization
Scikit Fusion
scikit-fusion: Data fusion via collective latent factor models
Stars: ✭ 103 (+442.11%)
Mutual labels:  matrix-factorization
equadratures
equadratures.org/
Stars: ✭ 92 (+384.21%)
Mutual labels:  matrix-factorization
Netmf
Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec
Stars: ✭ 158 (+731.58%)
Mutual labels:  matrix-factorization
Bionev
Graph Embedding Evaluation / Code and Datasets for "Graph Embedding on Biomedical Networks: Methods, Applications, and Evaluations" (Bioinformatics 2020)
Stars: ✭ 155 (+715.79%)
Mutual labels:  matrix-factorization
Recotour
A tour through recommendation algorithms in python [IN PROGRESS]
Stars: ✭ 140 (+636.84%)
Mutual labels:  matrix-factorization
Polara
Recommender system and evaluation framework for top-n recommendations tasks that respects polarity of feedbacks. Fast, flexible and easy to use. Written in python, boosted by scientific python stack.
Stars: ✭ 205 (+978.95%)
Mutual labels:  matrix-factorization
Awesome Community Detection
A curated list of community detection research papers with implementations.
Stars: ✭ 1,874 (+9763.16%)
Mutual labels:  matrix-factorization
matrix-completion
Lightweight Python library for in-memory matrix completion.
Stars: ✭ 94 (+394.74%)
Mutual labels:  matrix-factorization
Metarec
PyTorch Implementations For A Series Of Deep Learning-Based Recommendation Models (IN PROGRESS)
Stars: ✭ 120 (+531.58%)
Mutual labels:  matrix-factorization
Spotlight
Deep recommender models using PyTorch.
Stars: ✭ 2,623 (+13705.26%)
Mutual labels:  matrix-factorization
Cumf als
CUDA Matrix Factorization Library with Alternating Least Square (ALS)
Stars: ✭ 154 (+710.53%)
Mutual labels:  matrix-factorization
Nmflibrary
MATLAB library for non-negative matrix factorization (NMF): Version 1.8.1
Stars: ✭ 153 (+705.26%)
Mutual labels:  matrix-factorization
Cofactor
CoFactor: Regularizing Matrix Factorization with Item Co-occurrence
Stars: ✭ 160 (+742.11%)
Mutual labels:  matrix-factorization

RecoMetrics

Library-agnostic evaluation framework for implicit-feedback recommender systems that are based on low-rank matrix factorization models or latent embeddings. Calculates per-user metrics based on the ranking of items produced by the model, using efficient multi-threaded routines. Also provides functions for generating train-test splits of data. Writen in C++ with interfaces for Python and R.


For a longer introduction, see:

Evaluating implicit-feedback recommendations

When evaluating recommender systems built from implicit-feedback data (e.g. numer of times that a user played each song in a music service), one usually wants to evaluate the quality of the produced recommendations according to how well they rank the available pool of items for each user.

This is done by setting aside some fraction of the data for testing purposes, then building a model on the remainder of the data, and producing top-K recommended lists for each user that exclude the items already consumed by him/her in this non-held-out data. The recommended lists are evaluated according to how they rank the held-out items in comparison to the items that the user did not consume, using classification or ranking metrics such as precision, NDCG, AUC, among others. The held-out items are considered to be "positive" entries, while the items which were not consumed by the user are considered "negative" entries.

Compared to metrics used for explicit-feedback recommendations such as RMSE, metrics for implicit-feedback are much slower to compute (might well be slower than fitting the model itself), since they require generating a ranking of a large number of items for each user separately and iterating over ranked lists. Many libraries for recommender systems provide their own functionality for automatically setting aside some data and evaluating these metrics while fitting models, but there are some issues with such approach:

  • They can only evaluate models created with the same library, thus not allowing comparisons between libraries.
  • The methodologies are oftentimes not comparable across libraries (e.g. they differ in how they would discard users with few data or what would they output in edge cases).
  • Results are sometimes not possible to reproduce exactly on the outside (e.g. the library outputs only the metrics, but not the exact data split that was used).
  • Oftentimes, such evaluations are done in pure Python+NumPy, either using a single core, or sharing model matrices and data across processes by serializing them, which results in very slow calculations. What's more, sometimes different metrics are calculated separately, requiring to re-generate the ranking for each metric.

This library, in contrast:

  • Takes as input the model matrices and train-test data (as CSR matrices), thus allowing to work with any recommendation model in which the predicted scores are created from an inner product between user and item factors, regardless of library. Example libraries with these type of models: implicit, libmf, lightfm, spotlight, cmfrec, rsparse, lenskit, among many others.
  • Allows specifying criteria for filtering users to evaluate based on required amount of data (e.g. minimum number of positive test entries, minimum size of items pool to rank, whether cold-start recommendations are accepted, among others).
  • Outputs NaN when a metric is not calculable instead of silently filling with zeros or ones (e.g. if the user has no positive entries or no negative entries in the test data), and makes logical checks for invalid cases such as all predictions having the same values or having NAs.
  • Can calculate different metrics (e.g. AP@K, NDCG@K) in the same pass, without having to re-rank the items for each user, and allowing to generate the metrics for many values of K at the same time (e.g. NDCG@1, NDCG@2, ..., NDCG@10, instead of just NDCG@10).
  • Provides the calculation on a per-user basis, not just in aggregate, allowing further filters and post-hoc comparisons.
  • Can be used to generate the train-test split separately, setting configurable minimum criteria for the test users, desired size of the test data, and sampling users and items independently, thus allowing faster calculations with sub-sampled users.
  • Uses multi-threaded computations with a shared-memory model, SIMD CPU instructions (can use float32 and float64), and efficient search procedures, thus running much faster than pure-Python or pure-R software.

Supported metrics

  • P@K ("precision-at-k"): this is the proportion of the top-K recommended items (excluding those that were in the training data) that are present in the test data of a given user. Can also produce a standardized or "truncated" version which will divide by the minimum between K and the number of test items.

  • R@K ("recall-at-k"): this is the proportion of the test items that are found among the top-K recommended, thus accounting for the fact that some users have more test data than others and thus it's easier to find test items for them.

  • AP@K ("average-precision-at-k"): this is conceptually a metric which looks at precision, recall, and rank, by calculating precisions at different recall levels. See the Wikipedia entry for more information. Also offers a "truncated" version like for P@K. The average of this metric across users is typically called "MAP@K" or "Mean Average Precision".

  • NDCG@K (normalized discounted cumulative gain): this is a ranking metric that takes into account not only the presence of recommended items in the test set, but also their confidence score (according to the data), discounting this score according to the ranking of the item in the top-K list. Entries not present in the test data are assumed to have a score of zero. See the Wikipedia entry for more details.

  • Hit@K: indicates whether at least one of the top-K recommended items was in the test data (the by-user average is typically called "Hit Rate").

  • RR@K (reciprocal rank): inverse rank (one divided by the rank) of the first item among the top-K recommended that is in the test data (the by-user average is typically called "Mean Reciprocal Rank" or MRR).

  • ROC AUC (are under the receiver-operating characteristic curve). See the Wikipedia entry for more details. This metric evaluates the full ranking rather than just top-K.

  • PR AUC (are under the precision-recall curve): just like ROC-AUC, it evaluates the full ranking, but it is a lot more sensitive about what happens at the top of the ranks, providing a perhaps more helpful picture than ROC-AUC. It is calculated using the fast-but-not-so-precise rectangular method, whose formula corresponds to the AP@K metric with K=N.

This package does NOT deal with other more specialized metrics evaluating e.g. "serendipity", "discoverability", diversity of recommendations, etc.

Installation

  • Python:

pip install recometrics

Or if that fails:

pip install --no-use-pep517 recometrics

Note for macOS users: on macOS, the Python version of this package might compile without multi-threading capabilities. In order to enable multi-threading support, first install OpenMP:

brew install libomp

And then reinstall this package: pip install --force-reinstall recometrics.


IMPORTANT: the setup script will try to add compilation flag -march=native. This instructs the compiler to tune the package for the CPU in which it is being installed (by e.g. using AVX instructions if available), but the result might not be usable in other computers. If building a binary wheel of this package or putting it into a docker image which will be used in different machines, this can be overriden either by (a) defining an environment variable DONT_SET_MARCH=1, or by (b) manually supplying compilation CFLAGS as an environment variable with something related to architecture. For maximum compatibility (but slowest speed), it's possible to do something like this:

export DONT_SET_MARCH=1
pip install recometrics

or, by specifying some compilation flag for architecture:

export CFLAGS="-march=x86-64"
export CXXFLAGS="-march=x86-64"
pip install recometrics

Note that, if not using -march=native, it will rely on the BLAS library provided by SciPy for calculations.


  • R:
install.packages("recometrics")

For better performance, it's recommended to compile the package from source with extra optimizations -O3 and -march=native - in Linux, this can be done by creating a file ~/.R/Makevars containing this line: CXX11FLAGS += -O3 -march=native (plus an empty line at the end) (this file should be created before installing recometrics).

  • C++:

Library is a self-contained templated header file (src/recometrics.hpp). Can be copied into other projects and used by #include "..."'ing it.

Documentation

  • Python: documentation is available at ReadTheDocs.

  • R: documentation is internally available on CRAN.

  • C++: documentation is available in the header file src/recometrics.hpp.

Examples

Applied examples with public data and different libraries for fitting models:

  • Python notebook (LastFM-360K dataset, using libraries implicit and lightfm).

  • R vignette (MovieLens100K dataset, using library cmfrec).

Sample usage

  • Python:
import numpy as np
from scipy.sparse import csr_matrix, random as sprandom
import recometrics

### User-item interactions (e.g. number of video views)
n_users = 100
n_items = 50
rng = np.random.default_rng(seed=123)
X = sprandom(n_users, n_items, density=0.2,
             data_rvs=lambda n: rng.integers(1, 100, n),
             format="csr")

### Creating a fit + train-test split
X_fit, X_train, X_test, test_users = \
    recometrics.split_reco_train_test(
        X, split_type="separated",
        users_test_fraction=0.1,
        items_test_fraction=0.3,
        min_items_pool=10, min_pos_test=2,
        seed=123
    )

### Model would be fit to 'X_fit' (non-test users)
### e.g. model = Model(...).fit(X_fit)
latent_dim = 5
Item_Factors = rng.standard_normal((n_items, latent_dim))

### Then it would produce user factors for 'X_train'
### (users to which the model was not fit)
User_Factors = rng.standard_normal((X_train.shape[0], latent_dim))

### And then the metrics would be calculated
df_metrics_by_user = \
    recometrics.calc_reco_metrics(
        X_train, X_test,
        User_Factors, Item_Factors,
        k=5,
        precision=True,
        average_precision=True,
        ndcg=True,
        nthreads=-1
    )
df_metrics_by_user.head(3)
   P@5      AP@5    NDCG@5
0  0.0  0.000000  0.000000
1  0.0  0.000000  0.000000
2  0.2  0.333333  0.610062
  • R:
library(Matrix)
library(recometrics)

### User-item interactions (e.g. number of video views)
n_users <- 100
n_items <- 50
set.seed(123)
X <- rsparsematrix(n_users, n_items, density=0.2, repr="R",
                   rand.x=function(n) sample(100, n, replace=TRUE))

### Creating a fit + train-test split
temp <- create.reco.train.test(
    X, split_type="separated",
    users_test_fraction=0.1,
    items_test_fraction=0.3,
    min_items_pool=10, min_pos_test=2,
    seed=1
)
X_train <- temp$X_train
X_test <- temp$X_test
X_fit <- temp$X_rem
rm(temp)

### Model would be fit to 'X_fit' (non-test users)
### e.g. model <- reco_model(X_fit, ...)
latent_dim <- 5
Item_Factors <- matrix(rnorm(n_items*latent_dim), ncol=n_items)

### Then it would produce user factors for 'X_train'
### (users to which the model was not fit)
User_Factors <- matrix(rnorm(nrow(X_train)*latent_dim), ncol=nrow(X_train))

### And then the metrics would be calculated
df_metrics_by_user <- calc.reco.metrics(
    X_train, X_test,
    User_Factors, Item_Factors,
    k=5,
    precision=TRUE,
    average_precision=TRUE,
    ndcg=TRUE,
    nthreads=parallel::detectCores()
)
head(df_metrics_by_user, 3)
  p_at_5 ap_at_5 ndcg_at_5
1    0.0   0.000 0.0000000
2    0.2   0.125 0.3047166
3    0.2   0.125 0.1813742
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].