smn-ailab / Pycmf
Programming Languages
Labels
Projects that are alternatives of or similar to Pycmf
PyCMF
Collective Matrix Factorization in Python.
Collective Matrix Factorization is a machine learning method that decomposes two matrices
and
into three matrices
,
, and
such that
where is either the identity or sigmoid function.
Why Use CMF?
CMF decomposes complex and multiple relationships into a small number of components, and can provide valuable insights into your data. Relationships between
- words, documents, and sentiment
- people, movies, genres, and ratings
- items, categories, people, and sales
and many more can all be handled with this simple framework. See Use Cases for more details.
Usage
PyCMF implements a scikit-learn like interface (full compatibility with scikit-learn is currently in progress)
>>> import numpy as np
>>> import pycmf
>>> X = np.abs(np.random.randn(5, 4)); Y = np.abs(np.random.randn(4, 1))
>>> model = pycmf.CMF(n_components=4)
>>> U, V, Z = model.fit_transform(X, Y)
>>> np.linalg.norm(X - U @ V.T) / np.linalg.norm(X)
0.00010788067541423165
>>> np.linalg.norm(Y - V @ Z.T) / np.linalg.norm(Y)
1.2829730942643831e-05
Getting Started
$ pip install git+https://github.com/smn-ailab/PyCMF
Numpy and Cython must be installed in advance.
Features
- Support for both dense and sparse matrices
- Support for linear and sigmoid transformations
- Non-negativity constraints on the components (useful in use cases like topic modeling)
- Stochastic estimation of the gradient and Hessian for the newton solver
- Visualizing topics and importances (see
CMF.print_topic_terms
)
See the docstrings for more details on how to configure CMF.
Use Cases
See samples for working examples. Possible use cases include:
Topic modeling and text classification
Suppose you want to do topic modeling to explore the data, but want to use supervision signals such as toxicity, sentiment, etc.. By using CMF, you can extract topics that are relevant to classifying texts.
Movie rating prediction
Many prediction tasks involve relations between multiple entities. Movie rating prediction is a good example: common entities include users, movies, genres and actors. CMF can be used to model these relations and predict unobserved edges.
License
This project is licensed under the MIT License - see the LICENSE file for details
References
TODO
- [ ] Improve performance
- [ ] Add support for weight matrices on relations
- [ ] Add support for predicting using obtained components
- [ ] Full compatibility with sklearn