All Projects → smn-ailab → Pycmf

smn-ailab / Pycmf

Licence: mit
A python library for Collective Matrix Factorization (CMF)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pycmf

Contextualized Topic Models
A python package to run contextualized topic modeling. CTMs combine BERT with topic models to get coherent topics. Also supports multilingual tasks. Cross-lingual Zero-shot model published at EACL 2021.
Stars: ✭ 318 (+1345.45%)
Mutual labels:  topic-modeling
Ldavis
R package for web-based interactive topic model visualization.
Stars: ✭ 466 (+2018.18%)
Mutual labels:  topic-modeling
Lin Cms Koa
🌀使用Node.JS KOA构建的CMS开发框架
Stars: ✭ 649 (+2850%)
Mutual labels:  cmf
Cmf Sandbox
Base project for trying CMF components integration
Stars: ✭ 368 (+1572.73%)
Mutual labels:  cmf
Bolt
Bolt is a simple CMS written in PHP. It is based on Silex and Symfony components, uses Twig and either SQLite, MySQL or PostgreSQL.
Stars: ✭ 4,136 (+18700%)
Mutual labels:  cmf
Bigartm
Fast topic modeling platform
Stars: ✭ 563 (+2459.09%)
Mutual labels:  topic-modeling
2018 Machinelearning Lectures Esa
Machine Learning Lectures at the European Space Agency (ESA) in 2018
Stars: ✭ 280 (+1172.73%)
Mutual labels:  topic-modeling
Platform
A modular multilingual CMS built with Laravel 5.
Stars: ✭ 719 (+3168.18%)
Mutual labels:  cmf
Corex topic
Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx
Stars: ✭ 439 (+1895.45%)
Mutual labels:  topic-modeling
Sulu
Core framework that implements the functionality of the Sulu content management system
Stars: ✭ 645 (+2831.82%)
Mutual labels:  cmf
Guidedlda
semi supervised guided topic model with custom guidedLDA
Stars: ✭ 390 (+1672.73%)
Mutual labels:  topic-modeling
Pyshorttextcategorization
Various Algorithms for Short Text Mining
Stars: ✭ 429 (+1850%)
Mutual labels:  topic-modeling
Textpattern
A flexible, elegant, fast and easy-to-use content management system written in PHP.
Stars: ✭ 572 (+2500%)
Mutual labels:  cmf
Text mining resources
Resources for learning about Text Mining and Natural Language Processing
Stars: ✭ 358 (+1527.27%)
Mutual labels:  topic-modeling
Processwire
ProcessWire 3.x is a friendly and powerful open source CMS with a strong API.
Stars: ✭ 669 (+2940.91%)
Mutual labels:  cmf
Pyrocms
Pyro is an experienced and powerful Laravel PHP CMS.
Stars: ✭ 3,086 (+13927.27%)
Mutual labels:  cmf
Paper Reading
Paper reading list in natural language processing, including dialogue systems and text generation related topics.
Stars: ✭ 508 (+2209.09%)
Mutual labels:  topic-modeling
Bertopic
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
Stars: ✭ 745 (+3286.36%)
Mutual labels:  topic-modeling
Text2vec
Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
Stars: ✭ 715 (+3150%)
Mutual labels:  topic-modeling
Sulu Standard
This repository is not longer the recommended way to start a sulu project. Use:
Stars: ✭ 636 (+2790.91%)
Mutual labels:  cmf

PyCMF

Collective Matrix Factorization in Python.

Collective Matrix Factorization is a machine learning method that decomposes two matrices equation and equation into three matrices equation, equation, and equation such that

equation

equation

where equation is either the identity or sigmoid function.

Why Use CMF?

CMF decomposes complex and multiple relationships into a small number of components, and can provide valuable insights into your data. Relationships between

  • words, documents, and sentiment
  • people, movies, genres, and ratings
  • items, categories, people, and sales

and many more can all be handled with this simple framework. See Use Cases for more details.

Usage

PyCMF implements a scikit-learn like interface (full compatibility with scikit-learn is currently in progress)

>>> import numpy as np                                                                                          
>>> import pycmf
>>> X = np.abs(np.random.randn(5, 4)); Y = np.abs(np.random.randn(4, 1))
>>> model = pycmf.CMF(n_components=4)
>>> U, V, Z = model.fit_transform(X, Y)
>>> np.linalg.norm(X - U @ V.T) / np.linalg.norm(X)
0.00010788067541423165
>>> np.linalg.norm(Y - V @ Z.T) / np.linalg.norm(Y)
1.2829730942643831e-05

Getting Started

$ pip install git+https://github.com/smn-ailab/PyCMF

Numpy and Cython must be installed in advance.

Features

  • Support for both dense and sparse matrices
  • Support for linear and sigmoid transformations
  • Non-negativity constraints on the components (useful in use cases like topic modeling)
  • Stochastic estimation of the gradient and Hessian for the newton solver
  • Visualizing topics and importances (see CMF.print_topic_terms)

See the docstrings for more details on how to configure CMF.

Use Cases

See samples for working examples. Possible use cases include:

Topic modeling and text classification

Suppose you want to do topic modeling to explore the data, but want to use supervision signals such as toxicity, sentiment, etc.. By using CMF, you can extract topics that are relevant to classifying texts.

Movie rating prediction

Many prediction tasks involve relations between multiple entities. Movie rating prediction is a good example: common entities include users, movies, genres and actors. CMF can be used to model these relations and predict unobserved edges.

License

This project is licensed under the MIT License - see the LICENSE file for details

References

Lee, D., & Seung, H. (2001). Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems, (1), 556–562.

Singh, A. P., & Gordon, G. J. (2008). Relational learning via collective matrix factorization. Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD 08, 650.

Wang, Y., Yanchunzhangvueduau, E., & Zhou, B. (2017). Semi-supervised collective matrix factorization for topic detection and document clustering.

TODO

  • [ ] Improve performance
  • [ ] Add support for weight matrices on relations
  • [ ] Add support for predicting using obtained components
  • [ ] Full compatibility with sklearn
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].