All Projects → mims-harvard → Scikit Fusion

mims-harvard / Scikit Fusion

Licence: other
scikit-fusion: Data fusion via collective latent factor models

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Scikit Fusion

Cofactor
CoFactor: Regularizing Matrix Factorization with Item Co-occurrence
Stars: ✭ 160 (+55.34%)
Mutual labels:  embeddings, matrix-factorization
Awesome-Machine-Learning-Papers
📖Notes and remarks on Machine Learning related papers
Stars: ✭ 35 (-66.02%)
Mutual labels:  embeddings, matrix-factorization
Nimfa
Nimfa: Nonnegative matrix factorization in Python
Stars: ✭ 440 (+327.18%)
Mutual labels:  embeddings, matrix-factorization
Finalfrontier
Context-sensitive word embeddings with subwords. In Rust.
Stars: ✭ 61 (-40.78%)
Mutual labels:  embeddings
Entity embeddings categorical
Discover relevant information about categorical data with entity embeddings using Neural Networks (powered by Keras)
Stars: ✭ 67 (-34.95%)
Mutual labels:  embeddings
Stock Rnn
Predict stock market prices using RNN model with multilayer LSTM cells + optional multi-stock embeddings.
Stars: ✭ 1,213 (+1077.67%)
Mutual labels:  embeddings
Carskit
Java-Based Context-aware Recommendation Library
Stars: ✭ 98 (-4.85%)
Mutual labels:  matrix-factorization
Pytorch Continuous Bag Of Words
The Continuous Bag-of-Words model (CBOW) is frequently used in NLP deep learning. It's a model that tries to predict words given the context of a few words before and a few words after the target word.
Stars: ✭ 50 (-51.46%)
Mutual labels:  embeddings
Flurs
🌊 FluRS: A Python library for streaming recommendation algorithms
Stars: ✭ 97 (-5.83%)
Mutual labels:  matrix-factorization
Chinese Word Vectors
100+ Chinese Word Vectors 上百种预训练中文词向量
Stars: ✭ 9,548 (+9169.9%)
Mutual labels:  embeddings
Recosystem
Recommender System Using Parallel Matrix Factorization
Stars: ✭ 74 (-28.16%)
Mutual labels:  matrix-factorization
Graph 2d cnn
Code and data for the paper 'Classifying Graphs as Images with Convolutional Neural Networks' (new title: 'Graph Classification with 2D Convolutional Neural Networks')
Stars: ✭ 67 (-34.95%)
Mutual labels:  embeddings
Expo Mf
Exposure Matrix Factorization: modeling user exposure in recommendation
Stars: ✭ 81 (-21.36%)
Mutual labels:  matrix-factorization
Deeplearning Nlp Models
A small, interpretable codebase containing the re-implementation of a few "deep" NLP models in PyTorch. Colab notebooks to run with GPUs. Models: word2vec, CNNs, transformer, gpt.
Stars: ✭ 64 (-37.86%)
Mutual labels:  embeddings
Verse
Reference implementation of the paper VERSE: Versatile Graph Embeddings from Similarity Measures
Stars: ✭ 98 (-4.85%)
Mutual labels:  embeddings
Ml Surveys
📋 Survey papers summarizing advances in deep learning, NLP, CV, graphs, reinforcement learning, recommendations, graphs, etc.
Stars: ✭ 1,063 (+932.04%)
Mutual labels:  embeddings
Dict2vec
Dict2vec is a framework to learn word embeddings using lexical dictionaries.
Stars: ✭ 91 (-11.65%)
Mutual labels:  embeddings
Mads.jl
MADS: Model Analysis & Decision Support
Stars: ✭ 71 (-31.07%)
Mutual labels:  matrix-factorization
Dutchembeddings
Repository for the word embeddings experiments described in "Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource", presented at LREC 2016.
Stars: ✭ 71 (-31.07%)
Mutual labels:  embeddings
Skip Thought Tf
An implementation of skip-thought vectors in Tensorflow
Stars: ✭ 77 (-25.24%)
Mutual labels:  embeddings

scikit-fusion

build: passing BSD license

scikit-fusion is a Python module for data fusion and learning over heterogeneous datasets. The core of scikit-fusion are recent collective latent factor models and large-scale joint matrix factorization algorithms.

[News:] Fast CPU and GPU-accelerated implementatons of some of our methods.

[News:] Scikit-fusion, collective latent factor models, matrix factorization for data fusion and learning over hetnets.

[News:] fastGNMF, fast implementation of graph-regularized non-negative matrix factorization using Facebook FAISS.

Dependencies

scikit-fusion is tested to work under Python 3.

The required dependencies to build the software are Numpy >= 1.7, SciPy >= 0.12, PyGraphviz >= 1.3 (needed only for drawing data fusion graphs) and Joblib >= 0.8.4.

Install

This package uses distutils, which is the default way of installing python modules. To install in your home directory, use:

python setup.py install --user

To install for all users on Unix/Linux:

python setup.py build
sudo python setup.py install

For development mode use:

python setup.py develop

Use

Let's generate three random data matrices describing three different object types:

 >>> import numpy as np
 >>> R12 = np.random.rand(50, 100)
 >>> R13 = np.random.rand(50, 40)
 >>> R23 = np.random.rand(100, 40)

Next, we define our data fusion graph:

 >>> from skfusion import fusion
 >>> t1 = fusion.ObjectType('Type 1', 10)
 >>> t2 = fusion.ObjectType('Type 2', 20)
 >>> t3 = fusion.ObjectType('Type 3', 30)
 >>> relations = [fusion.Relation(R12, t1, t2),
                  fusion.Relation(R13, t1, t3),
                  fusion.Relation(R23, t2, t3)]
 >>> fusion_graph = fusion.FusionGraph()
 >>> fusion_graph.add_relations_from(relations)

and then collectively infer the latent data model:

 >>> fuser = fusion.Dfmf()
 >>> fuser.fuse(fusion_graph)
 >>> print(fuser.factor(t1).shape)
 (50, 10)

Afterwards new data might arrive:

 >>> new_R12 = np.random.rand(10, 100)
 >>> new_R13 = np.random.rand(10, 40)

for which we define the fusion graph:

 >>> new_relations = [fusion.Relation(new_R12, t1, t2),
                      fusion.Relation(new_R13, t1, t3)]
 >>> new_graph = fusion.FusionGraph(new_relations)

and transform new objects to the latent space induced by the fuser:

 >>> transformer = fusion.DfmfTransform()
 >>> transformer.transform(t1, new_graph, fuser)
 >>> print(transformer.factor(t1).shape)
 (10, 10)

scikit-fusion contains several applications of data fusion:

>>> from skfusion import datasets
>>> dicty = datasets.load_dicty()
>>> print(dicty)
FusionGraph(Object types: 3, Relations: 3)
>>> print(dicty.object_types)
{ObjectType(GO term), ObjectType(Experimental condition), ObjectType(Gene)}
>>> print(dicty.relations)
{Relation(ObjectType(Gene), ObjectType(GO term)),
 Relation(ObjectType(Gene), ObjectType(Gene)),
 Relation(ObjectType(Gene), ObjectType(Experimental condition))}

Selected publications (Methods)

Selected publications (Applications)

Tutorials

  • Large-scale data fusion by collective matrix factorization, Basel Computational Biology Conference, [BC]^2 [Slides] [Handouts]
  • Data fusion of everything, 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC [Slides] [Handouts]
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].