All Projects → eliorc → Node2vec

eliorc / Node2vec

Licence: mit
Implementation of the node2vec algorithm.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Node2vec

Verse
Reference implementation of the paper VERSE: Versatile Graph Embeddings from Similarity Measures
Stars: ✭ 98 (-85.02%)
Mutual labels:  machine-learning-algorithms, embeddings
Ner Lstm
Named Entity Recognition using multilayered bidirectional LSTM
Stars: ✭ 532 (-18.65%)
Mutual labels:  embeddings
Algorithmsanddatastructure
Algorithms And DataStructure Implemented In Python & CPP, Give a Star 🌟If it helps you
Stars: ✭ 400 (-38.84%)
Mutual labels:  machine-learning-algorithms
Lightly
A python library for self-supervised learning on images.
Stars: ✭ 439 (-32.87%)
Mutual labels:  embeddings
Moa
MOA is an open source framework for Big Data stream mining. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation.
Stars: ✭ 409 (-37.46%)
Mutual labels:  machine-learning-algorithms
Learn Data Science For Free
This repositary is a combination of different resources lying scattered all over the internet. The reason for making such an repositary is to combine all the valuable resources in a sequential manner, so that it helps every beginners who are in a search of free and structured learning resource for Data Science. For Constant Updates Follow me in …
Stars: ✭ 4,757 (+627.37%)
Mutual labels:  machine-learning-algorithms
Pattern classification
A collection of tutorials and examples for solving and understanding machine learning and pattern classification tasks
Stars: ✭ 3,880 (+493.27%)
Mutual labels:  machine-learning-algorithms
Multi Class Text Classification Cnn Rnn
Classify Kaggle San Francisco Crime Description into 39 classes. Build the model with CNN, RNN (GRU and LSTM) and Word Embeddings on Tensorflow.
Stars: ✭ 570 (-12.84%)
Mutual labels:  embeddings
Hyperparameter Optimization Of Machine Learning Algorithms
Implementation of hyperparameter optimization/tuning methods for machine learning & deep learning models (easy&clear)
Stars: ✭ 516 (-21.1%)
Mutual labels:  machine-learning-algorithms
Nimfa
Nimfa: Nonnegative matrix factorization in Python
Stars: ✭ 440 (-32.72%)
Mutual labels:  embeddings
Datascience Ai Machinelearning Resources
Alex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.
Stars: ✭ 414 (-36.7%)
Mutual labels:  machine-learning-algorithms
Multi Class Text Classification Cnn
Classify Kaggle Consumer Finance Complaints into 11 classes. Build the model with CNN (Convolutional Neural Network) and Word Embeddings on Tensorflow.
Stars: ✭ 410 (-37.31%)
Mutual labels:  embeddings
Ofxdarknet
darknet neural network addon for openFrameworks
Stars: ✭ 493 (-24.62%)
Mutual labels:  machine-learning-algorithms
Lmdb Embeddings
Fast word vectors with little memory usage in Python
Stars: ✭ 404 (-38.23%)
Mutual labels:  embeddings
Solid
🎯 A comprehensive gradient-free optimization framework written in Python
Stars: ✭ 546 (-16.51%)
Mutual labels:  machine-learning-algorithms
Ml Roadmap
🤖 Roadmap to becoming a Machine Learning developer in 2020
Stars: ✭ 398 (-39.14%)
Mutual labels:  machine-learning-algorithms
Agile data code 2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (-36.85%)
Mutual labels:  machine-learning-algorithms
Awesome Persian Nlp Ir
Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
Stars: ✭ 460 (-29.66%)
Mutual labels:  embeddings
Speedtorch
Library for faster pinned CPU <-> GPU transfer in Pytorch
Stars: ✭ 615 (-5.96%)
Mutual labels:  embeddings
Prmlt
Matlab code of machine learning algorithms in book PRML
Stars: ✭ 5,356 (+718.96%)
Mutual labels:  machine-learning-algorithms

Node2Vec

Downloads

Python3 implementation of the node2vec algorithm Aditya Grover, Jure Leskovec and Vid Kocijan. node2vec: Scalable Feature Learning for Networks. A. Grover, J. Leskovec. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016.

Installation

pip install node2vec

Usage

import networkx as nx
from node2vec import Node2Vec

# Create a graph
graph = nx.fast_gnp_random_graph(n=100, p=0.5)

# Precompute probabilities and generate walks - **ON WINDOWS ONLY WORKS WITH workers=1**
node2vec = Node2Vec(graph, dimensions=64, walk_length=30, num_walks=200, workers=4)  # Use temp_folder for big graphs

# Embed nodes
model = node2vec.fit(window=10, min_count=1, batch_words=4)  # Any keywords acceptable by gensim.Word2Vec can be passed, `dimensions` and `workers` are automatically passed (from the Node2Vec constructor)

# Look for most similar nodes
model.wv.most_similar('2')  # Output node names are always strings

# Save embeddings for later use
model.wv.save_word2vec_format(EMBEDDING_FILENAME)

# Save model for later use
model.save(EMBEDDING_MODEL_FILENAME)

# Embed edges using Hadamard method
from node2vec.edges import HadamardEmbedder

edges_embs = HadamardEmbedder(keyed_vectors=model.wv)

# Look for embeddings on the fly - here we pass normal tuples
edges_embs[('1', '2')]
''' OUTPUT
array([ 5.75068220e-03, -1.10937878e-02,  3.76693785e-01,  2.69105062e-02,
       ... ... ....
       ..................................................................],
      dtype=float32)
'''

# Get all edges in a separate KeyedVectors instance - use with caution could be huge for big networks
edges_kv = edges_embs.as_keyed_vectors()

# Look for most similar edges - this time tuples must be sorted and as str
edges_kv.most_similar(str(('1', '2')))

# Save embeddings for later use
edges_kv.save_word2vec_format(EDGES_EMBEDDING_FILENAME)

Parameters

node2vec.Node2vec

  • Node2Vec constructor:

    1. graph: The first positional argument has to be a networkx graph. Node names must be all integers or all strings. On the output model they will always be strings.
    2. dimensions: Embedding dimensions (default: 128)
    3. walk_length: Number of nodes in each walk (default: 80)
    4. num_walks: Number of walks per node (default: 10)
    5. p: Return hyper parameter (default: 1)
    6. q: Inout parameter (default: 1)
    7. weight_key: On weighted graphs, this is the key for the weight attribute (default: 'weight')
    8. workers: Number of workers for parallel execution (default: 1)
    9. sampling_strategy: Node specific sampling strategies, supports setting node specific 'q', 'p', 'num_walks' and 'walk_length'. Use these keys exactly. If not set, will use the global ones which were passed on the object initialization`
    10. quiet: Boolean controlling the verbosity. (default: False)
    11. temp_folder: String path pointing to folder to save a shared memory copy of the graph - Supply when working on graphs that are too big to fit in memory during algorithm execution.
    12. seed: Seed for the random number generator (default: None). Deterministic results can be obtained if seed is set and workers=1.
  • Node2Vec.fit method: Accepts any key word argument acceptable by gensim.Word2Vec

node2vec.EdgeEmbedder

EdgeEmbedder is an abstract class which all the concrete edge embeddings class inherit from. The classes are AverageEmbedder, HadamardEmbedder, WeightedL1Embedder and WeightedL2Embedder which their practical definition could be found in the paper on table 1 Notice that edge embeddings are defined for any pair of nodes, connected or not and even node with itself.

  • Constructor:

    1. keyed_vectors: A gensim.models.KeyedVectors instance containing the node embeddings
    2. quiet: Boolean controlling the verbosity. (default: False)
  • EdgeEmbedder.__getitem__(item) method, better known as EdgeEmbedder[item]:

    1. item - A tuple consisting of 2 nodes from the keyed_vectors passed in the constructor. Will return the embedding of the edge.
  • EdgeEmbedder.as_keyed_vectors method: Returns a gensim.models.KeyedVectors instance with all possible node pairs in a sorted manner as string. For example, for nodes ['1', '2', '3'] we will have as keys "('1', '1')", "('1', '2')", "('1', '3')", "('2', '2')", "('2', '3')" and "('3', '3')".

Caveats

  • Node names in the input graph must be all strings, or all ints
  • Parallel execution not working on Windows (joblib known issue). To run non-parallel on Windows pass workers=1 on the Node2Vec's constructor

TODO

  • [x] Parallel implementation for walk generation
  • [ ] Parallel implementation for probability precomputation
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].