All Projects → jinhongjung → Pyrwr

jinhongjung / Pyrwr

Licence: mit
Python Implementation for Random Walk with Restart (RWR)

Programming Languages

python
139335 projects - #7 most used programming language
python3
1442 projects

Projects that are alternatives of or similar to Pyrwr

Reddit Detective
Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more
Stars: ✭ 129 (+168.75%)
Mutual labels:  graph, network
Programming Languages Influence
Code to retrieve data for the programming languages influence visualizations from Freebase
Stars: ✭ 171 (+256.25%)
Mutual labels:  graph, network
Workbase
Grakn Workbase (Knowledge IDE)
Stars: ✭ 106 (+120.83%)
Mutual labels:  graph, network
Serial Studio
Multi-purpose serial data visualization & processing program
Stars: ✭ 1,168 (+2333.33%)
Mutual labels:  graph, network
Stabping
Continuously monitor your connection/ISP's latency & speed and view them in interactive charts
Stars: ✭ 8 (-83.33%)
Mutual labels:  graph, network
G6
♾ A Graph Visualization Framework in JavaScript
Stars: ✭ 8,490 (+17587.5%)
Mutual labels:  graph, network
Urbanaccess
A tool for GTFS transit and OSM pedestrian network accessibility analysis
Stars: ✭ 137 (+185.42%)
Mutual labels:  graph, network
P2p Graph
Real-time P2P network visualization with D3
Stars: ✭ 245 (+410.42%)
Mutual labels:  graph, network
Vue D3 Network
Vue component to graph networks using d3-force
Stars: ✭ 415 (+764.58%)
Mutual labels:  graph, network
Ngraph.graph
Graph data structure in JavaScript
Stars: ✭ 295 (+514.58%)
Mutual labels:  graph, network
Graphrole
Automatic feature extraction and node role assignment for transfer learning on graphs (ReFeX & RolX)
Stars: ✭ 38 (-20.83%)
Mutual labels:  graph, network
Subdue
The Subdue graph miner discovers highly-compressing patterns in an input graph.
Stars: ✭ 20 (-58.33%)
Mutual labels:  graph, network
Osmnet
Tools for the extraction of OpenStreetMap street network data
Stars: ✭ 39 (-18.75%)
Mutual labels:  graph, network
Llama Archive
Loss & LAtency MAtrix
Stars: ✭ 44 (-8.33%)
Mutual labels:  network
Vue Network
Render a Vue component to indicate network status.
Stars: ✭ 42 (-12.5%)
Mutual labels:  network
Resonance
◾️Resonance | 5kb React animation library
Stars: ✭ 1,011 (+2006.25%)
Mutual labels:  graph
Rats Search
BitTorrent P2P multi-platform search engine for Desktop and Web servers with integrated torrent client.
Stars: ✭ 1,037 (+2060.42%)
Mutual labels:  network
Aplay
A Better(Maybe) iOS Audio Stream、Cache、Play Framework
Stars: ✭ 44 (-8.33%)
Mutual labels:  network
Dknetworking
基于 AFNetworking + YYCache 的二次封装,支持缓存策略的网络请求框架
Stars: ✭ 41 (-14.58%)
Mutual labels:  network
Qualia2.0
Qualia is a deep learning framework deeply integrated with automatic differentiation and dynamic graphing with CUDA acceleration. Qualia was built from scratch.
Stars: ✭ 41 (-14.58%)
Mutual labels:  graph

pyrwr

Python Implementation for Random Walk with Restart (PyRWR).

Random Walk with Restart (RWR) is one of famous link analysis algorithms, which measures node-to-node proximities in arbitrary types of graphs (networks). The representative applications include various real-world graph mining tasks such as personalized node ranking, recommendation in graphs (e.g., 'who you may know'), and anormaly detection.

pyrwr aims to implement algorithms for computing RWR scores based on Power Iteration using numpy and scipy in Python. More specifically, pyrwr focuses on computing a single source RWR score vector w.r.t. a given query (seed) node, which is used for a personalized node ranking w.r.t. the querying node. Besides RWR, pyrwr supports computing Personalized PageRank (PPR) with multiple seeds and PageRank which are well-known variants of RWR.

The supported features of pyrwr are:

  • Query types
    • Random Walk with Restart (RWR): personalized ranking; only a single seed is allowed
    • Personalized PageRank (PPR): personalized ranking; multiple seeds are allowed
    • PageRank: global ranking; all nodes are used as seeds
  • Graph types
    • Unweighted/weighted graphs
    • Undirected/directed graphs

If you are interested in studying random walk based ranking models such as PageRank and RWR, please consider this hands-on tutorial (https://github.com/jinhongjung/tutorial-on-link-analysis) that provides how to correctly implement the algorithms of those models in Python and to analyze real-world networks using the ranking models.

Installation

To install this package, type the following:

cd pyrwr
pip3 install -r requirements.txt
python3 setup.py install

These will execute the installation of python modules required by this package. The name of the installed program is pyrwr. If you want to validate whether the installation is successfully finished, type the following command:

pyrwr --help

Requirements

  • numpy
  • scipy
  • tqdm
  • fire
  • pandas

Usage

We provide the following simple command line usage:

pyrwr --query-type rwr --graph-type directed --input-path data/directed/sample.tsv --output-path output/scores.tsv --seeds 10982

This will compute an RWR score vector w.r.t. the seed node given by --seeds in the given graph specified by --input-path, and write the vector into the target file in --output-path. --query-type specifies the type of query, e.g., this example indicates an RWR query. The detailed format of the input and output files is described below.

Input and Output Format

Input Format

The default input of pyrwr represents the edge list of a graph with the following format (tab separated):

# format: source (int) \t target (int)
1	2
1	4
2	3
...

The above example represents an unweighted and directed graph where each line indicates an edge from source to target. In this case, the edge weight is set to 1 uniformly. The node id should be non-negative (>= 0). To vary weights edge by edge, use the following format:

# format: source (int) \t target (int) \t weight (float)
1	2	1.5	
1	4	3.5
2	3	6.0
...

Note that

  • RWR is defined on positively weighted networks; thus, only positive weights are allowed.
  • If there are redundant edges in an weighted network, their weights will be summed, e.g.,
# Suppose there are the following redundant edges in the original file.
1   2   3
1   2   5
---------
# Then, their weights are summed in the final adjacency matrix as follows.
1   2   8

Output Format

The default output of pyrwr contains the single source RWR score vector w.r.t. the given seed node as follows:

# format : node id \t an RWR score of the node
1   0.1232e-3
2   0.0349e-4
...

How to Use pyrwr in My Codes?

The following example shows how to import pyrwr and compute an RWR query.

from pyrwr.rwr import RWR

rwr = RWR()
rwr.read_graph(input_graph, graph_type)
r = rwr.compute(seed, c, epsilon, max_iters)

Note that seed should be int. The format of the file at input_graph should follow one of the above input formats. r is a column vector (ndarray) having the RWR score vector w.r.t. seed node. The shape of r will be (n, 1) where n is the number of nodes.

For a PPR query, see the following code:

from pyrwr.ppr import PPR

ppr = PPR()
ppr.read_graph(input_graph, graph_type)
r = ppr.compute(seeds, c, epsilon, max_iters)

In this case, seeds is the list of seeds. r is the PPR score vector w.r.t. seeds. Note that the PPR vector r is used for obtaining the personalized node ranking list related to all seeds in the seeds list.

For the PageRank query, use the following snippet:

from pyrwr.pagerank import PageRank

pagerank = PageRank()
pagerank.read_graph(input_graph, graph_type)
r = pagerank.compute(c, epsilon, max_iters)

Note that for pagerank, we do not need to specify seeds since PageRank is a global ranking; thus, it automatically sets required seeds (i.e., all nodes are used as seeds).

Arguments of pyrwr

We summarize the input arguments of pyrwr in the following table:

Arguments Query Type Explanation Default
query-type common Query type among [rwr, ppr, pagerank] None
graph-type common Graph type among [directed, undirected] None
input-path common Input path for a graph None
output-path common Output path for storing a query result None
seeds rwr A single seed node id None
seeds ppr File path for seeds (str) or list of seeds (list) []
c common Restart probablity (rwr) or jumping probability (otherwise) 0.15
epsilon common Error tolerance for power iteration 1e-9
max-iters common Maximum number of iterations for power iteration 100
handles-deadend common If true, handles the deadend issue True

The value of Query Type in the above table is one of the followings:

  • common: parameter of all of rwr, ppr, and pagerank
  • rwr: parameter of rwr
  • ppr: parameter of ppr
  • pagerank: parameter of pagerank

Note the followings:

  • If you want to compute pagerank query, then do not need to specify seeds.
  • For directed graphs, there might be deadend nodes whose outdegree is zero. In this case, a naive power iteration would incur leaking out scores. handles_deadend exists for such issue handling deadend nodes. With handles_deadend, you can guarantee that the sum of a score vector is 1. Otherwise, the sum would less than 1 in directed graphs. The strategy pyrwr exploits is that whenever a random surfer visits a deadend node, go back to a seed node (or one of seed nodes), and restart. See this for the detailed technique of the strategy.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].