All Projects → lhelwerd → WebGraph

lhelwerd / WebGraph

Licence: GPL-3.0 License
WebGraph framework with extensions

Programming Languages

java
68154 projects - #9 most used programming language
shell
77523 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to WebGraph

yiufcrawl
Unofficial preservationist fork of DCSS
Stars: ✭ 13 (-40.91%)
Mutual labels:  forked-repo
tes3mp-android
tes3mp ported to Android devices (using CrabNet). Forked from xyzz/openmw-android.
Stars: ✭ 47 (+113.64%)
Mutual labels:  forked-repo
useful-forks.github.io
Improving GitHub's Forks list discoverability through automatic filtering. The project offers an online tool and a Chrome extension.
Stars: ✭ 917 (+4068.18%)
Mutual labels:  forked-repo
vpsolver
Arc-flow Vector Packing Solver (VPSolver)
Stars: ✭ 86 (+290.91%)
Mutual labels:  graph-compression
MIRROR-HUNTER
Who are we? We are the Hunters of all Torrent in this world.🗡️.Fork from SlamDevs
Stars: ✭ 86 (+290.91%)
Mutual labels:  forked-repo
How to set up and use the WebGraph framework

In order to compile the WebGraph framework and set it up so that it can be run,
the following steps should be followed:

- Download and extract Apache Maven binaries from
  http://maven.apache.org/download.cgi
- Download the dependencies tarball from http://webgraph.di.unimi.it and
  extract it.
- Retrieve the WebGraph framework source code with extensions from this
  repository. This version contains the copy list and copy flags compression
  formats, as well as additional flags for the other compression schemes.
- Compile the JAR file of the framework using Maven by running "mvn install"
  within the WebGraph root directory.
- Copy the target/webgraph-3.4.2.jar file it to the same location as the JAR
  files from the dependencies.

In order to use Maven on a computer where it is not yet in the PATH, one
can run mvn.sh provided in the repository. This sets up a link to the Maven
root directory. Note that it only keeps the path setting for the current
session by default. Instructions on adding the paths to a .bashrc file, for
example, are given by the script. Otherwise, one should either run the script
in a bash with ". ./mvn.sh" and run mvn afterwards, or with "./mvn.sh install"
(i.e., in place of mvn).

The framework has different modes of operations that allow compressing,
decompressing and testing graph files in different formats. For example, the
following command recompresses a dataset with other parameters:
java -cp "*" it.unimi.dsi.webgraph.BVGraph -o -m 1 ../sets/uk-2002 ../sets/uk-fast

This command creates an offsets file and alters the maximum reference chain
length of the given dataset.

The following parameters can be used to alter the implemented compression
algorithms:
- -m determines the maximum length of a reference chain that we can make
  during reference compression. -1 is essentially the same as allowing any
  length of reference chains, while 0 skips reference compression, which is
  used by copy list, copy blocks and copy flags compression.
- -w is the window size. 0 skips all reference compression, which is used by
  copy list, copy blocks and copy flags compression.
- -i is interval minimum length. 0 skips interval compression.
- -r is residual compression. This is a boolean parameter, where the value
  0 skips gaps compression, and 1 performs gaps compression, which is the
  default.
- -b determines which copy compression is used. 0 is copy list, 1 is copy
  blocks (default) and 2 is copy flags. This is only used if the maximum
  reference length (-m) and window size (-w) are not 0.

As we can see, we can set certain parameters to 0 to make sure that their
respective compression schemes are not used. We can therefore test the algorithms separately. To be specific, we have the following parameters for the
given compression formats:

- No compression algorithm: -m 0 -w 0 -i 0 -r 0
- Gaps compression: -m 0 -w 0 -i 0
- Interval compression: -m 0 -w 0 -r 0
- Copy blocks: -i 0 -r 0 -b 1
- Copy list: -i 0 -r 0 -b 0
- Copy flags: -i 0 -r 0 -b 2

For instance, this command creates a graph file with only gaps compression:
java -cp "*" it.unimi.dsi.webgraph.BVGraph -m 0 -w 0 -i 0 ../sets/uk-2002 ../sets/uk-gaps

The compressed files are binary files, with special representations for numeric
values that improve compression compared to plain text formats. The framework
can also decompress the graphs and store them in a readable format. There are
two major plain text formats that the framework supports. The first plain text
format is the naive adjacency list format:
java -cp "*" it.unimi.dsi.webgraph.ASCIIGraph -g BVGraph ../sets/uk-2002 ../sets/uk-raw

The second format is a naive listing of pairs of edges:
java -cp "*" it.unimi.dsi.webgraph.ArcListASCIIGraph -g BVGraph ../sets/uk-2002 ../sets/uk.edges

One can also import a graph from these flat files, by swapping around the
"ASCIIGraph" or "ArcListASCIIGraph" class with the "BVGraph" class. This
requires that those files are sorted correctly and do not contain non-edge
data such as comments. For example:
java -cp "*" it.unimi.dsi.webgraph.BVGraph -g ArcListASCIIGraph ../sets/huge.txt.e ../sets/huge
Here, one can again add specific compression flags to tune which compression
and which settings to use. Note that this will almost always giv increased
compression since it is no longer stored as ASCII text but as binary codes,
skewing the comparison.

The WebGraph framework also provides a speed test module, which has been
adapted to use CPU time instead of wall-clock time. The speed test has two
different modes in which it can operate. By default, the speed test performs
sequential testing, which times how long it takes to expand the whole
compressed graph. The following command can be used to run a sequential speed
test:
java -cp "*" it.unimi.dsi.webgraph.test.SpeedTest -g BVGraph ../sets/uk-gaps

We can also perform random access testing. This mode can be started by giving
the parameter -r. This parameter requires a sample size as value, which
determines how many randomly chosen nodes we use in the test in order to
measure and average the random access times. This command can be used to run
a random access speed test:
java -cp "*" it.unimi.dsi.webgraph.test.SpeedTest -g BVGraph -r 100000 ../sets/uk-gaps

Finally, if the WebGraph framework runs out of memory, one can increase the
Java heap size using the -Xmx command line flag and the Java thread stack size
using -Xss.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].