All Projects → mponza → Wikipediarelatedness

mponza / Wikipediarelatedness

Licence: apache-2.0
The Wikipedia Relatedness library

Programming Languages

scala
5932 projects

Labels

Projects that are alternatives of or similar to Wikipediarelatedness

copyvios
A copyright violation detector running on Wikimedia Cloud Services
Stars: ✭ 32 (+77.78%)
Mutual labels:  wikipedia
Wptools
Wikipedia tools (for Humans): easily extract data from Wikipedia, Wikidata, and other MediaWikis
Stars: ✭ 371 (+1961.11%)
Mutual labels:  wikipedia
Wikipedia2vec
A tool for learning vector representations of words and entities from Wikipedia
Stars: ✭ 655 (+3538.89%)
Mutual labels:  wikipedia
Wikipediakit
Wikipedia API Client Framework for Swift on macOS, iOS, watchOS, and tvOS
Stars: ✭ 270 (+1400%)
Mutual labels:  wikipedia
Adam qas
ADAM - A Question Answering System. Inspired from IBM Watson
Stars: ✭ 330 (+1733.33%)
Mutual labels:  wikipedia
Wikiteam
Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2020, WikiTeam has preserved more than 250,000 wikis.
Stars: ✭ 404 (+2144.44%)
Mutual labels:  wikipedia
WikimediaUI-Style-Guide
Wikimedia Design Style Guide with user interface focus, authored by Wikimedia Foundation Design team.
Stars: ✭ 93 (+416.67%)
Mutual labels:  wikipedia
Wikiquiz
Generates a quiz for a Wikipedia page using parts of speech and text chunking.
Stars: ✭ 778 (+4222.22%)
Mutual labels:  wikipedia
Jivesearch
A search engine that doesn't track you.
Stars: ✭ 364 (+1922.22%)
Mutual labels:  wikipedia
Search Deflector
A small program that forwards searches from Cortana to your preferred browser and search engine.
Stars: ✭ 620 (+3344.44%)
Mutual labels:  wikipedia
Wit
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.
Stars: ✭ 271 (+1405.56%)
Mutual labels:  wikipedia
Fel
Fast Entity Linker Toolkit for training models to link entities to KnowledgeBase (Wikipedia) in documents and queries.
Stars: ✭ 319 (+1672.22%)
Mutual labels:  wikipedia
Mwparserfromhell
A Python parser for MediaWiki wikicode
Stars: ✭ 440 (+2344.44%)
Mutual labels:  wikipedia
OA-signalling
A project to coordinate implementing a system to signal whether references cited on Wikipedia are free to reuse
Stars: ✭ 19 (+5.56%)
Mutual labels:  wikipedia
Dns Over Wikipedia
Redirect `.idk` domains using the official link found on a topic's Wikipedia page
Stars: ✭ 669 (+3616.67%)
Mutual labels:  wikipedia
verssion
RSS feeds of stable release versions, as found in Wikipedia.
Stars: ✭ 15 (-16.67%)
Mutual labels:  wikipedia
Kiwix Android
Kiwix for Android
Stars: ✭ 390 (+2066.67%)
Mutual labels:  wikipedia
Reality
Comprehensive data proxy to knowledge about real world
Stars: ✭ 795 (+4316.67%)
Mutual labels:  wikipedia
Listen To Wikipedia
Live, generative music from Wikipedia edits
Stars: ✭ 685 (+3705.56%)
Mutual labels:  wikipedia
Wtf wikipedia
a pretty-committed wikipedia markup parser
Stars: ✭ 475 (+2538.89%)
Mutual labels:  wikipedia

Branch for an immediate application of the Two-Stage Framework over every kind of graph. Typing only two commands.

Setting Up

In your working directory, just type:

git clone https://github.com/mponza/WikipediaRelatedness.git
cd WikipediaRelatedness
wget https://piccolo.link/sbt-0.13.17.zip; unzip sbt-0.13.17.zip; rm sbt-0.13.17.zip

for downloading this repository and sbt.

Indexing

First, you need to index and pre-process several resources for running the Two-Stage Framework. This can be automatically done with:

src/main/bash/build.sh path/to/graph.tsv path/to/two-stage-data

where graph.tsv is the graph in tsv format (or tsv.gz) and two-stage-data is the directory that will host all resources that will be indexed for running the Two-Stage Framework.

Running

You can compute the Two-Stage Framework relatedness over a set of query nodes by simply typing:

src/main/bash/query.sh k path/to/two-stage-data path/to/queries.tsv path/to/queries2rel.tsv

where k is the size of the subgraph (the value used in the paper was fixed to 30), two-stage-data is the same directory provided in the Indexing step and queries.tsv are the list of query nodes in tsv format of which the relatedness needs to be computed and saved in queries2rel.tsv.

Examples

For a toy-example on a very small graph please check src/main/bash/example.sh.

For using the Two-Stage Framework in your own code just check Main class for examples.

Datasets of Entity Relatedness Pairs

You can find the datasets WikiSim and WiRe in src/main/resources/datasets/WikiSim.csv and src/main/resources/datasets/WiRe.csv files, respectively.

Citation and Further Reading

If you find any resource (code or data) of this repository useful, please cite our paper:

Marco Ponza, Paolo Ferragina, Soumen Chakrabarti
A Two-Stage Framework for Computing Entity Relatedness in Wikipedia
In Proceedings of 26th International Conference on Information & Knowledge Management (CIKM 2017)

License

The code in this repository has been released under Apache License 2.0.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].