All Projects → molybdenum-99 → whatis

molybdenum-99 / whatis

Licence: MIT license
WhatIs.this: simple entity resolution through Wikipedia

Programming Languages

ruby
36898 projects - #4 most used programming language

Projects that are alternatives of or similar to whatis

wikitable2csv
A web tool to convert Wiki tables to CSV 📈
Stars: ✭ 112 (+522.22%)
Mutual labels:  wikipedia
illuminsight
💡👀 Read EPUB books with built-in insights from wikis, definitions, translations, and Google.
Stars: ✭ 55 (+205.56%)
Mutual labels:  wikipedia
Merge-Machine
Merge Dirty Data with Clean Reference Tables
Stars: ✭ 35 (+94.44%)
Mutual labels:  entity-resolution
entity-fishing
A machine learning tool for fishing entities
Stars: ✭ 176 (+877.78%)
Mutual labels:  wikipedia
xtools
A suite of tools to analyze page, user and project data of MediaWiki websites
Stars: ✭ 78 (+333.33%)
Mutual labels:  wikipedia
snowman
Welcome to Snowman App – a Data Matching Benchmark Platform.
Stars: ✭ 25 (+38.89%)
Mutual labels:  entity-resolution
fetch
wik is use to get information about anything on the shell using Wikipedia.
Stars: ✭ 335 (+1761.11%)
Mutual labels:  wikipedia
wikipedia-live-monitor
No description or website provided.
Stars: ✭ 19 (+5.56%)
Mutual labels:  wikipedia
wiki-tui
A simple and easy to use Wikipedia Text User Interface
Stars: ✭ 74 (+311.11%)
Mutual labels:  wikipedia
ratewithscience
Rate things on arbitrary scales using big data and science!
Stars: ✭ 42 (+133.33%)
Mutual labels:  wikipedia
Laosheng.top
老生常谈,节约您的搜寻时间。Laosheng.top 中国新闻云媒体,中央外宣与一带一路云媒体,五大洲的报纸、电视、通讯社;The Belt and Road Cloud Media。 解放军微博阵列,明星微博粉丝榜。中央有关部门大全,政府政协人大两院。中国千县地名图,联合国有关部门。 大萌望海楼,找法不用愁。中国法律体系概览,大萌法律读本。 老生常谈排行榜,难搜到的好网站。LSIP 大规模集成网页。😤
Stars: ✭ 21 (+16.67%)
Mutual labels:  wikipedia
pageviews
Pageviews Analysis tool for Wikimedia Foundation wikis
Stars: ✭ 95 (+427.78%)
Mutual labels:  wikipedia
DiscordWikiBot
Discord bot for Wikimedia projects and MediaWiki wiki sites
Stars: ✭ 30 (+66.67%)
Mutual labels:  wikipedia
youtube-video-maker
📹 A tool for automatic video creation and uploading on YouTube
Stars: ✭ 134 (+644.44%)
Mutual labels:  wikipedia
pageviews.js
A lightweight JavaScript client library for the Wikimedia Pageviews API for Wikipedia and various of its sister projects for Node.js and the browser.
Stars: ✭ 24 (+33.33%)
Mutual labels:  wikipedia
infobox-parser
Parse Wikipedia Infoboxes
Stars: ✭ 35 (+94.44%)
Mutual labels:  wikipedia
wikiapi
JavaScript MediaWiki API for node.js
Stars: ✭ 28 (+55.56%)
Mutual labels:  wikipedia
Word2Vec-on-Wikipedia-Corpus
利用wikipedia中英文的語料訓練Word2vec模型
Stars: ✭ 18 (+0%)
Mutual labels:  wikipedia
oabot
Adding links to full text in Wikipedia references
Stars: ✭ 33 (+83.33%)
Mutual labels:  wikipedia
linkcount
Web program to see the number of links to a page in any Wikimedia project.
Stars: ✭ 26 (+44.44%)
Mutual labels:  wikipedia

WhatIs.this

Gem Version Build Status

WhatIs.this is a quick probe for the meaning and metadata of concepts through Wikipedia.

Showcase

require 'whatis'

sparta = WhatIs.this('Sparta')
# => #<ThisIs Sparta [img] {37.081944,22.423611}>
sparta.coordinates
# => #<Geo::Coord 37.081944,22.423611>
sparta.image
# => "https://upload.wikimedia.org/wikipedia/commons/6/6c/Sparta_territory.jpg"

sparta.describe
# => Sparta
#            title: "Sparta"
#      description: "city-state in ancient Greece"
#      coordinates: #<Geo::Coord 37.081944,22.423611>
#          extract: "Sparta (Doric Greek: ; Attic Greek: ) was a prominent city-state in ancient Greece."
#            image: "https://upload.wikimedia.org/wikipedia/commons/6/6c/Sparta_territory.jpg"

# Fetch additional information: categories & translations:
sparta = WhatIs.this('Sparta', categories: true, languages: 'el')
# => #<ThisIs Sparta/Αρχαία Σπάρτη, 7 categories [img] {37.081944,22.423611}>
sparta.describe
# => Sparta
#            title: "Sparta"
#      description: "city-state in ancient Greece"
#      coordinates: #<Geo::Coord 37.081944,22.423611>
#       categories: ["Former countries in Europe", "Former populated places in Greece", "Locations in Greek mythology", "Populated places in Laconia", "Sparta", "States and territories disestablished in the 2nd century BC", "States and territories established in the 11th century BC"]
#        languages: {"el"=>#<ThisIs::Link el:Αρχαία Σπάρτη>}
#          extract: "Sparta (Doric Greek: ; Attic Greek: ) was a prominent city-state in ancient Greece."
#            image: "https://upload.wikimedia.org/wikipedia/commons/6/6c/Sparta_territory.jpg"

sparta.languages['el'].resolve
# => #<ThisIs Αρχαία Σπάρτη [img]>

# Multiple entities at once:
WhatIs.these('Paris', 'Berlin', 'Rome', 'Athens')
# => {
#   "Paris"=>#<ThisIs Paris [img] {48.856700,2.350800}>,
#   "Berlin"=>#<ThisIs Berlin [img] {52.516667,13.388889}>,
#   "Rome"=>#<ThisIs Rome [img] {41.900000,12.500000}>,
#   "Athens"=>#<ThisIs Athens [img] {37.983972,23.727806}>
# }

Applications

The gem is intended to be a simple tool for entities resolution/normalization. Possible usages:

  • You have a lot of user-entered answers to "What city are you from". Through WhatIs.these it is pretty easy to resolve them to "canonical" city name (e.g. "Warsaw", "Warszawa", "Warsaw, Poland" => "Warsaw") and map locations;
  • Quick check on user-entered cultural objects, "what is it";
  • Canonical Wikipedia-powered translations of toponyms, movie titles and historical people;
  • ...and so-on.

Features/problems

  • Fetches Wikipedia data by entity names: canonical title, geographical coordinates, main page image, the first phrase, short entity description from Wikidata;
  • Optionally fetches links to other Wikipedia languages and list of page categories;
  • Fetches any number of Wikipedia pages in minimal number of API requests (50-page batches);
    • Note that despite this optimization, Wikipedia API responses are not very small, so resolving, say, 1000 entities, will errrm take some time;
  • Works with any language version of Wikipedia:
WhatIs[:de].this('München')
# => #<ThisIs München [img] {48.137222,11.575556}>
  • Handles not found pages and allows to search them in place:
g = WhatIs.this('Guardians Of The Galaxy') # Wikipedia pages is case-sensitive
# => #<ThisIs::NotFound Guardians Of The Galaxy>
g.search(3)
# => [#<ThisIs::Ambigous Guardians of the Galaxy (11 options)>, #<ThisIs Guardians of the Galaxy (film)>, #<ThisIs Guardians of the Galaxy Vol. 2>]
  • Handles disambiguation pages:
g = WhatIs.this('Guardians of the Galaxy')
# => #<ThisIs::Ambigous Guardians of the Galaxy (11 options)>
g.describe
# => Guardians of the Galaxy: ambigous (11 options)
#      #<ThisIs::Link Marvel Comics teams/Guardians of the Galaxy (1969 team)>: Guardians of the Galaxy (1969 team), the original 31st-century team from an alternative timeline of the Marvel Universe (Earth-691)
#      #<ThisIs::Link Marvel Comics teams/Guardians of the Galaxy (2008 team)>: Guardians of the Galaxy (2008 team), the modern version of the team formed in the aftermath of Annihilation: Conquest
#    <...skip...>
#      Usage: .variants[0].resolve, .resolve_all
g.variants[1].resolve(categories: true)
# => #<ThisIs Guardians of the Galaxy (2008 team), 13 categories>
  • Provides command-line tool:
$ whatis Paris Berlin Rome
Paris: Paris {48.856700,2.350800} - capital city of France
Berlin: Berlin {52.516667,13.388889} - capital city of Germany
Rome: Rome {41.900000,12.500000} - capital city of Italy

$ whatis --help
Usage: `whatis [options] title1, title2, title3

Options:
    -l, --language CODE              Which language Wikipedia to ask, 2-letter code. "en" by default
    -t, --languages [CODE]           Without argument, fetches all translations for entity.
                                     With argument (two-letter code) fetches only one translation.
                                     By default, no translations are fetched.
        --categories                 Whether to fetch entity categories
    -f, --format FORMAT              Output format: one line per entity ("short"), several lines per
                                     entity ("long"), or "json". Default is "short".
    -h, --help                       Show this message

Note on disambiguation pages

Unfortunately, Wikipedia does not provide a consistent way to tell disambiguation pages from others, the only way is to know is to see the page's categories (different for different languages). Therefore, currently, disambiguation works currently for English, Ukrainian, Russian and Belorussian. Feel free to contribute disambiguation categories for your language versions!

Usage

gem install whatis or add gem "whatis" to your Gemfile.

Then use it as library (see docs for WhatIs and its methods) or command-line tool (try $ whatis --help).

How it works

WhatIs.this is a small brother of large reality. Under the hood, it uses infoboxer semantic Wikipedia client.

Most of the information is taken from API response metadata, but for some features (ambiguities resolution), Wikipedia page is actually parsed.

Unlike reality (which tries to be comprehensive), WhatIs.this tries to be as simple yet useful, as possible.

Author

Victor Shepelev

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].