All Projects → jackrusher → Mundaneum

jackrusher / Mundaneum

Licence: other
A clojure wrapper around WikiData

Programming Languages

clojure
4091 projects
dsl
153 projects

Projects that are alternatives of or similar to Mundaneum

scholia
Wikidata-based scholarly profiles
Stars: ✭ 166 (+207.41%)
Mutual labels:  sparql, wikidata
Scholia
Wikidata scholarly profiles
Stars: ✭ 115 (+112.96%)
Mutual labels:  wikidata, sparql
SPARQL
Lib PHP for SPARQL 1.1
Stars: ✭ 23 (-57.41%)
Mutual labels:  sparql, wikidata
Wikibase Sdk
JS utils functions to query a Wikibase instance and simplify its results
Stars: ✭ 251 (+364.81%)
Mutual labels:  wikidata, sparql
WikidataQueryServiceR
An R package for the Wikidata Query Service API
Stars: ✭ 23 (-57.41%)
Mutual labels:  sparql, wikidata
Web Karma
Information Integration Tool
Stars: ✭ 489 (+805.56%)
Mutual labels:  sparql
Wikimama
Scripts to help matching OSM features to Wikidata items
Stars: ✭ 8 (-85.19%)
Mutual labels:  wikidata
Ontop
Ontop is a platform to query relational databases as Virtual RDF Knowledge Graphs using SPARQL
Stars: ✭ 419 (+675.93%)
Mutual labels:  sparql
Wptools
Wikipedia tools (for Humans): easily extract data from Wikipedia, Wikidata, and other MediaWikis
Stars: ✭ 371 (+587.04%)
Mutual labels:  wikidata
Trifid
Lightweight Linked Data Server and Proxy
Stars: ✭ 51 (-5.56%)
Mutual labels:  sparql
Qlever
Very fast SPARQL Engine, which can handle very large datasets like Wikidata, offers context-sensitive Autocompletion for SPARQL queries, and allows combination with Text Search. It's faster than anything else out there, in particular faster than Blazegraph or Virtuoso. The index builds are also much faster.
Stars: ✭ 46 (-14.81%)
Mutual labels:  sparql
Knowledge
combining wikidata and clojure core.logic
Stars: ✭ 16 (-70.37%)
Mutual labels:  wikidata
Easyrdf
EasyRdf is a PHP library designed to make it easy to consume and produce RDF.
Stars: ✭ 546 (+911.11%)
Mutual labels:  sparql
Sparql Engine
🚂 A framework for building SPARQL query engines in Javascript/Typescript
Stars: ✭ 39 (-27.78%)
Mutual labels:  sparql
Brightstardb
This is the core development repository for BrightstarDB.
Stars: ✭ 420 (+677.78%)
Mutual labels:  sparql
Word2vec
訓練中文詞向量 Word2vec, Word2vec was created by a team of researchers led by Tomas Mikolov at Google.
Stars: ✭ 48 (-11.11%)
Mutual labels:  wikidata
Nlquery
Natural Language Engine on WikiData
Stars: ✭ 413 (+664.81%)
Mutual labels:  wikidata
Jena
Apache Jena
Stars: ✭ 700 (+1196.3%)
Mutual labels:  sparql
Bbw
Semantic annotator: Matching CSV to a Wikibase instance (e.g., Wikidata) via Meta-lookup
Stars: ✭ 42 (-22.22%)
Mutual labels:  wikidata
Virtuoso Opensource
Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
Stars: ✭ 688 (+1174.07%)
Mutual labels:  sparql

Mundaneum

This is a tiny, highly incomplete clojure wrapper around the Wikidata project's massive semantic database. It's named after the Mundaneum, which was Paul Otley's mad and wonderful c. 1910 vision for something like the World Wide Web.

(There's a mini-doc about him and it here.)

Motivation

Wikidata is amazing! And it provides API access to all the knowledge it has collected! This is great, but exploratory programmatic access to that data can be fairly painful.

The official Wikidata API Java library offers a document-oriented interface that makes it hard to ask interesting questions. A better way to do most things is with the Wikidata query service, which uses the standard Semantic Web query language, SPARQL.

The SPARQL query service is nice, but because the WikiData data model must cope with (a) items with multiple names in multiple languages, and (b) single names that map to multiple items, they've used a layer of abstraction by which everything in the DB is referred to by an id that looks like P50 (property number 50, meaning "author") or Q6882 (entity number 6882, the author "James Joyce").

For example, to get a selection of works authored by James Joyce, one would issue a query like:

SELECT ?work
WHERE { ?work wdt:P50 wd:Q6882. } 
LIMIT 10

(Users of Datomic will recognize the ?work style of selector, which is not a coincidence as SPARQL and Datomic were both strongly influenced by Datalog.)

The above query is simple enough, except for the non-human readable identifiers in the WHERE clause, which were both found by manually searching the web interface at Wikidata.

The first order of business was to build a more human-friendly way to specify relationships and entities without leaving my coding environment. The approach I took was:

  • download and reformat the full list of ~2000 properties (fresh as of 2017-04-19), shape them into a map of keyword/string pairs where the keyword is the name of the property and the string is its id, and make a helper function
(property :author)
;;=> "P50"
  • create a helper function that tries to correctly guess the id of an entity based on a string that's similar to its "label" (common name, currently sadly restricted to English in this code)
(entity "James Joyce")
;;=> "Q6882"

;; the entity function tries to return the most notable entity 
;; that matches, but sometimes that isn't what you want.

(describe (entity "U2"))
;;=> "Irish alternative rock band"

;; not the one I meant, let's try with more info:
(describe (entity "U2" :part-of (entity "Berlin U-Bahn")))
;;=> "underground line in Berlin"

This already helps to keep my emacs-driven process running smoothly. The next point of irritation was assembling query strings by hand, like an animal. So I banged together a quick and sloppy DSL similar to the one offered by Datomic. This looks like:

;; what are some works authored by James Joyce?
(query '[:select ?work ?workLabel
         :where [[?work (wdt :author) (entity "James Joyce")]]
         :limit 10])
;; #{{:work "Q864141", :workLabel "Eveline"}
;;   {:work "Q861185", :workLabel "A Little Cloud"}
;;   {:work "Q459592", :workLabel "Dubliners"}
;;   {:work "Q682681", :workLabel "Giacomo Joyce"}
;;   {:work "Q764318", :workLabel "Two Gallants"}
;;   {:work "Q429967", :workLabel "Chamber Music"}
;;   {:work "Q465360", :workLabel "A Portrait of the Artist as a Young Man"}
;;   {:work "Q6511", :workLabel "Ulysses"}
;;   {:work "Q866956", :workLabel "An Encounter"}
;;   {:work "Q6507", :workLabel "Finnegans Wake"}} 

This is actually quite similar to the programmatic query interface I created for the first purpose-built TripleStore around 15 years ago.

This code is much easier to understand if you have some familiarity with SPARQL and how it can be used to query Wikidata. I strongly recommend this introduction to get started. I'm trying to make sure all the examples are easy to translate to the DSL used here.

Condition

This is young code, and the APIs are likely to change in the future. It is presented for entertainment purposes only. The mundaneum.examples namespace is all examples, should you care to have a play.

Enjoy!

License

Copyright © 2016-2019 Jack Rusher. Distributed under the BSD 0-clause license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].