All Projects → KBNLresearch → dac

KBNLresearch / dac

Licence: GPL-3.0 license
Entity linker for the newspaper collection of the National Library of the Netherlands. Links named entity mentions to DBpedia descriptions using either a binary SVM classifier or a neural net.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to dac

NewsHub
News Hub display news of different category (Entertainment, Business, International, Sports, Medical, Technology, Global) and news can be saved as bookmark
Stars: ✭ 16 (+14.29%)
Mutual labels:  news
daily-paper
For viewing a daily issue of the Guardian and Observer newspapers. `main` branch should be stable, current work is in `dev` branch.
Stars: ✭ 23 (+64.29%)
Mutual labels:  news
clinews
A CLI for reading the news and getting the latest headlines including search functionality. Supports over 70 sources.
Stars: ✭ 17 (+21.43%)
Mutual labels:  news
HeadLines
HeadLines is a 📰 news app that delivers you with the latest news. It has interactive UI and easy to use. The app can be scrolled offline to watch your bookmarked news. Give this app a try and let me know.
Stars: ✭ 16 (+14.29%)
Mutual labels:  news
amazon-transcribe-news-media-analysis
Transcribe news audio in realtime
Stars: ✭ 21 (+50%)
Mutual labels:  news
dutch-hackathons
Building the most comprehensive list of annual hackathons in the Netherlands at hackathonlist.nl.
Stars: ✭ 22 (+57.14%)
Mutual labels:  dutch
cometa
Corpus of Online Medical EnTities: the cometA corpus
Stars: ✭ 31 (+121.43%)
Mutual labels:  entity-linking
archiveis
A simple Python wrapper for the archive.is capturing service
Stars: ✭ 152 (+985.71%)
Mutual labels:  news
transparencia-dados-abertos-brasil
A survey of Brazilian states' and municipalities' transparency and open data portals, as well as institutional websites, obtained from several public data sources. 🇧🇷 Levantamento de portais estaduais e municipais de transparência e dados abertos, bem como os portais institucionais, obtido a partir de diversas fontes públicas de dados.
Stars: ✭ 46 (+228.57%)
Mutual labels:  dbpedia
deepfrog
An NLP-suite powered by deep learning
Stars: ✭ 16 (+14.29%)
Mutual labels:  dutch
jiten
jiten - japanese android/cli/web dictionary based on jmdict/kanjidic — 日本語 辞典 和英辞典 漢英字典 和独辞典 和蘭辞典
Stars: ✭ 64 (+357.14%)
Mutual labels:  dutch
Online-News-Portal-with-Django
Daily News For You is an online news portal developed by Django and SQLite
Stars: ✭ 45 (+221.43%)
Mutual labels:  news
Inshorts-News-API
Unofficial API of Inshorts written in Flask
Stars: ✭ 87 (+521.43%)
Mutual labels:  news
News-API-Kotlin
Access the News API with Kotlin.
Stars: ✭ 35 (+150%)
Mutual labels:  news
gnewsclient
An easy-to-use python client for Google News feeds.
Stars: ✭ 42 (+200%)
Mutual labels:  news
flutter redux
A Flutter Starter Application
Stars: ✭ 25 (+78.57%)
Mutual labels:  news
census-error-analyzer
Analyze the margin of error in U.S. census data
Stars: ✭ 15 (+7.14%)
Mutual labels:  news
feedIO
A Feed Aggregator that Knows What You Want to Read.
Stars: ✭ 26 (+85.71%)
Mutual labels:  news
NewsApp
An app that fetches latest news, headlines
Stars: ✭ 28 (+100%)
Mutual labels:  news
UitzendingGemist
An *Unofficial* Uitzending Gemist application for Apple TV 4 (**deprecated, use TV Gemist ☝🏻**)
Stars: ✭ 48 (+242.86%)
Mutual labels:  dutch

DAC Entity Linker

Entity linker for the Dutch historical newspaper collection of the Koninklijke Bibliotheek, National Library of the Netherlands. The linker links named entity mentions in newspaper articles to relevant DBpedia descriptions using either a binary SVM classifier or a neural net. For background information, please see the project description on the Koninklijke Bibliotheek website.

Usage

Basic command line execution with the default values for all options:

$ cd dac
$ ./dac.py

This will link all recognized entities in a sample article using a neural network:

{'linkedNEs': [{'label': u'Winston Churchill',
                'link': u'http://nl.dbpedia.org/resource/Winston_Churchill',
                'prob': '0.9997673631',
                'reason': 'Predicted link',
                'text': u'Churchill'},
               {'label': u'Willem Drees',
                'link': u'http://nl.dbpedia.org/resource/Willem_Drees',
                'prob': '0.9968996048',
                'reason': 'Predicted link',
                'text': u'Drees'},
                ...

Command line interface

Additional options when using the command line interface:

usage: dac.py [-h] [--url URL] [--ne NE] [-m MODEL] [-d] [-f] [-c] [-e]

optional arguments:
  -h, --help                  show this help message and exit
  --url URL                   resolver link of the article to be processed
  --ne NE                     specific named entity to be linked
  -m MODEL, --model MODEL     model used for link prediction (svm, nn or bnn)
  -d, --debug                 include unlinked entities in response
  -f, --features              return feature values
  -c, --candidates            return candidate list
  -e, --errh                  turn on error handling

Web interface

The DAC Entity Linker can be started as a web application by running:

$ ./web.py

This starts a Bottle web server listening on http://localhost:5002. The URL parameters are similar to the command line options:

required arguments:
  - url          resolver link of the article to be processed

optional arguments:
  - ne           specific named entity to be linked
  - model        model used for link prediction (svm, nn or bnn)
  - debug        include unlinked entities in response
  - features     include feature values for predicted links
  - candidates   include the list of candidates for each entity
  - callback     name of a JavaScript callback function

Training new models

Given the availability of training set in the format created by the DAC Web Interface, new models can be trained in two simple steps. First, the web interface training set is extended with the features values for each training example:

$ cd training
$ ./generate.py

The default input file used here is ../../../dac-web/users/tve/art.json and the output is written to a training.csv file. These locations can be adjusted, however, using the --input and --output options of the generate.py script. The features calculated are listed in features/features.json.

The resulting training.csv file can now be used to train new models. Note that existing models in the models directory will be replaced, so these need to be backed up manually if they are to be preserved. To train, for example, a new Support Vector Machine, run:

$ ./models.py -t -m svm

This will create a models/svm.pkl file, using the feature set of features/svm.json, that can now be applied to new named entity examples.

Full command line options for training and cross-validation:

usage: models.py [-h] [-w] [-t] [-v] [-m MODEL]

optional arguments:
  -h, --help                  show this help message and exit
  -w, --weights               show the feature weights of the current model
  -t, --train                 train and save new model
  -v, --validate              cross-validate new model
  -m MODEL, --model MODEL     model type (svm, nn or bnn)

Evaluation

Once one or more models have been trained, the linker performance can be evaluated on a separate training set in the format created by the DAC Web Interface. To test the performance of, e.g., a first version of a neural net, run:

$ cd training
$ ./test.py -m nn -v 1

This will evaluate the current neural network model on the ../../../dac-web/users/test-clean/art.json file, but a different test set can be specified with the --input option.

A summary of the results will be printed out:

Number of instances: 500
Number of correct predictions: 467
Prediction accuracy: 0.934
---
Number of correct link predictions: 347
(Min) number of link instances: 362
(Max) number of link instances: 382
(Min) link recall: 0.908376963351
(Mean) link recall: 0.933470249631
(Max) link recall: 0.958563535912
---
Number of correct link predictions: 347
Number of link predictions: 358
Link precision: 0.969273743017
---
(Mean) link F1-measure: 0.951035143299
(Max) link F1-measure: 0.963888888889

The version number specified will be used to name a file containing the full results of the test run, e.g. training/results-nn-1.csv.

Further command line options for the test script:

usage: test.py [-h] -m MODEL -v VERSION [-i INPUT]

required arguments:
  -m MODEL, --model MODEL     model name (svm, nn or bnn)
  -v VERSION                  version number
  
optional arguments:
  -h, --help                  show this help message and exit
  -i INPUT                    path to test set
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].