NatLibFi / Bib Rdf Pipeline
Licence: other
Scripts and configuration for converting MARC bibliographic records into RDF
Stars: ✭ 27
Programming Languages
shell
77523 projects
Labels
Projects that are alternatives of or similar to Bib Rdf Pipeline
rdf2x
RDF2X converts big RDF datasets to the relational database model, CSV, JSON and ElasticSearch.
Stars: ✭ 43 (+59.26%)
Mutual labels: rdf, conversion
marc2bibframe2
Convert MARC records to BIBFRAME2 RDF
Stars: ✭ 72 (+166.67%)
Mutual labels: rdf, conversion
Dokieli
💡 dokieli is a clientside editor for decentralised article publishing, annotations and social interactions
Stars: ✭ 582 (+2055.56%)
Mutual labels: rdf
Convertpcltocore
17 Steps to Convert your PCL to .NET Standard
Stars: ✭ 10 (-62.96%)
Mutual labels: conversion
Awesome Semantic Web
A curated list of various semantic web and linked data resources.
Stars: ✭ 642 (+2277.78%)
Mutual labels: rdf
Simple Java Mail
Simple API, Complex Emails (JavaMail smtp wrapper)
Stars: ✭ 821 (+2940.74%)
Mutual labels: conversion
Alfred Convert
Convert between different units in Alfred
Stars: ✭ 560 (+1974.07%)
Mutual labels: conversion
Chatify
A Laravel package that allows you to add a complete user messaging system into your new/existing Laravel application.
Stars: ✭ 885 (+3177.78%)
Mutual labels: conversion
Docconv
Converts PDF, DOC, DOCX, XML, HTML, RTF, etc to plain text
Stars: ✭ 735 (+2622.22%)
Mutual labels: conversion
Vim Pandoc
pandoc integration and utilities for vim
Stars: ✭ 734 (+2618.52%)
Mutual labels: conversion
Structured Text Tools
A list of command line tools for manipulating structured text data
Stars: ✭ 6,180 (+22788.89%)
Mutual labels: conversion
Unitsnet
Makes life working with units of measurement just a little bit better.
Stars: ✭ 641 (+2274.07%)
Mutual labels: conversion
Metapensiero.pj
Javascript for refined palates: a Python 3 to ES6 Javascript translator
Stars: ✭ 752 (+2685.19%)
Mutual labels: conversion
Informationmodel
The Information Model of the International Data Spaces implements the IDS reference architecture as an extensible, machine readable and technology independent data model.
Stars: ✭ 27 (+0%)
Mutual labels: rdf
bib-rdf-pipeline
This repository contains various scripts and configuration for converting MARC bibliographic records into RDF, for use at the National Library of Finland.
The main component is a conversion pipeline driven by a Makefile that defines rules for realizing the conversion steps using command line tools.
The steps of the conversion are:
- Start with a file of MARC records in Aleph sequential format
- Split the file into smaller batches
- Preprocess using unix tools such as grep and sed, to remove some local peculiarities
- Convert to MARCXML and enrich the MARC records, using Catmandu
- Run the Library of Congress marc2bibframe2 XSLT conversion from MARC to BIBFRAME RDF
- Convert the BIBFRAME RDF/XML data into N-Triples format and fix up some bad URIs
- Calculate work keys (e.g. author+title combination) used later for merging data about the same creative work
- Convert the BIBFRAME data into Schema.org RDF in N-Triples format
- Reconcile entities in the Schema.org data against external sources (e.g. YSA/YSO, Corporate names authority, RDA vocabularies)
- Merge the Schema.org data about the same works
- Calculate agent keys used for merging data about the same agent (person or organization)
- Merge the agents based on agent keys
- Convert the raw Schema.org data to HDT format so the full data set can be queried with SPARQL from the command line
- Consolidate the data by e.g. rewriting URIs and moving subjects into the original work
- Convert the consolidated data to HDT
- ??? (TBD)
- Profit!
Dependencies
Command line tools are assumed to be available in $PATH
, but the paths can be overridden on the make command line, e.g. make CATMANDU=/opt/catmandu
For running the main suite
-
Apache Jena command line utilities
sparql
andrsparql
-
Catmandu utility
catmandu
-
uconv
utility from Ubuntu packageicu-devtools
-
xsltproc
utility from Ubuntu packagexsltproc
-
hdt-cpp command line utilities
rdf2hdt
andhdtSearch
-
hdt-java command line utility
hdtsparql.sh
For running the unit tests
In addition to above:
- bats in $PATH
-
xmllint
utility from Ubuntu packagelibxml2-utils
in $PATH
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].