All Projects → SDM-TIB → SDM-RDFizer

SDM-TIB / SDM-RDFizer

Licence: Apache-2.0 license
An Efficient RML-Compliant Engine for Knowledge Graph Construction

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to SDM-RDFizer

morph-kgc
Powerful RDF Knowledge Graph Generation with [R2]RML Mappings
Stars: ✭ 77 (+13.24%)
Mutual labels:  knowledge-graph, data-integration, rml
Mapeathor
Translator of spreadsheet mappings into R2RML, RML or YARRRML
Stars: ✭ 27 (-60.29%)
Mutual labels:  knowledge-graph, data-integration, rml
SchemaMapper
A .NET class library that allows you to import data from different sources into a unified destination
Stars: ✭ 41 (-39.71%)
Mutual labels:  data-integration
KGPool
[ACL 2021] KGPool: Dynamic Knowledge Graph Context Selection for Relation Extraction
Stars: ✭ 33 (-51.47%)
Mutual labels:  knowledge-graph
NBFNet
Official implementation of Neural Bellman-Ford Networks (NeurIPS 2021)
Stars: ✭ 106 (+55.88%)
Mutual labels:  knowledge-graph
skipchunk
Extracts a latent knowledge graph from text and index/query it in elasticsearch or solr
Stars: ✭ 18 (-73.53%)
Mutual labels:  knowledge-graph
obo-relations
RO is an ontology of relations for use with biological ontologies
Stars: ✭ 63 (-7.35%)
Mutual labels:  knowledge-graph
bio2bel
A Python framework for integrating biological databases and structured data sources in Biological Expression Language (BEL)
Stars: ✭ 16 (-76.47%)
Mutual labels:  data-integration
ChineseStarsRelationship
中国明星数据爬取。你甚至可以拿到互联网上所有的人之间的关系,接下来你可以自己发挥!基于这些数据,你可以完成更多有趣的事情。比如说社交网络分析,关系网络可视化,算法研究,和其他有意思的事情。Chinese star data crawling. You can even get all the people on the internet! Based on these data, you can do more interesting things. For example, social network analysis, relational network visualization, algorithm research, and other interesting things.
Stars: ✭ 26 (-61.76%)
Mutual labels:  knowledge-graph
Social-Knowledge-Graph-Papers
A paper list of research about social knowledge graph
Stars: ✭ 27 (-60.29%)
Mutual labels:  knowledge-graph
WSDM2021 NSM
Improving Multi-hop Knowledge Base Question Answering by Learning Intermediate Supervision Signals. WSDM 2021.
Stars: ✭ 84 (+23.53%)
Mutual labels:  knowledge-graph
assignPOP
Population Assignment using Genetic, Non-genetic or Integrated Data in a Machine-learning Framework. Methods in Ecology and Evolution. 2018;9:439–446.
Stars: ✭ 16 (-76.47%)
Mutual labels:  data-integration
Shukongdashi
使用知识图谱,自然语言处理,卷积神经网络等技术,基于python语言,设计了一个数控领域故障诊断专家系统
Stars: ✭ 109 (+60.29%)
Mutual labels:  knowledge-graph
typedb
TypeDB: a strongly-typed database
Stars: ✭ 3,152 (+4535.29%)
Mutual labels:  knowledge-graph
kglib
TypeDB-ML is the Machine Learning integrations library for TypeDB
Stars: ✭ 523 (+669.12%)
Mutual labels:  knowledge-graph
PaperMachete
A project that uses Binary Ninja and GRAKN.AI to perform static analysis on binary files with the goal of identifying bugs in software.
Stars: ✭ 49 (-27.94%)
Mutual labels:  knowledge-graph
carml
A pretty sweet RML engine, for RDF.
Stars: ✭ 74 (+8.82%)
Mutual labels:  rml
CoLAKE
COLING'2020: CoLAKE: Contextualized Language and Knowledge Embedding
Stars: ✭ 86 (+26.47%)
Mutual labels:  knowledge-graph
cognipy
In-memory Graph Database and Knowledge Graph with Natural Language Interface, compatible with Pandas
Stars: ✭ 31 (-54.41%)
Mutual labels:  knowledge-graph
TransC
Source code and datasets of EMNLP2018 paper: "Differentiating Concepts and Instances for Knowledge Graph Embedding".
Stars: ✭ 75 (+10.29%)
Mutual labels:  knowledge-graph

SDM-RDFizer

License DOI Latest PyPI version Python Version PyPI status

This project presents the SDM-RDFizer, an interpreter of mapping rules that allows the transformation of (un)structured data into RDF knowledge graphs. The current version of the SDM-RDFizer assumes mapping rules are defined in the RDF Mapping Language (RML) by Dimou et al. The SDM-RDFizer implements optimized data structures and relational algebra operators that enable an efficient execution of RML triple maps even in the presence of Big data. SDM-RDFizer is able to process data from heterogeneous data sources (CSV, JSON, RDB, XML) processing each set of RML rules (TriplesMap) in a multi-thread safe procedure.

SDM-RDFizer workflow

The new features presented by SDM-RDFizer version4.0

In version 4.0 of SDM-RDFizer, we have addressed the problem of efficiency in KG creation in terms of memory storage. SDM-RDFizer version4.0 includes a new module called "TriplesMap Planning" a.k.a. TMP which defines an optimized evaluation plan for the execution of triples maps. Additionally, version4.0 extends the previously included module (i.e. TriplesMap Execution a.k.a. TME) by introducing a new operator for compressing data stored in the data structures. These new features can be configured using two new parameters added to the configuration file, named "large_file" and "ordered".

We have performed extensive empirical evaluation on SDM-RDFizer version4.0 in terms of execution time and memory usage. The experiments are set up to empirically compare the impact of data duplicate rates, data size, and the complexity and the execution order of the triples maps on two versions of SDM-RDFizer (i.e. version4.0 and version3.6) and other exisiting engines icluding RMLMapper v4.7 and RocketRML ), in terms of execution time and memory usage. The experiments are performed on two different benchmarks:

  • From SDM-Genomic-datasets, datasets including 10k, 100k, and 1M records with 25% and 75% duplicates rates, over six mapping rules with different complexities (1/4 simple object map, 2/5 object reference maps, 2/5 object join maps)
  • From GTFS-Madrid, datasets with scale values of 1-csv, 5-csv, 10-csv, and 50-csv, over two different mapping rules (72 simple object maps and 11 object join maps).

The results of explained experiments can be summarized as the following: Overview of Results (Execution Time Comparison) As observed in the figures above, both versions of SDM-RDFizer completed all the testbeds successfully while the other two engines have cases of timeout. SDM-RDFizer version3.6 and RocketRML version 1.7.0 are competitve in simple testbeds, however, SDM-RDFizer version4.0 shows the best performance in all the testbeds. Overview of Results (Memory Consumption Comparison) As illustrated in the figures above, SDM-RDFizer version4.0 has the smallest peak in memory usage compared to the previous version of SDM-RDFizer.

The results of the execution of SDM-RDFizer has been described in the following research reports:

  • Enrique Iglesias, Samaneh Jozashoori, David Chaves-Fraga, Diego Collarana, and Maria-Esther Vidal. 2020. SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. The 29th ACM International Conference on Information and Knowledge Management (CIKM ’20).

  • Samaneh Jozashoori, David Chaves-Fraga, Enrique Iglesias, Oscar Corcho, and Maria-Esther Vidal. 2020. FunMap: Efficient Execution of Functional Mappings for Knowledge Graph Creation. The 19th International Semantic Web Conference - Research Track (ISWC 2020).

  • Samaneh Jozashoori and Maria-Esther Vidal. MapSDI: A Scaled-up Semantic Data Integrationframework for Knowledge Graph Creation. The 27th International Conference on Cooperative Information Systems (CoopIS 2019).

  • David Chaves-Fraga, Kemele M. Endris, Enrique Iglesias, Oscar Corcho, and Maria-Esther Vidal. What are the Parameters that Affect the Construction of a Knowledge Graph?. The 18th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE 2019).

  • David Chaves-Fraga, Antón Adolfo, Jhon Toledo, and Oscar Corcho. ONETT: Systematic Knowledge Graph Generation for National Access Points. The 1st International Workshop on Semantics for Transport co-located with SEMANTiCS 2019.

  • David Chaves-Fraga, Freddy Priyatna, Andrea Cimmino, Jhon Toledo, Edna Ruckhaus, and Oscar Corcho. GTFS-Madrid-Bench: A benchmark for virtual knowledge graph access in the transport domain. Journal of Web Semantics, 2020.

Additional References:

  • Dimou et al. 2014. Dimou, A., Sande, M.V., Colpaert, P., Verborgh, R., Mannens, E., de Walle, R.V.:RML: A generic language for integrated RDF mappings of heterogeneous data. In:Proceedings of the Workshop on Linked Data on the Web co-located with the 23rdInternational World Wide Web Conference (WWW 2014)

Projects where the SDM-RDFizer has been used

The SDM-RDFizer is used in the creation of the knowledge graphs of EU H2020 projects and national projects where the Scientific Data Management group participates. These projects include:

The SDM-RDFizer is also used in EU H2020, EIT-Digital and Spanish national projects where the Ontology Engineering Group (Technical University of Madrid) participates. These projects, mainly focused on the transportation and smart cities domain, include:

  • H2020 - SPRINT (http://sprint-transport.eu/): performance and scalability to test a semantic architecture for the Interoperability Framework on Transport across Europe.
  • EIT-SNAP (https://www.snap-project.eu/): innovation project on the application of semantic technologies for national access points.
  • Open Cities (https://ciudades-abiertas.es/): national project on creating common and shared vocabularies for Spanish Cities
  • Drugs4Covid (https://drugs4covid.oeg.fi.upm.es/): NLP annotations and metadata from more than 60,000 scientific papers about COVID viruses are integrated in a KG with almost 44M of facts (triples). SDM-RDFizer was used for creating this KG.

Other projects were the SDM-RDFizer is also used:

Installing and Running the SDM-RDFizer

From PyPI (https://pypi.org/project/rdfizer/):

python3 -m pip install rdfizer
python3 -m rdfizer -c /path/to/config/file

From Github/Docker: Visit the wiki of the repository to learn how to install and run the SDM-RDFizer. You can also take a look to our demo at: https://www.youtube.com/watch?v=DpH_57M1uOE

Configurations

You can easily customize your own configurations from the set of features that SDM-RDFzier offers by changing the values of the parameters in the config file. The descriptions of each parameter and the possible values are provided here; "ordered" and "large_file" are the new features provided by SDM-RDFizer version4.0.

Version

4.5.4

RML-Test Cases

See the results of the SDM-RDFizer over the RML test-cases at the RML Implementation Report. SDM-RDFizer version4.0 is tested over the latest published test cases before the release.

Experimental Evaluations

See the results of the experimental evaluations of SDM-RDFizer version 3.* at SDM-RDFizer-Experiments repository

License

This work is licensed under Apache 2.0

Papers

1. Conference paper published as a resource at CIKM2020

2. Journal paper under the review

Authors

The SDM-RDFizer has been developed by members of the Scientific Data Management Group at TIB, as an ongoing research effort. The development is coordinated and supervised by Maria-Esther Vidal ([email protected]). We strongly encourage you to please report any issues you have with the SDM-RDFizer. You can do that over our contact email or creating a new issue here on Github. The SDM-RDFizer has been implemented by Enrique Iglesias (current version, [email protected]) and Guillermo Betancourt (version 0.1, [email protected]) under the supervision of Samaneh Jozashoori ([email protected]), David Chaves-Fraga ([email protected]), and Kemele Endris ([email protected])

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].