All Projects → UB-Mannheim → Bbw

UB-Mannheim / Bbw

Licence: mit
Semantic annotator: Matching CSV to a Wikibase instance (e.g., Wikidata) via Meta-lookup

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Bbw

Hyte
EMNLP 2018: HyTE: Hyperplane-based Temporally aware Knowledge Graph Embedding
Stars: ✭ 130 (+209.52%)
Mutual labels:  knowledge-graph, wikidata
Bert Attributeextraction
USING BERT FOR Attribute Extraction in KnowledgeGraph. fine-tuning and feature extraction. 使用基于bert的微调和特征提取方法来进行知识图谱百度百科人物词条属性抽取。
Stars: ✭ 224 (+433.33%)
Mutual labels:  knowledge-graph, relation-extraction
Soweego
Link Wikidata items to large catalogs
Stars: ✭ 74 (+76.19%)
Mutual labels:  knowledge-graph, wikidata
Cnn Re Tf
Convolutional Neural Network for Multi-label Multi-instance Relation Extraction in Tensorflow
Stars: ✭ 190 (+352.38%)
Mutual labels:  wikidata, relation-extraction
knowledge-graph-nlp-in-action
从模型训练到部署,实战知识图谱(Knowledge Graph)&自然语言处理(NLP)。涉及 Tensorflow, Bert+Bi-LSTM+CRF,Neo4j等 涵盖 Named Entity Recognition,Text Classify,Information Extraction,Relation Extraction 等任务。
Stars: ✭ 58 (+38.1%)
Mutual labels:  knowledge-graph, relation-extraction
Knowledge Graphs
A collection of research on knowledge graphs
Stars: ✭ 845 (+1911.9%)
Mutual labels:  knowledge-graph, relation-extraction
Agriculture Knowledgegraph Data
对知识库Wikidata的爬虫以及数据处理脚本 将三元组关系对齐到语料库的脚本 获取知识图谱数据的脚本
Stars: ✭ 198 (+371.43%)
Mutual labels:  knowledge-graph, wikidata
Agriculture knowledgegraph
农业知识图谱(AgriKG):农业领域的信息检索,命名实体识别,关系抽取,智能问答,辅助决策
Stars: ✭ 2,957 (+6940.48%)
Mutual labels:  knowledge-graph, relation-extraction
KGPool
[ACL 2021] KGPool: Dynamic Knowledge Graph Context Selection for Relation Extraction
Stars: ✭ 33 (-21.43%)
Mutual labels:  knowledge-graph, relation-extraction
Shukongdashi
使用知识图谱,自然语言处理,卷积神经网络等技术,基于python语言,设计了一个数控领域故障诊断专家系统
Stars: ✭ 109 (+159.52%)
Mutual labels:  knowledge-graph, relation-extraction
Knowledge Graph Learning
A curated list of awesome knowledge graph tutorials, projects and communities.
Stars: ✭ 516 (+1128.57%)
Mutual labels:  knowledge-graph, relation-extraction
Casrel
A Novel Cascade Binary Tagging Framework for Relational Triple Extraction. Accepted by ACL 2020.
Stars: ✭ 329 (+683.33%)
Mutual labels:  knowledge-graph, relation-extraction
Deepke
基于深度学习的开源中文关系抽取框架
Stars: ✭ 525 (+1150%)
Mutual labels:  knowledge-graph, relation-extraction
Knowledgegraph
This repository for Web Crawling, Information Extraction, and Knowledge Graph build up.
Stars: ✭ 28 (-33.33%)
Mutual labels:  knowledge-graph
Graphvite
GraphVite: A General and High-performance Graph Embedding System
Stars: ✭ 865 (+1959.52%)
Mutual labels:  knowledge-graph
Person Search Annotation
Cross-Platform Annotation Tool for Person Search Datasets
Stars: ✭ 9 (-78.57%)
Mutual labels:  annotation
Vizel
Zettelkasten visualization and stats🤩🗒
Stars: ✭ 33 (-21.43%)
Mutual labels:  knowledge-graph
Rex
REx: Relation Extraction. Modernized re-write of the code in the master's thesis: "Relation Extraction using Distant Supervision, SVMs, and Probabalistic First-Order Logic"
Stars: ✭ 21 (-50%)
Mutual labels:  relation-extraction
Kbqa Bert
基于知识图谱的问答系统,BERT做命名实体识别和句子相似度,分为online和outline模式
Stars: ✭ 846 (+1914.29%)
Mutual labels:  knowledge-graph
Wikimama
Scripts to help matching OSM features to Wikidata items
Stars: ✭ 8 (-80.95%)
Mutual labels:  wikidata

bbw (boosted by wiki)

PyPI version badge badge Language grade: Python

  • Annotates tabular data with the entities, types and properties in Wikidata.
  • Easy to use: bbw.annotate().
  • Resolves even tricky spelling mistakes via meta-lookup through SearX.
  • Matches to the up-to-date values in Wikidata without the dump files.
  • Ranked in third place at SemTab2020.

Table of contents

How to use

Import library

from bbw import bbw

The easiest way to annotate the dataframe Y is:

[web_table, url_table, label_table, cpa, cea, cta] = bbw.annotate(Y)

It returns a list of six dataframes. The first three dataframes contain the annotations in the form of HTML-links, URLs and labels of the entities in Wikidata correspondingly. The dataframes have two more rows than Y. These two rows contain the annotations for types and properties. The last three dataframes contain the annotations in the format required by SemTab2020 challenge.

The fastest way to annotate the dataframe Y is:

[cpa_list, cea_list, nomatch] = bbw.contextual_matching(bbw.preprocessing(Y))
[cpa, cea, cta] = bbw.postprocessing(cpa_list, cea_list)

The dataframes cpa, cea and cta contain the annotations in SemTab2020-format. The list nomatch contains the labels which are not matched. The unprocessed and possibly non-unique annotations are in the lists cpa_list and cea_list.

GUI

If you need to annotate only one table, use the simple GUI:

streamlit run bbw_gui.py

Open the browser at http://localhost:8501 and choose a CSV-file. The annotation process starts automatically. It outputs the six tables of the annotate function.

Try it out online (no SearX support) with this binder link.

CLI

If you need to annotate a few tables, use the CLI-tool:

python3 bbw_cli.py --amount 100 --offset 0

GNU parallel

If you need to annotate hundreds or thousands of tables, use the script with GNU parallel:

./bbw_parallel.py

Installation

You can use pip to install bbw:

pip install bbw

The latest version can be installed directly from github:

pip install git+https://github.com/UB-Mannheim/bbw

You can test bbw in a virtual environment:

pip install virtualenv
virtualenv testing_bbw
source testing_bbw/bin/activate
python
from bbw import bbw
[web_table, url_table, label_table, cpa, cea, cta] = bbw.annotate(bbw.pd.DataFrame([['0','1'],['Mannheim','Rhine']]))
print(web_table)
deactivate

Install also SearX, because bbw meta-lookups through it.

export PORT=80
docker pull searx/searx
docker run --rm -d -v ${PWD}/searx:/etc/searx -p $PORT:8080 -e BASE_URL=http://localhost:$PORT/ searx/searx

SearX is running on http://localhost:80. bbw sends GET requests to it.

Citing

If you find bbw useful in your work, a proper reference would be:

@inproceedings{2020_bbw,
  author    = {Renat Shigapov and Philipp Zumstein and Jan Kamlah and Lars Oberl{\"a}nder and J{\"o}rg Mechnich and Irene Schumm},
  title     = {bbw: {M}atching {CSV} to {W}ikidata via {M}eta-lookup},
  booktitle = {[email protected] 2020},
  url = {http://ceur-ws.org/Vol-2775/paper2.pdf},
  volume = {2775},
  pages = {17-26},
  publisher = {CEUR-WS.org},
  year = {2020}
}

[paper] [presentation] [[email protected]]

SemTab2020

The library was designed, implemented and tested during SemTab2020. It received the best scores in the last 4th round at automatically generated dataset:

Task F1-score Precision Rank
CPA 0.995 0.996 2
CTA 0.980 0.980 2
CEA 0.978 0.984 4
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].