All Projects → RichardHGL → WSDM2021_NSM

RichardHGL / WSDM2021_NSM

Licence: other
Improving Multi-hop Knowledge Base Question Answering by Learning Intermediate Supervision Signals. WSDM 2021.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to WSDM2021 NSM

Capricorn
提供强大的NLP能力, low-code实现chatbot
Stars: ✭ 14 (-83.33%)
Mutual labels:  knowledge-graph, kbqa
KgCLUE
KgCLUE: 大规模中文开源知识图谱问答
Stars: ✭ 131 (+55.95%)
Mutual labels:  knowledge-graph, kbqa
KBQA-Exploration
知识图谱初探,关系抽取,实体抽取,基于kb的问答,基于es的问答,知识图谱可视化
Stars: ✭ 45 (-46.43%)
Mutual labels:  knowledge-graph, kbqa
amie plus
AMIE+ association rule mining
Stars: ✭ 24 (-71.43%)
Mutual labels:  knowledge-graph
news-graph
Key information extraction from text and graph visualization
Stars: ✭ 83 (-1.19%)
Mutual labels:  knowledge-graph
GGNN Reasoning
PyTorch implementation for Graph Gated Neural Network (for Knowledge Graphs)
Stars: ✭ 34 (-59.52%)
Mutual labels:  knowledge-graph
chatbot
kbqa task-oriented qa seq2seq ir neo4j jena seq2seq tf chatbot chat
Stars: ✭ 32 (-61.9%)
Mutual labels:  kbqa
biograkn
BioGrakn Knowledge Graph
Stars: ✭ 169 (+101.19%)
Mutual labels:  knowledge-graph
cognipy
In-memory Graph Database and Knowledge Graph with Natural Language Interface, compatible with Pandas
Stars: ✭ 31 (-63.1%)
Mutual labels:  knowledge-graph
skipchunk
Extracts a latent knowledge graph from text and index/query it in elasticsearch or solr
Stars: ✭ 18 (-78.57%)
Mutual labels:  knowledge-graph
kglib
TypeDB-ML is the Machine Learning integrations library for TypeDB
Stars: ✭ 523 (+522.62%)
Mutual labels:  knowledge-graph
Knowledge Graph Wander
A collection of papers, codes, projects, tutorials ... for Knowledge Graph and other NLP methods
Stars: ✭ 26 (-69.05%)
Mutual labels:  knowledge-graph
CoLAKE
COLING'2020: CoLAKE: Contextualized Language and Knowledge Embedding
Stars: ✭ 86 (+2.38%)
Mutual labels:  knowledge-graph
NBFNet
Official implementation of Neural Bellman-Ford Networks (NeurIPS 2021)
Stars: ✭ 106 (+26.19%)
Mutual labels:  knowledge-graph
ChineseTextAnalysisResouce
中文文本分析相关资源汇总
Stars: ✭ 71 (-15.48%)
Mutual labels:  knowledge-graph
obo-relations
RO is an ontology of relations for use with biological ontologies
Stars: ✭ 63 (-25%)
Mutual labels:  knowledge-graph
yang-db
YANGDB Open-source, Scalable, Non-native Graph database (Powered by Elasticsearch)
Stars: ✭ 92 (+9.52%)
Mutual labels:  knowledge-graph
semantic-python-overview
(subjective) overview of projects which are related both to python and semantic technologies (RDF, OWL, Reasoning, ...)
Stars: ✭ 406 (+383.33%)
Mutual labels:  knowledge-graph
Shukongdashi
使用知识图谱,自然语言处理,卷积神经网络等技术,基于python语言,设计了一个数控领域故障诊断专家系统
Stars: ✭ 109 (+29.76%)
Mutual labels:  knowledge-graph
KG4Rec
Knowledge-aware recommendation papers.
Stars: ✭ 76 (-9.52%)
Mutual labels:  knowledge-graph

WSDM2021_NSM (Neural State Machine for KBQA)

PWC

PWC

This is our Pytorch implementation for the paper:

Gaole He, Yunshi Lan, Jing Jiang, Wayne Xin Zhao and Ji-Rong Wen (2021). Improving Multi-hop Knowledge Base Question Answering by Learning Intermediate Supervision Signals. paper, slides, poster, video, CN blog. In WSDM'2021.

Introduction

Multi-hop Knowledge Base Question Answering (KBQA) aims to find the answer entities that are multiple hops away in the Knowledge Base (KB) from the entities in the question. A major challenge is the lack of supervision signals at intermediate steps. Therefore, multi-hop KBQA algorithms can only receive the feedback from the final answer, which makes the learning unstable or ineffective. To address this challenge, we propose a novel teacher-student approach for the multi-hop KBQA task.

Requirements:

  • Python 3.6
  • Pytorch >= 1.3

Dataset

We provide three processed datasets in : WebQuestionsSP (webqsp), Complex WebQuestions 1.1 (CWQ), and MetaQA.

  • We follow GraftNet to preprocess the datasets and construct question-specific graph.
  • You can find instructions to obtain datasets used in this repo in preprocessing folder
  • You can also download preprocessed datasets from google drive, and unzip it into dataset folder, and use config --data_folder <data_path> to indicate it.
Datasets Train Dev Test #entity coverage
MetaQA-1hop 96,106 9,992 9,947 487.6 100%
MetaQA-2hop 118,980 14,872 14,872 469.8 100%
MetaQA-3hop 114,196 14,274 14,274 497.9 99.0%
webqsp 2,848 250 1,639 1,429.8 94.9%
CWQ 27,639 3,519 3,531 1,305.8 79.3%

Each dataset is organized with following structure:

  • data-name/
    • *.dep: file contains question id, question text and dependency parsing (not used in our code);
    • *_simple.json: dataset file, every line describes a question and related question-specific graph; you can find how this file is generated with simplify_dataset.py. Mainly map entity, relation to global id in entities.txt and relations.txt.
    • entities.txt: file contains a list of entities;
    • relations.txt: file contains list of relations.
    • vocab_new.txt: vocab file.
    • word_emb_300d.npy: vocab related glove embeddings.

Results

We provide result for : WebQuestionsSP (webqsp), Complex WebQuestions 1.1 (CWQ), and MetaQA.

  • We follow GraftNet to conduct evaluation. Baseline results come from original paper or related paper.
Models webqsp MetaQA-1hop MetaQA-2hop MetaQA-3hop CWQ
KV-Mem 46.7 96.2 82.7 48.9 21.1
GraftNet 66.4 97.0 94.8 77.7 32.8
PullNet 68.1 97.0 99.9 91.4 45.9
SRN - 97.0 95.1 75.2 -
EmbedKGQA 66.6 97.5 98.8 94.8 -
NSM 68.7 97.1 99.9 98.9 47.6
NSM+p 73.9 97.3 99.9 98.9 48.3
NSM+h 74.3 97.2 99.9 98.9 48.8

The leaderboard result for NSM+h is 53.9, and we get rank 2 at 22th May 2021. (We are supposed to be ranked as top-1, if we submit around WSDM 2021 ddl 17th August 2020.) leaderboad

Training Instruction

Download preprocessed datasets from google drive, and unzip it into dataset folder, and use config --data_folder <data_path> to indicate it. reported models for webqsp and CWQ dataset are available at google drive. use following args to run the code. make sure you created --checkpoint_dir, in the bash, it's supposed to have a 'checkpoint' folder in this repository.

mkdir checkpoint
example commands: run_webqsp.sh, run_CWQ.sh, run_metaqa.sh

You can directly load trained ckpt and conduct fast evaluation with appending --is_eval --load_experiment <ckpt_file> to example commands. Notice that --load_experiment config only accept relative path to --checkpoint_dir.

you can get detailed evaluation information about every question in test set, saved as file in --checkpoint_dir. For more details, you can refer to NSM/train/evaluate_nsm.py.

Important arguments:

--data_folder          Path to load dataset.
--checkpoint_dir       Path to save checkpoint and logs.
--num_step             Multi-hop reasoning steps, hyperparameters.
--entity_dim           Hidden size of reasoning module.
--eval_every           Number of interval epoches between evaluation.
--experiment_name      The name of log and ckpt. If not defined, it will be generated with timestamp.
--eps                  Accumulated probability to collect answers, used to generate answers and affect Precision, Recalll and F1 metric.
--use_self_loop        If set, add a self-loop edge to all graph nodes.
--use_inverse_relation If set, add reverse edges to graph.
--encode_type          If set, use type layer initialize entity embeddings. 
--load_experiment      Path to load trained ckpt, only relative path to --checkpoint_dir is acceptable. 
--is_eval              If set, code will run fast evaluation mode on test set with trained ckpt from --load_experiment option.
--reason_kb            If set, model will reason step by step. Otherwise, model may focus on all nodes on graph every step.
--load_teacher         Path to load teacher ckpt, only relative path to --checkpoint_dir is acceptable. 

Acknowledgement

Any scientific publications that use our codes and datasets should cite the following paper as the reference:

@inproceedings{He-WSDM-2021,
    title = "Improving Multi-hop Knowledge Base Question Answering by Learning Intermediate Supervision Signals",
    author = {Gaole He and
              Yunshi Lan and
              Jing Jiang and
              Wayne Xin Zhao and
              Ji{-}Rong Wen},
    booktitle = {{WSDM}},
    year = {2021},
}

Nobody guarantees the correctness of the data, its suitability for any particular purpose, or the validity of results based on the use of the data set. The data set may be used for any research purposes under the following conditions:

  • The user must acknowledge the use of the data set in publications resulting from the use of the data set.
  • The user may not redistribute the data without separate permission.
  • The user may not try to deanonymise the data.
  • The user may not use this information for any commercial or revenue-bearing purposes without first obtaining permission from us.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].