All Projects → kelvin-jiang → FreebaseQA

kelvin-jiang / FreebaseQA

Licence: CC-BY-4.0 license
The release of the FreebaseQA data set (NAACL 2019).

Projects that are alternatives of or similar to FreebaseQA

Question-Answering-based-on-SQuAD
Question Answering System using BiDAF Model on SQuAD v2.0
Stars: ✭ 20 (-63.64%)
Mutual labels:  question-answering, nlp-datasets
bilkent-turkish-writings-dataset
Turkish writings dataset that promotes creativity, content, composition, grammar, spelling and punctuation.
Stars: ✭ 30 (-45.45%)
Mutual labels:  nlp-datasets
FlowQA
Implementation of conversational QA model: FlowQA (with slight improvement)
Stars: ✭ 197 (+258.18%)
Mutual labels:  question-answering
golang-interview-questions
golang 面试集锦
Stars: ✭ 42 (-23.64%)
Mutual labels:  question-answering
hf-experiments
Experiments with Hugging Face 🔬 🤗
Stars: ✭ 37 (-32.73%)
Mutual labels:  question-answering
Instahelp
Instahelp is a Q&A portal website similar to Quora
Stars: ✭ 21 (-61.82%)
Mutual labels:  question-answering
MoTIS
Mobile(iOS) Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP). Accepted at NAACL 2022.
Stars: ✭ 60 (+9.09%)
Mutual labels:  naacl
KitanaQA
KitanaQA: Adversarial training and data augmentation for neural question-answering models
Stars: ✭ 58 (+5.45%)
Mutual labels:  question-answering
pair2vec
pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference
Stars: ✭ 62 (+12.73%)
Mutual labels:  question-answering
dialogbot
dialogbot, provide search-based dialogue, task-based dialogue and generative dialogue model. 对话机器人,基于问答型对话、任务型对话、聊天型对话等模型实现,支持网络检索问答,领域知识问答,任务引导问答,闲聊问答,开箱即用。
Stars: ✭ 96 (+74.55%)
Mutual labels:  question-answering
django-simple-forum
full featured forum, easy to integrate and use.
Stars: ✭ 65 (+18.18%)
Mutual labels:  question-answering
GrailQA
No description or website provided.
Stars: ✭ 72 (+30.91%)
Mutual labels:  question-answering
Dynamic-Coattention-Network-for-SQuAD
Tensorflow implementation of DCN for question answering on the Stanford Question Answering Dataset (SQuAD)
Stars: ✭ 14 (-74.55%)
Mutual labels:  question-answering
HAR
Code for WWW2019 paper "A Hierarchical Attention Retrieval Model for Healthcare Question Answering"
Stars: ✭ 22 (-60%)
Mutual labels:  question-answering
TeBaQA
A question answering system which utilises machine learning.
Stars: ✭ 17 (-69.09%)
Mutual labels:  question-answering
Multi-Hop-Knowledge-Paths-Human-Needs
Ranking and Selecting Multi-Hop Knowledge Paths to Better Predict Human Needs
Stars: ✭ 17 (-69.09%)
Mutual labels:  naacl
TransTQA
Author: Wenhao Yu ([email protected]). EMNLP'20. Transfer Learning for Technical Question Answering.
Stars: ✭ 12 (-78.18%)
Mutual labels:  question-answering
BERT-for-Chinese-Question-Answering
No description or website provided.
Stars: ✭ 75 (+36.36%)
Mutual labels:  question-answering
HHH-An-Online-Question-Answering-System-for-Medical-Questions
HBAM: Hierarchical Bi-directional Word Attention Model
Stars: ✭ 44 (-20%)
Mutual labels:  question-answering
strategyqa
The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".
Stars: ✭ 27 (-50.91%)
Mutual labels:  question-answering

FreebaseQA (v1.0): A Trivia-type QA Data Set over the Freebase Knowledge Graph

This repository contains FreebaseQA, a new data set for open-domain QA over the Freebase knowledge graph. The question-answer pairs in this data set are collected from various sources, including the TriviaQA data set (Joshi et al., 2017) and other trivia websites (QuizBalls, QuizZone, KnowQuiz), and are matched against Freebase to generate relevant subject-predicate-object triples that were further verified by human annotators. As all questions in FreebaseQA are composed independently for human contestants in various trivia-like competitions, this data set shows richer linguistic variation and complexity than existing QA data sets, making it a good test-bed for emerging KB-QA systems.

If you find this data set useful, please cite the paper:

[1] K. Jiang, D. Wu and H. Jiang, "FreebaseQA: A New Factoid QA Data Set Matching Trivia-Style Question-Answer Pairs with Freebase," Proc. of North American Chapter of the Association for Computational Linguistics (NAACL), June 2019.

All data is distributed under the CC-BY-4.0 license.

Data Set Files

This data set contains 28,348 unique questions that are divided into three subsets: train (20,358), dev (3,994) and eval (3,996), formatted as JSON files: FreebaseQA-[train|dev|eval].json.

We have also included FreebaseQA-partial.json, which is not officially part of FreebaseQA but may be useful for training models for certain NLP tasks such as named entity recognition and entity linking.

Each file is formatted as follows:

  • Dataset: The name of this data set
  • Version: The version of the FreebaseQA data set
  • Questions: The set of unique questions in this data set
    • Question-ID: The unique ID of each question
    • RawQuestion: The original question collected from data sources
    • ProcessedQuestion: The question processed with some operations such as removal of trailing question mark and decapitalization
    • Parses: The semantic parse(s) for the question
      • Parse-Id: The ID of each semantic parse
      • PotentialTopicEntityMention: The potential topic entity mention in the question
      • TopicEntityName: The name or alias of the topic entity in the question from Freebase
      • TopicEntityMid: The Freebase MID of the topic entity in the question
      • InferentialChain: The path from the topic entity node to the answer node in Freebase, labeled as a predicate
      • Answers: The answer found from this parse
        • AnswersMid: The Freebase MID of the answer
        • AnswersName: The answer string from the original question-answer pair

Evaluation Metrics

Accuracy is used as the evaluation metric for this data set, i.e. a question is considered correct only if the predicted answer is exactly the same as one of the given answers.

Freebase Extract

We have extracted a subset of Freebase (2.2GB zip), which includes all relevant entities (16M) and triples (182M) to all FreebaseQA questions. The subset can accompany the FreebaseQA data set in order to evaluate the accuracy of trained models in answering questions. The subset may be downloaded from the following link: https://www.dropbox.com/sh/a25p7j2ir8gqnvx/AABJvjoI9mbHYj3hyfuxSdGaa?dl=0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].