All Projects → PetrochukM → Simple Qa Emnlp 2018

PetrochukM / Simple Qa Emnlp 2018

Code for my EMNLP 2018 paper "SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach"

Projects that are alternatives of or similar to Simple Qa Emnlp 2018

Covid Qa
API & Webapp to answer questions about COVID-19. Using NLP (Question Answering) and trusted data sources.
Stars: ✭ 283 (+225.29%)
Mutual labels:  question-answering, jupyter-notebook
Gossiping Chinese Corpus
PTT 八卦版問答中文語料
Stars: ✭ 137 (+57.47%)
Mutual labels:  question-answering, jupyter-notebook
Deep Embedded Memory Networks
https://arxiv.org/abs/1707.00836
Stars: ✭ 19 (-78.16%)
Mutual labels:  question-answering, jupyter-notebook
Pytorch Question Answering
Important paper implementations for Question Answering using PyTorch
Stars: ✭ 154 (+77.01%)
Mutual labels:  question-answering, jupyter-notebook
Turkish Bert Nlp Pipeline
Bert-base NLP pipeline for Turkish, Ner, Sentiment Analysis, Question Answering etc.
Stars: ✭ 85 (-2.3%)
Mutual labels:  question-answering, jupyter-notebook
Pyrenko
Stars: ✭ 86 (-1.15%)
Mutual labels:  jupyter-notebook
Detection Hackathon Apt29
Place for resources used during the Mordor Detection hackathon event featuring APT29 ATT&CK evals datasets
Stars: ✭ 87 (+0%)
Mutual labels:  jupyter-notebook
Book Mlearn Gyomu
Book sample (AI Machine-learning Deep-learning)
Stars: ✭ 84 (-3.45%)
Mutual labels:  jupyter-notebook
Ml Cv
机器学习实战
Stars: ✭ 85 (-2.3%)
Mutual labels:  jupyter-notebook
Few Shot Text Classification
Code for reproducing the results from the paper Few Shot Text Classification with a Human in the Loop
Stars: ✭ 87 (+0%)
Mutual labels:  jupyter-notebook
Amas
Awesome & Marvelous Amas
Stars: ✭ 1,273 (+1363.22%)
Mutual labels:  question-answering
Text objseg
Code release for Hu et al. Segmentation from Natural Language Expressions. in ECCV, 2016
Stars: ✭ 86 (-1.15%)
Mutual labels:  jupyter-notebook
Quantum programming tutorial
Gamified tutorial for the QISKit quantum SDK
Stars: ✭ 86 (-1.15%)
Mutual labels:  jupyter-notebook
Lstm autoencoder classifier
An LSTM Autoencoder for rare event classification
Stars: ✭ 87 (+0%)
Mutual labels:  jupyter-notebook
Python For Data Scientists
Deliverable: This Jupyter notebook will help aspiring data scientists learn and practice the necessary python code needed for many data science projects.
Stars: ✭ 86 (-1.15%)
Mutual labels:  jupyter-notebook
Deep Learning Notes
Experiments with Deep Learning
Stars: ✭ 1,278 (+1368.97%)
Mutual labels:  jupyter-notebook
Kaggle Competitions
There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.
Stars: ✭ 86 (-1.15%)
Mutual labels:  jupyter-notebook
Caffeonspark
Distributed deep learning on Hadoop and Spark clusters.
Stars: ✭ 1,272 (+1362.07%)
Mutual labels:  jupyter-notebook
Airbnb Dynamic Pricing Optimization
[BA project] Dynamic Pricing Optimization for Airbnb listing to optimize yearly profit for host. Use Clustering for competitive analysis, kNN regression for demand forecasting, and find dynamic optimal price with Optimization model.
Stars: ✭ 85 (-2.3%)
Mutual labels:  jupyter-notebook
Deep Learning Boot Camp
A community run, 5-day PyTorch Deep Learning Bootcamp
Stars: ✭ 1,270 (+1359.77%)
Mutual labels:  jupyter-notebook

Simple Question Answering — EMNLP 2018

This is the code for the EMNLP 2018 paper "SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach".

On the SimpleQuestions dataset task, one of the most commonly used benchmarks for studying single-relation factoid questions, we:

  1. Show that ambiguity in the data bounds performance on this benchmark at 83.4%; there are often multiple answers that cannot be disambiguated from the question alone.
  2. Introduce a baseline that sets a new state-of-the-art performance level at 78.1% accuracy, using only standard methods.

Example

Preview of the software

Structure

.
├── /notebooks/                          
│   ├── /Simple QA End-To-End/           # Experiments on components of the end-to-end QA pipeline
│   ├── /Simple QA Models                # Experiments on various neural models
│   ├── /Simple QA KG to PostgreSQL DB   # Scripts to populate postgreSQL
│   ├── /Simple QA Numbers               # Scripts for computing and verifying various numbers
├── /pretrained_models/                   
├── /lib/                                # Various utility functionality
├── /tests/                               
├── .flake8                               
└── requirements.txt                     # Required python packages

Prerequisites

This repository requires Python 3.5 or greater and PostgreSQL.

Installation

  • Clone the repository and cd into it
git clone https://github.com/PetrochukM/Simple-QA-EMNLP-2018.git
cd Simple-QA-EMNLP-2018
  • Install the required packages
python -m pip install -r requirements.txt
  • Create and populate a PostgreSQL table named fb_two_subject_name with notebooks/Simple QA KG to PostgreSQL DB/fb_two_subject_name.csv.gz

  • Create a .pass file using the below template:

    DB_NAME=
    DB_PORT=
    DB_USER=
    DB_HOST=
    DB_PASS=
    

    Such that:

    • DB_NAME: the database name
    • DB_USER: user name used to authenticate
    • DB_PASS: password used to authenticate
    • DB_HOST: database host address
    • DB_PORT: connection port number (typically 5432)
  • Download the SimpleQuestions v2 dataset from Facebook Research. Use the notebook at Simple-QA-EMNLP-2018/notebooks/Simple QA KG to PostgreSQL DB/FB5M & FB2M KG to DB.ipynb to create and populate a PostgreSQL table.

  • You're done! Feel free to run Simple-QA-EMNLP-2018/notebooks/Simple QA End-To-End.

Slides

The slides used for our EMNLP talk.

Citation

@article{Petrochuk2018SimpleQuestionsNS,
  title={SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach},
  author={Michael Petrochuk and Luke S. Zettlemoyer},
  journal={CoRR},
  year={2018},
  volume={abs/1804.08798}
}

Important Notes

  • The FB2M and FB5M subsets of Freebase KG can complete 7,188,636 and 7,688,234 graph queries respectively; therefore, the FB5M subset is 6.9% larger than the FB2M subset. Also, the FB5M dataset only contains 3.98M entities. This contradicts the statement that "FB5M, is much larger with about 5M entities" (Bordes et al., 2015).
  • FB5M and FB2M contain 4,322,266 and 3,654,470 duplicate grouped facts respectively.
  • FB2M is not a subset of FB5M, 1 atomic fact is in FB2M that is not in FB5M: (01g4wmh, music/album/acquire_webpage, 02q5zps).
  • FB5M and FB2M do not contain the answer for 24 and 36 examples in SimpleQuestions dataset respectively; therefore, those examples are unanswerable.

Other Important Papers

Other Important GitHub Repositories

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].