All Projects → sheffieldnlp → naacl2018-fever

sheffieldnlp / naacl2018-fever

Licence: Apache-2.0 license
Fact Extraction and VERification baseline published in NAACL2018

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to naacl2018-fever

Invoicenet
Deep neural network to extract intelligent information from invoice documents.
Stars: ✭ 1,886 (+1630.28%)
Mutual labels:  information-retrieval, information-extraction
verif
Software for verifying weather forecasts
Stars: ✭ 70 (-35.78%)
Mutual labels:  evaluation, verification
hydrotools
Suite of tools for retrieving USGS NWIS observations and evaluating National Water Model (NWM) data.
Stars: ✭ 36 (-66.97%)
Mutual labels:  evaluation, verification
Nlp Projects
word2vec, sentence2vec, machine reading comprehension, dialog system, text classification, pretrained language model (i.e., XLNet, BERT, ELMo, GPT), sequence labeling, information retrieval, information extraction (i.e., entity, relation and event extraction), knowledge graph, text generation, network embedding
Stars: ✭ 360 (+230.28%)
Mutual labels:  information-retrieval, information-extraction
Awesome Hungarian Nlp
A curated list of NLP resources for Hungarian
Stars: ✭ 121 (+11.01%)
Mutual labels:  information-retrieval, information-extraction
Pytrec eval
pytrec_eval is an Information Retrieval evaluation tool for Python, based on the popular trec_eval.
Stars: ✭ 114 (+4.59%)
Mutual labels:  information-retrieval, evaluation
evildork
Evildork targeting your fiancee👁️
Stars: ✭ 46 (-57.8%)
Mutual labels:  information-retrieval, information-extraction
Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (+13.76%)
Mutual labels:  information-retrieval, information-extraction
Vec4ir
Word Embeddings for Information Retrieval
Stars: ✭ 188 (+72.48%)
Mutual labels:  information-retrieval, evaluation
serval-sosp19
This repo contains the artifact for our SOSP'19 paper on Serval
Stars: ✭ 26 (-76.15%)
Mutual labels:  verification
Colorization
The pythorch implementation of Colorful Image Colorization. In ECCV, 2016.
Stars: ✭ 34 (-68.81%)
Mutual labels:  pytorch-implmention
tech-seo-crawler
Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.
Stars: ✭ 57 (-47.71%)
Mutual labels:  wikipedia
contextual
Contextual Bandits in R - simulation and evaluation of Multi-Armed Bandit Policies
Stars: ✭ 72 (-33.94%)
Mutual labels:  evaluation
Spell4Wiki
Spell4Wiki is a mobile application to record and upload audio for Wiktionary words to Wikimedia commons. Also act as a Wiki-Dictionary.
Stars: ✭ 17 (-84.4%)
Mutual labels:  wikipedia
firebase-spring-boot-rest-api-authentication
Firebase Spring Boot Rest API Authentication
Stars: ✭ 172 (+57.8%)
Mutual labels:  verification
revc
The fastest and safest EVC encoder and decoder
Stars: ✭ 75 (-31.19%)
Mutual labels:  baseline
hagelslag
Hagelslag is an object-based severe storm hazard forecasting system.
Stars: ✭ 58 (-46.79%)
Mutual labels:  verification
embedding evaluation
Evaluate your word embeddings
Stars: ✭ 32 (-70.64%)
Mutual labels:  evaluation
ProQA
Progressively Pretrained Dense Corpus Index for Open-Domain QA and Information Retrieval
Stars: ✭ 44 (-59.63%)
Mutual labels:  information-retrieval
rust-stemmers
A rust implementation of some popular snowball stemming algorithms
Stars: ✭ 85 (-22.02%)
Mutual labels:  information-retrieval

Fact Extraction and VERification

Important Information

This repository requires dependencies that are no longer available on PIP and Anaconda: an updated version of this repository has been created for the FEVER2.0 shared task and is available on pip and docker. For more info please see this repository: https://github.com/j6mes/fever2-sample

About

This is the PyTorch implementation of the FEVER pipeline baseline described in the NAACL2018 paper: FEVER: A large-scale dataset for Fact Extraction and VERification.

Unlike other tasks and despite recent interest, research in textual claim verification has been hindered by the lack of large-scale manually annotated datasets. In this paper we introduce a new publicly available dataset for verification against textual sources, FEVER: Fact Extraction and VERification. It consists of 185,441 claims generated by altering sentences extracted from Wikipedia and subsequently verified without knowledge of the sentence they were derived from. The claims are classified as Supported, Refuted or NotEnoughInfo by annotators achieving 0.6841 in Fleiss κ. For the first two classes, the annotators also recorded the sentence(s) forming the necessary evidence for their judgment. To characterize the challenge of the dataset presented, we develop a pipeline approach using both baseline and state-of-the-art components and compare it to suitably designed oracles. The best accuracy we achieve on labeling a claim accompanied by the correct evidence is 31.87%, while if we ignore the evidence we achieve 50.91%. Thus we believe that FEVER is a challenging testbed that will help stimulate progress on claim verification against textual sources

The baseline model constists of two components: Evidence Retrieval (DrQA) + Textual Entailment (Decomposable Attention).

Find Out More

  • Visit http://fever.ai to find out more about the shared task and download the data.

Quick Links

Pre-requisites

This was tested and evaluated using the Python 3.6 verison of Anaconda 5.0.1 which can be downloaded from anaconda.com

Mac OSX users may have to install xcode before running git or installing packages (gcc may fail). See this post on StackExchange

Support for Windows operating systems is not provided.

To train the Decomposable Attention models, it is highly recommended to use a GPU. Training will take about 3 hours on a GTX 1080Ti whereas training on a CPU will take days. We offer a pre-trained model.tar.gz that can be downloaded. To use the pretrained model, simply replace any path to a model.tar.gz file with the path to the file you downloaded. (e.g. logs/da_nn_sent/model.tar.gz could become ~/Downloads/model.tar.gz)

Change Log

  • v0.3 - Added the ability to read unlabelled data (i.e. the blind dataset for the competition). You must update to this version to take part in the competition
  • v0.2 - updated the Information Retrieval component to use a modified version of DrQA that allows multi-threaded document/sentence retrieval. This yields a >10x speed-up the in IR stage of the pipeline as I/O waits are no longer blocking computation of TF*IDF vectors
  • v0.1 - original implementation (tagged as naacl2018)

Docker Install

Download and run the latest FEVER.

docker volume create fever-data
docker run -it -v fever-data:/fever/data sheffieldnlp/fever-baselines

To enable GPU acceleration (run with --runtime=nvidia) once NVIDIA Docker has been installed

Manual Install

Installation using docker is preferred. If you are unable to do this, you can manually create the python environment following instructions here: Wiki/Manual-Install

Remember that if you manually installed, you should run source activate fever and cd to the directory before you run any commands.

Download Data

Wikipedia

To download a pre-processed Wikipedia dump (license):

bash scripts/download-processed-wiki.sh

Or download the raw data and process yourself

bash scripts/download-raw-wiki.sh
bash scripts/process-wiki.sh

Dataset

Download the FEVER dataset from our website into the data directory:

bash scripts/download-data.sh

(note that if you want to replicate the paper, run scripts/download-paper.sh instead of scripts/download-data).

Word Embeddings

Download pretrained GloVe Vectors

bash scripts/download-glove.sh

Data Preparation

Sample training data for the NotEnoughInfo class. There are two sampling methods evaluated in the paper: using the nearest neighbour (similarity between TF-IDF vectors) and random sampling.

#Using nearest neighbor method
PYTHONPATH=src python src/scripts/retrieval/document/batch_ir_ns.py --model data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --count 1 --split train
PYTHONPATH=src python src/scripts/retrieval/document/batch_ir_ns.py --model data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --count 1 --split dev

Or random sampling

#Using random sampling method
PYTHONPATH=src python src/scripts/dataset/neg_sample_evidence.py data/fever/fever.db

Training

We offer a pretrained model that can be downloaded by running the following command:

bash scripts/download-model.sh

Skip to evaluation if you are using the pretrained model.

Train DA

Train the Decomposable Attention model

#if using a CPU, set
export CUDA_DEVICE=-1

#if using a GPU, set
export CUDA_DEVICE=0 #or cuda device id

Then either train the model with Nearest-Page Sampling for the NEI class

# Using nearest neighbor sampling method for NotEnoughInfo class (better)
PYTHONPATH=src python src/scripts/rte/da/train_da.py data/fever/fever.db config/fever_nn_ora_sent.json logs/da_nn_sent --cuda-device $CUDA_DEVICE
mkdir -p data/models
cp logs/da_nn_sent/model.tar.gz data/models/decomposable_attention.tar.gz

Or with Random Sampling for the NEI class

# Using random sampled data for NotEnoughInfo (worse)
PYTHONPATH=src python src/scripts/rte/da/train_da.py data/fever/fever.db config/fever_rs_ora_sent.json logs/da_rs_sent --cuda-device $CUDA_DEVICE
mkdir -p data/models
cp logs/da_rs_sent/model.tar.gz data/models/decomposable_attention.tar.gz

Train MLP

The MLP model can be trained following instructions from the Wiki: Wiki/Train-MLP

Evaluation

These instructions are for the decomposable attention model. The MLP model can be evaluated following instructions from the Wiki: Wiki/Evaluate-MLP

Oracle Evaluation (no evidence retrieval):

Run the oracle evaluation for the Decomposable Attention model on the dev set (requires sampling the NEI class for the dev dataset - see Data Preparation)

PYTHONPATH=src python src/scripts/rte/da/eval_da.py data/fever/fever.db data/models/decomposable_attention.tar.gz data/fever/dev.ns.pages.p1.jsonl

Evidence Retrieval Evaluation:

First retrieve the evidence for the dev/test sets:

#Dev
PYTHONPATH=src python src/scripts/retrieval/ir.py --db data/fever/fever.db --model data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --in-file data/fever-data/dev.jsonl --out-file data/fever/dev.sentences.p5.s5.jsonl --max-page 5 --max-sent 5

#Test
PYTHONPATH=src python src/scripts/retrieval/ir.py --db data/fever/fever.db --model data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --in-file data/fever-data/test.jsonl --out-file data/fever/test.sentences.p5.s5.jsonl --max-page 5 --max-sent 5

Then run the model:

#Dev
PYTHONPATH=src python src/scripts/rte/da/eval_da.py data/fever/fever.db data/models/decomposable_attention.tar.gz data/fever/dev.sentences.p5.s5.jsonl  --log data/decomposable_attention.dev.log

#Test
PYTHONPATH=src python src/scripts/rte/da/eval_da.py data/fever/fever.db data/models/decomposable_attention.tar.gz data/fever/test.sentences.p5.s5.jsonl  --log logs/decomposable_attention.test.log

Scoring

Score locally (for dev set)

Score:

PYTHONPATH=src python src/scripts/score.py --predicted_labels data/decomposable_attention.dev.log --predicted_evidence data/fever/dev.sentences.p5.s5.jsonl --actual data/fever-data/dev.jsonl

Or score on Codalab (for dev/test)

Prepare Submission for Codalab (dev):

PYTHONPATH=src python src/scripts/prepare_submission.py --predicted_labels logs/decomposable_attention.dev.log --predicted_evidence data/fever/dev.sentences.p5.s5.jsonl --out_file predictions.jsonl
zip submission.zip predictions.jsonl

Prepare Submission for Codalab (test):

PYTHONPATH=src python src/scripts/prepare_submission.py --predicted_labels logs/decomposable_attention.test.log --predicted_evidence data/fever/test.sentences.p5.s5.jsonl --out_file predictions.jsonl
zip submission.zip predictions.jsonl
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].