All Projects → husseinmozannar → Soqal

husseinmozannar / Soqal

Licence: mit
Arabic Open Domain Question Answering System using Neural Reading Comprehension

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Soqal

text2text
Text2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (+161.11%)
Mutual labels:  question-answering, tf-idf
Redux React I18n
An i18n solution for React/Redux and React Native projects
Stars: ✭ 64 (-11.11%)
Mutual labels:  arabic
Conversational Ai
Conversational AI Reading Materials
Stars: ✭ 34 (-52.78%)
Mutual labels:  question-answering
Vazir Font
Vazir is a Persian/Arabic font. وزیر یک فونت فارسی/عربی است https://rastikerdar.github.io/vazir-font/
Stars: ✭ 1,085 (+1406.94%)
Mutual labels:  arabic
Predicting Myers Briggs Type Indicator With Recurrent Neural Networks
Stars: ✭ 43 (-40.28%)
Mutual labels:  tf-idf
Mullowbivqa
Hadamard Product for Low-rank Bilinear Pooling
Stars: ✭ 57 (-20.83%)
Mutual labels:  question-answering
Defactonlp
DeFactoNLP: An Automated Fact-checking System that uses Named Entity Recognition, TF-IDF vector comparison and Decomposable Attention models.
Stars: ✭ 30 (-58.33%)
Mutual labels:  tf-idf
Wsdm2018 hyperqa
Reference Implementation for WSDM 2018 Paper "Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering"
Stars: ✭ 66 (-8.33%)
Mutual labels:  question-answering
Mikhak
simple monoline Arabic-Latin semi handwriting typeface
Stars: ✭ 64 (-11.11%)
Mutual labels:  arabic
Pythoncodes
Stars: ✭ 55 (-23.61%)
Mutual labels:  arabic
Logic guided qa
The official implementation of ACL 2020, "Logic-Guided Data Augmentation and Regularization for Consistent Question Answering".
Stars: ✭ 55 (-23.61%)
Mutual labels:  question-answering
Shift Ctrl F
🔎 Search the information available on a webpage using natural language instead of an exact string match.
Stars: ✭ 1,023 (+1320.83%)
Mutual labels:  question-answering
Bidaf Keras
Bidirectional Attention Flow for Machine Comprehension implemented in Keras 2
Stars: ✭ 60 (-16.67%)
Mutual labels:  question-answering
Dialectid e2e
End to End Dialect Identification using Convolutional Neural Network
Stars: ✭ 40 (-44.44%)
Mutual labels:  arabic
Php Interview Best Practices In China
📙 PHP 面试知识点汇总
Stars: ✭ 1,133 (+1473.61%)
Mutual labels:  question-answering
Acl18 results
Code to reproduce results in our ACL 2018 paper "Did the Model Understand the Question?"
Stars: ✭ 31 (-56.94%)
Mutual labels:  question-answering
Cdqa Annotator
⛔ [NOT MAINTAINED] A web-based annotator for closed-domain question answering datasets with SQuAD format.
Stars: ✭ 48 (-33.33%)
Mutual labels:  question-answering
Bert Vietnamese Question Answering
Vietnamese question answering system with BERT
Stars: ✭ 57 (-20.83%)
Mutual labels:  question-answering
Farm
🏡 Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
Stars: ✭ 1,140 (+1483.33%)
Mutual labels:  question-answering
Medical Question Answer Data
Medical question and answer dataset gathered from the web.
Stars: ✭ 65 (-9.72%)
Mutual labels:  question-answering

SOQAL: Neural Arabic Question Answering

This repository includes the code and dataset described in our WANLP 2019 paper Neural Arabic Question Answering by Hussein Mozannar, Karl El Hajal, Elie Maamary and Hazem Hajj.

  • See below how to run a demo of our open domain question answering system in Arabic

  • Google Colab for training BERT on Arabic-SQuAD and ARCD: Colab

Quick Links:

Arabic Open Domain Question Answering

This work builds a system for open domain factual Arabic question answering (QA) using Wikipedia as our knowledge source. This constrains the answer of any question to be a span of text in Wikipedia. However, this enables to use neural reading comprehension models for our end goal.

Open domain QA for Arabic entails three challenges: annotated QA datasets in Arabic, large scale efficient information retrieval and machine reading comprehension. To deal with the lack of Arabic QA datasets we present the Arabic Reading Comprehension Dataset (ARCD) composed of 1,395 questions posed by crowdworkers on Wikipedia articles, and a machine translation of the Stanford Question Answering Dataset (Arabic-SQuAD) containing 48,344 questions.

Our system for open domain question answering in Arabic (SOQAL) is based on three components: (1) a document retriever using a hierarchical TF-IDF approach, (2) a neural reading comprehension model using the pre-trained bi-directional transformer BERT and finally (3) a linear answer ranking module to obtain .

Credit: This work draws inspiration from DrQA.

Platform

Tested for Python 3.6 on Windows 8,10 and Linux. Most commands are written assuming Windows.

Installing SOQAL

(for Windows) Create a new virtual environment (you need to install virtualenv if you want) and activate it:

virtualenv venv
venv\Scripts\activate

Now you are in the virtual environment you have created and will install things here.

Run the following commands to clone the repository and install SOQAL:

git clone https://github.com/husseinmozannar/SOQAL.git
cd SOQAL
pip install -r requirements.txt

Demo

After installing the required packages, we have to download trained retriever and reader.

First BERT model: cased multilingual, trained reader trained BERT and checkpoint file checkpoint: export the mod.zip and place the checkpoint file inside the mod folder, retriever retriever.

Export BERT model and place in bert/, export trained reader and place it in bert/ and place tfretriever.p in retriever/

To interactively ask Arabic open-domain questions to SOQAL, follow the instructions bellow:

python demo_open.py ^
-c bert/multi_cased_L-12_H-768_A-12/bert_config.json ^
-v bert/multi_cased_L-12_H-768_A-12/vocab.txt ^
-o bert/mod/ ^
-r retriever/tfidfretriever.p

And on your browser go to:

localhost:9999

Citation

Please cite our paper if you use our datasets or code:

@inproceedings{mozannar-etal-2019-neural,
    title = "Neural {A}rabic Question Answering",
    author = "Mozannar, Hussein  and
      Maamary, Elie  and
      El Hajal, Karl  and
      Hajj, Hazem",
    booktitle = "Proceedings of the Fourth Arabic Natural Language Processing Workshop",
    month = aug,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/W19-4612",
    doi = "10.18653/v1/W19-4612",
    pages = "108--118",
    abstract = "This paper tackles the problem of open domain factual Arabic question answering (QA) using Wikipedia as our knowledge source. This constrains the answer of any question to be a span of text in Wikipedia. Open domain QA for Arabic entails three challenges: annotated QA datasets in Arabic, large scale efficient information retrieval and machine reading comprehension. To deal with the lack of Arabic QA datasets we present the Arabic Reading Comprehension Dataset (ARCD) composed of 1,395 questions posed by crowdworkers on Wikipedia articles, and a machine translation of the Stanford Question Answering Dataset (Arabic-SQuAD). Our system for open domain question answering in Arabic (SOQAL) is based on two components: (1) a document retriever using a hierarchical TF-IDF approach and (2) a neural reading comprehension model using the pre-trained bi-directional transformer BERT. Our experiments on ARCD indicate the effectiveness of our approach with our BERT-based reader achieving a 61.3 F1 score, and our open domain system SOQAL achieving a 27.6 F1 score.",
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].