Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → cdqa-suite → Cdqa

cdqa-suite / Cdqa

Licence: apache-2.0

⛔ [NOT MAINTAINED] An End-To-End Closed Domain Question Answering System.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

deep-learning pytorch nlp natural-language-processing artificial-intelligence question-answering information-retrieval

Projects that are alternatives of or similar to Cdqa

Knowledge Graphs

A collection of research on knowledge graphs

Stars: ✭ 845 (+69%)

Mutual labels: question-answering, information-retrieval, natural-language-processing

Spago

Self-contained Machine Learning and Natural Language Processing library in Go

Stars: ✭ 854 (+70.8%)

Mutual labels: artificial-intelligence, question-answering, natural-language-processing

Neuronblocks

NLP DNN Toolkit - Building Your NLP DNN Models Like Playing Lego

Stars: ✭ 1,356 (+171.2%)

Mutual labels: artificial-intelligence, question-answering, natural-language-processing

Pyswip

PySwip is a Python - SWI-Prolog bridge enabling to query SWI-Prolog in your Python programs. It features an (incomplete) SWI-Prolog foreign language interface, a utility class that makes it easy querying with Prolog and also a Pythonic interface.

Stars: ✭ 276 (-44.8%)

Mutual labels: artificial-intelligence, natural-language-processing

Lda

LDA topic modeling for node.js

Stars: ✭ 262 (-47.6%)

Mutual labels: artificial-intelligence, natural-language-processing

Awesome Ai Awesomeness

A curated list of awesome awesomeness about artificial intelligence

Stars: ✭ 268 (-46.4%)

Mutual labels: artificial-intelligence, natural-language-processing

cdQA-ui

⛔ [NOT MAINTAINED] A web interface for cdQA and other question answering systems.

Stars: ✭ 19 (-96.2%)

Mutual labels: information-retrieval, question-answering

Ai Deadlines

⏰ AI conference deadline countdowns

Stars: ✭ 3,852 (+670.4%)

Mutual labels: artificial-intelligence, natural-language-processing

Graphbrain

Language, Knowledge, Cognition

Stars: ✭ 294 (-41.2%)

Mutual labels: artificial-intelligence, natural-language-processing

Adam qas

ADAM - A Question Answering System. Inspired from IBM Watson

Stars: ✭ 330 (-34%)

Mutual labels: question-answering, natural-language-processing

Botlibre

An open platform for artificial intelligence, chat bots, virtual agents, social media automation, and live chat automation.

Stars: ✭ 412 (-17.6%)

Mutual labels: artificial-intelligence, natural-language-processing

Ai Job Notes

AI算法岗求职攻略（涵盖准备攻略、刷题指南、内推和AI公司清单等资料）

Stars: ✭ 3,191 (+538.2%)

Mutual labels: artificial-intelligence, natural-language-processing

Fakenewscorpus

A dataset of millions of news articles scraped from a curated list of data sources.

Stars: ✭ 255 (-49%)

Mutual labels: artificial-intelligence, natural-language-processing

Olivia

💁‍♀️Your new best friend powered by an artificial neural network

Stars: ✭ 3,114 (+522.8%)

Mutual labels: artificial-intelligence, natural-language-processing

Articutapi

API of Articut 中文斷詞 (兼具語意詞性標記)：「斷詞」又稱「分詞」，是中文資訊處理的基礎。Articut 不用機器學習，不需資料模型，只用現代白話中文語法規則，即能達到 SIGHAN 2005 F1-measure 94% 以上，Recall 96% 以上的成績。

Stars: ✭ 252 (-49.6%)

Mutual labels: artificial-intelligence, natural-language-processing

Trankit

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Stars: ✭ 311 (-37.8%)

Mutual labels: artificial-intelligence, natural-language-processing

Gnn4nlp Papers

A list of recent papers about Graph Neural Network methods applied in NLP areas.

Stars: ✭ 405 (-19%)

Mutual labels: question-answering, natural-language-processing

Learn Data Science For Free

This repositary is a combination of different resources lying scattered all over the internet. The reason for making such an repositary is to combine all the valuable resources in a sequential manner, so that it helps every beginners who are in a search of free and structured learning resource for Data Science. For Constant Updates Follow me in …

Stars: ✭ 4,757 (+851.4%)

Mutual labels: artificial-intelligence, natural-language-processing

Ml Visuals

🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.

Stars: ✭ 5,676 (+1035.2%)

Mutual labels: artificial-intelligence, natural-language-processing

cherche

📑 Neural Search

Stars: ✭ 196 (-60.8%)

Mutual labels: information-retrieval, question-answering

View All Similar Projects ➔

cdQA: Closed Domain Question Answering

An End-To-End Closed Domain Question Answering System. Built on top of the HuggingFace transformers library.

⛔ [NOT MAINTAINED] This repository is no longer maintained, but is being kept around for educational purposes. If you want a maintained alternative to cdQA check out: https://github.com/deepset-ai/haystack

cdQA in details

If you are interested in understanding how the system works and its implementation, we wrote an article on Medium with a high-level explanation.

We also made a presentation during the #9 NLP Breakfast organised by Feedly. You can check it out here.

Installation
Getting started
Notebook Examples
Deployment
- Manual
Contributing
References
LICENSE

Installation

With pip

pip install cdqa

From source

git clone https://github.com/cdqa-suite/cdQA.git
cd cdQA
pip install -e .

Hardware Requirements

Experiments have been done with:

CPU 👉 AWS EC2 t2.medium Deep Learning AMI (Ubuntu) Version 22.0
GPU 👉 AWS EC2 p3.2xlarge Deep Learning AMI (Ubuntu) Version 22.0 + a single Tesla V100 16GB.

Getting started

Preparing your data

Manual

To use cdQA you need to create a pandas dataframe with the following columns:

title	paragraphs
The Article Title	[Paragraph 1 of Article, ... , Paragraph N of Article]

With converters

The objective of cdqa converters is to make it easy to create this dataframe from your raw documents database. For instance the pdf_converter can create a cdqa dataframe from a directory containing .pdf files:

from cdqa.utils.converters import pdf_converter

df = pdf_converter(directory_path='path_to_pdf_folder')

You will need to install Java OpenJDK to use this converter. We currently have converters for:

pdf
markdown

We plan to improve and add more converters in the future. Stay tuned!

Downloading pre-trained models and data

You can download the models and data manually from the GitHub releases or use our download functions:

from cdqa.utils.download import download_squad, download_model, download_bnpp_data

directory = 'path-to-directory'

# Downloading data
download_squad(dir=directory)
download_bnpp_data(dir=directory)

# Downloading pre-trained BERT fine-tuned on SQuAD 1.1
download_model('bert-squad_1.1', dir=directory)

# Downloading pre-trained DistilBERT fine-tuned on SQuAD 1.1
download_model('distilbert-squad_1.1', dir=directory)

Training models

Fit the pipeline on your corpus using the pre-trained reader:

import pandas as pd
from ast import literal_eval
from cdqa.pipeline import QAPipeline

df = pd.read_csv('your-custom-corpus-here.csv', converters={'paragraphs': literal_eval})

cdqa_pipeline = QAPipeline(reader='bert_qa.joblib') # use 'distilbert_qa.joblib' for DistilBERT instead of BERT
cdqa_pipeline.fit_retriever(df=df)

If you want to fine-tune the reader on your custom SQuAD-like annotated dataset:

cdqa_pipeline = QAPipeline(reader='bert_qa.joblib') # use 'distilbert_qa.joblib' for DistilBERT instead of BERT
cdqa_pipeline.fit_reader('path-to-custom-squad-like-dataset.json')

Save the reader model after fine-tuning:

cdqa_pipeline.dump_reader('path-to-save-bert-reader.joblib')

Making predictions

To get the best prediction given an input query:

cdqa_pipeline.predict(query='your question')

To get the N best predictions:

cdqa_pipeline.predict(query='your question', n_predictions=N)

There is also the possibility to change the weight of the retriever score versus the reader score in the computation of final ranking score (the default is 0.35, which is shown to be the best weight on the development set of SQuAD 1.1-open)

cdqa_pipeline.predict(query='your question', retriever_score_weight=0.35)

Evaluating models

In order to evaluate models on your custom dataset you will need to annotate it. The annotation process can be done in 3 steps:

Convert your pandas DataFrame into a json file with SQuAD format:

from cdqa.utils.converters import df2squad

json_data = df2squad(df=df, squad_version='v1.1', output_dir='.', filename='dataset-name')

Use an annotator to add ground truth question-answer pairs:

Please refer to our cdQA-annotator, a web-based annotator for closed-domain question answering datasets with SQuAD format.

Evaluate the pipeline object:

from cdqa.utils.evaluation import evaluate_pipeline

evaluate_pipeline(cdqa_pipeline, 'path-to-annotated-dataset.json')

Evaluate the reader:

from cdqa.utils.evaluation import evaluate_reader

evaluate_reader(cdqa_pipeline, 'path-to-annotated-dataset.json')

Notebook Examples

We prepared some notebook examples under the examples directory.

You can also play directly with these notebook examples using Binder or Google Colaboratory:

Notebook	Hardware	Platform
[1] First steps with cdQA	CPU or GPU
[2] Using the PDF converter	CPU or GPU
[3] Training the reader on SQuAD	GPU

Binder and Google Colaboratory provide temporary environments and may be slow to start but we recommend them if you want to get started with cdQA easily.

Deployment

Manual

You can deploy a cdQA REST API by executing:

export dataset_path=path-to-dataset.csv
export reader_path=path-to-reader-model

FLASK_APP=api.py flask run -h 0.0.0.0

You can now make requests to test your API (here using HTTPie):

http localhost:5000/api query=='your question here'

If you wish to serve a user interface on top of your cdQA system, follow the instructions of cdQA-ui, a web interface developed for cdQA.

Contributing

Read our Contributing Guidelines.

References

Type	Title	Author	Year
📹 Video	Stanford CS224N: NLP with Deep Learning Lecture 10 – Question Answering	Christopher Manning	2019
📰 Paper	Reading Wikipedia to Answer Open-Domain Questions	Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes	2017
📰 Paper	Neural Reading Comprehension and Beyond	Danqi Chen	2018
📰 Paper	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova	2018
📰 Paper	Contextual Word Representations: A Contextual Introduction	Noah A. Smith	2019
📰 Paper	End-to-End Open-Domain Question Answering with BERTserini	Wei Yang, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong, Ming Li, Jimmy Lin	2019
📰 Paper	Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering	Wei Yang, Yuqing Xie, Luchen Tan, Kun Xiong, Ming Li, Jimmy Lin	2019
📰 Paper	Passage Re-ranking with BERT	Rodrigo Nogueira, Kyunghyun Cho	2019
📰 Paper	MRQA: Machine Reading for Question Answering	Jonathan Berant, Percy Liang, Luke Zettlemoyer	2019
📰 Paper	Unsupervised Question Answering by Cloze Translation	Patrick Lewis, Ludovic Denoyer, Sebastian Riedel	2019
💻 Framework	Scikit-learn: Machine Learning in Python	Pedregosa et al.	2011
💻 Framework	PyTorch	Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan	2016
💻 Framework	Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.	Hugging Face	2018

LICENSE

Apache-2.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 500

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (55) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

cdqa-suite / Cdqa

Programming Languages

Labels

Projects that are alternatives of or similar to Cdqa

cdQA: Closed Domain Question Answering

cdQA in details

Table of Contents

Installation

With pip

From source

Hardware Requirements

Getting started

Preparing your data

Manual

With converters

Downloading pre-trained models and data

Training models

Making predictions

Evaluating models

Notebook Examples

Deployment

Manual

Contributing

References

LICENSE