All Projects → huybery → r2sql

huybery / r2sql

Licence: other
🌶️ R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
shell
77523 projects

Projects that are alternatives of or similar to r2sql

sede
Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data
Stars: ✭ 83 (+38.33%)
Mutual labels:  semantic-parsing, text2sql
gap-text2sql
GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training
Stars: ✭ 83 (+38.33%)
Mutual labels:  semantic-parsing, text2sql
spring
SPRING is a seq2seq model for Text-to-AMR and AMR-to-Text (AAAI2021).
Stars: ✭ 103 (+71.67%)
Mutual labels:  semantic-parsing
Hanlp
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
Stars: ✭ 24,626 (+40943.33%)
Mutual labels:  semantic-parsing
Compositional-Generalization-in-Natural-Language-Processing
Compositional Generalization in Natual Language Processing. A roadmap.
Stars: ✭ 26 (-56.67%)
Mutual labels:  semantic-parsing
flowsense
FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System
Stars: ✭ 40 (-33.33%)
Mutual labels:  semantic-parsing
SPARQA
SPARQA: Skeleton-based Semantic Parsing for Complex Questions over Knowledge Bases, AAAI 2020
Stars: ✭ 64 (+6.67%)
Mutual labels:  semantic-parsing
WikiTableQuestions
A dataset of complex questions on semi-structured Wikipedia tables
Stars: ✭ 81 (+35%)
Mutual labels:  semantic-parsing
Question-Answering
Question Answering over Knowledge Bases
Stars: ✭ 24 (-60%)
Mutual labels:  semantic-parsing
text2sql-lgesql
This is the project containing source codes and pre-trained models about ACL2021 Long Paper ``LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations".
Stars: ✭ 68 (+13.33%)
Mutual labels:  semantic-parsing
weak-supervised-Rule-Text2SQL
Using Database Rule for Weak Supervised Text-to-SQL Generation
Stars: ✭ 13 (-78.33%)
Mutual labels:  semantic-parsing
semantic-parsing-dual
Source code and data for ACL 2019 Long Paper ``Semantic Parsing with Dual Learning".
Stars: ✭ 17 (-71.67%)
Mutual labels:  semantic-parsing
TabularSemanticParsing
Translating natural language questions to a structured query language
Stars: ✭ 148 (+146.67%)
Mutual labels:  semantic-parsing
ucca-parser
[SemEval'19] Code for "HLT@SUDA at SemEval 2019 Task 1: UCCA Graph Parsing as Constituent Tree Parsing"
Stars: ✭ 18 (-70%)
Mutual labels:  semantic-parsing
ContextualSP
Multiple paper open-source codes of the Microsoft Research Asia DKI group
Stars: ✭ 224 (+273.33%)
Mutual labels:  semantic-parsing
lang2logic-PyTorch
PyTorch port of the paper "Language to Logical Form with Neural Attention"
Stars: ✭ 34 (-43.33%)
Mutual labels:  semantic-parsing
parse seq2seq
A tensorflow implementation of neural sequence-to-sequence parser for converting natural language queries to logical form.
Stars: ✭ 26 (-56.67%)
Mutual labels:  semantic-parsing
Natural-language-understanding-papers
NLU: domain-intent-slot; text2SQL
Stars: ✭ 77 (+28.33%)
Mutual labels:  text2sql

R²SQL

The PyTorch implementation of paper Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing. (AAAI 2021)

Requirements

The model is tested in python 3.6 with following requirements:

torch==1.0.0
transformers==2.10.0
sqlparse
pymysql
progressbar
nltk
numpy
six
spacy

All experiments on SParC and CoSQL datasets were run on NVIDIA V100 GPU with 32GB GPU memory.

  • Tips: The 16GB GPU memory may appear out-of-memory error.

Setup

The SParC and CoSQL experiments in two different folders, you need to download different datasets from [SParC | CoSQL] to the {sparc|cosql}/data folder separately. Another related data file could be download from EditSQL. Then, download the database sqlite files from [here] as data/database.

Download Pretrained BERT model from [here] as model/bert/data/annotated_wikisql_and_PyTorch_bert_param/pytorch_model_uncased_L-12_H-768_A-12.bin.

Download Glove embeddings file (glove.840B.300d.txt) and change the GLOVE_PATH for your own path in all scripts.

Download Reranker models from [SParC reranker | CoSQL reranker] as submit_models/reranker_roberta.pt, besides the roberta-base model could download from here for ./[sparc|cosql]/local_param/.

Usage

Train the model from scratch.

./sparc_train.sh

Test the model for the concrete checkpoint:

./sparc_test.sh

then the dev prediction file will be appeared in results folder, named like save_%d_predictions.json.

Get the evaluation result from the prediction file:

./sparc_evaluate.sh

the final result will be appeared in results folder, named *.eval.

Similarly, the CoSQL experiments could be reproduced in same way.


You could download our trained checkpoint and results in here:

Reranker

If your want train your own reranker model, you could download the training file from here:

Then you could train, test and predict it:

train:

python -m reranker.main --train --batch_size 64 --epoches 50

test:

python -m reranker.main --test --batch_size 64

predict:

python -m reranker.predict

Improvements

We have improved the origin version (descripted in paper) and got more performance improvements 🥳!

Compare with the origin version, we have made the following improvements:

  • add the self-ensemble strategy for prediction, which use different epoch checkpoint to get final result. In order to easily perform this strategy, we remove the task-related representation in Reranker module.
  • remove the decay function in DCRI, we find that DCRI is unstable with decay function, so we let DCRI degenerate into vanilla cross attention.
  • replace the BERT-based with RoBERTa-based model for Reranker module.

The final performance comparison on dev as follows:

SParC CoSQL
QM IM QM IM
EditSQL 47.2 29.5 39.9 12.3
R²SQL v1 (origin paper) 54.1 35.2 45.7 19.5
R²SQL v2 (this repo) 54.0 35.2 46.3 19.5
R²SQL v2 + ensemble 55.1 36.8 47.3 20.9

Citation

Please star this repo and cite paper if you want to use it in your work.

Acknowledgments

This implementation is based on "Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions" EMNLP 2019.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].