All Projects → rhythmcao → text2sql-lgesql

rhythmcao / text2sql-lgesql

Licence: Apache-2.0 license
This is the project containing source codes and pre-trained models about ACL2021 Long Paper ``LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations".

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to text2sql-lgesql

TabularSemanticParsing
Translating natural language questions to a structured query language
Stars: ✭ 148 (+117.65%)
Mutual labels:  semantic-parsing, natural-language-interface, text-to-sql
ContextualSP
Multiple paper open-source codes of the Microsoft Research Asia DKI group
Stars: ✭ 224 (+229.41%)
Mutual labels:  semantic-parsing, text-to-sql
spring
SPRING is a seq2seq model for Text-to-AMR and AMR-to-Text (AAAI2021).
Stars: ✭ 103 (+51.47%)
Mutual labels:  semantic-parsing
cognipy
In-memory Graph Database and Knowledge Graph with Natural Language Interface, compatible with Pandas
Stars: ✭ 31 (-54.41%)
Mutual labels:  natural-language-interface
parse seq2seq
A tensorflow implementation of neural sequence-to-sequence parser for converting natural language queries to logical form.
Stars: ✭ 26 (-61.76%)
Mutual labels:  semantic-parsing
NLIDB
Natural Language Interface to DataBases
Stars: ✭ 100 (+47.06%)
Mutual labels:  natural-language-interface
Gumbel-CRF
Implementation of NeurIPS 20 paper: Latent Template Induction with Gumbel-CRFs
Stars: ✭ 51 (-25%)
Mutual labels:  structured-prediction
Compositional-Generalization-in-Natural-Language-Processing
Compositional Generalization in Natual Language Processing. A roadmap.
Stars: ✭ 26 (-61.76%)
Mutual labels:  semantic-parsing
nl4dv
A python toolkit to create Visualizations (Vis) using natural language (NL) or add an NL interface to existing Vis.
Stars: ✭ 63 (-7.35%)
Mutual labels:  natural-language-interface
sede
Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data
Stars: ✭ 83 (+22.06%)
Mutual labels:  semantic-parsing
zmsp
The Mingled Structured Predictor
Stars: ✭ 20 (-70.59%)
Mutual labels:  structured-prediction
gap-text2sql
GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training
Stars: ✭ 83 (+22.06%)
Mutual labels:  semantic-parsing
SIAN
Code and data for ECML-PKDD paper "Social Influence Attentive Neural Network for Friend-Enhanced Recommendation"
Stars: ✭ 25 (-63.24%)
Mutual labels:  heterogeneous-graph-neural-network
semeval22 structured sentiment
SemEval-2022 Shared Task 10: Structured Sentiment Analysis
Stars: ✭ 67 (-1.47%)
Mutual labels:  structured-prediction
r2sql
🌶️ R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)
Stars: ✭ 60 (-11.76%)
Mutual labels:  semantic-parsing
Hanlp
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
Stars: ✭ 24,626 (+36114.71%)
Mutual labels:  semantic-parsing
lang2logic-PyTorch
PyTorch port of the paper "Language to Logical Form with Neural Attention"
Stars: ✭ 34 (-50%)
Mutual labels:  semantic-parsing
weak-supervised-Rule-Text2SQL
Using Database Rule for Weak Supervised Text-to-SQL Generation
Stars: ✭ 13 (-80.88%)
Mutual labels:  semantic-parsing
semantic-parsing-dual
Source code and data for ACL 2019 Long Paper ``Semantic Parsing with Dual Learning".
Stars: ✭ 17 (-75%)
Mutual labels:  semantic-parsing
ucca-parser
[SemEval'19] Code for "HLT@SUDA at SemEval 2019 Task 1: UCCA Graph Parsing as Constituent Tree Parsing"
Stars: ✭ 18 (-73.53%)
Mutual labels:  semantic-parsing

LGESQL

This is the project containing source code for the paper LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations in ACL 2021 main conference. If you find it useful, please cite our work.

    @inproceedings{cao-etal-2021-lgesql,
            title = "{LGESQL}: Line Graph Enhanced Text-to-{SQL} Model with Mixed Local and Non-Local Relations",
            author = "Cao, Ruisheng  and
            Chen, Lu  and
            Chen, Zhi  and
            Zhao, Yanbin  and
            Zhu, Su  and
            Yu, Kai",
            booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
            month = aug,
            year = "2021",
            address = "Online",
            publisher = "Association for Computational Linguistics",
            url = "https://aclanthology.org/2021.acl-long.198",
            doi = "10.18653/v1/2021.acl-long.198",
            pages = "2541--2555",
    }

Create environment and download dependencies

The following commands are provided in setup.sh.

  1. Firstly, create conda environment text2sql:
  • In our experiments, we use torch==1.6.0 and dgl==0.5.3 with CUDA version 10.1

  • We use one GeForce RTX 2080 Ti for GLOVE and base-series pre-trained language model~(PLM) experiments, one Tesla V100-PCIE-32GB for large-series PLM experiments

    conda create -n text2sql python=3.6
    source activate text2sql
    pip install torch==1.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
    pip install -r requirements.txt
    
  1. Next, download dependencies:

     python -c "import stanza; stanza.download('en')"
     python -c "from embeddings import GloveEmbedding; emb = GloveEmbedding('common_crawl_48', d_emb=300)"
     python -c "import nltk; nltk.download('stopwords')"
    
  2. Download pre-trained language models from Hugging Face Model Hub, such as bert-large-whole-word-masking and electra-large-discriminator, into the pretrained_models directory. The vocab file for glove.42B.300d is also pulled: (please ensure that Git LFS is installed)

     mkdir -p pretrained_models && cd pretrained_models
     git lfs install
     git clone https://huggingface.co/bert-large-uncased-whole-word-masking
     git clone https://huggingface.co/google/electra-large-discriminator
     mkdir -p glove.42b.300d && cd glove.42b.300d
     wget -c http://nlp.stanford.edu/data/glove.42B.300d.zip && unzip glove.42B.300d.zip
     awk -v FS=' ' '{print $1}' glove.42B.300d.txt > vocab_glove.txt
    

Download and preprocess dataset

  1. Download, unzip and rename the spider.zip into the directory data.

  2. Merge the data/train_spider.json and data/train_others.json into one single dataset data/train.json.

  3. Preprocess the train and dev dataset, including input normalization, schema linking, graph construction and output actions generation. (Our preprocessed dataset can be downloaded here)

     ./run/run_preprocessing.sh
    

Training

Training LGESQL models with GLOVE, BERT and ELECTRA respectively:

  • msde: mixed static and dynamic embeddings

  • mmc: multi-head multi-view concatenation

    ./run/run_lgesql_glove.sh [mmc|msde]
    ./run/run_lgesql_plm.sh [mmc|msde] bert-large-uncased-whole-word-masking
    ./run/run_lgesql_plm.sh [mmc|msde] electra-large-discriminator
    

Evaluation and submission

  1. Create the directory saved_models, save the trained model and its configuration (at least containing model.bin and params.json) into a new directory under saved_models, e.g. saved_models/electra-msde-75.1/.

  2. For evaluation, see run/run_evaluation.sh and run/run_submission.sh (eval from scratch) for reference.

  3. Model instances and submission scripts are available in codalab:plm and google drive: including submitted BERT and ELECTRA models. Codes and model for GLOVE are deprecated.

Results

Dev and test EXACT MATCH ACC in the official leaderboard, also provided in the results directory:

model dev acc test acc
LGESQL + GLOVE 67.6 62.8
LGESQL + BERT 74.1 68.3
LGESQL + ELECTRA 75.1 72.0

Acknowledgements

We would like to thank Tao Yu, Yusen Zhang and Bo Pang for running evaluations on our submitted models. We are also grateful to the flexible semantic parser TranX that inspires our works.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].