rhythmcao / text2sql-lgesql

Licence: Apache-2.0 license

This is the project containing source codes and pre-trained models about ACL2021 Long Paper ``LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations".

Programming Languages

python

139335 projects - #7 most used programming language

shell

77523 projects

Projects that are alternatives of or similar to text2sql-lgesql

TabularSemanticParsing

Translating natural language questions to a structured query language

Stars: ✭ 148 (+117.65%)

Mutual labels: semantic-parsing, natural-language-interface, text-to-sql

ContextualSP

Multiple paper open-source codes of the Microsoft Research Asia DKI group

Stars: ✭ 224 (+229.41%)

Mutual labels: semantic-parsing, text-to-sql

spring

SPRING is a seq2seq model for Text-to-AMR and AMR-to-Text (AAAI2021).

Stars: ✭ 103 (+51.47%)

Mutual labels: semantic-parsing

cognipy

In-memory Graph Database and Knowledge Graph with Natural Language Interface, compatible with Pandas

Stars: ✭ 31 (-54.41%)

Mutual labels: natural-language-interface

parse seq2seq

A tensorflow implementation of neural sequence-to-sequence parser for converting natural language queries to logical form.

Stars: ✭ 26 (-61.76%)

Mutual labels: semantic-parsing

NLIDB

Natural Language Interface to DataBases

Stars: ✭ 100 (+47.06%)

Mutual labels: natural-language-interface

Gumbel-CRF

Implementation of NeurIPS 20 paper: Latent Template Induction with Gumbel-CRFs

Stars: ✭ 51 (-25%)

Mutual labels: structured-prediction

Compositional-Generalization-in-Natural-Language-Processing

Compositional Generalization in Natual Language Processing. A roadmap.

Stars: ✭ 26 (-61.76%)

Mutual labels: semantic-parsing

nl4dv

A python toolkit to create Visualizations (Vis) using natural language (NL) or add an NL interface to existing Vis.

Stars: ✭ 63 (-7.35%)

Mutual labels: natural-language-interface

sede

Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data

Stars: ✭ 83 (+22.06%)

Mutual labels: semantic-parsing

zmsp

The Mingled Structured Predictor

Stars: ✭ 20 (-70.59%)

Mutual labels: structured-prediction

gap-text2sql

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

Stars: ✭ 83 (+22.06%)

Mutual labels: semantic-parsing

SIAN

Code and data for ECML-PKDD paper "Social Influence Attentive Neural Network for Friend-Enhanced Recommendation"

Stars: ✭ 25 (-63.24%)

Mutual labels: heterogeneous-graph-neural-network

semeval22 structured sentiment

SemEval-2022 Shared Task 10: Structured Sentiment Analysis

Stars: ✭ 67 (-1.47%)

Mutual labels: structured-prediction

r2sql

🌶️ R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

Stars: ✭ 60 (-11.76%)

Mutual labels: semantic-parsing

Hanlp

中文分词词性标注命名实体识别依存句法分析成分句法分析语义依存分析语义角色标注指代消解风格转换语义相似度新词发现关键词短语提取自动摘要文本分类聚类拼音简繁转换自然语言处理

Stars: ✭ 24,626 (+36114.71%)

Mutual labels: semantic-parsing

lang2logic-PyTorch

PyTorch port of the paper "Language to Logical Form with Neural Attention"

Stars: ✭ 34 (-50%)

Mutual labels: semantic-parsing

weak-supervised-Rule-Text2SQL

Using Database Rule for Weak Supervised Text-to-SQL Generation

Stars: ✭ 13 (-80.88%)

Mutual labels: semantic-parsing

semantic-parsing-dual

Source code and data for ACL 2019 Long Paper ``Semantic Parsing with Dual Learning".

Stars: ✭ 17 (-75%)

Mutual labels: semantic-parsing

ucca-parser

[SemEval'19] Code for "HLT@SUDA at SemEval 2019 Task 1: UCCA Graph Parsing as Constituent Tree Parsing"

Stars: ✭ 18 (-73.53%)

Mutual labels: semantic-parsing

View All Similar Projects ➔

LGESQL

This is the project containing source code for the paper LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations in ACL 2021 main conference. If you find it useful, please cite our work.

    @inproceedings{cao-etal-2021-lgesql,
            title = "{LGESQL}: Line Graph Enhanced Text-to-{SQL} Model with Mixed Local and Non-Local Relations",
            author = "Cao, Ruisheng  and
            Chen, Lu  and
            Chen, Zhi  and
            Zhao, Yanbin  and
            Zhu, Su  and
            Yu, Kai",
            booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
            month = aug,
            year = "2021",
            address = "Online",
            publisher = "Association for Computational Linguistics",
            url = "https://aclanthology.org/2021.acl-long.198",
            doi = "10.18653/v1/2021.acl-long.198",
            pages = "2541--2555",
    }

Create environment and download dependencies

The following commands are provided in setup.sh.

Firstly, create conda environment text2sql:

In our experiments, we use torch==1.6.0 and dgl==0.5.3 with CUDA version 10.1

We use one GeForce RTX 2080 Ti for GLOVE and base-series pre-trained language model~(PLM) experiments, one Tesla V100-PCIE-32GB for large-series PLM experiments

conda create -n text2sql python=3.6
source activate text2sql
pip install torch==1.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt

Next, download dependencies:

 python -c "import stanza; stanza.download('en')"
 python -c "from embeddings import GloveEmbedding; emb = GloveEmbedding('common_crawl_48', d_emb=300)"
 python -c "import nltk; nltk.download('stopwords')"

Download pre-trained language models from Hugging Face Model Hub, such as bert-large-whole-word-masking and electra-large-discriminator, into the pretrained_models directory. The vocab file for glove.42B.300d is also pulled: (please ensure that Git LFS is installed)

 mkdir -p pretrained_models && cd pretrained_models
 git lfs install
 git clone https://huggingface.co/bert-large-uncased-whole-word-masking
 git clone https://huggingface.co/google/electra-large-discriminator
 mkdir -p glove.42b.300d && cd glove.42b.300d
 wget -c http://nlp.stanford.edu/data/glove.42B.300d.zip && unzip glove.42B.300d.zip
 awk -v FS=' ' '{print $1}' glove.42B.300d.txt > vocab_glove.txt

Download and preprocess dataset

Download, unzip and rename the spider.zip into the directory data.
Merge the data/train_spider.json and data/train_others.json into one single dataset data/train.json.
Preprocess the train and dev dataset, including input normalization, schema linking, graph construction and output actions generation. (Our preprocessed dataset can be downloaded here)
```
 ./run/run_preprocessing.sh
```

Training

Training LGESQL models with GLOVE, BERT and ELECTRA respectively:

msde: mixed static and dynamic embeddings

mmc: multi-head multi-view concatenation

./run/run_lgesql_glove.sh [mmc|msde]
./run/run_lgesql_plm.sh [mmc|msde] bert-large-uncased-whole-word-masking
./run/run_lgesql_plm.sh [mmc|msde] electra-large-discriminator

Evaluation and submission

Create the directory saved_models, save the trained model and its configuration (at least containing model.bin and params.json) into a new directory under saved_models, e.g. saved_models/electra-msde-75.1/.
For evaluation, see run/run_evaluation.sh and run/run_submission.sh (eval from scratch) for reference.
Model instances and submission scripts are available in codalab:plm and google drive: including submitted BERT and ELECTRA models. Codes and model for GLOVE are deprecated.

Results

Dev and test EXACT MATCH ACC in the official leaderboard, also provided in the results directory:

model	dev acc	test acc
LGESQL + GLOVE	67.6	62.8
LGESQL + BERT	74.1	68.3
LGESQL + ELECTRA	75.1	72.0

Acknowledgements

We would like to thank Tao Yu, Yusen Zhang and Bo Pang for running evaluations on our submitted models. We are also grateful to the flexible semantic parser TranX that inspires our works.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

rhythmcao / text2sql-lgesql

Programming Languages

Labels

Projects that are alternatives of or similar to text2sql-lgesql

LGESQL

Create environment and download dependencies

Download and preprocess dataset

Training

Evaluation and submission

Results

Acknowledgements