All Projects → salesforce → TabularSemanticParsing

salesforce / TabularSemanticParsing

Licence: BSD-3-Clause license
Translating natural language questions to a structured query language

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to TabularSemanticParsing

text2sql-lgesql
This is the project containing source codes and pre-trained models about ACL2021 Long Paper ``LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations".
Stars: ✭ 68 (-54.05%)
Mutual labels:  semantic-parsing, natural-language-interface, text-to-sql
ContextualSP
Multiple paper open-source codes of the Microsoft Research Asia DKI group
Stars: ✭ 224 (+51.35%)
Mutual labels:  semantic-parsing, text-to-sql
weak-supervised-Rule-Text2SQL
Using Database Rule for Weak Supervised Text-to-SQL Generation
Stars: ✭ 13 (-91.22%)
Mutual labels:  semantic-parsing
parse seq2seq
A tensorflow implementation of neural sequence-to-sequence parser for converting natural language queries to logical form.
Stars: ✭ 26 (-82.43%)
Mutual labels:  semantic-parsing
spring
SPRING is a seq2seq model for Text-to-AMR and AMR-to-Text (AAAI2021).
Stars: ✭ 103 (-30.41%)
Mutual labels:  semantic-parsing
Question-Answering
Question Answering over Knowledge Bases
Stars: ✭ 24 (-83.78%)
Mutual labels:  semantic-parsing
sparklis
Sparklis is a query builder in natural language that allows people to explore and query SPARQL endpoints with all the power of SPARQL and without any knowledge of SPARQL.
Stars: ✭ 28 (-81.08%)
Mutual labels:  natural-language-interface
nli-go
Natural Language Interface in GO, a semantic parser and execution engine.
Stars: ✭ 20 (-86.49%)
Mutual labels:  natural-language-interface
ucca-parser
[SemEval'19] Code for "HLT@SUDA at SemEval 2019 Task 1: UCCA Graph Parsing as Constituent Tree Parsing"
Stars: ✭ 18 (-87.84%)
Mutual labels:  semantic-parsing
Hanlp
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
Stars: ✭ 24,626 (+16539.19%)
Mutual labels:  semantic-parsing
sede
Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data
Stars: ✭ 83 (-43.92%)
Mutual labels:  semantic-parsing
Compositional-Generalization-in-Natural-Language-Processing
Compositional Generalization in Natual Language Processing. A roadmap.
Stars: ✭ 26 (-82.43%)
Mutual labels:  semantic-parsing
WikiTableQuestions
A dataset of complex questions on semi-structured Wikipedia tables
Stars: ✭ 81 (-45.27%)
Mutual labels:  semantic-parsing
r2sql
🌶️ R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)
Stars: ✭ 60 (-59.46%)
Mutual labels:  semantic-parsing
lang2logic-PyTorch
PyTorch port of the paper "Language to Logical Form with Neural Attention"
Stars: ✭ 34 (-77.03%)
Mutual labels:  semantic-parsing
semantic-parsing-dual
Source code and data for ACL 2019 Long Paper ``Semantic Parsing with Dual Learning".
Stars: ✭ 17 (-88.51%)
Mutual labels:  semantic-parsing
NLIDB
Natural Language Interface to DataBases
Stars: ✭ 100 (-32.43%)
Mutual labels:  natural-language-interface
cognipy
In-memory Graph Database and Knowledge Graph with Natural Language Interface, compatible with Pandas
Stars: ✭ 31 (-79.05%)
Mutual labels:  natural-language-interface
flowsense
FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System
Stars: ✭ 40 (-72.97%)
Mutual labels:  semantic-parsing
gap-text2sql
GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training
Stars: ✭ 83 (-43.92%)
Mutual labels:  semantic-parsing

Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing

This is the official code release of the following paper:

Xi Victoria Lin, Richard Socher and Caiming Xiong. Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing. Findings of EMNLP 2020.

Overview

Cross-domain tabular semantic parsing (X-TSP) is the task of predicting the executable structured query language given a natural language question issued to some database. The model may or may not have seen the target database during training.

This library implements

  • A strong sequence-to-sequence based cross-domain text-to-SQL semantic parser that achieved state-of-the-art performance on two widely used benchmark datasets: Spider and WikiSQL.
  • A set of SQL processing tools for parsing, tokenizing and validating SQL queries, adapted from the Moz SQL Parser.

The parser can be adapted to learn mappings from text to other structured query languages such as SOQL by modifying the formal langauge pre-processing and post-processing modules.

Model

BRIDGE architecture

Our model takes a natural language utterance and a database (schema + field picklists) as input, and generates SQL queries as token sequences. We apply schema-guided decoding and post-processing to make sure the final output is executable.

  • Preprocessing: We concatenate the serialized database schema with the utterance to form a tagged sequence. A fuzzy string matching algorithm is used to identify picklist items mentioned in the utterance. The mentioned picklist items are appended to the corresponding field name in the tagged sequence.
  • Translating: The hybrid sequence is passed through the BRIDGE model, which output raw program sequences with probability scores via beam search.
  • Postprocessing: The raw program sequences are passed through a SQL checker, which verifies its syntactical correctness and schema consistency. Sequences that failed to pass the checker are discarded from the output.

Quick Start

Install Dependencies

Our implementation has been tested using Pytorch 1.7 and Cuda 11.0 with a single GPU.

git clone https://github.com/salesforce/TabularSemanticParsing
cd TabularSemanticParsing

pip install torch torchvision
python3 -m pip install -r requirements.txt

Set up Environment

export PYTHONPATH=`pwd` && python -m nltk.downloader punkt

Process Data

Spider

Download the official data release and unzip the folder. Manually merge spider/train_spider.json with spider/train_others.json into a single file spider/train.json.

mv spider data/ 

# Data Repair (more details in section 4.3 of paper)
python3 data/spider/scripts/amend_missing_foreign_keys.py data/spider

./experiment-bridge.sh configs/bridge/spider-bridge-bert-large.sh --process_data 0

WikiSQL

Download the official data release.

wget https://github.com/salesforce/WikiSQL/raw/master/data.tar.bz2
tar xf data.tar.bz2 -C data && mv data/data data/wikisql1.1
./experiment-bridge.sh configs/bridge/wikisql-bridge-bert-large.sh --process_data 0

The processed data will be stored in a separate pickle file.

Train

Train the model using the following commands. The checkpoint of the best model will be stored in a directory specified by the hyperparameters in the configuration file.

Spider

./experiment-bridge.sh configs/bridge/spider-bridge-bert-large.sh --train 0

WikiSQL

./experiment-bridge.sh configs/bridge/wikisql-bridge-bert-large.sh --train 0

Inference

Decode SQL predictions from pre-trained models. The following commands run inference with the checkpoints stored in the directory specified by the hyperparameters in the configuration file.

Spider

./experiment-bridge.sh configs/bridge/spider-bridge-bert-large.sh --inference 0

WikiSQL

./experiment-bridge.sh configs/bridge/wikisql-bridge-bert-large.sh --inference 0

Note:

  1. Add the --test flag to the above commands to obtain the test set evaluation results on the corresponding dataset. This flag is invalid for Spider, as its test set is hidden.
  2. Add the --checkpoint_path [path_to_checkpoint_tar_file] flag to decode using a checkpoint that's not stored in the default location.
  3. Evaluation metrics will be printed out at the end of decoding. The WikiSQL evaluation takes some time because it computes execution accuracy.

Inference with Model Ensemble

To decode with model ensemble, first list the checkpoint directories of the individual models in the ensemble model configuration file, then run the following command(s).

Spider

./experiment-bridge.sh configs/bridge/spider-bridge-bert-large.sh --ensemble_inference 0

Commandline Demo

You can interact with a pre-trained checkpoint through the commandline using the following commands:

Spider

./experiment-bridge.sh configs/bridge/spider-bridge-bert-large.sh --demo 0 --demo_db [db_name] --checkpoint_path [path_to_checkpoint_tar_file]

Hyperparameter Changes

To change the hyperparameters and other experiment set up, start from the configuration files.

Pre-trained Checkpoints

Spider

Download pre-trained checkpoints here:

URL E-SM EXE
https://drive.google.com/file/d/1dlrUdGMLvvvfR3kNVy76H12rR7gr4DXI/view?usp=sharing 70.1 68.2
mv bridge-spider-bert-large-ems-70-1-exe-68-2.tar.gz model
gunzip model/bridge-spider-bert-large-ems-70-1-exe-68-2.tar.gz

Download cached SQL execution order to normal order mappings:

URL
https://drive.google.com/file/d/1vk14iR4V_f5x4e17MAaL_L8T9wgjcKCy/view?usp=sharing

Why this cache? The overhead of converting thousands of SQL queries from execution order to normal order is large, so we cached the conversion for Spider dev set in our experiments. Without using the cache inference on the dev set will be slow. The model still runs fast for individual queries without using a cache.

mv dev.eo.pred.restored.pkl.gz data/spider
gunzip data/spider/dev.eo.pred.restored.pkl.gz

Citation

If you find the resource in this repository helpful, please cite

@inproceedings{LinRX2020:BRIDGE, 
  author = {Xi Victoria Lin and Richard Socher and Caiming Xiong}, 
  title = {Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing}, 
  booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural
               Language Processing: Findings, {EMNLP} 2020, November 16-20, 2020},
  year = {2020} 
}

Related Links

The parser has been integrated in the Photon web demo: http://naturalsql.com/. Please visit our website to test it live and try it on your own databases!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].