All Projects → rhythmcao → semantic-parsing-dual

rhythmcao / semantic-parsing-dual

Licence: other
Source code and data for ACL 2019 Long Paper ``Semantic Parsing with Dual Learning".

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to semantic-parsing-dual

Stylealign
[ICCV 2019]Aggregation via Separation: Boosting Facial Landmark Detector with Semi-Supervised Style Transition
Stars: ✭ 172 (+911.76%)
Mutual labels:  semi-supervised-learning, data-augmentation
Grand
Source code and dataset of the NeurIPS 2020 paper "Graph Random Neural Network for Semi-Supervised Learning on Graphs"
Stars: ✭ 75 (+341.18%)
Mutual labels:  semi-supervised-learning, data-augmentation
GAug
AAAI'21: Data Augmentation for Graph Neural Networks
Stars: ✭ 139 (+717.65%)
Mutual labels:  data-augmentation
fastai sparse
3D augmentation and transforms of 2D/3D sparse data, such as 3D triangle meshes or point clouds in Euclidean space. Extension of the Fast.ai library to train Sub-manifold Sparse Convolution Networks
Stars: ✭ 46 (+170.59%)
Mutual labels:  data-augmentation
ETCI-2021-Competition-on-Flood-Detection
Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training
Stars: ✭ 102 (+500%)
Mutual labels:  semi-supervised-learning
audio degrader
Audio degradation toolbox in python, with a command-line tool. It is useful to apply controlled degradations to audio: e.g. data augmentation, evaluation in noisy conditions, etc.
Stars: ✭ 40 (+135.29%)
Mutual labels:  data-augmentation
Keras-MultiClass-Image-Classification
Multiclass image classification using Convolutional Neural Network
Stars: ✭ 48 (+182.35%)
Mutual labels:  data-augmentation
generative models
Pytorch implementations of generative models: VQVAE2, AIR, DRAW, InfoGAN, DCGAN, SSVAE
Stars: ✭ 82 (+382.35%)
Mutual labels:  semi-supervised-learning
JCLAL
JCLAL is a general purpose framework developed in Java for Active Learning.
Stars: ✭ 22 (+29.41%)
Mutual labels:  semi-supervised-learning
semi-memory
Tensorflow Implementation on Paper [ECCV2018]Semi-Supervised Deep Learning with Memory
Stars: ✭ 49 (+188.24%)
Mutual labels:  semi-supervised-learning
ssdg-benchmark
Benchmarks for semi-supervised domain generalization.
Stars: ✭ 46 (+170.59%)
Mutual labels:  semi-supervised-learning
EC-GAN
EC-GAN: Low-Sample Classification using Semi-Supervised Algorithms and GANs (AAAI 2021)
Stars: ✭ 29 (+70.59%)
Mutual labels:  semi-supervised-learning
ContextualSP
Multiple paper open-source codes of the Microsoft Research Asia DKI group
Stars: ✭ 224 (+1217.65%)
Mutual labels:  semantic-parsing
pyprophet
PyProphet: Semi-supervised learning and scoring of OpenSWATH results.
Stars: ✭ 23 (+35.29%)
Mutual labels:  semi-supervised-learning
seededlda
Semisupervided LDA for theory-driven text analysis
Stars: ✭ 46 (+170.59%)
Mutual labels:  semi-supervised-learning
TabularSemanticParsing
Translating natural language questions to a structured query language
Stars: ✭ 148 (+770.59%)
Mutual labels:  semantic-parsing
manifold mixup
Tensorflow implementation of the Manifold Mixup machine learning research paper
Stars: ✭ 24 (+41.18%)
Mutual labels:  data-augmentation
ucca-parser
[SemEval'19] Code for "HLT@SUDA at SemEval 2019 Task 1: UCCA Graph Parsing as Constituent Tree Parsing"
Stars: ✭ 18 (+5.88%)
Mutual labels:  semantic-parsing
keras-transform
Library for data augmentation
Stars: ✭ 31 (+82.35%)
Mutual labels:  data-augmentation
Awesome-Few-Shot-Image-Generation
A curated list of papers, code and resources pertaining to few-shot image generation.
Stars: ✭ 209 (+1129.41%)
Mutual labels:  data-augmentation

Semantic Parsing with Dual Learning

This repository contains source code and data for the ACL 2019 Long Paper "Semantic Parsing with Dual Learning".

If you use our framework in your work, please cite it as follows:

    @inproceedings{cao-etal-2019-semantic,
        title = "Semantic Parsing with Dual Learning",
        author = "Cao, Ruisheng  and
          Zhu, Su  and
          Liu, Chen  and
          Li, Jieyu  and
          Yu, Kai",
        booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
        month = jul,
        year = "2019",
        address = "Florence, Italy",
        publisher = "Association for Computational Linguistics",
        url = "https://www.aclweb.org/anthology/P19-1007",
        doi = "10.18653/v1/P19-1007",
        pages = "51--64"
    }

Setup

  • First, create the environment

      conda create -n sp python=3.6
      source activate sp
      pip3 install -r requirements.txt
    
  • Second, pull all the dependencies from remote repository, including evaluator, lib and glove6B word embeddings.

      ./pull_dependency.sh
    
  • Construct vocabulary for all datasets in advance under corresponding directory data, run

      python3 utils/statistics.py
    

Dataset


Experiments are conducted on two semantic parsing dataset ATIS and OVERNIGHT, including traditional train, dev and test files, plus elaborated lexicon files for entity mapping and reverse entity mapping techniques, and extra files for synthesized unlabeled logical forms. An additional ontology file are created for dataset ATIS since there is no evaluator available.


ATIS

Files:

  • atis_train.tsv: training dataset, 4433 samples.
  • atis_dev.tsv: validation dataset, 491 samples.
  • atis_test.tsv: test dataset, 448 samples.
  • atis_extra.tsv: synthesized logical forms (Lambda Calculus), 3797 samples.
  • atis_lexicon.txt: each line specifies a one-to-one mapping between a natural language noun phrase and its corresponding entity representation in knowledge base, such as pair (first class, fist:cl).
  • atis_ontology.txt: specify all the entity types, unary, and binary predicates used in the logical form.

Attention: Since there is no evaluator for this domain, we provide a simple type consistency checker for the target logical form (utils/domain/atis_evaluator.py). atis_train.tsv, aits_dev.tsv and atis_test.tsv are preprocessed version provided by Dong and Lapata (2018), where natural language queries are lowercased and stemmed with NLTK, and entity mentions are replaced by numbered markers. For example:

flight from ci0 to ci1	( lambda $0 e ( and ( flight $0 ) ( from $0 ci0 ) ( to $0 ci1 ) ) )

OVERNIGHT

It contains eight sub-domains in total, namely basketball, blocks, calendar, housing, publications, recipes, restaurants and socialnetwork.

  • [domain]_train.tsv: training and dev dataset. There is no isolate validation dataset in OVERNIGHT. We follow the traditional 80%/20% (train/dev) split in experiments.
  • [domain]_test.tsv: test datset.
  • [domain]_extra.tsv: synthesized logical forms (Lambda DCS). We revise the template rules in SEMPRE to generate new instances.
  • [domain]_lexicon.txt: each line specifies a one-to-one mapping between a natural language noun phrase and its corresponding entity representation in knowledge base, such as pair (kobe bryant, en.player.kobe_bryant ).

Attention: There is also a evaluator program provided by Jia and Liang (2016) in each domain to obtain denotations (utils/domain/domain_overnight.py). Each sample in [domain]_train.tsv and [domain]_test.tsv is of the form:

what player did not play point guard	( call SW.listValue ( call SW.getProperty ( ( lambda s ( call SW.filter ( var s ) ( string position ) ( string ! = ) en.position.point_guard ) ) ( call SW.domain ( string player ) ) ) ( string player ) ) )

Experiments


Semantic Parsing (Supervised|Pretrain)

Refer to script in run/run_semantic_parsing.sh, for example

./run/run_semantic_parsing.sh dataset_name [attn|attnptr] labeled

dataset_name must be in choices [atis, basketball, blocks, calendar, housing, publications, recipes, restaurants, socialnetwork] and labeled denotes the ratio of labeled examples in training set we are going to use.


Question Generation (Supervised|Pretrain)

The procedure is similar to that of Semantic Parsing since we use similar model architecture.

./run/run_question_generation.sh dataset_name [attn|attnptr] labeled

Language Model (Unsupervised|Pretrain)

Language model is used to calculate the validity reward during the closed cycle.

./run/run_language_model.sh dataset_name [question|logical_form]

Pseudo Method (Semi-supervised)

Use pretrained models of Semantic Parsing and Question Generation to generate pseudo samples. Then shuffle these pseudo samples with labeled samples together to train an improved Semantic Parsing and Question Generation Model.

./run/run_pseudo_method.sh dataset_name [attn|attnptr] labeled

Attention: in the script run/run_pseudo_method.sh, read_sp_model_path and read_qg_model_path are paths to the pretrained models(semantic parsing and question generation). labeled and seed should be kept the same for both the pretraining phases and pseudo method. By default, model type (attn/attnptr) is the same for both semantic parsing and question generation models.


Dual Learning (Semi-supervised)

Use pretrained models of semantic parsing, question generation and language models to form two closed cycles with different starting points. Combine dual reinforcement learning algorithm and supervised training together. Running script:

./run/run_dual_learning.sh dataset_name [attn|attnptr] labeled

Attention: in the script run/run_dual_learning.sh, read_sp_model_path, read_qg_model_path, read_qlm_path and read_lflm_path are paths to the pretrained models(semantic parsing, question generation, question language model and logical form language model). labeled and seed should be kept the same for both the pretraining phases and dual learning framework. By default, model type (attn/attnptr) is the same for both semantic parsing and question generation models.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].