All Projects → princeton-vl → attach-juxtapose-parser

princeton-vl / attach-juxtapose-parser

Licence: BSD-2-Clause license
Code for the paper "Strongly Incremental Constituency Parsing with Graph Neural Networks"

Programming Languages

python
139335 projects - #7 most used programming language
c
50402 projects - #5 most used programming language
Scilab
70 projects

Projects that are alternatives of or similar to attach-juxtapose-parser

Plotty
C language compiler from scratch for a custom architecture, with virtual machine and all
Stars: ✭ 33 (+32%)
Mutual labels:  parsing
libwifi
An 802.11 Frame Generation and Parsing Library in C
Stars: ✭ 27 (+8%)
Mutual labels:  parsing
Concrete-Syntax-Tree
Concrete Syntax Trees represent s-expressions with source information
Stars: ✭ 48 (+92%)
Mutual labels:  parsing
slash-command
Simple slash command parsing.
Stars: ✭ 15 (-40%)
Mutual labels:  parsing
episode-parser
A javascript utility for parsing file names in a format that sometimes is used for tv shows.
Stars: ✭ 24 (-4%)
Mutual labels:  parsing
Compiler-written-in-Haskell
A Turing complete language 😉
Stars: ✭ 31 (+24%)
Mutual labels:  parsing
libvcs
⚙️ Lite, typed, pythonic utilities for git, svn, mercurial, etc.
Stars: ✭ 43 (+72%)
Mutual labels:  parsing
memology
Memes - why so popular?
Stars: ✭ 32 (+28%)
Mutual labels:  parsing
MP4Parse
C++ library for MP4 file parsing.
Stars: ✭ 55 (+120%)
Mutual labels:  parsing
librxvm
non-backtracking NFA-based regular expression library, for C and Python
Stars: ✭ 57 (+128%)
Mutual labels:  parsing
eeg-gcnn
Resources for the paper titled "EEG-GCNN: Augmenting Electroencephalogram-based Neurological Disease Diagnosis using a Domain-guided Graph Convolutional Neural Network". Accepted for publication (with an oral spotlight!) at ML4H Workshop, NeurIPS 2020.
Stars: ✭ 50 (+100%)
Mutual labels:  neurips-2020
SwiftTreeSitter
Swift wrappers for the tree-sitter incremental parsing system
Stars: ✭ 116 (+364%)
Mutual labels:  parsing
Singulink.IO.FileSystem
Reliable cross-platform strongly-typed file/directory path manipulation and file system access in .NET.
Stars: ✭ 16 (-36%)
Mutual labels:  parsing
GreynirPackage
The Greynir NLP parser for Icelandic, packaged for PyPI
Stars: ✭ 49 (+96%)
Mutual labels:  parsing
cvscan
Your not so typical resume parser
Stars: ✭ 46 (+84%)
Mutual labels:  parsing
angel.co-companies-list-scraping
No description or website provided.
Stars: ✭ 54 (+116%)
Mutual labels:  parsing
kataw
An 100% spec compliant ES2022 JavaScript toolchain
Stars: ✭ 303 (+1112%)
Mutual labels:  parsing
http-accept
Parse Accept and Accept-Language HTTP headers in Ruby.
Stars: ✭ 69 (+176%)
Mutual labels:  parsing
wrangler
Wrangler Transform: A DMD system for transforming Big Data
Stars: ✭ 63 (+152%)
Mutual labels:  parsing
pFedMe
Personalized Federated Learning with Moreau Envelopes (pFedMe) using Pytorch (NeurIPS 2020)
Stars: ✭ 196 (+684%)
Mutual labels:  neurips-2020

Strongly Incremental Constituency Parsing with Graph Neural Networks

Example actions

Code for the paper:

Strongly Incremental Constituency Parsing with Graph Neural Networks
Kaiyu Yang and Jia Deng
Neural Information Processing Systems (NeurIPS) 2020

@inproceedings{yang2020attachjuxtapose,
  title={Strongly Incremental Constituency Parsing with Graph Neural Networks},
  author={Yang, Kaiyu and Deng, Jia},
  booktitle={Neural Information Processing Systems (NeurIPS)},
  year={2020}
}

Requirements

  1. Make sure your gcc version is at least 5 (gcc --version). I encountered segmentation faults with gcc 4.8.5. But if it works for you, it's probably fine.
  2. Download and install Miniconda Python 3 (Anaconda should also work).
  3. cd into the root of this repo.
  4. Edit parser.yaml according to your system. For example, remove - cudatoolkit=10.2 if you don't have a GPU. Change the version of cudatoolkit if necessary.
  5. Install Python dependencies using conda: conda env create -f parser.yaml && conda activate parser. If you have troubles with the aforementioned two steps, you may manually install the packages in parser.yaml in whatever way that works for you.
  6. Instatll PyTorch Geometric following the instructions.
  7. Compile the Evalb program used for evaluation: cd EVALB && make && cd ..

Data

We include the preprocessed PTB and CTB data in the data directory. No additional data needs to be downloaded. For PTB, we use exactly the same data files as self-attentive-parser. For CTB, the data files are obtained following distance-parser, which is also adopted by HPSG-Neural-Parser. It basically selects a subset of CTB 8.0 that corresponds to CTB 5.1.

Training

Use train.py for training models. By default, python train.py trains the parser on PTB using XLNet encoder and graph decoder. It saves training logs and model checkpoints to ./runs/default. We use hydra to manage command-line arguments. Please refer to conf/train.yaml for the complete list of them. Below are some examples:

  • To save results to ./runs/EXPID, where EXPID is an arbitrary experiment identifier:
python train.py exp_id=EXPID
  • To use BERT instead of XLNet
python train.py model=ptb_bert_graph
  • To train on CTB using Chinese BERT as the encoder:
python train.py dataset=ctb model=ctb_bert_graph

Results and Pre-trained Models

We provide hyperparameters, training logs and pre-trained models for reproducing our main results (Table 1 and Table 2 in the paper). In the paper, we ran each experiment 5 times with beam search and reported the mean and its standard errors (SEM). Whereas the numbers below are results of 1 run without beam search.

Constituency parsing on PTB

Model EM F1 LP LR Hyperparameters Training log Pre-trained model
Ours (BERT) 57.41 95.80 96.01 95.59 ptb_bert_graph.yaml ptb_bert_graph.txt ptb_bert_graph.pth
Ours (XLNet) 59.48 96.44 96.64 96.24 ptb_xlnet_graph.yaml ptb_xlnet_graph.txt ptb_xlnet_graph.pth

Constituency parsing on CTB

Model EM F1 LP LR Hyperparameters Training log Pre-trained model
Ours (BERT) 49.43 93.52 93.66 93.38 ctb_bert_graph.yaml ctb_bert_graph.txt ctb_bert_graph.pth

Evaluation

To evaluate a model checkpoint on PTB:

python test.py model_path=PATH_TO_MODEL dataset=ptb

PATH_TO_MODEL is the path to the *.pth file generated by the training script or downloaded from our pre-trained models.

To evaluate on CTB:

python test.py model_path=PATH_TO_MODEL dataset=ctb

To evaluate with beam search:

python test.py model_path=PATH_TO_MODEL dataset=ptb/ctb beam_size=10

Please refer to conf/test.yaml for the complete list of command-line arguments.

Automatic Mixed Precision (AMP) Support

The evaluation script has amp enabled by default. In our experiments, amp speeds up the evaluation when using GTX 2080 Ti or Quadro RTX 6000, but it makes no difference when using GTX 1080 Ti. You may have to disable it when comparing speed with prior works without amp support.

python test.py model_path=PATH_TO_MODEL amp=false

GPU memory

We use a batch size of 150 during evaluation to fit our 11 GB GPU memory. Feel free to change it according to your hardware.

python test.py model_path=PATH_TO_MODEL eval_batch_size=XXX

Parsing User-Provided Texts

You can use the attach-juxtapose parser to parse your own sentences. First, download the spaCy models used for tokenization and POS tagging:

python -m spacy download en_core_web_sm
python -m spacy download zh_core_web_sm

Then, store the sentences in a text file, one sentence per line. See input_examples.txt and input_examples_chinese.txt for examples.
Finally, run the parser from a model checkpoint PATH_TO_MODEL, saving the parse trees to a output file, e.g., output.txt or output_chinese.txt:

python parse.py model_path=PATH_TO_MODEL input=input_examples.txt output=output.txt
python parse.py language=chinese model_path=PATH_TO_MODEL input=input_examples_chinese.txt output=output_chinese.txt

Static Type Checking

The codebase uses Python 3 type hints extensively. We use mypy for static type checking. Run mypy to typecheck the entire codebase. mypy.ini is the configuration file for mypy.

Credits

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].