All Projects → jkkummerfeld → berkeley-parser-analyser

jkkummerfeld / berkeley-parser-analyser

Licence: ISC license
A tool for classifying mistakes in the output of parsers

Programming Languages

python
139335 projects - #7 most used programming language
HTML
75241 projects

Projects that are alternatives of or similar to berkeley-parser-analyser

es6-template-regex
Regular expression for matching es6 template delimiters in a string.
Stars: ✭ 15 (-55.88%)
Mutual labels:  syntax, parse
postcss-styl
PostCSS parser plugin for converting Stylus syntax to PostCSS AST.
Stars: ✭ 15 (-55.88%)
Mutual labels:  syntax, parse
midi-recorder
🎹 The easiest way to record MIDI. No install. Automatically records.
Stars: ✭ 38 (+11.76%)
Mutual labels:  visualisation, visualizer
Postcss Less
PostCSS Syntax for parsing LESS
Stars: ✭ 93 (+173.53%)
Mutual labels:  syntax, parse
Phpgrep
Syntax-aware grep for PHP code.
Stars: ✭ 185 (+444.12%)
Mutual labels:  syntax
Ecsharp
Home of LoycCore, the LES language of Loyc trees, the Enhanced C# parser, the LeMP macro preprocessor, and the LLLPG parser generator.
Stars: ✭ 141 (+314.71%)
Mutual labels:  syntax
Es.next.syntax.vim
ES.Next syntax for Vim
Stars: ✭ 125 (+267.65%)
Mutual labels:  syntax
Syntax
A VSCode dark theme inspired by Framer’s popular code editor.
Stars: ✭ 123 (+261.76%)
Mutual labels:  syntax
opensource
Collection of Open Source packages by Otherwise
Stars: ✭ 21 (-38.24%)
Mutual labels:  parse
Vim Cpp Modern
Extended Vim syntax highlighting for C and C++ (C++11/14/17/20)
Stars: ✭ 229 (+573.53%)
Mutual labels:  syntax
Lexical syntax analysis
编译原理词法分析器&语法分析器LR(1)实现 C++
Stars: ✭ 173 (+408.82%)
Mutual labels:  syntax
Udify
A single model that parses Universal Dependencies across 75 languages. Given a sentence, jointly predicts part-of-speech tags, morphology tags, lemmas, and dependency trees.
Stars: ✭ 147 (+332.35%)
Mutual labels:  syntax
Ifmt
Inline expression interpolation for Rust.
Stars: ✭ 197 (+479.41%)
Mutual labels:  syntax
Lightscript
JavaScript, with cleaned-up syntax and a few conveniences.
Stars: ✭ 141 (+314.71%)
Mutual labels:  syntax
Chroma
A general purpose syntax highlighter in pure Go
Stars: ✭ 3,013 (+8761.76%)
Mutual labels:  syntax
Sugar Rs
Rust syntax sugar collections.
Stars: ✭ 125 (+267.65%)
Mutual labels:  syntax
Command
A library to build command line applications using PHP
Stars: ✭ 164 (+382.35%)
Mutual labels:  syntax
Clarifyjs
Create and Execute Chained Javascript Methods In Any Order You want
Stars: ✭ 227 (+567.65%)
Mutual labels:  syntax
React Ast
render abstract syntax trees with react
Stars: ✭ 160 (+370.59%)
Mutual labels:  syntax
Es6
ES5 vs ES6 Reference
Stars: ✭ 158 (+364.71%)
Mutual labels:  syntax

This software classifies mistakes in the output of parsers. For a full description of the method, and discussion of results when applied to a range of well known parsers, see:

Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output, Jonathan K. Kummerfeld, David Hall, James R. Curran, and Dan Klein, EMNLP 2012

An Empirical Examination of Challenges in Chinese Parsing, Jonathan K. Kummerfeld, Daniel Tse, James R. Curran, and Dan Klein, ACL (short) 2013

To use the system, download it one of these ways, and run as shown below:

If you use my code in your own work, please cite the following papers (for English and Chinese respectively):

@InProceedings{Kummerfeld-etal:2012:EMNLP,
  author    = {Jonathan K. Kummerfeld  and  David Hall  and  James R. Curran  and  Dan Klein},
  title     = {Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output},
  booktitle = {Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning},
  address   = {Jeju Island, South Korea},
  month     = {July},
  year      = {2012},
  pages     = {1048--1059},
  software  = {https://github.com/jkkummerfeld/berkeley-parser-analyser},
  url       = {http://www.aclweb.org/anthology/D12-1096},
}

@InProceedings{Kummerfeld-etal:2013:ACL,
  author    = {Jonathan K. Kummerfeld  and  Daniel Tse  and  James R. Curran  and  Dan Klein},
  title     = {An Empirical Examination of Challenges in Chinese Parsing},
  booktitle = {Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
  address   = {Sofia, Bulgaria},
  month     = {August},
  year      = {2013},
  pages     = {98--103},
  software  = {https://github.com/jkkummerfeld/berkeley-parser-analyser},
  url       = {http://www.aclweb.org/anthology/P13-2018},
}

Here is an example of system output (red brackets are extra, blue are missing and yellow are crossing):

Image of system terminal output

If you find a bug please submit an issue, and if you have a question please contact me. I am not actively working on this project anymore, but will try to respond to feedback when possible.

Running the System

There are four main programs:

  • classify_english.py, Classify errors in English output
  • classify_chinese.py, Classify errors in Chinese output
  • print_coloured_errors.py, Print errors using colour in a plain text format (red for extra brackets, blue for missing brackets, yellow for crossing brackets, and white for correct brackets)
  • reprint_trees.py, Reprint a set of trees in a different format (e.g. single line or multiline, plain text or latex), edits such as removing traces can also be applied

Running each with no arguments will provide help information. Here are some example commands using the provided sample data:

English errors:
./berkeley_parse_analyser/classify_english.py sample_data/wsj01.mrg sample_data/berkeley.mrg classified.english.berkeley

Coloured errors:
./berkeley_parse_analyser/print_coloured_errors.py sample_data/wsj01.mrg sample_data/berkeley.mrg coloured_errors.english.berkeley

For the error analysis runs the files produced are:

  • classified.berkeley.error_counts - The errors, their occurence, and the number of brackets attributed to them (frequency first, then number of brackets attributed)
  • classified.berkeley.init_errors - A pretty-print presentation of the initial errors (red indicates extra spans, blue indicates missing spans, and yellow are missing spans that cross current spans)
  • classified.berkeley.out - The complete output of the classification, including each step in each path
  • classified.berkeley.log - A log of system notes
  • classified.berkeley.test_trees - The test trees
  • classified.berkeley.gold_trees - The gold trees

For the coloured output it can help to view the files as follows (with -x3 to avoid the trees getting too wide):

less -x3 <filename>

Questions?

Q: How can I view the output files?

All of the output files are plain text. View their contents with tools like less, nano, or vim:

less <filename>

Q: How can I make a bar figure like in the papers?

In LaTeX, define these two lengths:

\setlength\fboxsep{0mm}
\setlength\fboxrule{0.05mm}

Then write this for each box, (it makes a thick horizontal rule, inside a frame box):

\framebox[8mm][l]{\rule{1.3mm}{2mm}}

Defining new commands can make it easier to create a whole lot of boxes:

\newcommand{\mybarheight}{2mm}
\newcommand{\myboxwidth}{8mm}
\newcommand{\mybar}[1]{\framebox[\myboxwidth][l]{\rule{#1mm}{\mybarheight}}}

And then write:

\mybar{1.30}

Q: What do I need to do to see the colours in the output files?

If you are not seeing colours when you look at the output files (e.g. by running less <filename>) it may be because your terminal is not interpreting raw ANSI escape codes. For less this can be changed by passing the -R flag (or -r).

Q: Other questions?

Either open an issue or contact me! See www.jkk.name for my contact info.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].