All Projects → BLLIP → bllip-parser

BLLIP / bllip-parser

Licence: other
BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.

Projects that are alternatives of or similar to bllip-parser

spaczz
Fuzzy matching and more functionality for spaCy.
Stars: ✭ 215 (-0.92%)
Mutual labels:  nlp-library
ghakuf
A Rust library for parsing/building SMF (Standard MIDI File).
Stars: ✭ 30 (-86.18%)
Mutual labels:  parsing
rsmorphy
Morphological analyzer / inflection engine for Russian and Ukrainian languages rewritten in Rust
Stars: ✭ 27 (-87.56%)
Mutual labels:  nlp-library
schrutepy
The Entire Transcript from the Office in Tidy Format
Stars: ✭ 22 (-89.86%)
Mutual labels:  nlp-library
NFlags
Simple yet powerfull library to made parsing CLI arguments easy. Library also allow to print usage help "out of box".
Stars: ✭ 44 (-79.72%)
Mutual labels:  parsing
Fashion-Clothing-Parsing
FCN, U-Net models implementation in TensorFlow for fashion clothing parsing
Stars: ✭ 29 (-86.64%)
Mutual labels:  parsing
postcss-jsx
PostCSS syntax for parsing CSS in JS literals
Stars: ✭ 73 (-66.36%)
Mutual labels:  parsing
MathExpressions.NET
➗ Library for parsing math expressions with rational numbers, finding their derivatives and compiling an optimal IL code
Stars: ✭ 63 (-70.97%)
Mutual labels:  parsing
Syntax
Write value-driven parsers quickly in Swift with an intuitive SwiftUI-like DSL
Stars: ✭ 134 (-38.25%)
Mutual labels:  parsing
perke
A keyphrase extractor for Persian
Stars: ✭ 60 (-72.35%)
Mutual labels:  computational-linguistics
masci-tools
Tools, utility, parsers useful in daily material science work
Stars: ✭ 18 (-91.71%)
Mutual labels:  parsing
pylangacq
Language Acquisition Research Tools
Stars: ✭ 33 (-84.79%)
Mutual labels:  computational-linguistics
yap
Yet Another (natural language) Parser
Stars: ✭ 40 (-81.57%)
Mutual labels:  computational-linguistics
py-lingualytics
A text analytics library with support for codemixed data
Stars: ✭ 36 (-83.41%)
Mutual labels:  nlp-library
BibleUtilities
Set of utilities to scan, parse, and work with Bible references.
Stars: ✭ 20 (-90.78%)
Mutual labels:  parsing
yellowpages-scraper
Yellowpages.com Web Scraper written in Python and LXML to extract business details available based on a particular category and location.
Stars: ✭ 56 (-74.19%)
Mutual labels:  parsing
quick-csv-streamer
Quick CSV Parser with Java 8 Streams API
Stars: ✭ 29 (-86.64%)
Mutual labels:  parsing
DrawRacket4Me
DrawRacket4Me draws trees and graphs from your code, making it easier to check if the structure is what you wanted.
Stars: ✭ 43 (-80.18%)
Mutual labels:  parsing
yaml.sh
Read YAML files with only Bash
Stars: ✭ 30 (-86.18%)
Mutual labels:  parsing
ppdb
Interface for reading the Paraphrase Database (PPDB)
Stars: ✭ 22 (-89.86%)
Mutual labels:  nlp-library

BLLIP Reranking Parser

https://travis-ci.org/BLLIP/bllip-parser.png?branch=master https://badge.fury.io/py/bllipparser.png

Copyright Mark Johnson, Eugene Charniak, 24th November 2005 --- August 2006

We request acknowledgement in any publications that make use of this software and any code derived from this software. Please report the release date of the software that you are using, as this will enable others to compare their results to yours.

Overview

BLLIP Parser is a statistical natural language parser including a generative constituent parser (first-stage) and discriminative maximum entropy reranker (second-stage). The latest version can be found on GitHub. This document describes basic usage of the command line interface and describes how to build and run the reranking parser. There are now Python and Java interfaces as well. The Python interface is described in README-python.rst.

Compiling the parser

  1. (optional) For optimal speed, you may want to define $GCCFLAGS specifically for your machine. However, this step can be safely skipped as the defaults are usually fine. With csh or tcsh, try something like:

    shell> setenv GCCFLAGS "-march=pentium4 -mfpmath=sse -msse2 -mmmx"
    

    or:

    shell> setenv GCCFLAGS "-march=opteron -m64"
    
  2. Build the parser with:

    shell> make
    
    • Sidenote on compiling on OS X

      OS X uses the clang compiler by default which cannot currently compile the parser. Try setting this environment variable before building to change the default C++ compiler:

      shell> setenv CXX g++
      

      Recent versions of OS X may have additional issues. See issues 60, 19, and 13 for more information.

Obtaining parser models

The GitHub repository includes parsing and reranker models, though these are mostly around for historical purposes. See this page on BLLIP Parser models for information about obtaining newer and more accurate parsing models.

Running the parser

After it has been built, the parser can be run with:

shell> parse.sh <sourcefile.txt>

For example:

shell> parse.sh sample-text/sample-data.txt

The input text must be pre-sentence segmented with each sentence in an <s> tag:

<s> Sentence 1 </s>
<s> Sentence 2 </s>
...

Note that there needs to be a space before and after the sentence.

The parser distribution currently includes a basic Penn Treebank Wall Street Journal parsing models which parse.sh will use by default. The Python interface to the parser includes a mechanism for listing and downloading additional parsing models (some of which are more accurate, depending on what you're parsing).

The script parse-and-fuse.sh demonstrates how to run syntactic parse fusion. Fusion can also be run via the Python bindings.

The script parse-eval.sh takes a list of treebank files as arguments and extracts the terminal strings from them, runs the two-stage parser on those terminal strings and then evaluates the parsing accuracy with Sparseval. For example, if the Penn Treebank 3 is installed at /usr/local/data/Penn3/, the following code evaluates the two-stage parser on section 24:

shell> parse-eval.sh /usr/local/data/Penn3/parsed/mrg/wsj/24/wsj*.mrg

The Makefile will attempt to automatically download and build Sparseval for you if you run make sparseval.

For more information on Sparseval see this paper:

@inproceedings{roark2006sparseval,
    title={SParseval: Evaluation metrics for parsing speech},
    author={Roark, Brian and Harper, Mary and Charniak, Eugene and
            Dorr, Bonnie and Johnson, Mark and Kahn, Jeremy G and
            Liu, Yang and Ostendorf, Mari and Hale, John and
            Krasnyanskaya, Anna and others},
    booktitle={Proceedings of LREC},
    year={2006}
}

We no longer distribute evalb with the parser since it sometimes skips sentences unnecessarily. Sparseval does not have these issues.

More questions?

There is more information about different components of the parser spread across README files in this distribution (see below). BLLIP Parser is maintained by David McClosky.

Parser details

For details on the running the parser, see first-stage/README.rst. For help retraining the parser, see first-stage/TRAIN/README.rst (also includes some information about the parser model file formats).

Reranker details

See second-stage/README for an overview. second-stage/README-retrain.rst details how to retrain the reranker. The second-stage/programs/*/README files include additional notes about different reranker components.

Other versions of the parser

We haven't tested these all of these and can't support them, but they may be useful if you're working on other platforms or languages.

References

Parser and reranker:

Self-training:

Syntactic fusion:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].