All Projects → spoddutur → syntaxnet

spoddutur / syntaxnet

Licence: other
Syntaxnet Parsey McParseface wrapper for POS tagging and dependency parsing

Programming Languages

python
139335 projects - #7 most used programming language
C++
36643 projects - #6 most used programming language
Jupyter Notebook
11667 projects
go
31211 projects - #10 most used programming language
CMake
9771 projects
shell
77523 projects

Projects that are alternatives of or similar to syntaxnet

TweebankNLP
[LREC 2022] An off-the-shelf pre-trained Tweet NLP Toolkit (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Tweebank-NER dataset
Stars: ✭ 84 (+9.09%)
Mutual labels:  dependency-parser, pos-tagging
udar
UDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.
Stars: ✭ 15 (-80.52%)
Mutual labels:  dependency-parser, pos-tagging
datalinguist
Stanford CoreNLP in idiomatic Clojure.
Stars: ✭ 93 (+20.78%)
Mutual labels:  dependency-parser, pos-tagging
Hanlp
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
Stars: ✭ 24,626 (+31881.82%)
Mutual labels:  dependency-parser, pos-tagging
Pyhanlp
中文分词 词性标注 命名实体识别 依存句法分析 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁 自然语言处理
Stars: ✭ 2,564 (+3229.87%)
Mutual labels:  dependency-parser
dpar
Neural network transition-based dependency parser (in Rust)
Stars: ✭ 41 (-46.75%)
Mutual labels:  dependency-parser
frog
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
Stars: ✭ 70 (-9.09%)
Mutual labels:  dependency-parser
syntaxnet-api
A small HTTP API for SyntaxNet
Stars: ✭ 20 (-74.03%)
Mutual labels:  syntaxnet
yap
Yet Another (natural language) Parser
Stars: ✭ 40 (-48.05%)
Mutual labels:  dependency-parser
sinling
A collection of NLP tools for Sinhalese (සිංහල).
Stars: ✭ 38 (-50.65%)
Mutual labels:  pos-tagging
dependency parsing tf
Tensorflow implementation of "A Fast and Accurate Dependency Parser using Neural Networks"
Stars: ✭ 77 (+0%)
Mutual labels:  dependency-parser
Paribhasha
paribhasha.herokuapp.com/
Stars: ✭ 21 (-72.73%)
Mutual labels:  pos-tagging
ipymarkup
NER, syntax markup visualizations
Stars: ✭ 108 (+40.26%)
Mutual labels:  dependency-parser
vk-api
VK SDK | VKontakte wrapper for standalone apps
Stars: ✭ 30 (-61.04%)
Mutual labels:  wrapper-api
SynThai
Thai Word Segmentation and Part-of-Speech Tagging with Deep Learning
Stars: ✭ 41 (-46.75%)
Mutual labels:  pos-tagging
SyntaxNet-Tensorflow
Minimal Tensorflow Docker image with SyntaxNet/DRAGNN based on Alpine linux
Stars: ✭ 35 (-54.55%)
Mutual labels:  syntaxnet
stanford-corenlp-docker
build/run the most current Stanford CoreNLP server in a docker container
Stars: ✭ 38 (-50.65%)
Mutual labels:  dependency-parser
py3cw
Unofficial wrapper for the 3Commas API written in Python
Stars: ✭ 88 (+14.29%)
Mutual labels:  wrapper-api
ruby
Official Ruby client library for IPinfo API (IP geolocation and other types of IP data)
Stars: ✭ 42 (-45.45%)
Mutual labels:  wrapper-api
java-binance-api
Java Binance API Client
Stars: ✭ 72 (-6.49%)
Mutual labels:  wrapper-api

Syntaxnet Parsey McParseface Python Wrapper for DependencyParsing

Note: This syntaxnet build contains The Great Models Move change.

1. Introduction

When Google declared that The World’s Most Accurate Parser i.e., SyntaxNet goes open-source, it grabbed widespread attention from machine-learning developers and researchers who were interested in core applications of NLU like automatic extraction of information, translation etc. Following gif shows how syntaxnet internally builds the dependency tree:

2. Troubles of the world's best parser SyntaxNet

Predominantly one will find two approaches to use SyntaxNet:

  1. Using demo.sh script provided by syntaxnet
  2. Invoke the same from python as a subprocess as shown below. This approach is obviously inefficient, non-scalable and over-kill as it internally calls other python scripts.
import subprocess
import os
os.chdir(r"../models/syntaxnet")
subprocess.call([    
"echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh"
], shell = True)
+ I wanted a proper scalable python application where one can do `import syntaxnet` 
+ and use it as shown below:
import syntaxnet
from syntaxnet import gen_parser_ops...

+ I could manage to get this done and hence sharing my project here. Please find below as to how I got this!!

2.1 Other Pain Part - Syntaxnet is a RESEARCH MODEL:


  • After The Great Models Move, Tensorflow categorized SyntaxNet as RESEARCH MODEL.
  • As mentioned here, Tensorflow team will no more provide guaranteed support to SyntaxNet and they encouraged Individual researchers to support research models.

2.2 Salt on the wound:


Apart from having high struggles in installation and huge learning curve, no official support and lack of clear documentation led forums talking about myraid of issues on SyntaxNet without proper solutions. Some of them were as basic as:

  • A lot of trouble understanding documentation around both syntaxnet and related tools
  • How to use Parsey McParseface model in python application
  • Confusing I/O handling in SyntaxNet because of the uncommon .conll file format it uses for input and output.
  • How to use/export the output (ascii tree or conll ) in a format that is easy to parse

3. What this project does?

This endevour addresses to make the life of SyntaxNet enthusiasts easier. It primarily saves all those hours to get Google's SyntaxNet Parsey McParseface up and running in a way it should be. For this, am providing two things as part of this project:

  1. One line (~5mins) SyntaxNet 0.2 installation
  2. Syntaxnet Parsey McParseface wrapper for POS tagging and dependency parsing

3.1 One line (~5mins) SyntaxNet 0.2 installation

Iam sharing the osx syntaxnet package distribution i.e., syntaxnet-0.2-cp27-cp27m-macosx_10_6_intel.whl file in this git repo that I've got successfully built using bazel build tool with all tests passing after pulling the latest code from syntaxnet git repository. This will setup syntaxnet 0.2 version with a simple command in barely 5 minutes as shown below:

git clone https://github.com/spoddutur/syntaxnet.git
cd <CLONED_SYNTAXNET_PROJ_DIR>
sudo pip install syntaxnet-0.2-cp27-cp27m-macosx_10_6_intel.whl
Tech Stack:

3.2 Syntaxnet Parsey McParseface wrapper for POS tagging and Dependency parsing

Here comes the most interesting (a.k.a challenging) part i.e., How to use syntaxnet in a python application. It should no more be of any trouble after this point :)

my_parser_eval.py is the file that contains the python-wrapper which I implemented to wrap SyntaxNet. The list of API's exposed in this wrapper are listed below:

1. Api to initialise parser: 
`tagger = my_parser_eval.SyntaxNetProcess("brain_tagger")`
("brain_tagger" will initialise pos tagger. change it to "brain_parser" for dependency parsing)

2. Api to input data to parser: 
`my_parser_eval._write_input("<YOUR_ENGLISH_SENTENCE_INPUT>")`

3. Api to invoke parser: 
`tagger.eval()`

3. Api to read parser's output in conll format:
`my_parser_eval._read_output()`

4. Api to pretty print parser's output as tree: 
`my_parser_eval.pretty_print()`

4. Demo

  • I wrote main.py (a sample python code) to demo this wrapper. It performs syntaxnet's dependency parsing.
  • Input to main.py: English sentence text
  • Output from main.py: Dependency graph tree

5. How to run the parser:

1. git clone https://github.com/spoddutur/syntaxnet.git
2. cd <syntaxnet-git-clone-directory>
3. python main.py 
4. That's it!!  It prints syntaxnet dependency parser output for given input english sentence

5.1 Sample output for “Bob brought the pizza to Alice” input

6. Project Structure:

  • /models: Originally cloned from syntaxnet git repository https://github.com/tensorflow/models . But this folder will additionally contain the bazel build “bazel-bin" folder with the needed runfiles.
  • custom_context.pbtxt: Custom context file used in setting context for parser.
  • my_parser_eval.py: python wrapper for “brain-tagger” POS tagger and “brain-parser” dependency parser. This file is heavily inspired from the original parser_eval.py that syntaxnet provides with quiet some modifications aand enhancements.
  • main.py: Demo sample usage
  • /data: folder where parser’s intermediate input’s and output’s are dumped.
  • .whl: osx package distribution of the final successful syntaxnet built using which you can setup syntaxnet 0.2 version in barely 5 minutes

7. References:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].