All Projects → juntaoy → biaffine-ner

juntaoy / biaffine-ner

Licence: Apache-2.0 license
Named Entity Recognition as Dependency Parsing

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to biaffine-ner

Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (-57.68%)
Mutual labels:  parsing, ner
Vncorenlp
A Vietnamese natural language processing toolkit (NAACL 2018)
Stars: ✭ 354 (+20.82%)
Mutual labels:  parsing, ner
re-typescript
An opinionated attempt at finally solving typescript interop for ReasonML / OCaml.
Stars: ✭ 68 (-76.79%)
Mutual labels:  parsing
pypact
A Python package for parsing FISPACT-II output
Stars: ✭ 19 (-93.52%)
Mutual labels:  parsing
pysub-parser
Library for extracting text and timestamps from multiple subtitle files (.ass, .ssa, .srt, .sub, .txt).
Stars: ✭ 40 (-86.35%)
Mutual labels:  parsing
wasmbin
A self-generating WebAssembly parser & serializer in Rust.
Stars: ✭ 40 (-86.35%)
Mutual labels:  parsing
Ramble
A R parser based on combinatory parsers.
Stars: ✭ 19 (-93.52%)
Mutual labels:  parsing
microformats-ruby
Ruby gem that parse HTML containing microformats/microformats2 and returns Ruby objects, a Ruby hash or a JSON hash
Stars: ✭ 89 (-69.62%)
Mutual labels:  parsing
php-fast-xml-parser
Fast SAX XML parser for PHP.
Stars: ✭ 25 (-91.47%)
Mutual labels:  parsing
genalog
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.
Stars: ✭ 234 (-20.14%)
Mutual labels:  ner
parser-lang
A parser combinator library with declarative superpowers
Stars: ✭ 25 (-91.47%)
Mutual labels:  parsing
ParsecSharp
The faster monadic parser combinator library for C#
Stars: ✭ 23 (-92.15%)
Mutual labels:  parsing
TweebankNLP
[LREC 2022] An off-the-shelf pre-trained Tweet NLP Toolkit (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Tweebank-NER dataset
Stars: ✭ 84 (-71.33%)
Mutual labels:  ner
Time Convert
时间转换工具
Stars: ✭ 32 (-89.08%)
Mutual labels:  ner
metal
A Java library for parsing binary data formats, using declarative descriptions.
Stars: ✭ 13 (-95.56%)
Mutual labels:  parsing
molminer
Python library and command-line tool for extracting compounds from scientific literature. Written in Python.
Stars: ✭ 38 (-87.03%)
Mutual labels:  ner
codeparser
Parse Wolfram Language source code as abstract syntax trees (ASTs) or concrete syntax trees (CSTs)
Stars: ✭ 84 (-71.33%)
Mutual labels:  parsing
scikitcrf NER
Python library for custom entity recognition using Sklearn CRF
Stars: ✭ 17 (-94.2%)
Mutual labels:  ner
twitter-to-rss
Simple python script to parse twitter feed to generate a rss feed.
Stars: ✭ 15 (-94.88%)
Mutual labels:  parsing
FullFIX
A library for parsing FIX (Financial Information eXchange) protocol messages.
Stars: ✭ 60 (-79.52%)
Mutual labels:  parsing

Named Entity Recognition as Dependency Parsing

Introduction

This repository contains code introduced in the following paper:

Named Entity Recognition as Dependency Parsing
Juntao Yu, Bernd Bohnet and Massimo Poesio
In Proceedings of the 58th Annual Conference of the Association for Computational Linguistics (ACL), 2020

Setup Environments

  • The code is written in Python 2 and Tensorflow 1.0, A Python3 and Tensorflow 2.0 version is provided by Amir (see Other Versions).
  • Before starting, you need to install all the required packages listed in the requirment.txt using pip install -r requirements.txt.
  • Then download the BERT models, for English we used the original cased BERT-Large model and for other languages we used the cased BERT-Base multilingual model.
  • After that modify and run extract_bert_features/extract_bert_features.sh to compute the BERT embeddings for your training or testing.
  • You also need to download context-independent word embeddings such as fasttext or GloVe embeddings that required by the system.

To use a pre-trained model

  • Pre-trained models can be download from this link. We provide all nine pre-trained models reported in our paper.

  • Choose the model you want to use and copy them to the logs/ folder.

  • Modifiy the test_path accordingly in the experiments.conf:

    • the test_path is the path to .jsonlines file, each line of the .jsonlines file is a batch of sentences and must in the following format:
    {"doc_key": "batch_01", 
    "ners": [[[0, 0, "PER"], [3, 3, "GPE"], [5, 5, "GPE"]], 
    [[3, 3, "PER"], [10, 14, "ORG"], [20, 20, "GPE"], [20, 25, "GPE"], [22, 22, "GPE"]], 
    []], 
    "sentences": [["Anwar", "arrived", "in", "Shanghai", "from", "Nanjing", "yesterday", "afternoon", "."], 
    ["This", "morning", ",", "Anwar", "attended", "the", "foundation", "laying", "ceremony", "of", "the", "Minhang", "China-Malaysia", "joint-venture", "enterprise", ",", "and", "after", "that", "toured", "Pudong", "'s", "Jingqiao", "export", "processing", "district", "."], 
    ["(", "End", ")"]]}
    
    • Each of the sentences in the batch corresponds to a list of NEs stored under ners key, if some sentences do not contain NEs use an empty list [] instead.
  • Then use python evaluate.py config_name to start your evaluation

To train your own model

  • You will need additionally to create the character vocabulary by using python get_char_vocab.py train.jsonlines dev.jsonlines
  • Then you can start training by using python train.py config_name

Other Versions

  • Amir Zeldes kindly created a tensorflow 2.0 and python 3 ready version and can be find here
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].