All Projects → qurator-spk → neat

qurator-spk / neat

Licence: Apache-2.0 License
Named entity annotation tool

Programming Languages

javascript
184084 projects - #8 most used programming language
HTML
75241 projects

Projects that are alternatives of or similar to neat

dinglehopper
An OCR evaluation tool
Stars: ✭ 38 (+80.95%)
Mutual labels:  qurator
simple NER
simple rule based named entity recognition
Stars: ✭ 29 (+38.1%)
Mutual labels:  annotation-tool
Grid-Anchor-based-Image-Cropping-Pytorch
Compatible with Python3 & PyTorch 1.0+ on Ubuntu
Stars: ✭ 47 (+123.81%)
Mutual labels:  annotation-tool
memex-gate
General Architecture for Text Engineering
Stars: ✭ 47 (+123.81%)
Mutual labels:  named-entities
Form-Labeller
Use this tool to label forms, bounding boxes, and assigning types to annotations
Stars: ✭ 17 (-19.05%)
Mutual labels:  annotation-tool
classifai
🔥 One of the most comprehensive open-source data annotation platform.
Stars: ✭ 99 (+371.43%)
Mutual labels:  annotation-tool
PersianNER
Named-Entity Recognition in Persian Language
Stars: ✭ 48 (+128.57%)
Mutual labels:  named-entities
auto-labeling-pipeline
doccano auto labeling pipeline helps doccano to annotate a document automatically.
Stars: ✭ 29 (+38.1%)
Mutual labels:  annotation-tool
trunklucator
Python module for data scientists for quick creating annotation projects.
Stars: ✭ 80 (+280.95%)
Mutual labels:  annotation-tool
label-studio-frontend
Data labeling react app that is backend agnostic and can be embedded into your applications — distributed as an NPM package
Stars: ✭ 230 (+995.24%)
Mutual labels:  annotation-tool
image-sorter2
One-click image sorting/labelling script
Stars: ✭ 65 (+209.52%)
Mutual labels:  annotation-tool
BBoxEE
Bounding Box Editor and Exporter
Stars: ✭ 15 (-28.57%)
Mutual labels:  annotation-tool
dac
Entity linker for the newspaper collection of the National Library of the Netherlands. Links named entity mentions to DBpedia descriptions using either a binary SVM classifier or a neural net.
Stars: ✭ 14 (-33.33%)
Mutual labels:  named-entities
arabic-tagger
AQMAR Arabic Tagger: Sequence tagger with cost-augmented structured perceptron training
Stars: ✭ 38 (+80.95%)
Mutual labels:  named-entities
KWDLC
Kyoto University Web Document Leads Corpus
Stars: ✭ 64 (+204.76%)
Mutual labels:  named-entities
neurotic
Curate, visualize, annotate, and share your behavioral ephys data using Python
Stars: ✭ 24 (+14.29%)
Mutual labels:  annotation-tool
advene
Official Advene repository
Stars: ✭ 32 (+52.38%)
Mutual labels:  annotation-tool
zoe
Zero-Shot Open Entity Typing as Type-Compatible Grounding, EMNLP'18.
Stars: ✭ 37 (+76.19%)
Mutual labels:  named-entities
sparv-pipeline
Språkbanken's text analysis tool
Stars: ✭ 19 (-9.52%)
Mutual labels:  annotation-tool
open-cravat
A modular annotation tool for genomic variants
Stars: ✭ 74 (+252.38%)
Mutual labels:  annotation-tool

neat: named entity annotation tool


Screenshot

Table of contents

1. Introduction

2. User Guide

   2.1 Installation

   2.2 Data format

   2.3 Navigation

   2.4 Saving progress

3. Annotation Guidelines

1. Introduction

neat is a simple, browser-based tool for editing and annotating text with named entities to produce labeled data for training/testing/evaluation. It can be used to add or correct named entity labels and to correct the token text or tokenization (e.g. due to OCR/segmentation errors).

neat is developed at the Berlin State Library for data annotation in the SoNAR-IDH project and the QURATOR project.

2. User Guide

2.1 Installation

neat runs locally as a pure HTML+JavaScript webpage in your web browser. No additional software needs to be installed, but JavaScript has to be enabled in the browser.

Clone the repo using git clone https://github.com/qurator-spk/neat.git or download and extract the ZIP. Make sure you have neat.html and neat.js in the same directory and open neat.html in a browser. Any fairly recent browser should work, but only Chrome and Firefox are tested.

2.2 Data format

The source data we use for annotation are OCR results in PAGE-XML format. We provide a Python tool for the transformation of OCR files in PAGE-XML into the TSV format used by neat.

The internal data format used by neat is based on the format used in the GermEval2014 Named Entity Recognition Shared Task. Text is encoded as one token per line, with name spans in the IOB2 format as tab-separated values:

  • the first column contains either
    • # a comment to indicate the source the sentence is taken from, or
    • >=1 the token position within the sentence, or
    • 0 to mark sentence boundaries
  • the second column contains the token text
  • outer entity spans are encoded in the third column NE-TAG
  • embedded entity spans are encoded in the fourth column NE-EMB
Example (simple)
No.	TOKEN	NE-TAG	NE-EMB
# https://example.url
1	Donnerstag	O	O
2	,	O	O
3	1	O	O	
4	.	O	O	
5	Januar	O	O	
6	.	O	O		
0		O	O
1	Berliner	B-ORG	B-LOC	
2	Tageblatt	I-ORG	O	
3	.	O	O		
0		O	O
1	Nr	O	O	
2	.	O	O		
3	1	O	O	
4	.	O	O	
0		O	O
1	Seite	O	O
2	3	O	O

For our purposes we extend this format by adding these (optional) values:

  • a fifth column for an ID for the outer NE-TAG from an authority file (neat supports automatic linking for Wikidata identifiers)
  • column six for use as a variable url_id for iiif Image API support (neat supports the embedding of image snippets into its interface to assist data annotation and correction if the PAGE-XML source contains word bounding boxes)
  • columns 7-10 are used for storing left,right,top,bottom pixel coordinates for the image snippets
Example (full)
No.	TOKEN	NE-TAG	NE-EMB	ID	url_id	left,right,top,bottom
# https://example.url/iiif/left,right,top,bottom/full/0/default.jpg
1	Donnerstag	O	O	-	0	174,352,358,390
2	,	O	O	-	0	174,352,358,390	
3	1	O	O	-	0	367,392,361,381
4	.	O	O	-	0	370,397,352,379
5	Januar	O	O	-	0	406,518,358,386
6	.	O	O	-	0	406,518,358,386	
0
1	Berliner	B-ORG	B-LOC	Q455014	0	816,984,358,388
2	Tageblatt	I-ORG	O	Q455014	0	1005,1208,360,387
3	.	O	O	-	0	1005,1208,360,387
0
1	Nr	O	O	-	0	1237,1288,360,382
2	.	O	O	-	0	1237,1288,360,382
3	1	O	O	-	0	1304,1326,361,381
4	.	O	O	-	0	1304,1326,361,381
0
1	Seite	O	O	-	0	1837,1926,361,392
2	3	O	O	-	0	1939,1967,364,385

2.3 Navigation

neat can be used both with a keyboard or a mouse, but for ergonomic reasons, we strongly recommend the use of below key combinations.

Keyboard
Key Combination Action
Left Move one cell left
Right Move one cell right
Up Move one row up
Down Move one row down
PageDown Move page down
PageUp Move page up
Crtl+Up Move entire table one row up
Crtl+Down Move entire table one row down
---------- --------------------------------------------
s t Start new sentence in current row
m e Merge current row with row above
s p Create copy of current row
d l Delete current row
---------- --------------------------------------------
backspace Set NE-TAG / NE-EMB to O
b p Set NE-TAG / NE-EMB to B-PER
b l Set NE-TAG / NE-EMB to B-LOC
b o Set NE-TAG / NE-EMB to B-ORG
b w Set NE-TAG / NE-EMB to B-WORK
b c Set NE-TAG / NE-EMB to B-CONF
b e Set NE-TAG / NE-EMB to B-EVT
b t Set NE-TAG / NE-EMB to B-TODO
i p Set NE-TAG / NE-EMB to I-PER
i l Set NE-TAG / NE-EMB to I-LOC
i o Set NE-TAG / NE-EMB to I-ORG
i w Set NE-TAG / NE-EMB to I-WORK
i c Set NE-TAG / NE-EMB to I-CONF
i e Set NE-TAG / NE-EMB to I-EVT
i t Set NE-TAG / NE-EMB to I-TODO
---------- --------------------------------------------
enter Edit TOKEN or ID
esc Close TOKEN or ID edit field without
application of changes
---------- --------------------------------------------
l a add one display row
l r remove on display row (minimum is 5)
---------- --------------------------------------------
Mouse
  • use mouse wheel to scroll up and down

  • left-click << and >> to move 15 rows up or down

  • left-click O in the NE-TAG or NE-EMB column to open a drop-down menu and subsequently select any of the supported NE-Tags to tag a token or change an existing tag

  • left-click the NE-TAG or NE-EMB column and select O to remove a tag

  • left-click the TOKEN column to edit the token text

  • left-click the POSITION and select split from the drop-down menu to create a copy of the current row below

  • left-click the POSITION and select merge from the drop-down menu to merge the current row with the row above

  • left-click the POSITION and select start-sentence from the drop-down menu to mark the start of a new sentence

2.4 Saving progress

neat runs fully locally in the browser. Therefore it can not automatically save any changes you made to disk. You have to use the Save Changes button to do so manually from time to time.

3. Annotation Guidelines

Annotation Guidelines

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].