All Projects → paperai → Pdfanno

paperai / Pdfanno

Linguistic Annotation and Visualization Tool for PDF Documents

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Pdfanno

Polar Bookshelf
Polar is a personal knowledge repository for PDF and web content supporting incremental reading and document annotation.
Stars: ✭ 4,411 (+2727.56%)
Mutual labels:  annotation, pdf
Open Semantic Etl
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Stars: ✭ 165 (+5.77%)
Mutual labels:  annotation, pdf
Jupiter
jupiter是一个aio web框架,基于aiohttp。支持(restful格式、扫描注解、依赖注入、jinja2模板引擎、ORM框架)等。
Stars: ✭ 140 (-10.26%)
Mutual labels:  annotation
H5 Transfer Pdf
H5TransferPDF是一个将网页HTML渲染为PDF和各种图像格式的API工具,完美兼容HTML、CSS、JS,较好的排版支持,并支持生成多种版本的PDF。
Stars: ✭ 149 (-4.49%)
Mutual labels:  pdf
Decktape
PDF exporter for HTML presentations
Stars: ✭ 1,847 (+1083.97%)
Mutual labels:  pdf
Svglib
Read SVG files and convert them to other formats.
Stars: ✭ 139 (-10.9%)
Mutual labels:  pdf
Pdf Toolbox
A collection of tools for processing PDF files in Haskell
Stars: ✭ 145 (-7.05%)
Mutual labels:  pdf
Pdfinverter
darken (or lighten) a PDF
Stars: ✭ 139 (-10.9%)
Mutual labels:  pdf
Yii2 Export
A library to export server/db data in various formats (e.g. excel, html, pdf, csv etc.)
Stars: ✭ 153 (-1.92%)
Mutual labels:  pdf
Pyecharts Snapshot
renders the output of pyecharts as png, jpeg, gif, svg, eps, pdf and raw base64
Stars: ✭ 142 (-8.97%)
Mutual labels:  pdf
Qtpdfium
Pdf Redening on Qt
Stars: ✭ 148 (-5.13%)
Mutual labels:  pdf
Pdf reports
📕 Python library and CSS theme to generate PDF reports from HTML/Pug
Stars: ✭ 142 (-8.97%)
Mutual labels:  pdf
Annotorious
A JavaScript library for image annotation
Stars: ✭ 138 (-11.54%)
Mutual labels:  annotation
Zathura Pywal
🎨📖 A script that dynamically generates a zathura colorscheme based on the current wal colors.
Stars: ✭ 147 (-5.77%)
Mutual labels:  pdf
Ambar
🔍 Ambar: Document Search Engine
Stars: ✭ 1,829 (+1072.44%)
Mutual labels:  pdf
Plagiarism Checker
A utility to check if a document's contents are plagiarised
Stars: ✭ 149 (-4.49%)
Mutual labels:  pdf
Educative.io Downloader
📖 This tool is to download course from educative.io for offline usage. It uses your login credentials and download the course.
Stars: ✭ 139 (-10.9%)
Mutual labels:  pdf
Cs Books Pdf
编程电子书pdf,计算机常用电子书整理(高质量/附下载链接)包括 Java, Python, Linux, Go, C, C++, 数据结构与算法, AI人工智能, 计算机基础, 面试, 设计模式, 数据库, 前端等编程书籍。
Stars: ✭ 140 (-10.26%)
Mutual labels:  pdf
Doctron
Docker-powered html convert to pdf(html2pdf), html to image(html2image like jpeg,png),which using chrome(golang) kernel, add watermarks to pdf, convert pdf to images etc.
Stars: ✭ 141 (-9.62%)
Mutual labels:  pdf
It books
好书分享,送人玫瑰,手有余香。
Stars: ✭ 154 (-1.28%)
Mutual labels:  pdf

PDFAnno

PDFAnno is a browser-based linguistic annotation tool for PDF documents.
It offers functions for annotating PDF with labels and relations.
For natural language processing and machine learning, it is suitable for development of gold-standard data with named entity spans, dependency relations, and coreference chains.

If you use PDFAnno, please cite the following paper:

Hiroyuki Shindo, Yohei Munesada and Yuji Matsumoto,
"PDFAnno: a Web-based Linguistic Annotation Tool for PDF Documents",
In Proceedings of LREC, 2018.

It is highly recommended to use the latest version of Chrome. (Firefox will also be supported in future.)

Installation

If you install PDFAnno locally,

git clone https://github.com/paperai/pdfanno.git
cd pdfanno
npm install
cp .env.example .env

Then, edit .env as you like.
The default values are:

SERVER_PORT=1000

Run Server

npm run server

Usage

  1. Visit the online demo with the latest version of Chrome.
  2. Load your PDF and annotation file (if any). Sample PDFs and annotations are downloadable from here.
  3. Annotate the PDF as you like.
  4. Save your annotations via button.
    If you continue the annotation, respecify your directory via Browse button to reload the PDF and anno file.

For security reasons, PDFAnno does NOT automatically save your annotations.
Don't forget to download your current annotations!

Annotation Tools

Icon Description
Span highlighting. It is disallowed to cross page boundaries.
One-way relation. This is used for annotating dependency relation between spans.
Rectangle. It is disallowed to cross page boundaries.

Annotation File (.anno)

In PDFAnno, an annotation file (.anno) follows TOML format.
Here is an example of anno file:

pdfanno = "0.4.1"
pdfextract = "0.2.4"

[[spans]]
id = "1"
page = 1
label = "label1"
text = "AgBi 0.05 Sb 0.95 Te 2"
textrange = [1422,1438]

[[spans]]
id = "2"
page = 1
label = "label1"
text = "0.48 Wm [NO_UNICODE] 1 K [NO_UNICODE] 1 )"
textrange = [1386,1397]

[[relations]]
head = "1"
tail = "2"
label = "relation1"

where textrange corresponds to the start and end token id of pdftxt.
pdftxt is a text file extracted from the original pdf file.
You can download pdftxt via pdf.txt button at the top right of the screen.

Reference Anno File

To support multi-user annotation, PDFAnno allows to load reference anno file.
For example, if you create a.anno and an another annotator creates b.anno for the same PDF, load a.anno as usual, and load b.anno as a reference file. Then PDFAnno renders a.anno and b.anno with different colors each other. Rendering more than one reference file is also supported.
This is useful to check inter-annotator agreement and resolving annotation conflicts.
Note that the reference files are rendered as read-only.

Contact

Please contact hshindo or feel free to create an issue.

LICENSE

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].