All Projects → hnesk → browse-ocrd

hnesk / browse-ocrd

Licence: MIT License
An extensible viewer for OCR-D mets.xml files

Programming Languages

python
139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to browse-ocrd

ocrd all
Master repository which includes most other OCR-D repositories as submodules
Stars: ✭ 53 (+278.57%)
Mutual labels:  ocr-d
ocr-fileformat
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
Stars: ✭ 142 (+914.29%)
Mutual labels:  ocr-d
ocrd anybaseocr
DFKI Layout Detection for OCR-D
Stars: ✭ 44 (+214.29%)
Mutual labels:  ocr-d
dinglehopper
An OCR evaluation tool
Stars: ✭ 38 (+171.43%)
Mutual labels:  ocr-d
ocrd cis
OCR-D python tools
Stars: ✭ 28 (+100%)
Mutual labels:  ocr-d

OCR-D Browser

An extensible viewer for OCR-D mets.xml files

Unit tests

Screenshot

OCRD Browser with Page and Xml view

Features

  • Browse fileGrps and pages, arranging views next to each other for comparison
  • PageView: Show original or derived page images with PAGE-XML annotations overlay, similar to PageViewer
  • ImageView: Show original or derived images (AlternativeImage on any level of the structural hierarchy)
  • ImageView: Show multiple images at once for different pages (horizontally) or different segments (vertically), zooming freely
  • XmlView: Show raw PAGE-XML with syntax highlighting, open with PageViewer
  • TextView: Show concatenated PAGE-XML text annotation
  • DiffView: Show a simple diff comparison between text annotations from different fileGrps
  • HtmlView: Show rendered HTML comparison from dinglehopper evaluations

Installation (tested on Ubuntu 18.04/20.04)

In any case you need a venv with a current pip version (>=20), preferably your existing ocrd-venv:

Create a current pip venv:
sudo apt install python3-pip python3-venv 
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip setuptools wheel

From source

git clone https://github.com/hnesk/browse-ocrd.git 
cd browse-ocrd
sudo make deps-ubuntu
make install

Via pip

sudo apt install libcairo2-dev libgirepository1.0-dev
pip install browse-ocrd

Usage

browse-ocrd ./path/to/mets.xml # or open interactively

Configuration

Configuration file locations

At startup the following directories a searched for a config file named ocrd-browser.conf

# directories and their default values under Ubuntu 20.04
GLib.get_system_config_dirs()  # '/etc/xdg/xdg-ubuntu/ocrd-browser.conf', '/etc/xdg/ocrd-browser.conf'
GLib.get_user_config_dir()     # '/home/jk/.config/ocrd-browser.conf'  
os.getcwd()                    # './ocrd-browser.conf'

Configuration file syntax

The ocrd-browser.conf file is an ini-file with the following keys:

[FileGroups]
# Preferred fileGrp names for thumbnail display in the Page Browser 
# Comma seperated list of regular expressions
preferredImages = OCR-D-IMG, OCR-D-IMG.*, ORIGINAL

# Each Tool has a section header [Tool XYZ]
# At the moment the only defined tool is "PageViewer"  
[Tool PageViewer]
# (ba)sh commandline to execute with placeholders  
commandline = /usr/bin/java -jar /home/jk/bin/JPageViewer/JPageViewer.jar --resolve-dir {workspace.directory} {file.path.absolute}

The commandline string will be used as a python format string with the keyword arguments:

  • workspace : The current ocrd.Workspace, all properties get shell escaped (by shlex.quote) automatically.
  • file : The current ocrd_models.OcrdFile, all properties get shell escaped (by shlex.quote) automatically, also there is an additional property path with the properties absolute and relative, so {file.path.absolute} will be replaced by the shell quoted absolute path of the file.

Note: You can get PRImA's PageViewer at Github.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].