All Projects → HTR-United → htr-united

HTR-United / htr-united

Licence: CC0-1.0 license
Ground Truth Resources for the HTR of patrimonial documents

Projects that are alternatives of or similar to htr-united

handprint
Apply different text recognition services to images of handwritten documents.
Stars: ✭ 127 (+452.17%)
Mutual labels:  htr, handwritten-text-recognition
Transformer-ocr
Handwritten text recognition using transformers.
Stars: ✭ 92 (+300%)
Mutual labels:  htr, handwritten-text-recognition
boba
A lightweight, modular CSS framework.
Stars: ✭ 47 (+104.35%)
Mutual labels:  modern
static-webpack-boilerplate
🚀 Minimal & Modern Webpack Boilerplate for building static sites
Stars: ✭ 40 (+73.91%)
Mutual labels:  modern
CRNN-OCR-lite
Lightweight CRNN for OCR (including handwritten text) with depthwise separable convolutions and spatial transformer module [keras+tf]
Stars: ✭ 130 (+465.22%)
Mutual labels:  handwritten-text-recognition
Handwritten-Names-Recognition
The goal of this project is to solve the task of name transcription from handwriting images implementing a NN approach.
Stars: ✭ 54 (+134.78%)
Mutual labels:  handwritten-text-recognition
annotate
Create 3D labelled bounding boxes in RViz
Stars: ✭ 104 (+352.17%)
Mutual labels:  ground-truth
libnica
Common C library functions
Stars: ✭ 37 (+60.87%)
Mutual labels:  modern
SimpleDialogs
💬 A simple framework to help displaying dialogs on a WPF app
Stars: ✭ 24 (+4.35%)
Mutual labels:  modern
hexo-theme-kaze
⛵ A responsive, modern Hexo theme
Stars: ✭ 172 (+647.83%)
Mutual labels:  modern
pyqt5-custom-widgets
More useful widgets for PyQt5
Stars: ✭ 199 (+765.22%)
Mutual labels:  modern
ModernSimpleProfileUI
Design a Modern Simple Profile UI with Constraint Layout in Android Studio 3.1 Canary 6
Stars: ✭ 24 (+4.35%)
Mutual labels:  modern
Pastebin
Modern pastebin written in golang
Stars: ✭ 111 (+382.61%)
Mutual labels:  modern
Master3
Master3 – modern, comfortable and flexible template for Joomla! 3, based on the UIkit 3 framework
Stars: ✭ 21 (-8.7%)
Mutual labels:  modern
suru-plus-dark
Suru++ 25 Dark — A full dark cyberpunk, elegant, futuristic and Papirus-like third-party icons theme
Stars: ✭ 55 (+139.13%)
Mutual labels:  modern
createurstech.fr
Première plateforme collaborative et open source qui référence les créateurs de contenus tech francophone.
Stars: ✭ 174 (+656.52%)
Mutual labels:  french
Android-Starter-Kit
This is up-to-date android studio project for native android application, that is using modern tools and libraries.
Stars: ✭ 16 (-30.43%)
Mutual labels:  modern
date-extractor
Extract dates from text
Stars: ✭ 58 (+152.17%)
Mutual labels:  french
BARIS
Use the French Open Data Portal API features from R
Stars: ✭ 21 (-8.7%)
Mutual labels:  french
memory signature
A small wrapper class providing an unified interface to search for various memory signatures
Stars: ✭ 69 (+200%)
Mutual labels:  modern

HTR-United

FR Go to htr-united.github.io

CC BY 4.0

What is HTR-United

HTR-United is a Github organization without any other form of legal personality. It aims at gathering HTR/OCR transcriptions of all periods and style of writing, mostly but not exclusively in French. It was born from the mere necessity -for projects- to possess potentiel ground truth to rapidly train models on smaller corpora.

What is shared?

Datasets shared or referenced with HTR-United must, at minimum, take the form of:

  • an ensemble of ALTO XML and/or PAGE XML files containing either only informations on the segmentation, either the segmentation and the corresponding transcription;
  • an ensemble of corresponding images. They can be shared in the form of a simple permalink to ressources hosted somewhere else, or can be the contact information necessary to request access to the images. It must be possible to recompose the link between the XML files and the image without any intermediary process such as changing the images' names;
  • a documentation on the context in which the dataset was produced and the rules followed to segment and transcribed the documents. For Github repositories, this documentation is usually presented in the README.

A corpus can be sub-diveded into smaller ensembles if it seems necessary.

If you need help to compose your repository, you can check our template!

Only data?

Eventually, this organization will also aim at sharing -under free licenses- models suited for requested HTR processors. This will make it possible for projects with smaller capacities to benefit from ready-to-use models. Thus, if you share your data, and according to the rythm followed by the other members, you will soon be able to use such models.

However, keep in mind there exists a virtuous circle Transcription<->Training which will eventually, we hope, considerably improve the transcriptions created by young projects starting from scratch.

How does it work?

There are two cases:

  1. You already have data in a repository
  2. You don't have one and prefer to help the organization directly

You already have data in a repository

It's rather convinient: you stay in control, and there's no issue with joining the organization. However, if you want your dataset to gain visibility, it seems important to us that you describe it here. In deed, if you take benefit from data or models provided by HTR-United, you may as well contribute!

To do so, you just need to open an issue or request an update on the deposit repository by adding a YAML file generated with our form, presented as follows:

    schema: https://htr-united.github.io/schema/2021-10-15/schema.json
    title: My Example Dataset
    url: http://link.to.repository
    authors:
      - name: John
        surname: Doe
        roles:
          - transcriber
      - name: Jeanne
        surname: Dupont
        roles:
          - project-manager
    description: A short description of the content of the dataset.
    project-name: My Awesome Project
    project-website: http://optional.link.to.project
    language:
      - fra
    script:
      - Latn
    script-type: only-manuscript
    time:
      notBefore: '1830'
      notAfter: '1875'
    hands:
      count: '1'
      precision: exact
    license:
      - name: CC-BY 4.0
        url: https://creativecommons.org/licenses/by/4.0/
    format: Page-XML
    volume:
      - metric: pages
        count: 42
      - metric: lines
        count: 420
      - metric: characters
        count: 4200
    transcription-guidelines: A presentation of the rules established for the transcription.

You don't have one

Well, we'll be happy to get help from you. Open an issue here and we will gladly help to create and share your repository on HTR-United. Skills with git are appreciated but, if you want to share data, we will help you. It's the purpose of this organization!

Overview

You can browse the content of the catalog from out website: here.

Here is an overview of the periods covered by the datasets documented in HTR-United's catalog!

graph

Quality Check

To help you improve and guarantee the quality of your dataset, we developped a series of tools which can easily be add to your repositories. Check out our Tools webpage to see descriptions and demos!

Publications

  • (FR) Alix Chagué, Thibault Clérice, Laurent Romary. HTR-United : Mutualisons la vérité de terrain !. DHNord2021 - Publier, partager, réutiliser les données de la recherche : les data papers et leurs enjeux, MESHS, Nov 2021, Lille, France. ⟨hal-03398740⟩

  • (FR) Alix Chagué. Conditions de la mutualisation : les principes FAIR et HTR-United. Humanistica 2022, May 2022, Montréal, Canada. ⟨hal-03685731⟩


Logo by Alix Chagué.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].