All Projects → dell-research-harvard → HJDataset

dell-research-harvard / HJDataset

Licence: other
A Large Dataset of Historical Japanese Documents with Complex Layouts

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to HJDataset

Species-Names-Corpus
物种名称语料库。植物名,动物名。
Stars: ✭ 23 (+21.05%)
Mutual labels:  dataset
pull facebook data for good
[DEPRECATED] Imitate an API for downloading data from Facebook Data For Good
Stars: ✭ 12 (-36.84%)
Mutual labels:  dataset
Audio-Classification-using-CNN-MLP
Multi class audio classification using Deep Learning (MLP, CNN): The objective of this project is to build a multi class classifier to identify sound of a bee, cricket or noise.
Stars: ✭ 36 (+89.47%)
Mutual labels:  dataset
Thirukkural-English-Translation-Dataset
Thirukural in English
Stars: ✭ 12 (-36.84%)
Mutual labels:  dataset
ainized-detectron2
api server for detectron2
Stars: ✭ 22 (+15.79%)
Mutual labels:  detectron2
dataset-histology-landmarks
Dataset: landmarks for registration of histology images
Stars: ✭ 26 (+36.84%)
Mutual labels:  dataset
detectron2-pipeline
Modular image processing pipeline using OpenCV and Python generators powered by Detectron2.
Stars: ✭ 78 (+310.53%)
Mutual labels:  detectron2
mxmortalitydb
A data only R package containing all injury intent deaths registered in Mexico from 2004 to 2019
Stars: ✭ 20 (+5.26%)
Mutual labels:  dataset
HAR
Recognize one of six human activities such as standing, sitting, and walking using a Softmax Classifier trained on mobile phone sensor data.
Stars: ✭ 18 (-5.26%)
Mutual labels:  dataset
BIRL
BIRL: Benchmark on Image Registration methods with Landmark validations
Stars: ✭ 66 (+247.37%)
Mutual labels:  dataset
TVQAplus
[ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering
Stars: ✭ 99 (+421.05%)
Mutual labels:  dataset
ACVR2017
An Innovative Salient Object Detection Using Center-Dark Channel Prior
Stars: ✭ 20 (+5.26%)
Mutual labels:  dataset
Neural-Re-Rendering-of-Humans-from-a-Single-Image
Pytorch implementation of the paper, Neural re-rendering of humans from a single image.
Stars: ✭ 77 (+305.26%)
Mutual labels:  detectron2
climateR
An R 📦 for getting point and gridded climate data by AOI
Stars: ✭ 93 (+389.47%)
Mutual labels:  dataset
pump-and-dump-dataset
Additional material for paper: Pump and Dumps in the Bitcoin Era: Real Time Detection of Cryptocurrency Market Manipulations, ICCCN '20
Stars: ✭ 66 (+247.37%)
Mutual labels:  dataset
InstanceShadowDetection
Instance Shadow Detection (CVPR 2020)
Stars: ✭ 97 (+410.53%)
Mutual labels:  detectron2
Awesome-Deepfakes-Detection
A list of tools, papers and code related to Deepfake Detection.
Stars: ✭ 30 (+57.89%)
Mutual labels:  dataset
MaskedFaceRepresentation
Masked face recognition focuses on identifying people using their facial features while they are wearing masks. We introduce benchmarks on face verification based on masked face images for the development of COVID-safe protocols in airports.
Stars: ✭ 17 (-10.53%)
Mutual labels:  dataset
squad-v1.1-pt
Portuguese translation of the SQuAD dataset
Stars: ✭ 13 (-31.58%)
Mutual labels:  dataset
snorkeling
Extracting biomedical relationships from literature with Snorkel 🏊
Stars: ✭ 56 (+194.74%)
Mutual labels:  dataset

HJDataset

A Large Dataset of Historical Japanese Documents with Complex Layouts

HJDataset is a Large Dataset of Historical Japanese Documents with Complex Layouts. It contains over 250,000 layout element annotations of seven types. In addition to bounding boxes and masks of the content regions, it also includes the hierarchical structures and reading orders for layout elements for advanced analysis.

Download the dataset

All the annotations are available through this link. However, due to some copyright issues, we could not directly release the images in this dataset. Please fill out this form to send us a request for downloading, and we will send back the links.

Organization of the files

After downloading, we suggest organize the annotation and images in this fashion:

data/
├── train/
├── test/
├── val/
└── annotations/
    ├── instances_train.json 
    └── .... 

Environment configuration

You can also use the provided conda environment file to configure your own environment.

conda install -f environment.yml

However, when installing Detectron2, you may encounter some problems. Please check their official install instructions and Common Installation Issues for better reference.

Starter code

We provide some starter code in notebooks/.

  • 1-Dataloader and visualization.ipynb illustrates how to use the dataloder class to load and visualize layout elements in HJDataset.
  • 2-Training Using Detectron2.ipynb shows how to train segmentation models on the dataset using Detectron2.

Cite our work

If you find the dataset is helpful for your research, please cite our work:

@article{shen2020large,
  title={A Large Dataset of Historical Japanese Documents with Complex Layouts},
  author={Shen, Zejiang and Zhang, Kaixuan and Dell, Melissa},
  journal={arXiv preprint arXiv:2004.08686},
  year={2020}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].