sign-language-processing / datasets

Licence: other
TFDS data loaders for sign language datasets.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to datasets

panoptic parts
This repository contains code and tools for reading, processing, evaluating on, and visualizing Panoptic Parts datasets. Moreover, it contains code for reproducing our CVPR 2021 paper results.
Stars: ✭ 82 (+382.35%)
Mutual labels:  datasets
ml4se
A curated list of papers, theses, datasets, and tools related to the application of Machine Learning for Software Engineering
Stars: ✭ 46 (+170.59%)
Mutual labels:  datasets
recurrent-defocus-deblurring-synth-dual-pixel
Reference github repository for the paper "Learning to Reduce Defocus Blur by Realistically Modeling Dual-Pixel Data". We propose a procedure to generate realistic DP data synthetically. Our synthesis approach mimics the optical image formation found on DP sensors and can be applied to virtual scenes rendered with standard computer software. Lev…
Stars: ✭ 30 (+76.47%)
Mutual labels:  datasets
bnk48 photo datasets
BNK48 Photo Datasets
Stars: ✭ 12 (-29.41%)
Mutual labels:  datasets
databrewer
The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!
Stars: ✭ 39 (+129.41%)
Mutual labels:  datasets
dplace-data
The data repository for the D-PLACE Project (Database of Places, Language, Culture and Environment)
Stars: ✭ 49 (+188.24%)
Mutual labels:  datasets
awesome-sweden-datasets
A curated list of awesome datasets to use when coding for the Swedish market.
Stars: ✭ 17 (+0%)
Mutual labels:  datasets
covid-19-data-cleanup
Scripts to cleanup data from https://github.com/CSSEGISandData/COVID-19
Stars: ✭ 25 (+47.06%)
Mutual labels:  datasets
opendatasets
A Python library for downloading datasets from Kaggle, Google Drive, and other online sources.
Stars: ✭ 161 (+847.06%)
Mutual labels:  datasets
podium
Podium: a framework agnostic Python NLP library for data loading and preprocessing
Stars: ✭ 55 (+223.53%)
Mutual labels:  datasets
SER-datasets
A collection of datasets for the purpose of emotion recognition/detection in speech.
Stars: ✭ 74 (+335.29%)
Mutual labels:  datasets
ck-env
CK repository with components and automation actions to enable portable workflows across diverse platforms including Linux, Windows, MacOS and Android. It includes software detection plugins and meta packages (code, data sets, models, scripts, etc) with the possibility of multiple versions to co-exist in a user or system environment:
Stars: ✭ 67 (+294.12%)
Mutual labels:  datasets
TSForecasting
This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.
Stars: ✭ 53 (+211.76%)
Mutual labels:  datasets
the-weather-scraper
A Lightweight Weather Scraper
Stars: ✭ 56 (+229.41%)
Mutual labels:  datasets
databrewer-recipes
DataBrewer Recipes Repository.
Stars: ✭ 19 (+11.76%)
Mutual labels:  datasets
kaggle-code
A repository for some of the code I used in kaggle data science & machine learning tasks.
Stars: ✭ 100 (+488.24%)
Mutual labels:  datasets
download audioset
📁 This repo makes it easy to download the raw audio files from AudioSet (32.45 GB, 632 classes).
Stars: ✭ 53 (+211.76%)
Mutual labels:  datasets
Writing-editing-Network
Code for Paper Abstract Writing through Editing Mechanism
Stars: ✭ 72 (+323.53%)
Mutual labels:  datasets
NetEmb-Datasets
A collection of real-world networks/graphs for Network Embedding
Stars: ✭ 18 (+5.88%)
Mutual labels:  datasets
disent
🧶 Modular VAE disentanglement framework for python built with PyTorch Lightning ▸ Including metrics and datasets ▸ With strongly supervised, weakly supervised and unsupervised methods ▸ Easily configured and run with Hydra config ▸ Inspired by disentanglement_lib
Stars: ✭ 41 (+141.18%)
Mutual labels:  datasets

Sign Language Datasets

This repository includes TFDS data loaders for sign language datasets.

Installation

From Source

pip install git+https://github.com/sign-language-processing/datasets.git

PyPi

Not available. Need to add automatic publication on push.

Usage

We demonstrate a loading script for every dataset in examples/load.ipynb Open In Colab

Our config includes the option to choose the resolution and fps, for example:

import tensorflow_datasets as tfds
import sign_language_datasets.datasets
from sign_language_datasets.datasets.config import SignDatasetConfig

# Loading a dataset with default configuration
aslg_pc12 = tfds.load("aslg_pc12")

# Loading a dataset with custom configuration
config = SignDatasetConfig(name="videos_and_poses256x256:12", 
                           version="3.0.0",          # Specific version
                           include_video=True,       # Download and load dataset videos
                           process_video=True,       # Process videos to tensors, or only save path to video
                           fps=12,                   # Load videos at constant, 12 fps
                           resolution=(256, 256),    # Convert videos to a constant resolution, 256x256
                           include_pose="holistic")  # Download and load Holistic pose estimation
rwth_phoenix2014_t = tfds.load(name='rwth_phoenix2014_t', builder_kwargs=dict(config=config))

Datasets

Dataset Videos Poses Versions
aslg_pc12 N/A N/A 0.0.1
rwth_phoenix2014_t Yes Holistic 3.0.0
autsl Yes OpenPose, Holistic 1.0.0
dgs_corpus Yes OpenPose, Holistic 3.0.0
how2sign Yes OpenPose 1.0.0
sign2mint Yes 1.0.0
signtyp Links 1.0.0
swojs_glossario Yes 1.0.0
SignBank N/A 1.0.0
wlasl Failed OpenPose None
msasl None
Video-Based CSL None
RVL-SLLL ASL None

Data Interface

We follow the following interface wherever possible to make it easy to swap datasets.

{
    "id": tfds.features.Text(),
    "signer": tfds.features.Text() | tf.int32,
    "video": tfds.features.Video(shape=(None, HEIGHT, WIDTH, 3)),
    "depth_video": tfds.features.Video(shape=(None, HEIGHT, WIDTH, 1)),
    "fps": tf.int32,
    "pose": {
        "data": tfds.features.Tensor(shape=(None, 1, POINTS, CHANNELS), dtype=tf.float32),
        "conf": tfds.features.Tensor(shape=(None, 1, POINTS), dtype=tf.float32)
    },
    "gloss": tfds.features.Text(),
    "text": tfds.features.Text()
}

Why not Huggingface Datasets?

Huggingface datasets do not work well with videos. From the lack of native support of the video type, to lack of support of arbitrary tensors. Furthermore, they currently have memory leaks that prevent from saving even the smallest of video datasets.

Cite

@misc{moryossef2021datasets, 
    title={Sign Language Datasets},
    author={Moryossef, Amit},
    howpublished={\url{https://github.com/sign-language-processing/datasets}},
    year={2021}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].