sign-language-processing / datasets

Licence: other

TFDS data loaders for sign language datasets.

Programming Languages

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to datasets

This repository contains code and tools for reading, processing, evaluating on, and visualizing Panoptic Parts datasets. Moreover, it contains code for reproducing our CVPR 2021 paper results.

Stars: ✭ 82 (+382.35%)

Mutual labels: datasets

ml4se

A curated list of papers, theses, datasets, and tools related to the application of Machine Learning for Software Engineering

Stars: ✭ 46 (+170.59%)

Mutual labels: datasets

recurrent-defocus-deblurring-synth-dual-pixel

Reference github repository for the paper "Learning to Reduce Defocus Blur by Realistically Modeling Dual-Pixel Data". We propose a procedure to generate realistic DP data synthetically. Our synthesis approach mimics the optical image formation found on DP sensors and can be applied to virtual scenes rendered with standard computer software. Lev…

Stars: ✭ 30 (+76.47%)

Mutual labels: datasets

bnk48 photo datasets

BNK48 Photo Datasets

Stars: ✭ 12 (-29.41%)

Mutual labels: datasets

databrewer

The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!

Stars: ✭ 39 (+129.41%)

Mutual labels: datasets

dplace-data

The data repository for the D-PLACE Project (Database of Places, Language, Culture and Environment)

Stars: ✭ 49 (+188.24%)

Mutual labels: datasets

awesome-sweden-datasets

A curated list of awesome datasets to use when coding for the Swedish market.

Stars: ✭ 17 (+0%)

Mutual labels: datasets

covid-19-data-cleanup

Scripts to cleanup data from https://github.com/CSSEGISandData/COVID-19

Stars: ✭ 25 (+47.06%)

Mutual labels: datasets

opendatasets

A Python library for downloading datasets from Kaggle, Google Drive, and other online sources.

Stars: ✭ 161 (+847.06%)

Mutual labels: datasets

podium

Podium: a framework agnostic Python NLP library for data loading and preprocessing

Stars: ✭ 55 (+223.53%)

Mutual labels: datasets

SER-datasets

A collection of datasets for the purpose of emotion recognition/detection in speech.

Stars: ✭ 74 (+335.29%)

Mutual labels: datasets

ck-env

CK repository with components and automation actions to enable portable workflows across diverse platforms including Linux, Windows, MacOS and Android. It includes software detection plugins and meta packages (code, data sets, models, scripts, etc) with the possibility of multiple versions to co-exist in a user or system environment:

Stars: ✭ 67 (+294.12%)

Mutual labels: datasets

TSForecasting

This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

Stars: ✭ 53 (+211.76%)

Mutual labels: datasets

the-weather-scraper

A Lightweight Weather Scraper

Stars: ✭ 56 (+229.41%)

Mutual labels: datasets

databrewer-recipes

DataBrewer Recipes Repository.

Stars: ✭ 19 (+11.76%)

Mutual labels: datasets

kaggle-code

A repository for some of the code I used in kaggle data science & machine learning tasks.

Stars: ✭ 100 (+488.24%)

Mutual labels: datasets

download audioset

📁 This repo makes it easy to download the raw audio files from AudioSet (32.45 GB, 632 classes).

Stars: ✭ 53 (+211.76%)

Mutual labels: datasets

Writing-editing-Network

Code for Paper Abstract Writing through Editing Mechanism

Stars: ✭ 72 (+323.53%)

Mutual labels: datasets

NetEmb-Datasets

A collection of real-world networks/graphs for Network Embedding

Stars: ✭ 18 (+5.88%)

Mutual labels: datasets

disent

🧶 Modular VAE disentanglement framework for python built with PyTorch Lightning ▸ Including metrics and datasets ▸ With strongly supervised, weakly supervised and unsupervised methods ▸ Easily configured and run with Hydra config ▸ Inspired by disentanglement_lib

Stars: ✭ 41 (+141.18%)

Mutual labels: datasets

View All Similar Projects ➔

Sign Language Datasets

This repository includes TFDS data loaders for sign language datasets.

Installation

From Source

pip install git+https://github.com/sign-language-processing/datasets.git

PyPi

Not available. Need to add automatic publication on push.

Usage

We demonstrate a loading script for every dataset in examples/load.ipynb

Our config includes the option to choose the resolution and fps, for example:

import tensorflow_datasets as tfds
import sign_language_datasets.datasets
from sign_language_datasets.datasets.config import SignDatasetConfig

# Loading a dataset with default configuration
aslg_pc12 = tfds.load("aslg_pc12")

# Loading a dataset with custom configuration
config = SignDatasetConfig(name="videos_and_poses256x256:12", 
                           version="3.0.0",          # Specific version
                           include_video=True,       # Download and load dataset videos
                           process_video=True,       # Process videos to tensors, or only save path to video
                           fps=12,                   # Load videos at constant, 12 fps
                           resolution=(256, 256),    # Convert videos to a constant resolution, 256x256
                           include_pose="holistic")  # Download and load Holistic pose estimation
rwth_phoenix2014_t = tfds.load(name='rwth_phoenix2014_t', builder_kwargs=dict(config=config))

Datasets

Dataset	Videos	Poses	Versions
aslg_pc12	N/A	N/A	0.0.1
rwth_phoenix2014_t	Yes	Holistic	3.0.0
autsl	Yes	OpenPose, Holistic	1.0.0
dgs_corpus	Yes	OpenPose, Holistic	3.0.0
how2sign	Yes	OpenPose	1.0.0
sign2mint	Yes		1.0.0
signtyp	Links		1.0.0
swojs_glossario	Yes		1.0.0
SignBank	N/A		1.0.0
wlasl	Failed	OpenPose	None
msasl			None
Video-Based CSL			None
RVL-SLLL ASL			None

Data Interface

We follow the following interface wherever possible to make it easy to swap datasets.

{
    "id": tfds.features.Text(),
    "signer": tfds.features.Text() | tf.int32,
    "video": tfds.features.Video(shape=(None, HEIGHT, WIDTH, 3)),
    "depth_video": tfds.features.Video(shape=(None, HEIGHT, WIDTH, 1)),
    "fps": tf.int32,
    "pose": {
        "data": tfds.features.Tensor(shape=(None, 1, POINTS, CHANNELS), dtype=tf.float32),
        "conf": tfds.features.Tensor(shape=(None, 1, POINTS), dtype=tf.float32)
    },
    "gloss": tfds.features.Text(),
    "text": tfds.features.Text()
}

Why not Huggingface Datasets?

Huggingface datasets do not work well with videos. From the lack of native support of the video type, to lack of support of arbitrary tensors. Furthermore, they currently have memory leaks that prevent from saving even the smallest of video datasets.

Cite

@misc{moryossef2021datasets, 
    title={Sign Language Datasets},
    author={Moryossef, Amit},
    howpublished={\url{https://github.com/sign-language-processing/datasets}},
    year={2021}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

sign-language-processing / datasets

Programming Languages

Labels

Projects that are alternatives of or similar to datasets

Sign Language Datasets

Installation

From Source

PyPi

Usage

Datasets

Data Interface

Why not Huggingface Datasets?

Cite