All Projects → imdeepmind → processed-imdb-wiki-dataset

imdeepmind / processed-imdb-wiki-dataset

Licence: MIT license
Processes IMDB WIKI dataset ready to be used in any projects

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to processed-imdb-wiki-dataset

Age-gender-and-emotion-recognition
3 networks to recognition age,gender and emotion
Stars: ✭ 29 (-39.58%)
Mutual labels:  gender-classification, age-classification
AIML-Human-Attributes-Detection-with-Facial-Feature-Extraction
This is a Human Attributes Detection program with facial features extraction. It detects facial coordinates using FaceNet model and uses MXNet facial attribute extraction model for extracting 40 types of facial attributes. This solution also detects Emotion, Age and Gender along with facial attributes.
Stars: ✭ 48 (+0%)
Mutual labels:  gender-classification
11K-Hands
Two-stream CNN for gender classification and biometric identification using a dataset of 11K hand images.
Stars: ✭ 44 (-8.33%)
Mutual labels:  gender-classification
namsor-python-sdk2
NamSor API v2 Python SDK - classify personal names accurately by gender, country of origin, or ethnicity.
Stars: ✭ 23 (-52.08%)
Mutual labels:  gender-classification
age-and-gender
Predict Age and Gender of people from images | Determination of gender and age
Stars: ✭ 68 (+41.67%)
Mutual labels:  gender-classification
multi-task-learning
Multi-task learning smile detection, age and gender classification on GENKI4k, IMDB-Wiki dataset.
Stars: ✭ 154 (+220.83%)
Mutual labels:  gender-classification
HSE FaceRec tf
Tensorflow/Keras small models for face recognition, ag/gender prediction from images
Stars: ✭ 23 (-52.08%)
Mutual labels:  gender-classification
voice gender detection
♂️♀️ Detect a person's gender from a voice file (90.7% +/- 1.3% accuracy).
Stars: ✭ 51 (+6.25%)
Mutual labels:  gender-classification
emotion-and-gender-classification
2 networks to recognition gender and emotion; face detection using Opencv or Mtcnn
Stars: ✭ 21 (-56.25%)
Mutual labels:  gender-classification
name2gender
Extrapolate gender from first names using Naïve-Bayes and PyTorch Char-RNN
Stars: ✭ 24 (-50%)
Mutual labels:  gender-classification
UHV-OTS-Speech
A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.
Stars: ✭ 94 (+95.83%)
Mutual labels:  gender-classification
genderize
Python client for the Genderize.io web service.
Stars: ✭ 59 (+22.92%)
Mutual labels:  gender-classification

Processed IMDB WIKI Dataset

This GitHub repository contains a preprocessed IMDB WIKI dataset.

Table of contents:

Introduction

IMDB WIKI dataset is the largest dataset of human faces with gender, name and age information. In this project, I preprocessed the entire dataset so that it can be used easily without any problems.

IMDB WIKI Dataset

IMDB WIKI dataset is the largest publically available dataset of human faces with gender, age, and name. It contains more than 500 thousand+ images with all the meta information. All the images are in .jpg format.

For more information about the dataset please visit this website.

The Problem

The dataset is great for research purposes. It contains more than 500 thousand+ images of faces. But the dataset is not ready for any Machine Learning algorithm. There are some problems with the dataset.

  • All the images are of different size
  • Some of the images are completely corrupted
  • Some images don't have any faces
  • Some of the ages are invalid
  • The distribution between the gender is not equal(there are more male faces than female faces)
  • Also, the meta information is in .mat format. Reading .mat files in python is a tedious process.

The Solution

In this project, I filter all the images, resized them all to 128x128, remove all the images with invalid age, fix the gender distribution problem, and save them in the proper format. Along with that, I’ve also processed the .mat files and converted them in .csv files also.

File Structure

This repository contains 3 files

  • mat.py
  • gender.py
  • age.py

The first mat.py file converts the mat files IMDB and WIKI dataset to .csv format and merge them into one file.

The last two file process the images for gender and age classification.

As the size of the dataset is huge, I can not upload it here on GitHub

How to Run Locally

Following are the steps for running it locally

  • Download the dataset from this link and unzip it
  • Extract the dataset and save it in the project directory
  • After that, you should have the following folders
    • imdb_crop
    • wiki_crop
  • Run the mat.py file
  • Run age.py and gender.py file
  • Now the dataset in preprocessed and ready for your project

Dependencies

  • Numpy=1.15.4
  • Scipy=1.2.0
  • pandas=0.23.4
  • cv2=4.0.0

Acknowledgments

I really thankful to these peoples for providing this amazing dataset

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].