All Projects → EB-Dodo → C Ms Celeb

EB-Dodo / C Ms Celeb

Licence: gpl-3.0
A clean version (wash list) of MS-Celeb-1M face dataset, containing 6,464,018 face images of 94,682 celebrities

Projects that are alternatives of or similar to C Ms Celeb

Facerecognition guide
This is a guide to face recognition with Python, GNU Octave/MATLAB and OpenCV2 C++. Eigenfaces and Fisherfaces are explained in detail and implemented.
Stars: ✭ 188 (-17.18%)
Mutual labels:  face-recognition
Ros people object detection tensorflow
An extensive ROS toolbox for object detection & tracking and face/action recognition with 2D and 3D support which makes your Robot understand the environment
Stars: ✭ 202 (-11.01%)
Mutual labels:  face-recognition
Face recognition py
基于OpenCV的视频人脸识别
Stars: ✭ 215 (-5.29%)
Mutual labels:  face-recognition
Facerecognition
Webcam face recognition using tensorflow and opencv
Stars: ✭ 192 (-15.42%)
Mutual labels:  face-recognition
Marvel
Marvel - Face Recognition With Android & OpenCV
Stars: ✭ 199 (-12.33%)
Mutual labels:  face-recognition
Esp32 Cam Webserver
Expanded version of the Espressif ESP webcam
Stars: ✭ 200 (-11.89%)
Mutual labels:  face-recognition
Hms Ml Demo
HMS ML Demo provides an example of integrating Huawei ML Kit service into applications. This example demonstrates how to integrate services provided by ML Kit, such as face detection, text recognition, image segmentation, asr, and tts.
Stars: ✭ 187 (-17.62%)
Mutual labels:  face-recognition
Faceimagequality
Code and information for face image quality assessment with SER-FIQ
Stars: ✭ 223 (-1.76%)
Mutual labels:  face-recognition
Arcface Multiplex Recognition
适用于复杂场景的人脸识别身份认证系统
Stars: ✭ 200 (-11.89%)
Mutual labels:  face-recognition
Howdy
🛡️ Windows Hello™ style facial authentication for Linux
Stars: ✭ 3,237 (+1325.99%)
Mutual labels:  face-recognition
Face Nn
游戏捏脸,基于神经风格迁移框架生成逼真人脸
Stars: ✭ 192 (-15.42%)
Mutual labels:  face-recognition
Mediadevices
Go implementation of the MediaDevices API.
Stars: ✭ 197 (-13.22%)
Mutual labels:  face-recognition
Face.evolve.pytorch
🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥
Stars: ✭ 2,719 (+1097.8%)
Mutual labels:  face-recognition
Tf Insightface
A better tensorflow implementation of deepinsight, aiming at smoothly production ready for cross-platforms. Currently only with inference, training code later.
Stars: ✭ 191 (-15.86%)
Mutual labels:  face-recognition
Maskinsightface
基于人脸关键区域提取的人脸识别(LFW:99.82%+ CFP_FP:98.50%+ AgeDB30:98.25%+)
Stars: ✭ 221 (-2.64%)
Mutual labels:  face-recognition
Mobile Id
Deep Face Model Compression
Stars: ✭ 187 (-17.62%)
Mutual labels:  face-recognition
Facerecognition
Implement face recognition using PCA, LDA and LPP
Stars: ✭ 206 (-9.25%)
Mutual labels:  face-recognition
Insightface Tensorflow
Tensoflow implementation of InsightFace (ArcFace: Additive Angular Margin Loss for Deep Face Recognition).
Stars: ✭ 228 (+0.44%)
Mutual labels:  face-recognition
Ownphotos
Self hosted alternative to Google Photos
Stars: ✭ 2,587 (+1039.65%)
Mutual labels:  face-recognition
Mobilefacenet pytorch
MobileFaceNets: Efficient CNNs for Accurate Real-Time Face Verification on Mobile Devices
Stars: ✭ 209 (-7.93%)
Mutual labels:  face-recognition

C-MS-Celeb

This is a clean version of MS-Celeb-1M face dataset, containing 6,464,018 images of 94,682 celebrities. Since the original MS-Celeb-1M has too much mislabeled images, we would like to clean this dataset for better model training.

Many Thanks to ha1990-12, the origianl MS-Celeb-1M dataset is available here.

The paper of our cleaning work, "A Community Detection Approach to Cleaning Extremely Large Face Database", can be found here

Data overview

Our C-MS-Celeb cleaned dataset has 6,464,018 images belonging to 94,682 celebrities. The table below compares ours with other publicly available cleaned MS-Celeb datasets:

Datasets Celebrities Images
Original Dataset 99,892 8,456,240
XiangWu's Cleaned Dataset 79,099 5,049,824
MS-Celeb-1M WashList Cleaned Dataset 78,579 4,621,640
C-MS-Celeb Cleaned Dataset 94,682 6,464,018

Our C-MS-Celeb is large, clean and diverse.

Large

First, from this table, compared with other cleaning lists, we can see that C-MS-Celeb preserves more people with more images during the cleaning.

Clean

Second, based on our empirical evaluation, approximate 97.3% of images in C-MS-Celeb are correctly labeled.

Diverse

Third, our community detection based cleaning method can also preserve the diversity of facial images for each individual. Here are some sample images from "Lady Gaga" and "Quinn Cummings" in our cleaning result:

From these sample results, we can see that images with diverse makeups can be preserved during cleaning (Lady Gaga on the left half). The diversity of different ages can also be observed from the cleaning results (Quinn Cummings on the right half).

Our cleaning method based on community detection

We develop a community detection based pipeline to clean the noisy MS-Celeb-1M face dataset. As the diversity of faces is preserved in multiple large communities, our cleaning results have both high cleanness and rich data diversity. More details can be found in our paper here.

The picture below shows the images of Phil Upchurch before and after our cleaning

Images with red squares on the left are mislabeled images in the MS-Celeb-1M face dataset and images on the right are our cleaning results. We can again see that diverse Phil Upchurch of all ages is preserved during the cleaning.

The diagram below illustrates our community detection based cleaning method. We first construct a face similarity graph using pre-trained face recognition models. Each node in the similarity graph represents one image and the weight of the link between two nodes quantifies the similarity between these two images. Then we remove the weak links and run the community detection algorithm on this graph. Finally, we preserve the images in the large communities (colored communities on the right in this diagram) and remove the scattered nodes and minor communities (grey nodes in the diagram). Thus, we are able to achieve both high cleanness and rich data diversity during the data cleaning.

Benefits of using C-MS-Celeb to train a face recognition model

We use our C-MS-Celeb dataset to train a face recognition model and the image below shows that using C-MS-Celeb for model training can increase the model's performance. Check our paper for more benefit details.

How to use C-MS-Celeb

C-MS-Celeb has two TXT files in clean_list.7z: "clean_list_128Vec_WT051_P010.txt" and "relabel_list_128Vec_T058.txt", which are the cleaned lists of facial images.

"clean_list_128Vec_WT051_P010.txt" contains the path of all cleaning results in Stage 2 (See our paper for more details). "relabel_list_128Vec_T058.txt" contains the path of all relabeling results in Stage 3 (See our paper for more details). For both files, the first column is the identity label of the image and the second column is the path of the image file.

Note that C-MS-Celeb here is only the cleaned label list. In order to use this dataset, one needs firstly download all images of the MS-Celeb-1M dataset and then filter out the noisy (mislabeled) images according to the path in C-MS-Celeb's TXT files. You may need to combine these two TXT files as one before filtering out mislabeled images. The raw MS-Celeb-1M dataset can be downloaded on this website: https://www.microsoft.com/en-us/research/project/ms-celeb-1m-challenge-recognizing-one-million-celebrities-real-world/

Citation information

If you use this dataset, please cite our paper as below:

Chi Jin, Ruochun Jin, Kai Chen, and Yong Dou, “A Community Detection Approach to Cleaning Extremely Large Face Database,” Computational Intelligence and Neuroscience, vol. 2018, Article ID 4512473, 10 pages, 2018. doi:10.1155/2018/4512473

@article{jin2018community, title={A community detection approach to cleaning extremely large face database}, author={Jin, Chi and Jin, Ruochun and Chen, Kai and Dou, Yong}, journal={Computational intelligence and neuroscience}, volume={2018}, year={2018}, publisher={Hindawi} }

The link of our paper "A Community Detection Approach to Cleaning Extremely Large Face Database" is: https://www.hindawi.com/journals/cin/2018/4512473/

another test

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].