All Projects → ellisbrown → name2gender

ellisbrown / name2gender

Licence: other
Extrapolate gender from first names using Naïve-Bayes and PyTorch Char-RNN

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to name2gender

Text-Generate-RNN
中国古诗生成(文本生成)
Stars: ✭ 106 (+341.67%)
Mutual labels:  rnn, char-rnn
neural-namer
Fantasy name generator in TensorFlow
Stars: ✭ 65 (+170.83%)
Mutual labels:  rnn, char-rnn
UHV-OTS-Speech
A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.
Stars: ✭ 94 (+291.67%)
Mutual labels:  gender-classification
wod-generator
Crossfit WOD generator using recurrent neural networks (LSTM), implemented in Tensorflow
Stars: ✭ 24 (+0%)
Mutual labels:  char-rnn
nemesyst
Generalised and highly customisable, hybrid-parallelism, database based, deep learning framework.
Stars: ✭ 17 (-29.17%)
Mutual labels:  rnn
Market-Trend-Prediction
This is a project of build knowledge graph course. The project leverages historical stock price, and integrates social media listening from customers to predict market Trend On Dow Jones Industrial Average (DJIA).
Stars: ✭ 57 (+137.5%)
Mutual labels:  rnn
training-charRNN
Training charRNN model for ml5js
Stars: ✭ 87 (+262.5%)
Mutual labels:  rnn
Selected Stories
An experimental web text editor that runs a LSTM model while you write to suggest new lines
Stars: ✭ 39 (+62.5%)
Mutual labels:  rnn
tf-attend-infer-repeat
TensorFlow-based implementation of "Attend, Infer, Repeat" paper (Eslami et al., 2016, arXiv:1603.08575).
Stars: ✭ 44 (+83.33%)
Mutual labels:  rnn
Predicting-Next-Character-using-RNN
Uses RNN on the Nietzsche dataset
Stars: ✭ 15 (-37.5%)
Mutual labels:  rnn
keras-utility-layer-collection
Collection of custom layers and utility functions for Keras which are missing in the main framework.
Stars: ✭ 63 (+162.5%)
Mutual labels:  rnn
TCN-TF
TensorFlow Implementation of TCN (Temporal Convolutional Networks)
Stars: ✭ 107 (+345.83%)
Mutual labels:  rnn
rnn-theano
RNN(LSTM, GRU) in Theano with mini-batch training; character-level language models in Theano
Stars: ✭ 68 (+183.33%)
Mutual labels:  rnn
ACT
Alternative approach for Adaptive Computation Time in TensorFlow
Stars: ✭ 16 (-33.33%)
Mutual labels:  rnn
presidential-rnn
Project 4 for Metis bootcamp. Objective was generation of character-level RNN trained on Donald Trump's statements using Keras. Also generated Markov chains, and quick pyTorch RNN as baseline. Attempted semi-supervised GAN, but was unable to test in time.
Stars: ✭ 26 (+8.33%)
Mutual labels:  rnn
char-rnn
medium.com/@jctestud/yet-another-text-generation-project-5cfb59b26255
Stars: ✭ 20 (-16.67%)
Mutual labels:  char-rnn
machine learning
机器学习、深度学习、NLP实战项目
Stars: ✭ 123 (+412.5%)
Mutual labels:  rnn
Motor-Imagery-Tasks-Classification-using-EEG-data
Implementation of Deep Neural Networks in Keras and Tensorflow to classify motor imagery tasks using EEG data
Stars: ✭ 67 (+179.17%)
Mutual labels:  rnn
DiseaseClassifier
Using a Naive Bayes Classifier gets possible diseases from symptoms
Stars: ✭ 23 (-4.17%)
Mutual labels:  naive-bayes-classifier
MetaTraderForecast
RNN based Forecasting App for Meta Trader and similar trading platforms
Stars: ✭ 103 (+329.17%)
Mutual labels:  rnn

Name2Gender

Using character sequences in first names to predict gender. This is a quick exploration into the interesting problem; see my Medium post where I elaborate on why it is interesting https://medium.com/@ellisbrown/name2gender-introduction-626d89378fb0.

I have implemented a Naïve-Bayes approach and an Char-RNN approach, which are contained in their respective subdirectories.

Table of Contents

Naïve-Bayes /naive_bayes

In this approach, I defined features of first names (last two letters, count of vowels, etc.) to use to learn the genders. I explain this in more detail here in my blog post and in the /naive_bayes subdirectory.

Char-RNN /rnn

In this second approach, I feed characters in a name one by one through a character level recurrent neural network built in PyTorch in the hopes of learning the latent space of all character sequences that denote gender without having to define them a priori. I explain this in more detail here in my blog post in the /rnn subdirectory.

Dataset /data

I have aggregated multiple smaller datasets representing various cultures into a large dataset (~135k instances) of gender-labeled first names. See data/dataset.ipynb for further information on how I pulled it together. Note: I did not spend a ton of time going through and pruning this dataset, so it is probably not amazing or particularly clean (I would greatly appreciate any PR’s if anyone cares or has the time!).

Acknowledgement

Below are a bunch of links I found useful:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].