All Projects → kensk8er → Chicksexer

kensk8er / Chicksexer

Licence: mit
A Python package for gender classification.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Chicksexer

Pytorch Pos Tagging
A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.
Stars: ✭ 96 (+50%)
Mutual labels:  natural-language-processing, lstm, recurrent-neural-networks
Sangita
A Natural Language Toolkit for Indian Languages
Stars: ✭ 43 (-32.81%)
Mutual labels:  natural-language-processing, lstm, recurrent-neural-networks
Pytorch Sentiment Analysis
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Stars: ✭ 3,209 (+4914.06%)
Mutual labels:  natural-language-processing, lstm, recurrent-neural-networks
Multitask sentiment analysis
Multitask Deep Learning for Sentiment Analysis using Character-Level Language Model, Bi-LSTMs for POS Tag, Chunking and Unsupervised Dependency Parsing. Inspired by this great article https://arxiv.org/abs/1611.01587
Stars: ✭ 93 (+45.31%)
Mutual labels:  natural-language-processing, lstm, recurrent-neural-networks
Ner Lstm
Named Entity Recognition using multilayered bidirectional LSTM
Stars: ✭ 532 (+731.25%)
Mutual labels:  natural-language-processing, lstm, recurrent-neural-networks
Biolitmap
Code for the paper "BIOLITMAP: a web-based geolocated and temporal visualization of the evolution of bioinformatics publications" in Oxford Bioinformatics.
Stars: ✭ 18 (-71.87%)
Mutual labels:  data-science, natural-language-processing
Chainer Rnn Ner
Named Entity Recognition with RNN, implemented by Chainer
Stars: ✭ 19 (-70.31%)
Mutual labels:  lstm, recurrent-neural-networks
Spago
Self-contained Machine Learning and Natural Language Processing library in Go
Stars: ✭ 854 (+1234.38%)
Mutual labels:  natural-language-processing, lstm
Freeml
A List of Data Science/Machine Learning Resources (Mostly Free)
Stars: ✭ 974 (+1421.88%)
Mutual labels:  data-science, natural-language-processing
Keras Attention
Visualizing RNNs using the attention mechanism
Stars: ✭ 697 (+989.06%)
Mutual labels:  natural-language-processing, recurrent-neural-networks
Named Entity Recognition
name entity recognition with recurrent neural network(RNN) in tensorflow
Stars: ✭ 20 (-68.75%)
Mutual labels:  natural-language-processing, recurrent-neural-networks
Reading comprehension tf
Machine Reading Comprehension in Tensorflow
Stars: ✭ 37 (-42.19%)
Mutual labels:  natural-language-processing, recurrent-neural-networks
Deep Learning Time Series
List of papers, code and experiments using deep learning for time series forecasting
Stars: ✭ 796 (+1143.75%)
Mutual labels:  lstm, recurrent-neural-networks
Coursera
Quiz & Assignment of Coursera
Stars: ✭ 774 (+1109.38%)
Mutual labels:  data-science, natural-language-processing
Sentiment Analysis Nltk Ml Lstm
Sentiment Analysis on the First Republic Party debate in 2016 based on Python,NLTK and ML.
Stars: ✭ 61 (-4.69%)
Mutual labels:  lstm, recurrent-neural-networks
Machine learning examples
A collection of machine learning examples and tutorials.
Stars: ✭ 6,466 (+10003.13%)
Mutual labels:  data-science, natural-language-processing
Lstmvis
Visualization Toolbox for Long Short Term Memory networks (LSTMs)
Stars: ✭ 959 (+1398.44%)
Mutual labels:  lstm, recurrent-neural-networks
Char Rnn Keras
TensorFlow implementation of multi-layer recurrent neural networks for training and sampling from texts
Stars: ✭ 40 (-37.5%)
Mutual labels:  lstm, recurrent-neural-networks
Gdax Orderbook Ml
Application of machine learning to the Coinbase (GDAX) orderbook
Stars: ✭ 60 (-6.25%)
Mutual labels:  lstm, recurrent-neural-networks
Tensorflow Lstm Sin
TensorFlow 1.3 experiment with LSTM (and GRU) RNNs for sine prediction
Stars: ✭ 52 (-18.75%)
Mutual labels:  lstm, recurrent-neural-networks

chicksexer - Python package for gender classification

Chicksexer

chicksexer is a Python package that performs gender classification. It receives a string of person name and returns the probability estimate of its gender as follows:

>>> from chicksexer import predict_gender
>>> predict_gender('John Smith')
{'female': 0.0027230381965637207, 'male': 0.9972769618034363}

Several merits of using the classifier instead of simply looking up known male/female names are:

  • Sometimes simple name lookup does not work. For instance, "Miki" is a Japanese female name, but also a Croatian male name.
  • Can predict the gender of a name that does not exist in the list of male/female names.
  • Can deal with a typo in a name relatively easily.

You can also get an estimate as a simple string as follows:

>>> predict_gender('Oliver Butterfield', return_proba=False)
'male'
>>> predict_gender('Naila Ata', return_proba=False)
'female'
>>> predict_gender('Saldivar Anderson', return_proba=False)
'neutral'
>>> predict_gender('Ponyo', return_proba=False)  # name of a character from the film
'male'
>>> predict_gender('Ponya', return_proba=False)  # modify the name such that it sounds like a female name
'female'
>>> predict_gender('Miki Suzuki', return_proba=True)  # Suzuki here is a Japanese surname so Miki is a female name
{'female': 0.9997618066990981, 'male': 0.00023819330090191215}
>>> predict_gender('Miki Adamić', return_proba=True)  # Adamić is a Croatian surname so Miki is a male name
{'female': 0.16958969831466675, 'male': 0.8304103016853333}
>>> predict_gender('Jessica')
{'female': 0.999996105068476, 'male': 3.894931523973355e-06}
>>> predict_gender('Jesssica')  # typo in Jessica
{'female': 0.9999851534785194, 'male': 1.4846521480649244e-05}

If you want to predict the gender of multiple names, use predict_genders (plural) function instead:

>>> from chicksexer import predict_genders
>>> predict_genders(['Ichiro Suzuki', 'Haruki Murakami'])
[{'female': 3.039836883544922e-05, 'male': 0.9999696016311646},
 {'female': 1.2040138244628906e-05, 'male': 0.9999879598617554}]
>>> predict_genders(['Ichiro Suzuki', 'Haruki Murakami'], return_proba=False)
['male', 'male']

Installation

  • This repository can run on Ubuntu 14.04 LTS & Mac OSX 10.x (not tested on other OSs)
  • Tested only on Python 3.5

chicksexer depends on NumPy and Scipy, Python packages for scientific computing. You might need to have them installed prior to installing chicksexer.

You can install chicksexer by:

pip install chicksexer

chicksexer also depends on tensorflow package. In default, it tries to install the CPU-only version of tensorflow. If you want to use GPU, you need to install tensorflow with GPU support by yourself. (C.f. Installing Tensorflow)

Model Architecture

The gender classifier is implemented using Character-level Multilayer LSTM. The architecture is roughly as follows:

  1. Character Embedding Layer
  2. 1st LSTM Layer
  3. 2nd LSTM Layer
  4. Pooling Layer
  5. Fully Connected Layer

The fully connected layer outputs the probability of a name bing a male name. For the details, look at _build_graph() method in chicksexer/_classifier.py, which implements the computational graph of the architecture in tensorflow.

Training Data

Names with gender annotation are obtained from the sources as follows:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].