Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → Reasonence → Lingo

Reasonence / Lingo

Licence: other

Infer the gender of an individual based on their name.

Programming Languages

139335 projects - #7 most used programming language

Labels

machine-learning statistics bayesian-inference

Projects that are alternatives of or similar to Lingo

UAI 2015. Kernel-based just-in-time learning for expectation propagation

Stars: ✭ 16 (+6.67%)

Mutual labels: bayesian-inference

Stanマニュアルの日本語への翻訳プロジェクト

Stars: ✭ 53 (+253.33%)

Mutual labels: bayesian-inference

artificial neural networks

A collection of Methods and Models for various architectures of Artificial Neural Networks

Stars: ✭ 40 (+166.67%)

Mutual labels: bayesian-inference

TrendinessOfTrends

The Trendiness of Trends

Stars: ✭ 14 (-6.67%)

Mutual labels: bayesian-inference

DynamicHMCExamples.jl

Examples for Bayesian inference using DynamicHMC.jl and related packages.

Stars: ✭ 33 (+120%)

Mutual labels: bayesian-inference

NestedSamplers.jl

Implementations of single and multi-ellipsoid nested sampling

Stars: ✭ 32 (+113.33%)

Mutual labels: bayesian-inference

Pure julia implementation of Multiple Affine Invariant Sampling for efficient Approximate Bayesian Computation

Stars: ✭ 28 (+86.67%)

Mutual labels: bayesian-inference

Decision Analysis Course

🎓 Uni-Bonn Decision Analysis graduate course, lectures and materials

Stars: ✭ 17 (+13.33%)

Mutual labels: bayesian-inference

Parallel nested sampling

Stars: ✭ 21 (+40%)

Mutual labels: bayesian-inference

bayesian-stats-with-R

Material for a workshop on Bayesian stats with R

Stars: ✭ 55 (+266.67%)

Mutual labels: bayesian-inference

LogDensityProblems.jl

A common framework for implementing and using log densities for inference.

Stars: ✭ 26 (+73.33%)

Mutual labels: bayesian-inference

Julia package for automatic Bayesian inference on a factor graph with reactive message passing

Stars: ✭ 58 (+286.67%)

Mutual labels: bayesian-inference

autoencoders tensorflow

Automatic feature engineering using deep learning and Bayesian inference using TensorFlow.

Stars: ✭ 66 (+340%)

Mutual labels: bayesian-inference

A Latent Dirichlet Allocation implementation in Python.

Stars: ✭ 51 (+240%)

Mutual labels: bayesian-inference

Deodorant: Solving the problems of Bayesian Optimization

Stars: ✭ 15 (+0%)

Mutual labels: bayesian-inference

Bayesian tools for fitting molecular mechanics torsion parameters to quantum chemical data.

Stars: ✭ 15 (+0%)

Mutual labels: bayesian-inference

Bayesian inference for Gaussian mixture model with some novel algorithms

Stars: ✭ 51 (+240%)

Mutual labels: bayesian-inference

Implementation of normalising flows and constrained random variable transformations

Stars: ✭ 131 (+773.33%)

Mutual labels: bayesian-inference

Probabilistic Programming with Gaussian processes in Julia

Stars: ✭ 318 (+2020%)

Mutual labels: bayesian-inference

Density estimation likelihood-free inference. No longer actively developed see https://github.com/mackelab/sbi instead

Stars: ✭ 66 (+340%)

Mutual labels: bayesian-inference

View All Similar Projects ➔

Lingo

An experimental project that seeks to infer the gender of a person based on their name.

Requirements

1 GB RAM
Python 3.6 and above

Usage

Make sure you issue these commands while in the directory.

You must first train the model with the following command. This will read the file data/training.txt and save the trained model as json in training.json

python3 learn.py

then in order to use, run the file Lingo.py. You will be greeted with a Name: prompt as soon as the training data is loaded into memory.

python3 Lingo.py

TODO

How It Works

TL;DR: BAYES THEOREM.

At both training and use time, each name is divided into about 300 components called 'metrics'. A few metrics include:

Letter pairs. For example: adnan is split into ad, dn, na ...
Letter triplets. For example: adnan is split into adn, dna, nan ...
Pairs and Triplets with offset from the end of the name like: 0:an, 1:na or 0:nan, 1:dna
Singular letters with offsets. (0:n, 1:a, 2:n ...)

Each letter is also represented phonetically in multiple different ways for example a can be GutturalVowel, LongGutturalVowel, LongVowel, LongGuttural, Vowel, Guttural, Long (See phonetics.py a list of representations of each letter).

These phonetic attributes are taken from the Bengali Alphabet page on Wikipedia by matching up each english letter to the fitting phonetic doppleganger in the Bengali language.

Afterwards all the combinations that can occur between the two(or three) lists of phonetic representations of the two(or three) letters in a pair(or triplet) is found and used as a metric. Examples: GutturalVowel-LabialConsonant, Long-LabialAspiratedGenericConsonant-GutturalUnaspirated, Vowel-Consonant-Aspirated

The combinations mentioned above is combined with the offset from the end of the name again to create yet another set of metrics. Example: 0:GutturalVowel-LabialConsonant. These two processes account for the meat of the metrics and is what gives the model the high accuracy achieved.

Note: Internally Lingo uses single letter short hands for traits like Vowel is just v and etc, making the actual metrics look similar to: 0:xwe-fiu

Training

When learning all the about 300 metrics that each name results in are tallied up and stored in the training file for later use. The count of the number of male or female names found is also tallied for later use in Bayesian Inference.

Inferencing

When making an inference, Lingo creates two buckets in memory the female bucket and male bucket. Then all the mtrics for the anme are found out again using the methods above.

Finally the tally for each metric is run though a bayes probability function multiplied by a weight based on offset and metric type and added to the bucket.

metrics that pretain to the ends of names are given higher weights than other metrics
phonetic trait based metric is given precedence over character based metrics.

If the percentage difference in the levels in each bucket is higher than 15% an inference is made. Otherwise the name is considered to be Unisex.

Accuracy

We trained the model on 32 thousand names and checked it against 3,200 names to come to the conclusion that the model is 91% accurate. In order to run this statistic, execute the file checker.py. Should tell you the correct and incorrect percentage soon enough.

python3 checker.py

License

MIT.

Made With ♥ By


	Samiha Tahsin [email protected]		Omran Jamal [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 15

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗