All Projects → Reasonence → Lingo

Reasonence / Lingo

Licence: other
Infer the gender of an individual based on their name.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Lingo

kernel-ep
UAI 2015. Kernel-based just-in-time learning for expectation propagation
Stars: ✭ 16 (+6.67%)
Mutual labels:  bayesian-inference
stan-ja
Stanマニュアルの日本語への翻訳プロジェクト
Stars: ✭ 53 (+253.33%)
Mutual labels:  bayesian-inference
artificial neural networks
A collection of Methods and Models for various architectures of Artificial Neural Networks
Stars: ✭ 40 (+166.67%)
Mutual labels:  bayesian-inference
TrendinessOfTrends
The Trendiness of Trends
Stars: ✭ 14 (-6.67%)
Mutual labels:  bayesian-inference
DynamicHMCExamples.jl
Examples for Bayesian inference using DynamicHMC.jl and related packages.
Stars: ✭ 33 (+120%)
Mutual labels:  bayesian-inference
NestedSamplers.jl
Implementations of single and multi-ellipsoid nested sampling
Stars: ✭ 32 (+113.33%)
Mutual labels:  bayesian-inference
KissABC.jl
Pure julia implementation of Multiple Affine Invariant Sampling for efficient Approximate Bayesian Computation
Stars: ✭ 28 (+86.67%)
Mutual labels:  bayesian-inference
Decision Analysis Course
🎓 Uni-Bonn Decision Analysis graduate course, lectures and materials
Stars: ✭ 17 (+13.33%)
Mutual labels:  bayesian-inference
cpnest
Parallel nested sampling
Stars: ✭ 21 (+40%)
Mutual labels:  bayesian-inference
bayesian-stats-with-R
Material for a workshop on Bayesian stats with R
Stars: ✭ 55 (+266.67%)
Mutual labels:  bayesian-inference
LogDensityProblems.jl
A common framework for implementing and using log densities for inference.
Stars: ✭ 26 (+73.33%)
Mutual labels:  bayesian-inference
ReactiveMP.jl
Julia package for automatic Bayesian inference on a factor graph with reactive message passing
Stars: ✭ 58 (+286.67%)
Mutual labels:  bayesian-inference
autoencoders tensorflow
Automatic feature engineering using deep learning and Bayesian inference using TensorFlow.
Stars: ✭ 66 (+340%)
Mutual labels:  bayesian-inference
PyLDA
A Latent Dirichlet Allocation implementation in Python.
Stars: ✭ 51 (+240%)
Mutual labels:  bayesian-inference
deodorant
Deodorant: Solving the problems of Bayesian Optimization
Stars: ✭ 15 (+0%)
Mutual labels:  bayesian-inference
torsionfit
Bayesian tools for fitting molecular mechanics torsion parameters to quantum chemical data.
Stars: ✭ 15 (+0%)
Mutual labels:  bayesian-inference
PyBGMM
Bayesian inference for Gaussian mixture model with some novel algorithms
Stars: ✭ 51 (+240%)
Mutual labels:  bayesian-inference
Bijectors.jl
Implementation of normalising flows and constrained random variable transformations
Stars: ✭ 131 (+773.33%)
Mutual labels:  bayesian-inference
Stheno.jl
Probabilistic Programming with Gaussian processes in Julia
Stars: ✭ 318 (+2020%)
Mutual labels:  bayesian-inference
delfi
Density estimation likelihood-free inference. No longer actively developed see https://github.com/mackelab/sbi instead
Stars: ✭ 66 (+340%)
Mutual labels:  bayesian-inference

Lingo

An experimental project that seeks to infer the gender of a person based on their name.

Terminal

Requirements

  • 1 GB RAM
  • Python 3.6 and above

Usage

Make sure you issue these commands while in the directory.

You must first train the model with the following command. This will read the file data/training.txt and save the trained model as json in training.json

python3 learn.py

then in order to use, run the file Lingo.py. You will be greeted with a Name: prompt as soon as the training data is loaded into memory.

python3 Lingo.py

TODO

  • REST API Mode
  • Proper CLI
  • Silent IPC mode
  • Speed Enhancements
  • Unit Tests

How It Works

TL;DR: BAYES THEOREM.

At both training and use time, each name is divided into about 300 components called 'metrics'. A few metrics include:

  • Letter pairs. For example: adnan is split into ad, dn, na ...
  • Letter triplets. For example: adnan is split into adn, dna, nan ...
  • Pairs and Triplets with offset from the end of the name like: 0:an, 1:na or 0:nan, 1:dna
  • Singular letters with offsets. (0:n, 1:a, 2:n ...)

Each letter is also represented phonetically in multiple different ways for example a can be GutturalVowel, LongGutturalVowel, LongVowel, LongGuttural, Vowel, Guttural, Long (See phonetics.py a list of representations of each letter).

These phonetic attributes are taken from the Bengali Alphabet page on Wikipedia by matching up each english letter to the fitting phonetic doppleganger in the Bengali language.

Phonetic Attributes

Afterwards all the combinations that can occur between the two(or three) lists of phonetic representations of the two(or three) letters in a pair(or triplet) is found and used as a metric. Examples: GutturalVowel-LabialConsonant, Long-LabialAspiratedGenericConsonant-GutturalUnaspirated, Vowel-Consonant-Aspirated

The combinations mentioned above is combined with the offset from the end of the name again to create yet another set of metrics. Example: 0:GutturalVowel-LabialConsonant. These two processes account for the meat of the metrics and is what gives the model the high accuracy achieved.

Note: Internally Lingo uses single letter short hands for traits like Vowel is just v and etc, making the actual metrics look similar to: 0:xwe-fiu

Training

When learning all the about 300 metrics that each name results in are tallied up and stored in the training file for later use. The count of the number of male or female names found is also tallied for later use in Bayesian Inference.

Inferencing

When making an inference, Lingo creates two buckets in memory the female bucket and male bucket. Then all the mtrics for the anme are found out again using the methods above.

bayes

Finally the tally for each metric is run though a bayes probability function multiplied by a weight based on offset and metric type and added to the bucket.

  • metrics that pretain to the ends of names are given higher weights than other metrics
  • phonetic trait based metric is given precedence over character based metrics.

If the percentage difference in the levels in each bucket is higher than 15% an inference is made. Otherwise the name is considered to be Unisex.

Accuracy

We trained the model on 32 thousand names and checked it against 3,200 names to come to the conclusion that the model is 91% accurate. In order to run this statistic, execute the file checker.py. Should tell you the correct and incorrect percentage soon enough.

python3 checker.py

License

MIT.

Made With By

Samiha Tahsin
[email protected]
Omran Jamal
[email protected]
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].