All Projects → tolga-b → Debiaswe

tolga-b / Debiaswe

Licence: mit
Remove problematic gender bias from word embeddings.

Projects that are alternatives of or similar to Debiaswe

Glove As A Tensorflow Embedding Layer
Taking a pretrained GloVe model, and using it as a TensorFlow embedding weight layer **inside the GPU**. Therefore, you only need to send the index of the words through the GPU data transfer bus, reducing data transfer overhead.
Stars: ✭ 85 (-51.43%)
Mutual labels:  jupyter-notebook, word2vec, word-embeddings
Germanwordembeddings
Toolkit to obtain and preprocess german corpora, train models using word2vec (gensim) and evaluate them with generated testsets
Stars: ✭ 189 (+8%)
Mutual labels:  jupyter-notebook, word2vec, word-embeddings
Deep learning nlp
Keras, PyTorch, and NumPy Implementations of Deep Learning Architectures for NLP
Stars: ✭ 407 (+132.57%)
Mutual labels:  jupyter-notebook, word2vec, word-embeddings
Postgres Word2vec
utils to use word embedding like word2vec vectors in a postgres database
Stars: ✭ 96 (-45.14%)
Mutual labels:  word2vec, word-embeddings
Textclustering
Stars: ✭ 89 (-49.14%)
Mutual labels:  jupyter-notebook, word2vec
Dict2vec
Dict2vec is a framework to learn word embeddings using lexical dictionaries.
Stars: ✭ 91 (-48%)
Mutual labels:  word2vec, word-embeddings
Word2vec
訓練中文詞向量 Word2vec, Word2vec was created by a team of researchers led by Tomas Mikolov at Google.
Stars: ✭ 48 (-72.57%)
Mutual labels:  jupyter-notebook, word2vec
Awesome Embedding Models
A curated list of awesome embedding models tutorials, projects and communities.
Stars: ✭ 1,486 (+749.14%)
Mutual labels:  jupyter-notebook, word2vec
Text Summarizer
Python Framework for Extractive Text Summarization
Stars: ✭ 96 (-45.14%)
Mutual labels:  word2vec, word-embeddings
Dna2vec
dna2vec: Consistent vector representations of variable-length k-mers
Stars: ✭ 117 (-33.14%)
Mutual labels:  word2vec, word-embeddings
Fasttext.js
FastText for Node.js
Stars: ✭ 127 (-27.43%)
Mutual labels:  word2vec, word-embeddings
Hierarchical Attention Network
Implementation of Hierarchical Attention Networks in PyTorch
Stars: ✭ 120 (-31.43%)
Mutual labels:  jupyter-notebook, word2vec
Elmo Tutorial
A short tutorial on Elmo training (Pre trained, Training on new data, Incremental training)
Stars: ✭ 145 (-17.14%)
Mutual labels:  jupyter-notebook, word-embeddings
Deeplearning Nlp Models
A small, interpretable codebase containing the re-implementation of a few "deep" NLP models in PyTorch. Colab notebooks to run with GPUs. Models: word2vec, CNNs, transformer, gpt.
Stars: ✭ 64 (-63.43%)
Mutual labels:  jupyter-notebook, word2vec
Experiments
Some research experiments
Stars: ✭ 95 (-45.71%)
Mutual labels:  jupyter-notebook, word2vec
Average Word2vec
🔤 Calculate average word embeddings (word2vec) from documents for transfer learning
Stars: ✭ 52 (-70.29%)
Mutual labels:  jupyter-notebook, word-embeddings
Magnitude
A fast, efficient universal vector embedding utility package.
Stars: ✭ 1,394 (+696.57%)
Mutual labels:  word2vec, word-embeddings
Log Anomaly Detector
Log Anomaly Detection - Machine learning to detect abnormal events logs
Stars: ✭ 169 (-3.43%)
Mutual labels:  jupyter-notebook, word2vec
Word2vec Russian Novels
Inspired by word2vec-pride-vis the replacement of words of Russian most valuable novels text with closest word2vec model words. By Boris Orekhov
Stars: ✭ 39 (-77.71%)
Mutual labels:  jupyter-notebook, word2vec
Word2vec Win32
A word2vec port for Windows.
Stars: ✭ 41 (-76.57%)
Mutual labels:  word2vec, word-embeddings

Debiaswe: try to make word embeddings less sexist

🔴FAT* 2018 tutorial slides

Here we have the code and data for the following paper: Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings by Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. Proceedings of NIPS 2016.

Just looking to download a debiased embedding?

You can download binary/txt hard debiased version of the Google's Word2Vec embedding trained on Google News (Origin: GoogleNews-vectors-negative300.bin.gz found here).

Python scripts:

  • learn_gender_specific.py: given a word embedding and a seed set of gender-specific words (like king, she, etc.), it learns a much larger list of gender-specific words
  • debias.py: given a word embedding, sets of gender-pairs, gender-specific words, and pairs to equalize, it outputs a new word embedding. This version basically reads/writes word2vec binary file format.
python learn_gender_specific.py ../embeddings/GoogleNews-vectors-negative300.bin 50000 ../data/gender_specific_seed.json gender_specific_full.json
python debias.py ../embeddings/GoogleNews-vectors-negative300.bin ../data/definitional_pairs.json ../data/gender_specific_full.json ../data/equalize_pairs.json ../embeddings/GoogleNews-vectors-negative300-hard-debiased.bin

We also have seed data used to debias and crowd data used to evaluate the embeddings.

Data files:

  • gender_specific_seed.json: A list of 218 gender-specific words
  • gender_specific_full.json: A list of 1441 gender-specific words
  • definitional_pairs.json: The ten pairs of words we use to define the gender direction
  • equalize_pairs.json: Some crowdsourced F-M pairs of words that represent gender direction

🔵 This is only a partial repo at the moment. I will add more features as I get time.

(All external files that I refer within this repo can be found in this folder.)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].