Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → polywock → Text2gender

polywock / Text2gender

Predict the author's gender from their text.

Programming Languages

139335 projects - #7 most used programming language

Labels

machine-learning neural-network text-classification

Projects that are alternatives of or similar to Text2gender

Textclassification Keras

Text classification models implemented in Keras, including: FastText, TextCNN, TextRNN, TextBiRNN, TextAttBiRNN, HAN, RCNN, RCNNVariant, etc.

Stars: ✭ 621 (+4335.71%)

Mutual labels: text-classification

Tf Rnn Attention

Tensorflow implementation of attention mechanism for text classification tasks.

Stars: ✭ 735 (+5150%)

Mutual labels: text-classification

Data augmentation for NLP, presented at EMNLP 2019

Stars: ✭ 902 (+6342.86%)

Mutual labels: text-classification

A tool for learning vector representations of words and entities from Wikipedia

Stars: ✭ 655 (+4578.57%)

Mutual labels: text-classification

Text Classification Pytorch

Text classification using deep learning models in Pytorch

Stars: ✭ 683 (+4778.57%)

Mutual labels: text-classification

Nlp In Practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Stars: ✭ 790 (+5542.86%)

Mutual labels: text-classification

Multi Class Text Classification Cnn Rnn

Classify Kaggle San Francisco Crime Description into 39 classes. Build the model with CNN, RNN (GRU and LSTM) and Word Embeddings on Tensorflow.

Stars: ✭ 570 (+3971.43%)

Mutual labels: text-classification

Nlp tensorflow project

Use tensorflow to achieve some NLP project, eg: classification chatbot ner attention QAetc.

Stars: ✭ 27 (+92.86%)

Mutual labels: text-classification

Nlp chinese corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

Stars: ✭ 6,656 (+47442.86%)

Mutual labels: text-classification

Text Mining in Python

Stars: ✭ 18 (+28.57%)

Mutual labels: text-classification

Eda nlp for chinese

An implement of the paper of EDA for Chinese corpus.中文语料的EDA数据增强工具。NLP数据增强。论文阅读笔记。

Stars: ✭ 660 (+4614.29%)

Mutual labels: text-classification

Text Classification

Implementation of papers for text classification task on DBpedia

Stars: ✭ 682 (+4771.43%)

Mutual labels: text-classification

基于金融-司法领域(兼有闲聊性质)的聊天机器人，其中的主要模块有信息抽取、NLU、NLG、知识图谱等，并且利用Django整合了前端展示,目前已经封装了nlp和kg的restful接口

Stars: ✭ 791 (+5550%)

Mutual labels: text-classification

Natural Language Processing Best Practices & Examples

Stars: ✭ 5,783 (+41207.14%)

Mutual labels: text-classification

Concise Ipython Notebooks For Deep Learning

Ipython Notebooks for solving problems like classification, segmentation, generation using latest Deep learning algorithms on different publicly available text and image data-sets.

Stars: ✭ 23 (+64.29%)

Mutual labels: text-classification

A Modern C++ Data Sciences Toolkit

Stars: ✭ 600 (+4185.71%)

Mutual labels: text-classification

基于Pytorch和torchtext的自然语言处理深度学习框架。

Stars: ✭ 739 (+5178.57%)

Mutual labels: text-classification

Text classification

all kinds of text classification models and more with deep learning

Stars: ✭ 7,179 (+51178.57%)

Mutual labels: text-classification

Bert language understanding

Pre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN

Stars: ✭ 933 (+6564.29%)

Mutual labels: text-classification

Text Classification Benchmark

文本分类基准测试

Stars: ✭ 18 (+28.57%)

Mutual labels: text-classification

View All Similar Projects ➔

Author gender classification from text.

Use at own risk, not well supported/documented project.

Trained on Reddit posts from r/AskMen and r/AskWomen. If I can say so myself, a clever, but abeit lazy way to get labelled data. Training was done on posts directly from those two subreddits, but this introduces its own set of biases. Maybe women who post on r/AskWomen write in a unique style inside of the subreddit, but not outside of it. To rectify this, you could instead find "women" users from the r/AskWomen, but look at their posts outside of r/AskWomen. Ideally, in a subreddit both men and women visit like r/AskReddit.

The accuracy rate must be further investigated for real world data.

length	accuracy	examples
< 250	67.56%	48481
200 to 500	66.02%	30715
500 to 1000	69.22%	13600
1000 to 2000	72.99%	3654
> 2000	76.96%	599
-	-	-
male below 250	65.98%	23527
male 200 to 500	65.2%	15275
male 500 to 1000	66.51%	6346
male 1000 to 2000	69.99%	1656
male above 2000	73.08%	286
-	-	-
female below 250	69.06%	24954
female 200 to 500	66.83%	15440
female 500 to 1000	71.59%	7254
female 1000 to 2000	75.48%	1998
female above 2000	80.51%	313

Use

Install pipenv and learn how to use it.
Download required dependencies

pipenv install
Install required NLTK data.

pipenv run python3 -m textblob.download_corpora lite
Predict gender from piping in a text file. This should print out a 0 to 1 value. Male if above 0.5, otherwise female.

cat some_text.txt | pipenv run python3 predict.py

Train your own model (not required).

Install required developer dependencies. (also ensure you have sqlite3 installed)

pipenv install --dev
Install required NLTK data.

pipenv run python3 -m textblob.download_corpora lite
pipenv run python3 download.py to download Reddit posts using the PushShift API. This goes on forever until your interrupt the process. I recommend around ~200k posts. The posts are saved to data.db using sqlite3 under a "posts" table.
Run pipenv run python3 transform.py to transform the posts into training data. Output will be stored in data.db under the examples table.
Run pipenv run python3 generate_model.py to train and test the model. The model weights will be saved to data/model_weights.json and data/model_biases.json.
Predict gender by piping in a text file. cat some_text.txt | pipenv run python3 predict.py

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 14

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗