All Projects → AckClinkz → ShortText-Fasttext

AckClinkz / ShortText-Fasttext

Licence: other
ShortText classification

Programming Languages

C++
36643 projects - #6 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to ShortText-Fasttext

fake-news
This is a further development of the kdnuggets article on fake news classification by George McIntyre
Stars: ✭ 15 (+25%)
Mutual labels:  nlp-machine-learning
deep-semantic-code-search
Deep Semantic Code Search aims to explore a joint embedding space for code and description vectors and then use it for a code search application
Stars: ✭ 63 (+425%)
Mutual labels:  nlp-machine-learning
anuvada
Interpretable Models for NLP using PyTorch
Stars: ✭ 102 (+750%)
Mutual labels:  nlp-machine-learning
Sumrized
Automatic Text Summarization (English/Arabic).
Stars: ✭ 37 (+208.33%)
Mutual labels:  nlp-machine-learning
DeepLearningReading
Deep Learning and Machine Learning mini-projects. Current Project: Deepmind Attentive Reader (rc-data)
Stars: ✭ 78 (+550%)
Mutual labels:  nlp-machine-learning
Engine
The Centrifuge process, filter and saves the relevant documents as recommendations to the relevant users
Stars: ✭ 20 (+66.67%)
Mutual labels:  nlp-machine-learning
SENet-for-Weakly-Supervised-Relation-Extraction
No description or website provided.
Stars: ✭ 39 (+225%)
Mutual labels:  nlp-machine-learning
Naive-Bayes-Evening-Workshop
Companion code for Introduction to Python for Data Science: Coding the Naive Bayes Algorithm evening workshop
Stars: ✭ 23 (+91.67%)
Mutual labels:  nlp-machine-learning
Machine-Learning-Models
In This repository I made some simple to complex methods in machine learning. Here I try to build template style code.
Stars: ✭ 30 (+150%)
Mutual labels:  nlp-machine-learning
Very-deep-cnn-tensorflow
Very deep CNN for text classification
Stars: ✭ 18 (+50%)
Mutual labels:  nlp-machine-learning
topic modelling financial news
Topic modelling on financial news with Natural Language Processing
Stars: ✭ 51 (+325%)
Mutual labels:  nlp-machine-learning
Entity Embedding
Reference implementation of the paper "Word Embeddings for Entity-annotated Texts"
Stars: ✭ 19 (+58.33%)
Mutual labels:  nlp-machine-learning
AI-Sentiment-Analysis-on-IMDB-Dataset
Sentiment Analysis using Stochastic Gradient Descent on 50,000 Movie Reviews Compiled from the IMDB Dataset
Stars: ✭ 55 (+358.33%)
Mutual labels:  nlp-machine-learning
scicle-stopclickbait
Userscript that changes Clickbait headlines by headlines more honest to the news it links to.
Stars: ✭ 16 (+33.33%)
Mutual labels:  nlp-machine-learning
Multi-Type-TD-TSR
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:
Stars: ✭ 174 (+1350%)
Mutual labels:  nlp-machine-learning
NLP-Flask-Website
A simple Flask website for all NLP tasks which includes Text Preprocessing, Keyword Extraction, Text Summarization etc. Created Date: 30 Jan 2019
Stars: ✭ 43 (+258.33%)
Mutual labels:  nlp-machine-learning
Deception-Detection-on-Amazon-reviews-dataset
A SVM model that classifies the reviews as real or fake. Used both the review text and the additional features contained in the data set to build a model that predicted with over 85% accuracy without using any deep learning techniques.
Stars: ✭ 42 (+250%)
Mutual labels:  nlp-machine-learning
kex
Kex is a python library for unsupervised keyword extraction from a document, providing an easy interface and benchmarks on 15 public datasets.
Stars: ✭ 46 (+283.33%)
Mutual labels:  nlp-machine-learning
brand-sentiment-analysis
Scripts utilizing Heartex platform to build brand sentiment analysis from the news
Stars: ✭ 21 (+75%)
Mutual labels:  nlp-machine-learning
Quora QuestionPairs DL
Kaggle Competition: Using deep learning to solve quora's question pairs problem
Stars: ✭ 54 (+350%)
Mutual labels:  nlp-machine-learning

ShortText-Fasttext

ShortText classification is an intractability problem in Machine Learning. Based on Facebook's Fasttext, this project optimizes its performance in ShortText and introduces a mechanism named category dropout. This mechanism works very well in Weibo's ads classification. Details about category dropout can read Blog

Building

git clone https://github.com/AckClinkz/ShortText-Fasttext
cd ShortText-Fasttext
make

Usage

This section assumes that you are somewhat familiar with Fasttext. You can try to learn the basics of Fasttext from officical website.

Category Dropout only supports mode of supervised. Train model with category dropout, just set -cate_dropout.

In pratice, setting minCount can improve model performance. But it will lost infomation in low-frequency word. Settting vocabulary is an efficient method to balance them. If word in vocabulary, don't consider minCount.

Summarize the above points ShortText-Fasttext's usage as follows:

fasttext supervised \
  -input your_input.txt \
  -output your_model \
  -dim 120 \
  -lr 0.3 \
  -wordNgrams 3 \
  -minCount 30 \
  -bucket 10000000 \
  -epoch 100 \
  -thread 4 \
  -cate_dropout \
  -vocab ./your_vocabulary.txt

Updated 2018.11.6

According to 《Item2Vec: Neural Item Embedding for Collaborative Filtering 》, realize item2vec, usage as follow:

fasttext skipgram_item2vec \
  -input ./your_input.txt \
  -output "./your_model" \
  -dim 120 \
  -lr 0.01 \
  -wordNgrams 1 \
  -minCount 2 \
  -bucket 1000000 \
  -epoch 5 \
  -thread 8
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].