All Projects → donghyeonk → fastText1607

donghyeonk / fastText1607

Licence: MIT License
Unofficial Implementation of "Bag of Tricks for Efficient Text Classification", 2016, Armand Joulin et al. (https://arxiv.org/pdf/1607.01759.pdf)

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to fastText1607

fasttext-serving
Serve your fastText models for text classification and word vectors
Stars: ✭ 21 (+5%)
Mutual labels:  fasttext
FastText.NetWrapper
.NET Standard wrapper for fastText library. Now works on Windows, Linux and MacOs!
Stars: ✭ 57 (+185%)
Mutual labels:  fasttext
Embedding
Embedding模型代码和学习笔记总结
Stars: ✭ 25 (+25%)
Mutual labels:  fasttext
fasttext-serverless
Serverless hashtag recommendations using fastText and Python with AWS Lambda
Stars: ✭ 20 (+0%)
Mutual labels:  fasttext
compress-fasttext
Tools for shrinking fastText models (in gensim format)
Stars: ✭ 124 (+520%)
Mutual labels:  fasttext
nlpbuddy
A text analysis application for performing common NLP tasks through a web dashboard interface and an API
Stars: ✭ 115 (+475%)
Mutual labels:  fasttext
fastchess
Predicts the best chess move with 27.5% accuracy by a single matrix multiplication
Stars: ✭ 75 (+275%)
Mutual labels:  fasttext
word embedding
Sample code for training Word2Vec and FastText using wiki corpus and their pretrained word embedding..
Stars: ✭ 21 (+5%)
Mutual labels:  fasttext
goclassy
An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.
Stars: ✭ 81 (+305%)
Mutual labels:  fasttext
fasttext-server
Flask web server to serve supervised models trained with FastText.
Stars: ✭ 25 (+25%)
Mutual labels:  fasttext
german-sentiment
A data set and model for german sentiment classification.
Stars: ✭ 37 (+85%)
Mutual labels:  fasttext
fasttext-serving
fastText model serving service
Stars: ✭ 54 (+170%)
Mutual labels:  fasttext
FNet-pytorch
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
Stars: ✭ 204 (+920%)
Mutual labels:  feedforward-neural-network
ungoliant
🕷️ The pipeline for the OSCAR corpus
Stars: ✭ 69 (+245%)
Mutual labels:  fasttext
Base-On-Relation-Method-Extract-News-DA-RNN-Model-For-Stock-Prediction--Pytorch
基於關聯式新聞提取方法之雙階段注意力機制模型用於股票預測
Stars: ✭ 33 (+65%)
Mutual labels:  fasttext
actions-suggest-related-links
A GitHub Action to suggest related or similar issues, documents, and links. Based on the power of NLP and fastText.
Stars: ✭ 23 (+15%)
Mutual labels:  fasttext
ticket-tagger
Machine learning driven issue classification bot.
Stars: ✭ 24 (+20%)
Mutual labels:  fasttext
Persian-Sentiment-Analyzer
Persian sentiment analysis ( آناکاوی سهش های فارسی | تحلیل احساسات فارسی )
Stars: ✭ 30 (+50%)
Mutual labels:  fasttext
extremeText
Library for fast text representation and extreme classification.
Stars: ✭ 141 (+605%)
Mutual labels:  fasttext
spacy-fastlang
Language detection using Spacy and Fasttext
Stars: ✭ 34 (+70%)
Mutual labels:  fasttext

Bag of Tricks for Efficient Text Classification, fastText

Unofficial PyTorch Implementation of "Bag of Tricks for Efficient Text Classification", 2016, A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov (https://arxiv.org/pdf/1607.01759.pdf)

  • The original model

    • fastText, h=10, bigram (See Table 1 of the paper)
  • Dataset

  • Experiment

    # Download a spacy "en_core_web_lg" model
    $ python3 -m spacy download en_core_web_lg --user
    
    # Download datasets (select your os (mac or ubuntu))
    $ sh download_datasets_mac.sh
    
    • AG
    # Create a pickle file: data/ag_news_csv/ag.pkl
    $ python3 dataset.py --data_dir ./data/ag_news_csv --pickle_name ag.pkl --num_classes 4 --max_len 467
    
    # Run
    $ python3 main.py --data_path ./data/ag_news_csv/ag.pkl --batch_size 2048 --lr 0.5 --log_interval 20
    
    • Sogou
    # Create a pickle file: data/sogou_news_csv/sogou.pkl
    $ python3 dataset.py --data_dir ./data/sogou_news_csv --pickle_name sogou.pkl --num_classes 5 --max_len 90064
    
    # Run
    $ python3 main.py --data_path ./data/sogou_news_csv/sogou.pkl --batch_size 1024 --lr 0.1 --log_interval 40
    
    • DBpedia
    # Create a pickle file: data/dbpedia_csv/dbp.pkl
    $ python3 dataset.py --data_dir ./data/dbpedia_csv --pickle_name dbp.pkl --num_classes 14 --max_len 3013
    
    # Run
    $ python3 main.py --data_path ./data/dbpedia_csv/dbp.pkl --batch_size 2048 --lr 0.1 --log_interval 20
    
    • Yelp P.
    # Create a pickle file: data/yelp_review_polarity_csv/yelp_p.pkl
    $ python3 dataset.py --data_dir ./data/yelp_review_polarity_csv --pickle_name yelp_p.pkl --num_classes 2 --max_len 2955
    
    # Run
    $ python3 main.py --data_path ./data/yelp_review_polarity_csv/yelp_p.pkl --batch_size 1024 --lr 0.1 --log_interval 40
    
    • Yelp F.
    # Create a pickle file: data/yelp_review_full_csv/yelp_f.pkl
    $ python3 dataset.py --data_dir ./data/yelp_review_full_csv --pickle_name yelp_f.pkl --num_classes 5 --max_len 2955
    
    # Run
    $ python3 main.py --data_path ./data/yelp_review_full_csv/yelp_f.pkl --batch_size 1024 --lr 0.05 --log_interval 40
    
    • Yahoo A.
    # Create a pickle file: data/yahoo_answers_csv/yahoo_a.pkl
    $ python3 dataset.py --data_dir ./data/yahoo_answers_csv --pickle_name yahoo_a.pkl --num_classes 10 --max_len 8024
    
    # Run
    $ python3 main.py --data_path ./data/yahoo_answers_csv/yahoo_a.pkl --batch_size 1024 --lr 0.05 --log_interval 40
    
    • Amazon F.
    # Create a pickle file: data/amazon_review_full_csv/amazon_f.pkl
    $ python3 dataset.py --data_dir ./data/amazon_review_full_csv --pickle_name amazon_f.pkl --num_classes 5 --max_len 1214
    
    # Run
    $ python3 main.py --data_path ./data/amazon_review_full_csv/amazon_f.pkl --batch_size 4096 --lr 0.25 --log_interval 10
    
    • Amazon P.
    # Create a pickle file: data/amazon_review_polarity_csv/amazon_p.pkl
    $ python3 dataset.py --data_dir ./data/amazon_review_polarity_csv --pickle_name amazon_p.pkl --num_classes 2 --max_len 1318
    
    # Run
    $ python3 main.py --data_path ./data/amazon_review_polarity_csv/yahoo_a.pkl --batch_size 4096 --lr 0.25 --log_interval 10
    
  • Performance (accuracy %)

    • Results may vary slightly depending on your experimental environment.
Model AG Sogou DBpedia Yelp P. Yelp F. Yahoo A. Amazon F. Amazon P.
fastText, h=10, bigram 92.5 96.8 98.6 95.7 63.9 72.3 60.2 94.6
My implementation of fastText 92.6 (Ep. 3) 97.1 (Ep. 5) 98.1 (Ep. 4) 95.7 (Ep. 1) 63.5 (Ep. 1) 72.5 (Ep. 1) 57.7 (Ep. 1) 94.3 (Ep. 1)
  • Training time for an epoch (CPU)
    • Results may vary slightly depending on your experimental environment.
fastText My implementation of fastText (Intel i7 8th gen.)
AG 1s 12s
Sogou 7s 30m
DBpedia 2s 3m
Yelp P. 3s 7m
Yelp F. 4s 8m
Yahoo A. 5s 24m
Amazon F. 9s 14m
Amazon P. 10s 15m
  • Dictionary size & data size
Dataset Size Is Hashing Trick needed? # train examples # test examples # classes
AG 1.4M No 120K 7.6K 4
Sogou 3.4M No 450K 60K 5
DBpedia 6.6M No 560K 70K 14
Yelp P. 6.4M No 560K 38K 2
Yelp F. 7.1M No 650K 50K 5
Yahoo A. 17.9M Yes 1.4M 60K 10
Amazon F. 21.7M Yes 3M 650K 5
Amazon P. 24.3M Yes 3.6M 400K 2
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].