Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → apcode → Tensorflow_fasttext

apcode / Tensorflow_fasttext

Licence: mit

Simple embedding based text classifier inspired by fastText, implemented in tensorflow

Programming Languages

python

139335 projects - #7 most used programming language

Labels

tensorflow fasttext

Projects that are alternatives of or similar to Tensorflow fasttext

ungoliant

🕷️ The pipeline for the OSCAR corpus

Stars: ✭ 69 (-76.21%)

Mutual labels: fasttext

ticket-tagger

Machine learning driven issue classification bot.

Stars: ✭ 24 (-91.72%)

Mutual labels: fasttext

extremeText

Library for fast text representation and extreme classification.

Stars: ✭ 141 (-51.38%)

Mutual labels: fasttext

german-sentiment

A data set and model for german sentiment classification.

Stars: ✭ 37 (-87.24%)

Mutual labels: fasttext

goclassy

An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.

Stars: ✭ 81 (-72.07%)

Mutual labels: fasttext

spacy-fastlang

Language detection using Spacy and Fasttext

Stars: ✭ 34 (-88.28%)

Mutual labels: fasttext

actions-suggest-related-links

A GitHub Action to suggest related or similar issues, documents, and links. Based on the power of NLP and fastText.

Stars: ✭ 23 (-92.07%)

Mutual labels: fasttext

fastText1607

Unofficial Implementation of "Bag of Tricks for Efficient Text Classification", 2016, Armand Joulin et al. (https://arxiv.org/pdf/1607.01759.pdf)

Stars: ✭ 20 (-93.1%)

Mutual labels: fasttext

FastText.NetWrapper

.NET Standard wrapper for fastText library. Now works on Windows, Linux and MacOs!

Stars: ✭ 57 (-80.34%)

Mutual labels: fasttext

Base-On-Relation-Method-Extract-News-DA-RNN-Model-For-Stock-Prediction--Pytorch

基於關聯式新聞提取方法之雙階段注意力機制模型用於股票預測

Stars: ✭ 33 (-88.62%)

Mutual labels: fasttext

NLP-paper

🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/

Stars: ✭ 23 (-92.07%)

Mutual labels: fasttext

compress-fasttext

Tools for shrinking fastText models (in gensim format)

Stars: ✭ 124 (-57.24%)

Mutual labels: fasttext

fasttext-server

Flask web server to serve supervised models trained with FastText.

Stars: ✭ 25 (-91.38%)

Mutual labels: fasttext

fasttext-serverless

Serverless hashtag recommendations using fastText and Python with AWS Lambda

Stars: ✭ 20 (-93.1%)

Mutual labels: fasttext

word embedding

Sample code for training Word2Vec and FastText using wiki corpus and their pretrained word embedding..

Stars: ✭ 21 (-92.76%)

Mutual labels: fasttext

fasttext-serving

Serve your fastText models for text classification and word vectors

Stars: ✭ 21 (-92.76%)

Mutual labels: fasttext

nlpbuddy

A text analysis application for performing common NLP tasks through a web dashboard interface and an API

Stars: ✭ 115 (-60.34%)

Mutual labels: fasttext

node-fasttext

Nodejs binding for fasttext representation and classification.

Stars: ✭ 39 (-86.55%)

Mutual labels: fasttext

Persian-Sentiment-Analyzer

Persian sentiment analysis ( آناکاوی سهش های فارسی | تحلیل احساسات فارسی )

Stars: ✭ 30 (-89.66%)

Mutual labels: fasttext

Embedding

Embedding模型代码和学习笔记总结

Stars: ✭ 25 (-91.38%)

Mutual labels: fasttext

View All Similar Projects ➔

FastText in Tensorflow

This project is based on the ideas in Facebook's FastText but implemented in Tensorflow. However, it is not an exact replica of fastText.

Classification is done by embedding each word, taking the mean embedding over the full text and classifying that using a linear classifier. The embedding is trained with the classifier. You can also specify to use 2+ character ngrams. These ngrams get hashed then embedded in a similar manner to the orginal words. Note, ngrams make training much slower but only make marginal improvements in performance, at least in English.

I may implement skipgram and cbow training later. Or preloading embedding tables.

<< Still WIP >>

You can use Horovod to distribute training across multiple GPUs, on one or multiple servers. See usage section below.

FastText Language Identification

I have added utilities to train a classifier to detect languages, as described in Fast and Accurate Language Identification using FastText

See usage below. It basically works in the same way as default usage.

Implemented:

classification of text using word embeddings
char ngrams, hashed to n bins
training and prediction program
serve models on tensorflow serving
preprocess facebook format, or text input into tensorflow records

Not Implemented:

separate word vector training (though can export embeddings)
heirarchical softmax.
quantize models (supported by tensorflow, but I haven't tried it yet)

Usage

The following are examples of how to use the applications. Get full help with --help option on any of the programs.

To transform input data into tensorflow Example format:

process_input.py --facebook_input=queries.txt --output_dir=. --ngrams=2,3,4

Or, using a text file with one example per line with an extra file for labels:

process_input.py --text_input=queries.txt --labels=labels.txt --output_dir=.

To train a text classifier:

classifier.py \
  --train_records=queries.tfrecords \
  --eval_records=queries.tfrecords \
  --label_file=labels.txt \
  --vocab_file=vocab.txt \
  --model_dir=model \
  --export_dir=model

To predict classifications for text, use a saved_model from classifier. classifier.py --export_dir stores a saved model in a numbered directory below export_dir. Pass this directory to the following to use that model for predictions:

predictor.py
  --saved_model=model/12345678
  --text="some text to classify"
  --signature_def=proba

To export the embedding layer you can export from predictor. Note, this will only be the text embedding, not the ngram embeddings.

predictor.py
  --saved_model=model/12345678
  --text="some text to classify"
  --signature_def=embedding

Use the provided script to train easily:

train_classifier.sh path-to-data-directory

Language Identification

To implement something similar to the method described in Fast and Accurate Language Identification using FastText you need to download the data:

lang_dataset.sh [datadir]

You can then process the training and validation data using process_input.py and classifier.py as described above.

There is a utility script to do this for you:

train_langdetect.sh datadir

It reaches about 96% accuracy using word embeddings and this increases to nearly 99% when adding --ngrams=2,3,4

Distributed Training

You can run training across multiple GPUs either on one or multiple servers. To do so you need to install MPI and Horovod then add the --horovod option. It runs very close to the GPU multiple in terms of performance. I.e. if you have 2 GPUs on your server, it should run close to 2x the speed.

NUM_GPUS=2
mpirun -np $NUM_GPUS python classifier.py \
  --horovod \
  --train_records=queries.tfrecords \
  --eval_records=queries.tfrecords \
  --label_file=labels.txt \
  --vocab_file=vocab.txt \
  --model_dir=model \
  --export_dir=model

The training script has this option added: train_classifier.sh.

Tensorflow Serving

As well as using predictor.py to run a saved model to provide predictions, it is easy to serve a saved model using Tensorflow Serving with a client server setup. There is a supplied simple rpc client (predictor_client.py) that provides predictions by using tensorflow server.

First make sure you install the tensorflow serving binaries. Instructions are here.

You then serve the latest saved model by supplying the base export directory where you exported saved models to. This directory will contain the numbered model directories:

tensorflow_model_server --port=9000 --model_base_path=model

Now you can make requests to the server using gRPC calls. An example simple client is provided in predictor_client.py:

predictor_client.py --text="Some text to classify"

Facebook Examples

<< NOT IMPLEMENTED YET >>

You can compare with Facebook's fastText by running similar examples to what's provided in their repository.

./classification_example.sh
./classification_results.sh

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 290

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (9) 🔗