All Projects → CareerVillage → fasttext-serverless

CareerVillage / fasttext-serverless

Licence: MIT license
Serverless hashtag recommendations using fastText and Python with AWS Lambda

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to fasttext-serverless

Text Classification Demos
Neural models for Text Classification in Tensorflow, such as cnn, dpcnn, fasttext, bert ...
Stars: ✭ 144 (+620%)
Mutual labels:  fasttext
Cw2vec
cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information
Stars: ✭ 224 (+1020%)
Mutual labels:  fasttext
Text Classification TF
用tf实现各种文本分类模型,并且封装restful接口,可以直接工程化
Stars: ✭ 32 (+60%)
Mutual labels:  fasttext
Fasttext4j
Implementing Facebook's FastText with java
Stars: ✭ 148 (+640%)
Mutual labels:  fasttext
Sentence Classification
Sentence Classifications with Neural Networks
Stars: ✭ 177 (+785%)
Mutual labels:  fasttext
Pytorch Sentiment Analysis
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Stars: ✭ 3,209 (+15945%)
Mutual labels:  fasttext
Whatthelang
Lightning Fast Language Prediction 🚀
Stars: ✭ 130 (+550%)
Mutual labels:  fasttext
fasttext-serving
Serve your fastText models for text classification and word vectors
Stars: ✭ 21 (+5%)
Mutual labels:  fasttext
Shallowlearn
An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.
Stars: ✭ 196 (+880%)
Mutual labels:  fasttext
Simple-Sentence-Similarity
Exploring the simple sentence similarity measurements using word embeddings
Stars: ✭ 99 (+395%)
Mutual labels:  fasttext
Embedding As Service
One-Stop Solution to encode sentence to fixed length vectors from various embedding techniques
Stars: ✭ 151 (+655%)
Mutual labels:  fasttext
Wordvectors
Pre-trained word vectors of 30+ languages
Stars: ✭ 2,043 (+10115%)
Mutual labels:  fasttext
Ai law
all kinds of baseline models for long text classificaiton( text categorization)
Stars: ✭ 243 (+1115%)
Mutual labels:  fasttext
Wordembeddings Elmo Fasttext Word2vec
Using pre trained word embeddings (Fasttext, Word2Vec)
Stars: ✭ 146 (+630%)
Mutual labels:  fasttext
fastchess
Predicts the best chess move with 27.5% accuracy by a single matrix multiplication
Stars: ✭ 75 (+275%)
Mutual labels:  fasttext
Nlp research
NLP research:基于tensorflow的nlp深度学习项目,支持文本分类/句子匹配/序列标注/文本生成 四大任务
Stars: ✭ 141 (+605%)
Mutual labels:  fasttext
Pyfasttext
Yet another Python binding for fastText
Stars: ✭ 229 (+1045%)
Mutual labels:  fasttext
ungoliant
🕷️ The pipeline for the OSCAR corpus
Stars: ✭ 69 (+245%)
Mutual labels:  fasttext
actions-suggest-related-links
A GitHub Action to suggest related or similar issues, documents, and links. Based on the power of NLP and fastText.
Stars: ✭ 23 (+15%)
Mutual labels:  fasttext
fasttextjs
JavaScript implementation of the FastText prediction algorithm
Stars: ✭ 31 (+55%)
Mutual labels:  fasttext

fasttext-serverless

Serverless hashtag recommendations using fastText and Python with AWS Lambda.

A simple HTTP POST endpoint that returns hashtag recommendations. This function requires a pre-trained fastText model. When you send a properly formatted string in the body of a POST to this endpoint, it will reply with JSON containing up to 5 topic recommendations that it believes match that string. It will also identify and return a list of hashtags that are already included in the submitted text (so you can handle collisions if you want to). While the internal function is named tagRecommendations the HTTP endpoint is exposed as recommendations.

Setup

Step 1: Clone this repo

$ git clone https://github.com/CareerVillage/fasttext-serverless/

Step 2: Install and configure Serverless
Refer to the Serverless docs [1, 2] for help.

$ npm install -g serverless
$ serverless config credentials --provider aws --key AKIAIOSFODNN7EXAMPLE --secret wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Step 3: Add your pre-trained classification model
Save your pre-trained model file in this project as /trained_models/model_standard.bin.

$ mv model_standard.bin /path/to/fasttext-serverless/trained_models/model_standard.bin

If you have not yet trained a model, refer to the fastText docs for help. You'll be looking to use the fasttext supervised command to generate the model.

Step 4: Deploy to AWS
Assuming you have properly configured Serverless to access AWS, to deploy the endpoint (with verbose logs) simply run serverless deploy. You should see something like this (I added the -v "verbose" flag to get more logging):

$ serverless deploy -v
Serverless: Packaging service...
Serverless: Excluding development dependencies...
Serverless: Uploading CloudFormation file to S3...
Serverless: Uploading artifacts...
Serverless: Uploading service .zip file to S3 (27.71 MB)...
Serverless: Validating template...
Serverless: Updating Stack...
Serverless: Checking Stack update progress...
..............
Serverless: Stack update finished...
Service Information
service: lambda
stage: dev
region: us-east-1
stack: lambda-dev
api keys:
  None
endpoints:
  POST - https://{your-subdomain-here}.execute-api.{your-region-code-here}.amazonaws.com/dev/recommendations
functions:
  tagRecommendations: lambda-dev-tagRecommendations
Serverless: Removing old service versions...

Usage

You can now send an HTTP POST request directly to the endpoint. For example using curl you might do:

curl -X POST https://{your-subdomain-here}.execute-api.{your-region-code-here}.amazonaws.com/dev/recommendations --data '{ "text": "What should I do in the evenings and weekends during high school to become a pediatrician? I want to become a doctor after college so that I can help children recover from terrible diseases and illnesses. #doctor #medicine" }'

The expected result should be similar to:

{"hashtags_already_used": "#doctor #healthcare #medicine", "hashtags_recommended": "('__label__doctor 0.662109 __label__pediatrician 0.0585938 __label__medicine 0.015625 __label__pre-med 0.0136719 __label__surgeon 0.0136719\\n', '')"}

Success!

Updating fastText assets

Updating the fastText binary

It's important to make the fastTExt binary using the same environment as the one your serverless function will run in. I followed the approach used here to set up an EC2 instance, import everything needed, and then make and download the binary. The fastText binary included in this project was built using fastText version 0.1.0 with:

wget https://github.com/facebookresearch/fastText/archive/v0.1.0.zip
$ unzip v0.1.0.zip
$ cd fastText-0.1.0
$ make

If you would like to update the fastText binary, you should follow a similar set of steps: ssh into a running EC2 instance (which is running an Amazon Linux AMI), follow the instructions at https://github.com/facebookresearch/fastText to update to the latest version of fastText so you can make the binary, and then copy (scp) the binary file into the folder for this repo.

Updating the classification model file

To update the model_standard.bin file, you must have training data properly formated for fastText training (e.g., training_set.txt cleaned in the same way as on the machine doing prediction. For example, for CareerVillage we remove all punctuation, remove all HTML tags, and lowercase all characters) and for optimal results, you should also have a local copy of the wikipedia-based english language word vectors file provided by fastText (wiki.en.vec). Training is completed with the following parameters: ./fasttext supervised -input ./data/questions_set_for_training.txt -output model -pretrainedVectors ./data/wiki.en.vec -verbose 2 -lr 1.0 -epoch 20 -dim 300 -wordNgrams 2 -neg 10 -bucket 10000. If you use the pretrained vectors, your model will almost certainly be too large for AWS Lambda, so you will need to use fastText's quantize to reduce the filesize. More information is available at https://github.com/facebookresearch/fastText#text-classification

Scaling

By default, AWS Lambda limits the total concurrent executions across all functions within a given region to 100. The default limit is a safety limit that protects you from costs due to potential runaway or recursive functions during initial development and testing. To increase this limit above the default, follow the steps in To request a limit increase for concurrent executions.

References

License

Please refer to the LICENSE file for license information applying to everything in this project except for the fastText binary. The license for the fastText binary is in the LICENSE_FASTTEXT file.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].