Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → imadelh → Nlp News Classification

imadelh / Nlp News Classification

Train and deploy a News Classifier using language model (ULMFit) - Serverless container

Programming Languages

python

139335 projects - #7 most used programming language

Labels

nlp serverless text-classification

Projects that are alternatives of or similar to Nlp News Classification

Alagarr

🦍 Alagarr is a request-response helper library that removes the boilerplate from your Node.js (AWS Lambda) serverless functions and helps make your code portable.

Stars: ✭ 58 (-7.94%)

Mutual labels: serverless

Graphql Serverless

Example boilerplates for GraphQL backends hosted on serverless platforms

Stars: ✭ 60 (-4.76%)

Mutual labels: serverless

Syncano Node

Syncano Toolkit for JavaScript development

Stars: ✭ 61 (-3.17%)

Mutual labels: serverless

Apex

Old apex/apex

Stars: ✭ 20 (-68.25%)

Mutual labels: serverless

Applied Text Mining In Python

Repo for Applied Text Mining in Python (coursera) by University of Michigan

Stars: ✭ 59 (-6.35%)

Mutual labels: text-classification

Mycail

中国法研杯-司法人工智能挑战赛

Stars: ✭ 60 (-4.76%)

Mutual labels: text-classification

Fasttext.py

A Python interface for Facebook fastText

Stars: ✭ 1,091 (+1631.75%)

Mutual labels: text-classification

Blockstack Browser

The Blockstack Browser

Stars: ✭ 1,119 (+1676.19%)

Mutual labels: serverless

Freediscovery

Web Service for E-Discovery Analytics

Stars: ✭ 59 (-6.35%)

Mutual labels: text-classification

Bombermon

A multiplayer game (Bomberman-like) using Serverless concepts

Stars: ✭ 60 (-4.76%)

Mutual labels: serverless

Up focuses on deploying "vanilla" HTTP servers so there's nothing new to learn, just develop with your favorite existing frameworks such as Express, Koa, Django, Golang net/http or others.

Stars: ✭ 8,439 (+13295.24%)

Mutual labels: serverless

Serverless Reqvalidator Plugin

Serverless plugin to attach AWS API Gateway Basic Request Validation https://rafpe.ninja/2017/12/18/serverless-own-plugin-to-attach-aws-api-gateway-basic-request-validation/

Stars: ✭ 59 (-6.35%)

Mutual labels: serverless

Textblob Ar

Arabic support for textblob

Stars: ✭ 60 (-4.76%)

Mutual labels: text-classification

Pulumi

Pulumi - Developer-First Infrastructure as Code. Your Cloud, Your Language, Your Way 🚀

Stars: ✭ 10,887 (+17180.95%)

Mutual labels: serverless

Ng Toolkit

⭐️ Angular tool-box! Start your PWA in two steps! Add Serverless support for existing projects and much more

Stars: ✭ 1,116 (+1671.43%)

Mutual labels: serverless

Checkout Netlify Serverless

Sell products on the Jamstack with Netlify Functions and Stripe Checkout!

Stars: ✭ 58 (-7.94%)

Mutual labels: serverless

Faas Containerd

containerd and CNI provider for OpenFaaS

Stars: ✭ 60 (-4.76%)

Mutual labels: serverless

Sentiment analysis albert

sentiment analysis、文本分类、ALBERT、TextCNN、classification、tensorflow、BERT、CNN、text classification

Stars: ✭ 61 (-3.17%)

Mutual labels: text-classification

Serverless Api Example

Example of a Golang, Serverless API

Stars: ✭ 62 (-1.59%)

Mutual labels: serverless

Functions Csharp Eventhub Ordered Processing

Example of processing events in order with the Azure Functions Event Hubs trigger

Stars: ✭ 60 (-4.76%)

Mutual labels: serverless

View All Similar Projects ➔

NLP - News classification

Train and deploy a news classifier based on ULMFit.

Live version: https://nlp.imadelhanafi.com
Serverless version: https://newsclassifier-eebuzelyaa-uc.a.run.app/
Blog post: https://imadelhanafi.com/posts/text_classification_ulmfit/

Running on cloud/local machine

To run the application, we can use the pre-build docker image available on Docker Hub and simply run the following command

docker run --rm -p 8080:8080 imadelh/news:v1

The application will be available on http://0.0.0.0:8080. The user can run a customized Gunicorn command to specify number of workers or an HTTPS certificate.

# Get into the container
docker run -it --rm -v ~/nlp:/cert -p 8080:8080 imadelh/news:v1 bash

# Run Gunicorn with specefic number of workers/threads
gunicorn --certfile '/path_to/chain.pem' --keyfile '/path_to/key.pem' --workers=4 --bind 0.0.0.0:8080 wsgi:app

Serverless deployement - Google Run

Google Run is a new service from GCP that allows serverless deployment of containers with HTTPS endpoints. The app will run on 1 CPU with 2GB memory and have the ability to scale automatically depending on the number of concurrent requests.

Build image and push it to Container Registry

From a GCP project, we will use Google Shell to build the image and push it to GCR (container registry).

# Get name of project 
# For illustration we will call it PROJECT-ID

gcloud config get-value project

Create the following Dockerfile in your CloudShell session.

FROM imadelh/news:v_1cpu

# Google Run uses env variable PORT 

CMD gunicorn --bind :$PORT wsgi:app

Finally, we can build and submit the image to GCR.

gcloud builds submit --tag gcr.io/PROJECT-ID/news_classifier

Deploy on Google Run

From Google Run page, we will use the image gcr.io/PROJECT-ID/news_classifier:latest to run the app. Create a new service

Then enter the address of the image, choose other parameters as follows and deploy

After few seconds, you will see a link to the app.

Serverless version may suffer from cold-start if the service does not receive requests for a long time.

Reproduce results

LR and SVM

Requirements

To reproduce results reported in the blog post, we need to install the requirements in our development environment.

# Open requirement.txt and select torch==1.1.0 instead of the cpu version used for inference only.
# Then install requirements
pip install -r requirements.txt

Hyper-parameter search

After completing the installation, we can run parameters search or training of sklearn models as follows

# Params search for SVM
cd sklearn_models
python3 params_search.py --model svc --exp_name svmsearch_all --data dataset_processed

# Params search for LR
python3 params_search.py --model lreg --exp_name logreg_all --data dataset_processed

The parameters space is defined in the file sklearn_models/params_search.py. The outputs will be saved in the logs folder.

Training

Training a model for a fixed set of parameters can be done using sklearn_models/baseline.py

# Specify the parameters of the model inside baseline.py and run
python3 baseline.py --model svc --exp_name svc_all --data dataset_processed

The logs/metrics on test dataset will be saved in sklearn_models/logs/ and the trained model will be saved in sklearn_models/saved_models/.

ULMFit

To reproduce/train ULMFit model, the notebooks available in ulmfit/ are used. Same requirements are needed as explained before. We will need a GPU to fine-tune LM models, this can be done using Google Colab.

Notebook contents:
- data preparation
- Fine-tune ULMFit
- Train ULMFit classifier
- Predictions and evaluation
- Exporting the trained model
- Inference on CPU

To be able to run the training, we need to specify the path to a folder where the training data is stored.

Locally:

Save data from data/, then specify the absolute PATH in the beginning of the notebook.

# This is the absolute path to where folder "data" is available
PATH = "/app/analyse/"

Google Colab:

Save the data in Google drive folder, for example files/nlp/

# The folder 'data' is saved in Google drive in "files/nlp/"
# While running the notebook from google colab, mount the drive and define PATH to data
from google.colab import drive
drive.mount('/content/gdrive/')

# then give the path where your data is stored (in google drive)
PATH = "/content/gdrive/My Drive/files/nlp/"

01_ulmfit_balanced_dataset.ipynb - Train ULMfit on balanced dataset

02_ulmfit_all_data.ipynb - Train ULMFit on full dataset

Performance

Performance of ULMFit on the test dataset data/dataset_inference (see end of 02_ulmfit_all_data.ipynb for the definition of test dataset).

# ULMFit - Performance on test dataset
            precision    recall  f1-score   support
micro avg                           0.73     20086
macro avg       0.66      0.61      0.63     20086
weighted avg    0.72      0.73      0.72     20086

Top 3 accuracy on test dataset:
0.9044

Trained model is available for download at: https://github.com/imadelh/NLP-news-classification/releases/download/v1.0/ulmfit_model

This project is a very basic text classifier. Here is a list of other features that could be added

Feedback option to allow the user to submit a correction of the prediction.
Fine-tune the model periodically based on new feedbacks.
Compare performance to other language models (BERT, XLNet, etc).

Imad El Hanafi

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 63

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗