Alternatives and detailed information of bert-as-a-service_TFX

dimitreOliveira / bert-as-a-service_TFX

Licence: MIT License

End-to-end pipeline with TFX to train and deploy a BERT model for sentiment analysis.

Programming Languages

Jupyter Notebook

11667 projects

Projects that are alternatives of or similar to bert-as-a-service TFX

tfx-kubeflow-pipelines

Kubeflow pipelines built on top of Tensorflow TFX library

Stars: ✭ 17 (-46.87%)

Mutual labels: tfx, mlops

PDN

The official PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing" (WebConf '21)

Stars: ✭ 44 (+37.5%)

Mutual labels: transformer, bert

Kevinpro-NLP-demo

All NLP you Need Here. 个人实现了一些好玩的NLP demo，目前包含13个NLP应用的pytorch实现

Stars: ✭ 117 (+265.63%)

Mutual labels: transformer, bert

Filipino-Text-Benchmarks

Open-source benchmark datasets and pretrained transformer models in the Filipino language.

Stars: ✭ 22 (-31.25%)

Mutual labels: transformer, bert

are-16-heads-really-better-than-1

Code for the paper "Are Sixteen Heads Really Better than One?"

Stars: ✭ 128 (+300%)

Mutual labels: transformer, bert

sticker2

Further developed as SyntaxDot: https://github.com/tensordot/syntaxdot

Stars: ✭ 14 (-56.25%)

Mutual labels: transformer, bert

COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers

Rank 1 / 216

Stars: ✭ 24 (-25%)

Mutual labels: transformer, bert

TabFormer

Code & Data for "Tabular Transformers for Modeling Multivariate Time Series" (ICASSP, 2021)

Stars: ✭ 209 (+553.13%)

Mutual labels: transformer, bert

golgotha

Contextualised Embeddings and Language Modelling using BERT and Friends using R

Stars: ✭ 39 (+21.88%)

Mutual labels: transformer, bert

transformer-models

Deep Learning Transformer models in MATLAB

Stars: ✭ 90 (+181.25%)

Mutual labels: transformer, bert

KitanaQA

KitanaQA: Adversarial training and data augmentation for neural question-answering models

Stars: ✭ 58 (+81.25%)

Mutual labels: transformer, bert

text-generation-transformer

text generation based on transformer

Stars: ✭ 36 (+12.5%)

Mutual labels: transformer, bert

tensorflow-ml-nlp-tf2

텐서플로2와 머신러닝으로 시작하는 자연어처리 (로지스틱회귀부터 BERT와 GPT3까지) 실습자료

Stars: ✭ 245 (+665.63%)

Mutual labels: transformer, bert

NLP-paper

🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/

Stars: ✭ 23 (-28.12%)

Mutual labels: transformer, bert

FasterTransformer

Transformer related optimization, including BERT, GPT

Stars: ✭ 1,571 (+4809.38%)

Mutual labels: transformer, bert

rasa milktea chatbot

Chatbot with bert chinese model, base on rasa framework（中文聊天机器人，结合bert意图分析，基于rasa框架）

Stars: ✭ 97 (+203.13%)

Mutual labels: bert, bert-as-service

sister

SImple SenTence EmbeddeR

Stars: ✭ 66 (+106.25%)

Mutual labels: transformer, bert

les-military-mrc-rank7

莱斯杯：全国第二届“军事智能机器阅读”挑战赛 - Rank7 解决方案

Stars: ✭ 37 (+15.63%)

Mutual labels: transformer, bert

Xpersona

XPersona: Evaluating Multilingual Personalized Chatbot

Stars: ✭ 54 (+68.75%)

Mutual labels: transformer, bert

semantic-document-relations

Implementation, trained models and result data for the paper "Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles"

Stars: ✭ 21 (-34.37%)

Mutual labels: transformer, bert

View All Similar Projects ➔

image source

BERT as a service

This repository is designed to demonstrate a simple yet complete machine learning solution that uses a BERT model for text sentiment analysis using a TensorFlow Extended end-to-end pipeline, and making use of some of the best practices from the MLOps domain, it will cover steps from data ingestion until model serving and consuming it either with REST or gRPC requests.

Content

Pipelines
- Notebook (Google Colab)
  - BERT from TF HUB [link]
  - BERT from HuggingFace [link]
- GCP (KubeFlow) [link]
- GCP (Vertex AI) [link]
- Local (AirFlow) TODO
Documentation [link]
Data [link]

Pipeline description

image source

The end-to-end TFX pipeline will cover most of the main areas of a machine learning solution, from data ingestion and validation to model training and serving, those steps are further described below, this repository also aims to provide different options for managing the pipeline, this will be done using orchestrators, the orchestrators covered will be AirFlow, KubeFlow and an interactive option that can be used at Google Colab for demonstration purposes.

ExampleGen is the initial input component of a pipeline that ingests and optionally splits the input dataset.
- Reads the IMDB dataset stored as a CSV file and spits the data into train (2/3) and validation (1/3).
StatisticsGen calculates statistics for the dataset.
- Generate statistics for text and label distribution.
SchemaGen examines the statistics and creates a data schema.
ExampleValidator looks for anomalies and missing values in the dataset.
- Validates the input data based on the SchemaGen's schema.
Transform performs feature engineering on the dataset.
- Input missing data and do basic data pre-processing.
Tuner uses kerastuner to perform hyperparameters tuning for the model.
- The optimal hyperparameters will be used by the Trainer
Trainer trains the model.
- Train the custom pre-trained BERT model, this model also has a built-in text tokenizer.
Resolver performs model validation.
- Resolve a model to be used as a baseline for model validation.
Evaluator performs deep analysis of the training results and helps you validate your exported models, ensuring that they are "good enough" to be pushed to production.
InfraValidator used as an early warning layer before pushing a model into production. The name "infra" validator came from the fact that it is validating the model in the actual model serving "infrastructure".
- Evaluate the model's accuracy over the complete dataset and across different data slices, also evaluate new models against a baseline.
Pusher deploys the model on a serving infrastructure.
- Export the model for serving if the new model improved over the baseline.

Model description

At the modeling part, we are going to use the BERT model, for better performance we will use transfer learning, this means that we are using a model that was pre-trained on another task (usually a task that is more generic or similar), from the pre-trained model we will use all layers until the output of the last embedding, to be more specific only the output from the CLS token, shown in the image below, then we add a classifier layer at the top, this classifier layer will be responsible for classifying the input text as being positive or negative, this task is also known as sentiment analysis, and is very common in natural language processing.

image source

Dataset description

The dataset used for training and evaluating the model is the known IMDB review dataset, this dataset has 25,000 movies reviews, being either negative (label 0) or positive (label 1), this dataset was slightly processed to be used here, labels have been encoded to be integers (0 or 1), and for faster experimentation, the data was reduced to have only 5,000 samples.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

dimitreOliveira / bert-as-a-service_TFX

Programming Languages

Labels

Projects that are alternatives of or similar to bert-as-a-service TFX

BERT as a service

Content

Pipeline description

Model description

Dataset description