Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → brightmart → Sentiment_analysis_fine_grain

brightmart / Sentiment_analysis_fine_grain

Multi-label Classification with BERT; Fine Grained Sentiment Analysis from AI challenger

Labels

jupyter-notebook text-classification sentiment-analysis language-model online

Projects that are alternatives of or similar to Sentiment analysis fine grain

Bertweet

BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)

Stars: ✭ 282 (-48.35%)

Mutual labels: text-classification, sentiment-analysis, language-model

Text Analytics With Python

Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.

Stars: ✭ 1,132 (+107.33%)

Mutual labels: jupyter-notebook, text-classification, sentiment-analysis

Tia

Your Advanced Twitter stalking tool

Stars: ✭ 98 (-82.05%)

Mutual labels: jupyter-notebook, text-classification, sentiment-analysis

Kaggle-Twitter-Sentiment-Analysis

Kaggle Twitter Sentiment Analysis Competition

Stars: ✭ 18 (-96.7%)

Mutual labels: sentiment-analysis, text-classification

NewsMTSC

Target-dependent sentiment classification in news articles reporting on political events. Includes a high-quality data set of over 11k sentences and a state-of-the-art classification model.

Stars: ✭ 54 (-90.11%)

Mutual labels: sentiment-analysis, text-classification

NSP-BERT

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

Stars: ✭ 166 (-69.6%)

Mutual labels: sentiment-analysis, text-classification

Text tone analyzer

Система, анализирующая тональность текстов и высказываний.

Stars: ✭ 15 (-97.25%)

Mutual labels: sentiment-analysis, text-classification

Introduction Datascience Python Book

Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications

Stars: ✭ 275 (-49.63%)

Mutual labels: jupyter-notebook, sentiment-analysis

Stock Analysis

Regression, Scrapers, and Visualization

Stars: ✭ 255 (-53.3%)

Mutual labels: jupyter-notebook, sentiment-analysis

Azureml Bert

End-to-End recipes for pre-training and fine-tuning BERT using Azure Machine Learning Service

Stars: ✭ 342 (-37.36%)

Mutual labels: jupyter-notebook, language-model

Text mining resources

Resources for learning about Text Mining and Natural Language Processing

Stars: ✭ 358 (-34.43%)

Mutual labels: text-classification, sentiment-analysis

FNet-pytorch

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

Stars: ✭ 204 (-62.64%)

Mutual labels: text-classification, language-model

COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers

Rank 1 / 216

Stars: ✭ 24 (-95.6%)

Mutual labels: sentiment-analysis, text-classification

ML2017FALL

Machine Learning (EE 5184) in NTU

Stars: ✭ 66 (-87.91%)

Mutual labels: sentiment-analysis, text-classification

Deep-NLP-Resources

Curated list of all NLP Resources

Stars: ✭ 65 (-88.1%)

Mutual labels: text-classification, language-model

Afinn

AFINN sentiment analysis in Python

Stars: ✭ 356 (-34.8%)

Mutual labels: jupyter-notebook, sentiment-analysis

Bert Multitask Learning

BERT for Multitask Learning

Stars: ✭ 380 (-30.4%)

Mutual labels: jupyter-notebook, text-classification

Sentimentanalysis

文本情感分析

Stars: ✭ 421 (-22.89%)

Mutual labels: jupyter-notebook, sentiment-analysis

awesome-text-classification

Text classification meets word embeddings.

Stars: ✭ 27 (-95.05%)

Mutual labels: sentiment-analysis, text-classification

TLA

A comprehensive tool for linguistic analysis of communities

Stars: ✭ 47 (-91.39%)

Mutual labels: sentiment-analysis, text-classification

View All Similar Projects ➔

Introduction

With this repository, you will able to train Multi-label Classification with BERT,

Deploy BERT for online prediction.

You can also find the a short tutorial of how to use bert with chinese: BERT short chinese tutorial

You can find Introduction to fine grain sentiment from AI Challenger

Basic Ideas

Add something here.

Experiment on New Models

for more, check model/bert_cnn_fine_grain_model.py

Performance

Model	TextCNN(No-pretrain)	TextCNN(Pretrain-Finetuning)	Bert(base_model_zh)	Bert(base_model_zh,pre-train on corpus)
F1 Score	0.678	0.685	ADD A NUMBER HERE	ADD A NUMBER HERE

Notice: F1 Score is reported on validation set

Usage

Bert for Multi-label Classificaiton [data for fine-tuning and pre-train]

export BERT_BASE_DIR=BERT_BASE_DIR/chinese_L-12_H-768_A-12
export TEXT_DIR=TEXT_DIR
nohup python run_classifier_multi_labels_bert.py   
  --task_name=sentiment_analysis   
  --do_train=true   
  --do_eval=true  
  --data_dir=$TEXT_DIR   
  --vocab_file=$BERT_BASE_DIR/vocab.txt   
  --bert_config_file=$BERT_BASE_DIR/bert_config.json  
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt   
  --max_seq_length=512   
  --train_batch_size=4   
  --learning_rate=2e-5   
  --num_train_epochs=3   
  --output_dir=./checkpoint_bert &

1.firstly, you need to download pre-trained model from google, and put to a folder(e.g.BERT_BASE_DIR)

chinese_L-12_H-768_A-12 from <a href='https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip'>bert</a>

2.secondly, you need to have training data(e.g. train.tsv) and validation data(e.g. dev.tsv), and put it under a

 folder(e.g.TEXT_DIR ). you can also download data from here <a href='https://pan.baidu.com/s/1ZS4dAdOIAe3DaHiwCDrLKw'>data to train bert for AI challenger-Sentiment Analysis</a>.
  
 it contains processed data you can run for both fine-tuning on sentiment analysis and pre-train with Bert. 
  
 it is generated by following this notebook step by step:
  
 preprocess_char.ipynb 
  
 you can generate data by yourself as long as data format is compatible with 
  
 processor SentimentAnalysisFineGrainProcessor(alias as sentiment_analysis); 


 data format:  label1,label2,label3\t here is sentence or sentences\t
 
 it only contains two columns, the first one is target(one or multi-labels), the second one is input strings.
  
 no need to tokenized.
 
 sample:"0_1,1_-2,2_-2,3_-2,4_1,5_-2,6_-2,7_-2,8_1,9_1,10_-2,11_-2,12_-2,13_-2,14_-2,15_1,16_-2,17_-2,18_0,19_-2 浦东五莲路站，老饭店福瑞轩属于上海的本帮菜，交通方便，最近又重新装修，来拨草了，饭店活动满188元送50元钱，环境干净，简单。朋友提前一天来预订包房也没有订到，只有大堂，五点半到店基本上每个台子都客满了，都是附近居民，每道冷菜量都比以前小，味道还可以，热菜烤茄子，炒河虾仁，脆皮鸭，照牌鸡，小牛排，手撕腊味花菜等每道菜都很入味好吃，会员价划算，服务员人手太少，服务态度好，要能团购更好。可以用支付宝方便"
 
 check sample data in ./BERT_BASE_DIR folder 

 for more detail, check create_model and SentimentAnalysisFineGrainProcessor from run_classifier.py

Pre-train Bert model based on open-souced model, then do classification task

generate raw data: [ADD SOMETHING HERE]

take sure each line is a sentence. between each document there is a blank line.

you can find generated data from zip file.
```
 use write_pre_train_doc() from preprocess_char.ipynb 
```

generate data for pre-train stage using:

export BERT_BASE_DIR=./BERT_BASE_DIR/chinese_L-12_H-768_A-12
nohup python create_pretraining_data.py \
--input_file=./PRE_TRAIN_DIR/bert_*_pretrain.txt \
--output_file=./PRE_TRAIN_DIR/tf_examples.tfrecord \
--vocab_file=$BERT_BASE_DIR/vocab.txt \
--do_lower_case=True \
--max_seq_length=512 \
--max_predictions_per_seq=60 \
--masked_lm_prob=0.15 \
--random_seed=12345 \
--dupe_factor=5 nohup_pre.out &

pre-train model with generated data:

python run_pretraining.py
fine-tuning

python run_classifier.py

TextCNN

download cache file of sentiment analysis(tokens are in word level)
train the model:

python train_cnn_fine_grain.py

 cache file of TextCNN model was generate by following steps from preprocess_word.ipynb. 
 
 it contains everything you need to run TextCNN.
 
 it include: processed train/validation/test set; vocabulary of word; a dict map label to index. 
 
 take train_valid_test_vocab_cache.pik and put it under folder of preprocess_word/
 
 raw data are also included in this zip file.

Pre-train TextCNN

pre-train TextCNN with masked language model

python train_cnn_lm.py
fine-tuning for TextCNN

python train_cnn_fine_grain.py

Deploy BERT for online prediction

with session and feed style you can easily deploy BERT.

online prediction with BERT, check more from here

Reference

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 546

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (8) 🔗