All Projects → yya518 → FinBERT

yya518 / FinBERT

Licence: Apache-2.0 License
A Pretrained BERT Model for Financial Communications. https://arxiv.org/abs/2006.08097

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to FinBERT

bert sa
bert sentiment analysis tensorflow serving with RESTful API
Stars: ✭ 35 (-81.87%)
Mutual labels:  sentiment-analysis, bert
bert-sentiment
Fine-grained Sentiment Classification Using BERT
Stars: ✭ 49 (-74.61%)
Mutual labels:  sentiment-analysis, bert
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+1204.66%)
Mutual labels:  sentiment-analysis, bert
COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (-87.56%)
Mutual labels:  sentiment-analysis, bert
Pytorch Sentiment Analysis
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Stars: ✭ 3,209 (+1562.69%)
Mutual labels:  sentiment-analysis, bert
NSP-BERT
The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"
Stars: ✭ 166 (-13.99%)
Mutual labels:  sentiment-analysis, bert
text2class
Multi-class text categorization using state-of-the-art pre-trained contextualized language models, e.g. BERT
Stars: ✭ 15 (-92.23%)
Mutual labels:  bert
LSX
A word embeddings-based semi-supervised model for document scaling
Stars: ✭ 42 (-78.24%)
Mutual labels:  sentiment-analysis
MemNet ABSA
No description or website provided.
Stars: ✭ 20 (-89.64%)
Mutual labels:  sentiment-analysis
Text and Audio classification with Bert
Text Classification in Turkish Texts with Bert
Stars: ✭ 34 (-82.38%)
Mutual labels:  bert
Stock-Prediction
LSTM RNN for sentiment-based stock prediction
Stars: ✭ 50 (-74.09%)
Mutual labels:  sentiment-analysis
HistoricalVolatility
A framework for historical volatility estimation and analysis.
Stars: ✭ 22 (-88.6%)
Mutual labels:  financial-analysis
OpenDialog
An Open-Source Package for Chinese Open-domain Conversational Chatbot (中文闲聊对话系统,一键部署微信闲聊机器人)
Stars: ✭ 94 (-51.3%)
Mutual labels:  bert
IEX CPP API
Unofficial C++ Lib for the IEXtrading API
Stars: ✭ 34 (-82.38%)
Mutual labels:  financial-analysis
Sentic-GCN
[KBS] Aspect-based sentiment analysis via affective knowledge enhanced graph convolutional networks
Stars: ✭ 19 (-90.16%)
Mutual labels:  sentiment-analysis
Quality-Estimation2
机器翻译子任务-翻译质量评价-在BERT模型后面加上Bi-LSTM进行fine-tuning
Stars: ✭ 31 (-83.94%)
Mutual labels:  bert
KLUE
📖 Korean NLU Benchmark
Stars: ✭ 420 (+117.62%)
Mutual labels:  bert
classy
classy is a simple-to-use library for building high-performance Machine Learning models in NLP.
Stars: ✭ 61 (-68.39%)
Mutual labels:  bert
stock-news-sentiment-analysis
This program uses Vader SentimentIntensityAnalyzer to calculate the news headline overall sentiment for a stock
Stars: ✭ 21 (-89.12%)
Mutual labels:  sentiment-analysis
TextPair
文本对关系比较 - 语义相似度、字面相似度、文本蕴含等等
Stars: ✭ 44 (-77.2%)
Mutual labels:  bert

FinBERT

UPDATE:

[July 30, 2021] The fine-tuned FinBERT model for financial sentiment classification has been uploaded and integrated with Huggingface's transformers library. This model is fine-tuned on 10,000 manually annotated (positive, negative, neutral) sentences from analylst reports. This model achieves superior performance on financial tone anlaysis task. If you are simply interested in using FinBERT for financial tone analysis, give it a try.

from transformers import BertTokenizer, BertForSequenceClassification
import numpy as np

finbert = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-tone',num_labels=3)
tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-tone')

sentences = ["there is a shortage of capital, and we need extra financing", 
             "growth is strong and we have plenty of liquidity", 
             "there are doubts about our finances", 
             "profits are flat"]

inputs = tokenizer(sentences, return_tensors="pt", padding=True)
outputs = finbert(**inputs)[0]

labels = {0:'neutral', 1:'positive',2:'negative'}
for idx, sent in enumerate(sentences):
    print(sent, '----', labels[np.argmax(outputs.detach().numpy()[idx])])
    
'''
there is a shortage of capital, and we need extra financing ---- negative
growth is strong and we have plenty of liquidity ---- positive
there are doubts about our finances ---- negative
profits are flat ---- neutral
'''
    

FinBERT is a BERT model pre-trained on financial communication text. The purpose is to enhance finaincal NLP research and practice. It is trained on the following three finanical communication corpus. The total corpora size is 4.9B tokens.

  • Corporate Reports 10-K & 10-Q: 2.5B tokens
  • Earnings Call Transcripts: 1.3B tokens
  • Analyst Reports: 1.1B tokens

FinBERT results in state-of-the-art performance on financial sentiment classification task, which is a core financial NLP task. With the release of FinBERT, we hope practitioners and researchers can utilize FinBERT for a wider range of applications where the prediction target goes beyond sentiment, such as financial-related outcomes including stock returns, stock volatilities, corporate fraud, etc.

You can use FinBERT in two ways:

  1. Pre-trained model. You can fine-tuned FinBERT with your own dataset. FinBERT is most suitable for financial NLP tasks. We have the provided several FinBERT models in below, as well as the fine-tune scripts.
  2. Fine-tuned model. If you are interested in simply using FinBERT for financial sentiment classification prediction, we provide a fine-tuned FinBERT model that is fine tuned on 10,000 manually annotated analyst statements. This dataset has been used in Accounting literature for analyst tone analysis (Huang et al., The Accounting Review, 2014).

1. Pre-trained model

Download FinBERT

We provide four versions of pre-trained weights.

FinVocab is a new WordPiece vocabulary on our finanical corpora using the SentencePiece library. We produce both cased and uncased versions of FinVocab, with sizes of 28,573 and 30,873 tokens respectively. This is very similar to the 28,996 and 30,522 token sizes of the original BERT cased and uncased BaseVocab.

Downloading Financial Phrase Bank Dataset

The datase is collected by Malo et al. 2014, and can be downloaded from this link. The zip file for the Financial Phrase Bank Dataset has been provided for ease of download and use.

Environment:

To set up the evironment used to train and test the model, run pip install -r requirements.txt
We would like to give special thanks to the creators of pytorch-pretrained-bert (i.e. pytorch-transformers)

In order to fine-tune FinBERT on the Financial Phrase Bank dataset, please run the script as follows:

python train_bert.py --cuda_device (cuda:device_id) --output_path (output directory) --vocab (vocab chosen)
--vocab_path (path to new vocab txt file) --data_dir (path to downloaded dataset) --weight_path (path to downloaded weights)

There are 4 kinds of vocab to choose from: FinVocab-Uncased, FinVocab-Cased, and Google's BERT Base-Uncased and Base-Cased.

Note to run the script, one should first download the model weights, and the Financial Phrase Bank Dataset.

2. Fine-tuned model

Using FinBERT for financial sentiment classification

If you are simply interested in using FinBERT for downstream sentiment classification task, we have a fine-tuned FinBERT for your use. This fine-tuned FinBERT model is fine-tuned on 10,000 analyst statements for tone prediction task (positive, negative, neutral). We provide a Jupyter notebook to show how you can use it with your own data. For comparison purpose, we also provided a pre-trained Naive Bayes Model. The fine-tuned FinBERT has significantly better performance than the Naive Bayes model, and it can gauge finanical text tone with high accuracy.

Citation

@misc{yang2020finbert,
    title={FinBERT: A Pretrained Language Model for Financial Communications},
    author={Yi Yang and Mark Christopher Siy UY and Allen Huang},
    year={2020},
    eprint={2006.08097},
    archivePrefix={arXiv},
    }

Contact

Please post a Github issue or contact [email protected] if you have any questions.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].