All Projects → manideep2510 → siamese-BERT-fake-news-detection-LIAR

manideep2510 / siamese-BERT-fake-news-detection-LIAR

Licence: other
Triple Branch BERT Siamese Network for fake news classification on LIAR-PLUS dataset in PyTorch

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to siamese-BERT-fake-news-detection-LIAR

Transformer Temporal Tagger
Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging
Stars: ✭ 55 (-42.71%)
Mutual labels:  bert-model, huggingface
PIE
Fast + Non-Autoregressive Grammatical Error Correction using BERT. Code and Pre-trained models for paper "Parallel Iterative Edit Models for Local Sequence Transduction": www.aclweb.org/anthology/D19-1435.pdf (EMNLP-IJCNLP 2019)
Stars: ✭ 164 (+70.83%)
Mutual labels:  bert-model, bert-models
Final Project
Using Twitter Ego Network Analysis to Detect Sources of Fake News
Stars: ✭ 44 (-54.17%)
Mutual labels:  fake-news
TrollHunter
Twitter Troll & Fake News Hunter - Crawls news websites and twitter to identify fake news
Stars: ✭ 38 (-60.42%)
Mutual labels:  fake-news
feedIO
A Feed Aggregator that Knows What You Want to Read.
Stars: ✭ 26 (-72.92%)
Mutual labels:  fake-news
parsbert-ner
🤗 ParsBERT Persian NER Tasks
Stars: ✭ 15 (-84.37%)
Mutual labels:  huggingface
Vision2018
The GeniSys TASS Devices & Applications use Siamese Neural Networks and Triplet Loss to classify known and unknown faces.
Stars: ✭ 17 (-82.29%)
Mutual labels:  artificial-neural-networks
ai-background-remove
Cut out objects and remove backgrounds from pictures with artificial intelligence
Stars: ✭ 70 (-27.08%)
Mutual labels:  artificial-neural-networks
dl-relu
Deep Learning using Rectified Linear Units (ReLU)
Stars: ✭ 20 (-79.17%)
Mutual labels:  artificial-neural-networks
fake-news-datasets
This repository contains list of available fake news datasets for data mining.
Stars: ✭ 28 (-70.83%)
Mutual labels:  fake-news
India-WhatsAppFakeNews-Dataset
WhatsApps related deaths News Articles along with other articles across India during that period
Stars: ✭ 41 (-57.29%)
Mutual labels:  fake-news
modelhub
A collection of deep learning models with a unified API.
Stars: ✭ 59 (-38.54%)
Mutual labels:  artificial-neural-networks
converse
Conversational text Analysis using various NLP techniques
Stars: ✭ 147 (+53.13%)
Mutual labels:  huggingface
chef-transformer
Chef Transformer 🍲 .
Stars: ✭ 29 (-69.79%)
Mutual labels:  huggingface
Neurapse
Nuerapse simulations for SNNs
Stars: ✭ 22 (-77.08%)
Mutual labels:  artificial-neural-networks
ganbert-pytorch
Enhancing the BERT training with Semi-supervised Generative Adversarial Networks in Pytorch/HuggingFace
Stars: ✭ 60 (-37.5%)
Mutual labels:  huggingface
bert-movie-reviews-sentiment-classifier
Build a Movie Reviews Sentiment Classifier with Google's BERT Language Model
Stars: ✭ 12 (-87.5%)
Mutual labels:  bert-model
Sequence-to-Sequence-Learning-of-Financial-Time-Series-in-Algorithmic-Trading
My bachelor's thesis—analyzing the application of LSTM-based RNNs on financial markets. 🤓
Stars: ✭ 64 (-33.33%)
Mutual labels:  artificial-neural-networks
artificial neural networks
A collection of Methods and Models for various architectures of Artificial Neural Networks
Stars: ✭ 40 (-58.33%)
Mutual labels:  artificial-neural-networks
NeuroEvolution-Flappy-Bird
A comparison between humans, neuroevolution and multilayer perceptrons playing Flapy Bird implemented in Python
Stars: ✭ 17 (-82.29%)
Mutual labels:  artificial-neural-networks

Triple Branch BERT Siamese Network for fake news classification on LIAR-PLUS dataset

Dependensies

Files

  1. bert_siamese.py - Code to train the binary/six-way classifier

  2. main_attention.py - Keras code for Attention model (Need not be trained)

  3. Fake_News_classification.pdf - Explanation about the architectures and techniques used

  4. requirements.txt - File to install all the dependencies

Usage

Install Python3.5 (Should also work for python>3.5)

Then install the requirements by running

$ pip3 install -r requirements.txt

Now to run the training code for binary classification, execute

$ python3 bert_siamese.py -num_labels 2

Now to run the training code for 6 class classification, execute

$ python3 bert_siamese.py -num_labels 6

Architecture and Methodology

Highest Accuracy I was able to achieve

Model Architectures tried

Note: I will be referring to Where is your Evidence: Improving Fact-checking by Justification Modeling as “the dataset paper” or “the LIAR-PLUS dataset paper” throughout this documentation.

As presented in the dataset paper, where they have employed “enhanced claim/statement representation that captures additional information shown to be useful such as hedging” but I haven't done anything like they mentioned in the paper because I wanted to check whether the state of the art language modeling algorithm can do good on a very complex classification task like the fake news classification. From my experiments I found that BERT can be finetuned to work on classification to some extent.

I have experimented with different training strategies with BERT as the base architecture which I have fine tuned for text classification (In this case fake news classification).

I chose BERT as the base architecture because state of the art performance in Language Translation and Language Modeling tasks. I thought I would be a good idea to leverage its pretrained weights as finetine it to the task of text classification.

Below are the three training strategies and Architectures used to get the desired results.

1. Finetuning BERT:

Finetuned the BERT architecture for classification by passing the tensor from BERT into a Linear layer (fully connected layer) which gives a Binary output logits.

Here only the news statements are used for training the network. No metadata or the justification data has been used.

Through this, I was able to achieve around 60% accuracy on the binary classification task.

2. A Siamese Network with BERT as the base network:

Built a siamese network with two branches with each branch containing BERT as the base models.

Input of the first branch will be the tokens corresponding to the news statements on which we need to predict the labels. Input of the second branch will be the tokens corresponding to the justification of the particular news statement passed to the first branch.

The output of each BERT layer branch will be a 1D tensor of shape (768). As we have two branches we will get two 1D tensors of shape (768). Now these two outputs are concatenated and passed through a Linear layer(fully connected layer), from this we get two logits and a ‘softmax’ activation is applied to get the output probabilities.

In this architecture, both branches share the same weights between them.

This approach is used particularly to leverage the additional information we have, In this case the ‘justifications’. This method gave a binary classification accuracy of 65.4%.

In case of 6 classification, this method achieved an accuracy of 23.6% which is improved a lot in the next method.

3. Triple Branch Siamese Network with BERT as the base network:

Here the architecture is similar to the one in the previous case, but here I have added one bore branch with BERT as the base network to the siamese network in the previous case making this a Triple Branch Siamese network. The input to this additional branch will be the remaining meta data available like speaker, source, affiliation, etc. apart from the justification.

The second change/addition here is taking into account the authenticity of the publisher.

For this purpose I have defined a feature called “Credit Score”.

As given in the dataset documentation, the columns 9-13 in the dataset correspond to the number of ​barely true counts, false counts, half true counts, mostly true counts, pants on fire counts made by the news source.

So the Credit score is calculated as,

The credit score tells us about how false or fake the news published by that author or the source is on average.

This credit score is multiplied to the tensor resulting from the concatenation of the output tensors from all three branches of the siamese network. And then multiplied 1D tensor is passed through a fully connected layers to get two logits a outputs.

The reason why I used this credit score is to sort of increase the relative difference between the output activations between the fake and the real cases (As the credit score will be high incase of a publisher who publishes fake news compared to the someone who does less.)

For binary classification, the model wasn’t able to learn at all on the training data as the loss is also constant throughput the training. This can be due to the reason that there are many moving parts here like the credit_score integration, meta data integration, etc. and due to this, tweaking the network and learning parameters became difficult. Also because of limited computing resources available with me and the huge training times the network is taking, I was not able to properly tweak different parameters and couldn’t experiment with different strategies of combining the meta data with the news statements and justification.

I believe that, if some time can be invested in this method, there will be some goods gains in accuracy.

Quite different from the binary classification, there was an improvement in accuracy in the case of 6 class classification to 32.8%.

Two further modifications have been made to this method giving better results. They are discussed below.

Modification1: Added the credit scores to the output of the concatenation layer instead of multiplication. And decreasing the learning rate by 5 times.

Modification2: Instead of giving inputs of same sequence sizes (128) to all three branches, I changed the input sequence size depending on the type of data and the average number of words in them. For the branch which takes news statements as input, the sequence size is 64 as there are only 5-10 input sequences with more than 64 words in them. For the branch which takes justifications as input, the sequence size is 256 as many of the justifications have 128 to 264 words and there are only around 10 input sequences with more than 264 words in them. And same with metadata input for which the input sequence size is fixed to 32 as there are zero inputs with more than 32 words in them. This also allowed me to use the GPU memory more efficiently.

These modifications resolved the problem of network not learning in case of binary classification and improved the six-way classification accuracy by a large margin.

References

  1. Keras: The Python Deep Learning library
  2. Keras Tutorial on Glove embedding
  3. A library of state-of-the-art pretrained models for Natural Language Processing
  4. Pytorch Deep Learning framework
  5. Pytorch BERT usage example
  6. Attention Is All You Need
  7. Blog on attention networks in Keras
  8. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  9. Example on Siamese networks in pytorch
  10. LIAR-PLUS dataset (https://aclweb.org/anthology/W18-5513)
  11. GloVe: Global Vectors for Word Representation
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].