Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → tlatkowski → Multihead Siamese Nets

tlatkowski / Multihead Siamese Nets

Licence: mit

Implementation of Siamese Neural Networks built upon multihead attention mechanism for text semantic similarity task.

Programming Languages

python3

1442 projects

Labels

jupyter-notebook deep-learning tensorflow nlp natural-language-processing deep-neural-networks attention

Projects that are alternatives of or similar to Multihead Siamese Nets

Deep Math Machine Learning.ai

A blog which talks about machine learning, deep learning algorithms and the Math. and Machine learning algorithms written from scratch.

Stars: ✭ 173 (+20.14%)

Mutual labels: jupyter-notebook, natural-language-processing, deep-neural-networks

100 Days Of Nlp

Stars: ✭ 125 (-13.19%)

Mutual labels: jupyter-notebook, natural-language-processing, deep-neural-networks

Nlp Tutorial

Natural Language Processing Tutorial for Deep Learning Researchers

Stars: ✭ 9,895 (+6771.53%)

Mutual labels: jupyter-notebook, natural-language-processing, attention

Hey Jetson

Deep Learning based Automatic Speech Recognition with attention for the Nvidia Jetson.

Stars: ✭ 161 (+11.81%)

Mutual labels: jupyter-notebook, deep-neural-networks, attention

Germanwordembeddings

Toolkit to obtain and preprocess german corpora, train models using word2vec (gensim) and evaluate them with generated testsets

Stars: ✭ 189 (+31.25%)

Mutual labels: jupyter-notebook, natural-language-processing, deep-neural-networks

Speech Emotion Analyzer

The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)

Stars: ✭ 633 (+339.58%)

Mutual labels: jupyter-notebook, natural-language-processing, deep-neural-networks

Pytorchnlpbook

Code and data accompanying Natural Language Processing with PyTorch published by O'Reilly Media https://nlproc.info

Stars: ✭ 1,390 (+865.28%)

Mutual labels: jupyter-notebook, natural-language-processing, deep-neural-networks

Nlp Models Tensorflow

Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0

Stars: ✭ 1,603 (+1013.19%)

Mutual labels: jupyter-notebook, attention

Ml Fraud Detection

Credit card fraud detection through logistic regression, k-means, and deep learning.

Stars: ✭ 117 (-18.75%)

Mutual labels: jupyter-notebook, deep-neural-networks

Nlp Pretrained Model

A collection of Natural language processing pre-trained models.

Stars: ✭ 122 (-15.28%)

Mutual labels: natural-language-processing, deep-neural-networks

Dnnweaver2

Open Source Specialized Computing Stack for Accelerating Deep Neural Networks.

Stars: ✭ 125 (-13.19%)

Mutual labels: jupyter-notebook, deep-neural-networks

Dat8

General Assembly's 2015 Data Science course in Washington, DC

Stars: ✭ 1,516 (+952.78%)

Mutual labels: jupyter-notebook, natural-language-processing

Machine Learning Demystified

A weekly workshop series at ITP to teach machine learning with a focus on deep learning

Stars: ✭ 114 (-20.83%)

Mutual labels: jupyter-notebook, deep-neural-networks

Pytextrank

Python implementation of TextRank for phrase extraction and summarization of text documents

Stars: ✭ 1,675 (+1063.19%)

Mutual labels: jupyter-notebook, natural-language-processing

Tensorflow Nlp

NLP and Text Generation Experiments in TensorFlow 2.x / 1.x

Stars: ✭ 1,487 (+932.64%)

Mutual labels: jupyter-notebook, natural-language-processing

Deep Nlp Seminars

Materials for deep NLP course

Stars: ✭ 113 (-21.53%)

Mutual labels: jupyter-notebook, natural-language-processing

Aws Machine Learning University Accelerated Nlp

Machine Learning University: Accelerated Natural Language Processing Class

Stars: ✭ 1,695 (+1077.08%)

Mutual labels: jupyter-notebook, natural-language-processing

Chinese Chatbot

中文聊天机器人，基于10万组对白训练而成，采用注意力机制，对一般问题都会生成一个有意义的答复。已上传模型，可直接运行，跑不起来直播吃键盘。

Stars: ✭ 124 (-13.89%)

Mutual labels: jupyter-notebook, attention

Deep Steganography

Hiding Images within other images using Deep Learning

Stars: ✭ 136 (-5.56%)

Mutual labels: jupyter-notebook, deep-neural-networks

Algobook

A beginner-friendly project to help you in open-source contributions. Data Structures & Algorithms in various programming languages Please leave a star ⭐ to support this project! ✨

Stars: ✭ 132 (-8.33%)

Mutual labels: jupyter-notebook, deep-neural-networks

View All Similar Projects ➔

Siamese Deep Neural Networks for semantic similarity.

This repository contains implementation of Siamese Neural Networks in Tensorflow built based on 3 different and major deep learning architectures:

Convolutional Neural Networks
Recurrent Neural Networks
Multihead Attention Networks

The main reason of creating this repository is to compare well-known implementaions of Siamese Neural Networks available on GitHub mainly built upon CNN and RNN architectures with Siamese Neural Network built based on multihead attention mechanism originally proposed in Transformer model from Attention is all you need paper.

Supported datasets

Current version of pipeline supports working with 3 datasets:

The Stanford Natural Language Inference (SNLI) Corpus
Quora Question Pairs
🆕 Adversarial Natural Language Inference (ANLI) benchmark: GitHub, arXiv

Installation

Data preparation

In order to download data, execute the following commands (this process can take a while depending on your network throughput):

cd bin
chmod a+x prepare_data.sh
./prepare_data.sh

As as result of executing above script, corpora directory will be created with QQP, SNLI and ANLI data.

Dependency installation

This project was developed in and has been tested on Python 3.6. The package requirements are stored in requirements folder.

To install the requirements, execute the following command:

For GPU usage, execute:

pip install -r requirements/requirements-gpu.txt

and for CPU usage:

pip install -r requirements/requirements-cpu.txt

Training models

To train model run the following command:

python3 run.py train SELECTED_MODEL SELECTED_DATASET --experiment_name NAME --gpu GPU_NUMBER

where SELECTED_MODEL represents one of the selected model among:

cnn
rnn
multihead

and SELECTED_DATASET is represented by:

SNLI
QQP
ANLI

--experiment_name is an optional argument used for indicating experiment name. Default value {SELECTED_MODEL}_{EMBEDDING_SIZE}.

--gpu is an optional argument, use it in order to indicate specific GPU on your machine (the default value is '0').

Example (GPU usage): Run the following command to train Siamese Neural Network based on CNN and trained on SNLI corpus:

python3 run.py train cnn SNLI --gpu 1

Example (CPU usage): Run the following command to train Siamese Neural Network based on CNN:

python3 run.py train cnn SNLI

Training configuration

This repository contains main configuration training file placed in 'config/main.ini'.

[TRAINING]
num_epochs = 10
batch_size = 512
eval_every = 20
learning_rate = 0.001
checkpoints_to_keep = 5
save_every = 100
log_device_placement = false

[DATA]
logs_path = logs
model_dir = model_dir

[PARAMS]
embedding_size = 64
loss_function = mse

Model configuration

Additionally each model contains its own specific configuration file in which changing hyperparameters is possible.

Multihead Attention Network configuration file

[PARAMS]
num_blocks = 2
num_heads = 8
use_residual = False
dropout_rate = 0.0

Convolutional Neural Network configuration file

[PARAMS]
num_filters = 50,50,50
filter_sizes = 2,3,4
dropout_rate = 0.0

Recurrent Neural Network configuration file

[PARAMS]
hidden_size = 128
cell_type = GRU
bidirectional = True

Training models with GPU support on Google Colaboratory

If you don't have an access to workstation with GPU, you can use the below exemplary Google Colaboratory notebook for training your models (CNN, RNN or Multihead) on SNLI or QQP datasets with usage of NVIDIA Tesla T4 16GB GPU available within Google Colaboratory backend: Multihead Siamese Nets in Google Colab

Testing models

Download pretrained models from the following link: pretrained Siamese Nets models, unzip and put them into ./model_dir directory. After that, you can test models either using predict mode of pipeline:

python3 run.py predict cnn

or using GUI demo:

python3 gui_demo.py

The below pictures presents Multihead Siamese Nets GUI for:

Positive example:

Negative example:

Attention weights visualization

In order to visualize multihead attention weights for compared sentences use GUI demo - check 'Visualize attention weights' checkbox which is visible after choosing model based on multihead attention mechanism.

The example of attention weights visualization looks as follows (4 attention heads):

Comparison of models

Experiments performed on GPU Nvidia GeForce GTX 1080Ti.

> SNLI dataset.

Experiment parameters:

Number of epochs : 10
Batch size : 512
Learning rate : 0.001

Number of training instances : 326959
Number of dev instances : 3674
Number of test instances : 36736

Embedding size : 64
Loss function: mean squared error (MSE)

Specific hyperparameters of models:

CNN	RNN	Multihead
num_filters = 50,50,50	hidden_size = 128	num_blocks = 2
filter_sizes = 2,3,4	cell_type = GRU	num_heads = 8
	bidirectional = True	use_residual = False
		layers_normalization = False

Evaluation results:

Model	Mean-Dev-Acc*	Last-Dev-Acc**	Test-Acc	Epoch Time
CNN	76.51	75.08	75.40	15.97s
bi-RNN	79.36	79.52	79.56	1 min 22.95s
Multihead	78.52	79.61	78.29	1 min 00.24s

*Mean-Dev-Acc: the mean development set accuaracy over all epochs.

**Last-Dev-Acc: the development set accuaracy for the last epoch.

Training curves (Accuracy & Loss):

> QQP dataset.

Experiment parameters:

Number of epochs : 10
Batch size : 512
Learning rate : 0.001

Number of training instances : 362646
Number of dev instances : 1213
Number of test instances : 40428

Embedding size : 64
Loss function: mean squared error (MSE)

Specific hyperparameters of models:

CNN	RNN	Multihead
num_filters = 50,50,50	hidden_size = 128	num_blocks = 2
filter_sizes = 2,3,4	cell_type = GRU	num_heads = 8
	bidirectional = True	use_residual = False
		layers_normalization = False

Evaluation results:

Model	Mean-Dev-Acc*	Last-Dev-Acc**	Test-Acc	Epoch Time
CNN	79.74	80.83	80.90	49.84s
bi-RNN	82.68	83.66	83.30	4 min 26.91s
Multihead	80.75	81.74	80.99	4 min 58.58s

*Mean-Dev-Acc: the mean development set accuracy over all epochs.

**Last-Dev-Acc: the development set accuracy for the last epoch.

Training curves (Accuracy & Loss):

Contributors

Code Contributors

This project exists thanks to all the people who contribute. [Contribute].

Financial Contributors

Become a financial contributor and help us sustain our community. [Contribute]

Individuals

Organizations

Support this project with your organization. Your logo will show up here with a link to your website. [Contribute]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 144

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (11) 🔗