All Projects → kostyaev → sentence2vec

kostyaev / sentence2vec

Licence: MIT license
Deep sentence embedding using Sequence to Sequence learning

Programming Languages

Jupyter Notebook
11667 projects
lua
6591 projects

Projects that are alternatives of or similar to sentence2vec

torch-asg
Auto Segmentation Criterion (ASG) implemented in pytorch
Stars: ✭ 42 (+82.61%)
Mutual labels:  torch, seq2seq
Neuralconvo
Neural conversational model in Torch
Stars: ✭ 773 (+3260.87%)
Mutual labels:  torch, seq2seq
Pytorch Beam Search Decoding
PyTorch implementation of beam search decoding for seq2seq models
Stars: ✭ 204 (+786.96%)
Mutual labels:  torch, seq2seq
skt
Sanskrit compound segmentation using seq2seq model
Stars: ✭ 21 (-8.7%)
Mutual labels:  seq2seq
lang2logic-PyTorch
PyTorch port of the paper "Language to Logical Form with Neural Attention"
Stars: ✭ 34 (+47.83%)
Mutual labels:  seq2seq
NLP-paper
🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/
Stars: ✭ 23 (+0%)
Mutual labels:  seq2seq
flambeau
Nim bindings to libtorch
Stars: ✭ 60 (+160.87%)
Mutual labels:  torch
kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Stars: ✭ 456 (+1882.61%)
Mutual labels:  seq2seq
dynmt-py
Neural machine translation implementation using dynet's python bindings
Stars: ✭ 17 (-26.09%)
Mutual labels:  seq2seq
tensorflow-ml-nlp-tf2
텐서플로2와 머신러닝으로 시작하는 자연어처리 (로지스틱회귀부터 BERT와 GPT3까지) 실습자료
Stars: ✭ 245 (+965.22%)
Mutual labels:  seq2seq
Word-Level-Eng-Mar-NMT
Translating English sentences to Marathi using Neural Machine Translation
Stars: ✭ 37 (+60.87%)
Mutual labels:  seq2seq
probabilistic nlg
Tensorflow Implementation of Stochastic Wasserstein Autoencoder for Probabilistic Sentence Generation (NAACL 2019).
Stars: ✭ 28 (+21.74%)
Mutual labels:  seq2seq
chatbot
一个基于深度学习的中文聊天机器人,这里有详细的教程与代码,每份代码都有详细的注释,作为学习是美好的选择。A Chinese chatbot based on deep learning.
Stars: ✭ 94 (+308.7%)
Mutual labels:  seq2seq
Captcha-Cracking
Crack number and Chinese captcha with both traditional and deep learning methods, based on Torch and python.
Stars: ✭ 35 (+52.17%)
Mutual labels:  torch
Adversarial-Learning-for-Generative-Conversational-Agents
This repository contains a new adversarial training method for Generative Conversational Agents
Stars: ✭ 71 (+208.7%)
Mutual labels:  seq2seq
S2VT-seq2seq-video-captioning-attention
S2VT (seq2seq) video captioning with bahdanau & luong attention implementation in Tensorflow
Stars: ✭ 18 (-21.74%)
Mutual labels:  seq2seq
classifier multi label seq2seq attention
multi-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification,seq2seq,attention,beam search
Stars: ✭ 26 (+13.04%)
Mutual labels:  seq2seq
ALIGNet
code to train a neural network to align pairs of shapes without needing ground truth warps for supervision
Stars: ✭ 58 (+152.17%)
Mutual labels:  torch
hypnettorch
Package for working with hypernetworks in PyTorch.
Stars: ✭ 66 (+186.96%)
Mutual labels:  torch
vrn-torch-to-keras
Transfer pre-trained VRN model from torch to Keras/Tensorflow
Stars: ✭ 63 (+173.91%)
Mutual labels:  torch

Deep sentence embedding using Sequence to Sequence learning

screenshot

Installing

  1. Install Torch.

  2. Install the following additional Lua libs:

    luarocks install nn
    luarocks install rnn
    luarocks install penlight

    To train with CUDA install the latest CUDA drivers, toolkit and run:

    luarocks install cutorch
    luarocks install cunn

    To train with opencl install the lastest Opencl torch lib:

    luarocks install cltorch
    luarocks install clnn
  3. Download the Cornell Movie-Dialogs Corpus and extract all the files into data/cornell_movie_dialogs.

Training

th train.lua [-h / options]

Use the --dataset NUMBER option to control the size of the dataset. Training on the full dataset takes about 5h for a single epoch.

The model will be saved to data/model.t7 after each epoch if it has improved (error decreased).

Getting a pretrained model

Download:

  1. The pretraned model.t7
  2. Vocabulary vocab.t7

Put them into the data directory.

Extracting embeddings from sentences

Run the following command

th -i extract_embeddings.lua --model_file data/model.t7 --input_file data/test_sentences.txt --output_file data/embeddings.t7 --cuda

To visualize 2D projections of the embeddings refer to: example.ipynb

Acknowledgments

This implementation utilizes code from Marc-André Cournoyer's repo

License

MIT License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].