All Projects → EternalFeather → Transformer In Generating Dialogue

EternalFeather / Transformer In Generating Dialogue

Licence: apache-2.0
An Implementation of 'Attention is all you need' with Chinese Corpus

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Transformer In Generating Dialogue

Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+237.19%)
Mutual labels:  attention-mechanism, transformer
Nmt Keras
Neural Machine Translation with Keras
Stars: ✭ 501 (+314.05%)
Mutual labels:  attention-mechanism, transformer
Pytorch Original Transformer
My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing otherwise seemingly hard concepts. Currently included IWSLT pretrained models.
Stars: ✭ 411 (+239.67%)
Mutual labels:  attention-mechanism, transformer
galerkin-transformer
[NeurIPS 2021] Galerkin Transformer: a linear attention without softmax
Stars: ✭ 111 (-8.26%)
Mutual labels:  transformer, attention-mechanism
Se3 Transformer Pytorch
Implementation of SE3-Transformers for Equivariant Self-Attention, in Pytorch. This specific repository is geared towards integration with eventual Alphafold2 replication.
Stars: ✭ 73 (-39.67%)
Mutual labels:  attention-mechanism, transformer
Seq2seq chatbot
基于seq2seq模型的简单对话系统的tf实现,具有embedding、attention、beam_search等功能,数据集是Cornell Movie Dialogs
Stars: ✭ 308 (+154.55%)
Mutual labels:  chatbot, attention-mechanism
Seq2seqchatbots
A wrapper around tensor2tensor to flexibly train, interact, and generate data for neural chatbots.
Stars: ✭ 466 (+285.12%)
Mutual labels:  chatbot, transformer
pytorch-transformer-chatbot
PyTorch v1.2에서 생긴 Transformer API 를 이용한 간단한 Chitchat 챗봇
Stars: ✭ 44 (-63.64%)
Mutual labels:  chatbot, transformer
Sockeye
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet
Stars: ✭ 990 (+718.18%)
Mutual labels:  attention-mechanism, transformer
Moel
MoEL: Mixture of Empathetic Listeners
Stars: ✭ 38 (-68.6%)
Mutual labels:  chatbot, transformer
linformer
Implementation of Linformer for Pytorch
Stars: ✭ 119 (-1.65%)
Mutual labels:  transformer, attention-mechanism
Multiturndialogzoo
Multi-turn dialogue baselines written in PyTorch
Stars: ✭ 106 (-12.4%)
Mutual labels:  chatbot, transformer
pynmt
a simple and complete pytorch implementation of neural machine translation system
Stars: ✭ 13 (-89.26%)
Mutual labels:  transformer, attention-mechanism
Transformer
A TensorFlow Implementation of the Transformer: Attention Is All You Need
Stars: ✭ 3,646 (+2913.22%)
Mutual labels:  attention-mechanism, transformer
Transformer-in-Transformer
An Implementation of Transformer in Transformer in TensorFlow for image classification, attention inside local patches
Stars: ✭ 40 (-66.94%)
Mutual labels:  transformer, attention-mechanism
Transformer Tts
A Pytorch Implementation of "Neural Speech Synthesis with Transformer Network"
Stars: ✭ 418 (+245.45%)
Mutual labels:  attention-mechanism, transformer
FragmentVC
Any-to-any voice conversion by end-to-end extracting and fusing fine-grained voice fragments with attention
Stars: ✭ 134 (+10.74%)
Mutual labels:  transformer, attention-mechanism
Image-Caption
Using LSTM or Transformer to solve Image Captioning in Pytorch
Stars: ✭ 36 (-70.25%)
Mutual labels:  transformer, attention-mechanism
Awesome Bert Nlp
A curated list of NLP resources focused on BERT, attention mechanism, Transformer networks, and transfer learning.
Stars: ✭ 567 (+368.6%)
Mutual labels:  attention-mechanism, transformer
Eqtransformer
EQTransformer, a python package for earthquake signal detection and phase picking using AI.
Stars: ✭ 95 (-21.49%)
Mutual labels:  attention-mechanism, transformer

An Implementation of Attention is all you need with Chinese Corpus

  The code is an implementation of Paper Attention is all you need working for dialogue generation tasks like: ChatbotText Generation and so on.
  Thanks to every friends who have raised issues and helped solve them. Your contribution is very important for the improvement of this project. Due to the limited support of the 'static graph mode' in coding, we decided to move the features to 2.0.0-beta1 version. However if you worry about the problems from docker building and service creation with version issues, we still keep an old version of the code written by eager mode using tensorflow 1.12.x version to refer.

Documents

|-- root/
    |-- data/
        |-- src-train.csv
        |-- src-val.csv
        |-- tgt-train.csv
        `-- tgt-val.csv
    |-- old_version/
        |-- data_loader.py
        |-- eval.py
        |-- make_dic.py
        |-- modules.py
        |-- params.py
        |-- requirements.txt
        `-- train.py
    |-- tf1.12.0-eager/
        |-- bleu.py
        |-- main.ipynb
        |-- modules.py
        |-- params.py
        |-- requirements.txt
        `-- utils.py
    |-- images/
    |-- bleu.py
    |-- main-v2.ipynb
    |-- modules-v2.py
    |-- params.py
    |-- requirements.txt
    `-- utils-v2.py

Requirements

  • Numpy >= 1.13.1
  • Tensorflow-gpu == 1.12.0
  • Tensorflow-gpu == 2.0.0-beta1
    • cudatoolkit >= 10.0
    • cudnn >= 7.4
    • nvidia cuda driver version >= 410.x
  • tqdm
  • nltk
  • jupyter notebook

Construction

  As we all know the Translation System can be used in implementing conversational model just by replacing the paris of two different sentences to questions and answers. After all, the basic conversation model named "Sequence-to-Sequence" is develped from translation system. Therefore, why we not to improve the efficiency of conversation model in generating dialogues?

  With the development of BERT-based models, more and more nlp tasks are refreshed constantly. However, the language model is not contained in BERT's open source tasks. There is no doubt that on this way we still have a long way to go.

Model Advantages

  A transformer model handles variable-sized input using stacks of self-attention layers instead of RNNs or CNNs. This general architecture has a number of advantages and special ticks. Now let's take them out:

  • It make no assumptions about the temporal/spatial relationships across the data.(However this was proved to be not sure from AutoML)
  • Layer outputs can be calculated in parallel, instead of a series like an RNN.(Faster training)
  • Distant items can affect each other's output without passing through many RNN-steps, or CNN layers.(Lower cost)
  • It can learn long-range dependencies, which is a challenge of dialogue system.

Implementation details

  In the newest version of our code, we complete the details described in paper.

Data Generation

  • Use tfrecord to unified data storage format.
  • Use dataset to load the processed chinese token datasets.

Positional Encoding

  • Since the model doesn't contain any memory mechanism, positional encoding is added to give it some information about the relative position of the words in the sentence by representing a token into a d-dimensional space where tokens with similar meaning will be closer to each other.

Mask

  • We create two different type of mask during training. One is for the padding masking, the other is for the decoder look_ahead masking to keep the following tokens invisible when generating the previous ones.

Scaled dot product attention

  • The attention function used by the transformer takes three inputs: Q,K,V. The equation used to calculate the attention weights, which is scaled by a factor of square root of the depth is:

Multi-head attention

  • Multi-head attention consists of four parts: Linear layersMulti-head attentionConcatenation of heads and Final linear layers.

Pointwise Feedforward Network

  • Pointwise feedforward network consists of two fully-connected layers with ReLU activation in between.

Learning Rate Schedule

  • Use the adam optimizer with a custom learning rate scheduler according to the formula like:

Model Downsides

However, such a strong architecture still have some downsides:

  • For a time-series, the output for a time-step is calculated from the entire history of only the inputs and current hidden-state(Just like the different between CRF & HMM). So that it may be less efficient.
  • As the first part above said, if the input does have a temporal/spatial relationship, like text generation task, the model may be lost in the context.

Usage

  • old_version
    • STEP 1. Download dialogue corpus with format like sample datasets and extract them to data/ folder.
    • STEP 2. Adjust hyper parameters in params.py if you want.
    • STEP 3. Run make_dic.py to generate vocabulary files to a new folder named dictionary.
    • STEP 4. Run train.py to build the model. Checkpoint will be stored in checkpoint folder while the tensorflow event files can be found in logdir.
    • STEP 5. Run eval.py to evaluate the result with testing data. Result will be stored in Results folder.
  • new_version(2.0 & 1.12.x with eager mode)
    • follow the .ipynb to run train & eval & demo
      • if you use GPU to speed up training processing, please set up your device in the code.(It support multi-workers training)

Results

  • demo
- Source: 肥 宅 初 夜 可 以 賣 多 少 `
- Ground Truth: 肥 宅 還 是 去 打 手 槍 吧
- Predict: 肥 宅 還 是 去 打 手 槍 吧

- Source: 兇 的 女 生 484 都 很 胸
- Ground Truth: 我 看 都 是 醜 的 比 較 凶
- Predict: 我 看 都 是 醜 的 比 較 <UNK>

- Source: 留 髮 不 留 頭
- Ground Truth: 還 好 我 早 就 禿 頭 了
- Predict: 還 好 我 早 就 禿 頭 了

- Source: 當 人 好 痛 苦 R 的 八 卦
- Ground Truth: 去 中 國 就 不 用 當 人 了
- Predict: 去 中 國 就 不 會 有 了 -

- Source: 有 沒 有 今 天 捷 運 的 八 卦
- Ground Truth: 有 - 真 的 有 多
- Predict: 有 - 真 的 有 多

- Source: 2016 帶 走 了 什 麼 `
- Ground Truth: HellKitty 麥 當 勞 歡 樂 送 開 門 -
- Predict: <UNK> 麥 當 勞 歡 樂 送 開 門 -

- Source: 有 沒 有 多 益 很 賺 的 八 卦
- Ground Truth: 比 大 型 包 裹 貴
- Predict: 比 大 型 包 <UNK> 貴

- Source: 邊 緣 人 收 到 地 震 警 報 了
- Ground Truth: 都 跑 到 窗 邊 了 才 來
- Predict: 都 跑 到 <UNK> 邊 了 才 來

- Source: 車 震
- Ground Truth: 沒 被 刪 版 主 是 有 眼 睛 der
- Predict: 沒 被 刪 版 主 是 有 眼 睛 der

- Source: 在 家 跌 倒 的 八 卦 `
- Ground Truth: 傷 到 腦 袋 - 可 憐
- Predict: 傷 到 腦 袋 - 可 憐

- Source: 大 家 很 討 厭 核 核 嗎 `
- Ground Truth: 核 核 欠 幹 阿
- Predict: 核 核 欠 幹 阿

- Source: 館 長 跟 黎 明 打 誰 贏 -
- Ground Truth: 我 愛 黎 明 - 我 愛 黎 明 -
- Predict: 我 愛 <UNK> 明 - 我 愛 <UNK> 明 -

- Source: 嘻 嘻 打 打
- Ground Truth: 媽 的 智 障 姆 咪 滾 喇 幹
- Predict: 媽 的 智 障 姆 咪 滾 喇 幹

- Source: 經 典 電 影 台 詞
- Ground Truth: 超 時 空 要 愛 裡 滿 滿 的 梗
- Predict: 超 時 空 要 愛 裡 滿 滿 滿 的

- Source: 2B 守 得 住 街 亭 嗎 `
- Ground Truth: 被 病 毒 滅 亡 真 的 會 -
- Predict: <UNK> 守 得 住

Comparison

Implement feedforward through fully connected.

  • Training Accuracy
  • Training Loss

Implement feedforward through convolution in only one dimention.

  • Training Accuracy
  • Training Loss

Tips

  If you try to use AutoGraph to speed up your training process, please make sure the datasets is padded to a fixed length. Because of the graph rebuilding operation will be activated during training, which may affect the performance. Our code only ensures the performance of version 2.0, and the lower ones can try to refer it.

Reference

Thanks for Transformer and Tensorflow

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].