All Projects → vineetjohn → Linguistic Style Transfer

vineetjohn / Linguistic Style Transfer

Licence: apache-2.0
Neural network parametrized objective to disentangle and transfer style and content in text

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Linguistic Style Transfer

Style Transfer In Text
Paper List for Style Transfer in Text
Stars: ✭ 1,030 (+871.7%)
Mutual labels:  natural-language-processing, style-transfer
Zhihu
This repo contains the source code in my personal column (https://zhuanlan.zhihu.com/zhaoyeyu), implemented using Python 3.6. Including Natural Language Processing and Computer Vision projects, such as text generation, machine translation, deep convolution GAN and other actual combat code.
Stars: ✭ 3,307 (+3019.81%)
Mutual labels:  natural-language-processing, style-transfer
D2l En
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 300 universities from 55 countries including Stanford, MIT, Harvard, and Cambridge.
Stars: ✭ 11,837 (+11066.98%)
Mutual labels:  natural-language-processing
Ios ml
List of Machine Learning, AI, NLP solutions for iOS. The most recent version of this article can be found on my blog.
Stars: ✭ 1,409 (+1229.25%)
Mutual labels:  natural-language-processing
Pytorchnlpbook
Code and data accompanying Natural Language Processing with PyTorch published by O'Reilly Media https://nlproc.info
Stars: ✭ 1,390 (+1211.32%)
Mutual labels:  natural-language-processing
Neural Style Transfer Papers
✏️ Neural Style Transfer: A Review
Stars: ✭ 1,372 (+1194.34%)
Mutual labels:  style-transfer
Magnitude
A fast, efficient universal vector embedding utility package.
Stars: ✭ 1,394 (+1215.09%)
Mutual labels:  natural-language-processing
Awesome Machine Learning
📖 List of some awesome university courses for Machine Learning! Feel free to contribute!
Stars: ✭ 99 (-6.6%)
Mutual labels:  natural-language-processing
Style transfer
Data-parallel image stylization using Caffe.
Stars: ✭ 106 (+0%)
Mutual labels:  style-transfer
Metaknowledge
A Python library for doing bibliometric and network analysis in science and health policy research
Stars: ✭ 102 (-3.77%)
Mutual labels:  natural-language-processing
Mxnet Gluon Style Transfer
Neural Style and MSG-Net
Stars: ✭ 105 (-0.94%)
Mutual labels:  style-transfer
Repo 2016
R, Python and Mathematica Codes in Machine Learning, Deep Learning, Artificial Intelligence, NLP and Geolocation
Stars: ✭ 103 (-2.83%)
Mutual labels:  natural-language-processing
Codesearchnet
Datasets, tools, and benchmarks for representation learning of code.
Stars: ✭ 1,378 (+1200%)
Mutual labels:  natural-language-processing
Spokestack Python
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application.
Stars: ✭ 103 (-2.83%)
Mutual labels:  natural-language-processing
Atis.keras
Spoken Language Understanding(SLU)/Slot Filling in Keras
Stars: ✭ 100 (-5.66%)
Mutual labels:  natural-language-processing
Easy Bert
A Dead Simple BERT API for Python and Java (https://github.com/google-research/bert)
Stars: ✭ 106 (+0%)
Mutual labels:  natural-language-processing
Clustype
Automatic Entity Recognition and Typing for Domain-Specific Corpora (KDD'15)
Stars: ✭ 99 (-6.6%)
Mutual labels:  natural-language-processing
Pynlp
A pythonic wrapper for Stanford CoreNLP.
Stars: ✭ 103 (-2.83%)
Mutual labels:  natural-language-processing
Anago
Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.
Stars: ✭ 1,392 (+1213.21%)
Mutual labels:  natural-language-processing
Pytorch Neural Style Transfer
Reconstruction of the original paper on neural style transfer (Gatys et al.). I've additionally included reconstruction scripts which allow you to reconstruct only the content or the style of the image - for better understanding of how NST works.
Stars: ✭ 106 (+0%)
Mutual labels:  style-transfer

Linguistic Style-Transfer

Neural network model to disentangle and transfer linguistic style in text


Prerequistites


Notes

  • Ignore CUDA_DEVICE_ORDER="PCI_BUS_ID", CUDA_VISIBLE_DEVICES="0" unless you're training with a GPU
  • Input data file format:
    • ${TEXT_FILE_PATH} should have 1 sentence per line.
    • Similarly, ${LABEL_FILE_PATH} should have 1 label per line.
  • Assuming that you already have g++ and bash installed, run the following commands to setup the kenlm library properly:
    • wget -O - https://kheafield.com/code/kenlm.tar.gz |tar xz
    • mkdir kenlm/build
    • cd kenlm/build
    • sudo apt-get install build-essential libboost-all-dev cmake zlib1g-dev libbz2-dev liblzma-dev (to install basic dependencies)
    • Install Boost:
      • Download boost_1_67_0.tar.bz2 from here
      • tar --bzip2 -xf /path/to/boost_1_67_0.tar.bz2
    • Install Eigen:
      • export EIGEN3_ROOT=$HOME/eigen-eigen-07105f7124f9
      • cd $HOME; wget -O - https://bitbucket.org/eigen/eigen/get/3.2.8.tar.bz2 |tar xj
      • Go back to the kenlm/build folder and run rm CMakeCache.txt
    • cmake ..
    • make -j2

Data Sources

Customer Review Datasets

  • Yelp Service Reviews - Link
  • Amazon Product Reviews - Link

Word Embeddings

References to ${VALIDATION_WORD_EMBEDDINGS_PATH} in the instructions below should be replaced by the path to the file glove.6B.100d.txt, which can be downloaded from here.

Opinion Lexicon

The file "data/opinion-lexicon/sentiment-words.txt", referenced in global_config.py can be downloaded from below page.


Pretraining

Run a corpus cleaner/adapter

./scripts/run_corpus_adapter.sh \
linguistic_style_transfer_model/corpus_adapters/${CORPUS_ADAPTER_SCRIPT}

Train word embedding model

./scripts/run_word_vector_training.sh \
--text-file-path ${TRAINING_TEXT_FILE_PATH} \
--model-file-path ${WORD_EMBEDDINGS_PATH}

Train validation classifier

CUDA_DEVICE_ORDER="PCI_BUS_ID" \
CUDA_VISIBLE_DEVICES="0" \
TF_CPP_MIN_LOG_LEVEL=1 \
./scripts/run_classifier_training.sh \
--text-file-path ${TRAINING_TEXT_FILE_PATH} \
--label-file-path ${TRAINING_LABEL_FILE_PATH} \
--training-epochs ${NUM_EPOCHS} --vocab-size ${VOCAB_SIZE}

This will produce a folder like saved-models-classifier/xxxxxxxxxx.

Train Kneser-Ney Language Model

Use the below command to train a n-gram language model (run from the kenlm/build folder)

./bin/lmplz -o ${n} --text ${TRAINING_TEXT_FILE_PATH} > ${LANGUAGE_MODEL_PATH}

Extract label-correlated words

./scripts/run_word_retriever.sh \
--text-file-path ${TEXT_FILE_PATH} \
--label-file-path ${LABEL_FILE_PATH} \
--logging-level ${LOGGING_LEVEL}

Style Transfer Model Training

Train style transfer model

CUDA_DEVICE_ORDER="PCI_BUS_ID" \
CUDA_VISIBLE_DEVICES="0" \
TF_CPP_MIN_LOG_LEVEL=1 \
./scripts/run_linguistic_style_transfer_model.sh \
--train-model \
--text-file-path ${TRAINING_TEXT_FILE_PATH} \
--label-file-path ${TRAINING_LABEL_FILE_PATH} \
--training-embeddings-file-path ${TRAINING_WORD_EMBEDDINGS_PATH} \
--validation-text-file-path ${VALIDATION_TEXT_FILE_PATH} \
--validation-label-file-path ${VALIDATION_LABEL_FILE_PATH} \
--validation-embeddings-file-path ${VALIDATION_WORD_EMBEDDINGS_PATH} \
--classifier-saved-model-path ${CLASSIFIER_SAVED_MODEL_PATH} \
--dump-embeddings \
--training-epochs ${NUM_EPOCHS} \
--vocab-size ${VOCAB_SIZE} \
--logging-level="DEBUG"

This will produce a folder like saved-models/xxxxxxxxxx. It will also produce output/xxxxxxxxxx-training if validation is turned on.

Infer style transferred sentences

CUDA_DEVICE_ORDER="PCI_BUS_ID" \
CUDA_VISIBLE_DEVICES="0" \
TF_CPP_MIN_LOG_LEVEL=1 \
./scripts/run_linguistic_style_transfer_model.sh \
--transform-text \
--evaluation-text-file-path ${TEST_TEXT_FILE_PATH} \
--saved-model-path ${SAVED_MODEL_PATH} \
--logging-level="DEBUG"

This will produce a folder like output/xxxxxxxxxx-inference.

Generate new sentences

CUDA_DEVICE_ORDER="PCI_BUS_ID" \
CUDA_VISIBLE_DEVICES="0" \
TF_CPP_MIN_LOG_LEVEL=1 \
./scripts/run_linguistic_style_transfer_model.sh \
--generate-novel-text \
--saved-model-path ${SAVED_MODEL_PATH} \
--num-sentences-to-generate ${NUM_SENTENCES}
--logging-level="DEBUG"

This will produce a folder like output/xxxxxxxxxx-generation.


Visualizations

Plot validation accuracy metrics

./scripts/run_validation_scores_visualization_generator.sh \
--saved-model-path ${SAVED_MODEL_PATH}

This will produce a few files like ${SAVED_MODEL_PATH}/validation_xxxxxxxxxx.svg

Plot T-SNE embedding spaces

./scripts/run_tsne_visualization_generator.sh \
--saved-model-path ${SAVED_MODEL_PATH}

This will produce a few files like ${SAVED_MODEL_PATH}/tsne_plots/tsne_embeddings_plot_xx.svg


Run evaluation metrics

Style Transfer

CUDA_DEVICE_ORDER="PCI_BUS_ID" \
CUDA_VISIBLE_DEVICES="0" \
TF_CPP_MIN_LOG_LEVEL=1 \
./scripts/run_style_transfer_evaluator.sh \
--classifier-saved-model-path ${CLASSIFIER_SAVED_MODEL_PATH} \
--text-file-path ${GENERATED_TEXT_FILE_PATH} \
--label-index ${GENERATED_TEXT_LABEL}

Alternatively, if you have a file with the labels, use the below command instead

CUDA_DEVICE_ORDER="PCI_BUS_ID" \
CUDA_VISIBLE_DEVICES="0" \
TF_CPP_MIN_LOG_LEVEL=1 \
./scripts/run_style_transfer_evaluator.sh \
--classifier-saved-model-path ${CLASSIFIER_SAVED_MODEL_PATH} \
--text-file-path ${GENERATED_TEXT_FILE_PATH} \
--label-file-path ${GENERATED_LABELS_FILE_PATH}

Content Preservation

./scripts/run_content_preservation_evaluator.sh \
--embeddings-file-path ${VALIDATION_WORD_EMBEDDINGS_PATH} \
--source-file-path ${TEST_TEXT_FILE_PATH} \
--target-file-path ${GENERATED_TEXT_FILE_PATH}

Latent Space Predicted Label Accuracy

./scripts/run_label_accuracy_prediction.sh \
--gold-labels-file-path ${TEST_LABEL_FILE_PATH} \
--saved-model-path ${SAVED_MODEL_PATH} \
--predictions-file-path ${PREDICTIONS_LABEL_FILE_PATH}

Language Fluency

./scripts/run_language_fluency_evaluator.sh \
--language-model-path ${LANGUAGE_MODEL_PATH} \
--generated-text-file-path ${GENERATED_TEXT_FILE_PATH}

Log-likelihood values are base 10.

All Evaluation Metrics (works only for the output of this project)

CUDA_DEVICE_ORDER="PCI_BUS_ID" \
CUDA_VISIBLE_DEVICES="0" \
TF_CPP_MIN_LOG_LEVEL=1 \
./scripts/run_all_evaluators.sh \
--embeddings-path ${VALIDATION_WORD_EMBEDDINGS_PATH} \
--language-model-path ${LANGUAGE_MODEL_PATH} \
--classifier-model-path ${CLASSIFIER_SAVED_MODEL_PATH} \
--training-path ${SAVED_MODEL_PATH} \
--inference-path ${GENERATED_SENTENCES_SAVE_PATH}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].