Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → vineetjohn → Linguistic Style Transfer

vineetjohn / Linguistic Style Transfer

Licence: apache-2.0

Neural network parametrized objective to disentangle and transfer style and content in text

Programming Languages

python

139335 projects - #7 most used programming language

Labels

deep-learning neural-network natural-language-processing style-transfer

Projects that are alternatives of or similar to Linguistic Style Transfer

Style Transfer In Text

Paper List for Style Transfer in Text

Stars: ✭ 1,030 (+871.7%)

Mutual labels: natural-language-processing, style-transfer

Zhihu

This repo contains the source code in my personal column (https://zhuanlan.zhihu.com/zhaoyeyu), implemented using Python 3.6. Including Natural Language Processing and Computer Vision projects, such as text generation, machine translation, deep convolution GAN and other actual combat code.

Stars: ✭ 3,307 (+3019.81%)

Mutual labels: natural-language-processing, style-transfer

D2l En

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 300 universities from 55 countries including Stanford, MIT, Harvard, and Cambridge.

Stars: ✭ 11,837 (+11066.98%)

Mutual labels: natural-language-processing

Ios ml

List of Machine Learning, AI, NLP solutions for iOS. The most recent version of this article can be found on my blog.

Stars: ✭ 1,409 (+1229.25%)

Mutual labels: natural-language-processing

Pytorchnlpbook

Code and data accompanying Natural Language Processing with PyTorch published by O'Reilly Media https://nlproc.info

Stars: ✭ 1,390 (+1211.32%)

Mutual labels: natural-language-processing

Neural Style Transfer Papers

✏️ Neural Style Transfer: A Review

Stars: ✭ 1,372 (+1194.34%)

Mutual labels: style-transfer

Magnitude

A fast, efficient universal vector embedding utility package.

Stars: ✭ 1,394 (+1215.09%)

Mutual labels: natural-language-processing

Awesome Machine Learning

📖 List of some awesome university courses for Machine Learning! Feel free to contribute!

Stars: ✭ 99 (-6.6%)

Mutual labels: natural-language-processing

Style transfer

Data-parallel image stylization using Caffe.

Stars: ✭ 106 (+0%)

Mutual labels: style-transfer

Metaknowledge

A Python library for doing bibliometric and network analysis in science and health policy research

Stars: ✭ 102 (-3.77%)

Mutual labels: natural-language-processing

Mxnet Gluon Style Transfer

Neural Style and MSG-Net

Stars: ✭ 105 (-0.94%)

Mutual labels: style-transfer

Repo 2016

R, Python and Mathematica Codes in Machine Learning, Deep Learning, Artificial Intelligence, NLP and Geolocation

Stars: ✭ 103 (-2.83%)

Mutual labels: natural-language-processing

Codesearchnet

Datasets, tools, and benchmarks for representation learning of code.

Stars: ✭ 1,378 (+1200%)

Mutual labels: natural-language-processing

Spokestack Python

Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application.

Stars: ✭ 103 (-2.83%)

Mutual labels: natural-language-processing

Atis.keras

Spoken Language Understanding(SLU)/Slot Filling in Keras

Stars: ✭ 100 (-5.66%)

Mutual labels: natural-language-processing

Easy Bert

A Dead Simple BERT API for Python and Java (https://github.com/google-research/bert)

Stars: ✭ 106 (+0%)

Mutual labels: natural-language-processing

Clustype

Automatic Entity Recognition and Typing for Domain-Specific Corpora (KDD'15)

Stars: ✭ 99 (-6.6%)

Mutual labels: natural-language-processing

Pynlp

A pythonic wrapper for Stanford CoreNLP.

Stars: ✭ 103 (-2.83%)

Mutual labels: natural-language-processing

Anago

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

Stars: ✭ 1,392 (+1213.21%)

Mutual labels: natural-language-processing

Pytorch Neural Style Transfer

Reconstruction of the original paper on neural style transfer (Gatys et al.). I've additionally included reconstruction scripts which allow you to reconstruct only the content or the style of the image - for better understanding of how NST works.

Stars: ✭ 106 (+0%)

Mutual labels: style-transfer

View All Similar Projects ➔

Linguistic Style-Transfer

Neural network model to disentangle and transfer linguistic style in text

Prerequistites

Notes

Ignore CUDA_DEVICE_ORDER="PCI_BUS_ID", CUDA_VISIBLE_DEVICES="0" unless you're training with a GPU
Input data file format:
- ${TEXT_FILE_PATH} should have 1 sentence per line.
- Similarly, ${LABEL_FILE_PATH} should have 1 label per line.
Assuming that you already have g++ and bash installed, run the following commands to setup the kenlm library properly:
- wget -O - https://kheafield.com/code/kenlm.tar.gz |tar xz
- mkdir kenlm/build
- cd kenlm/build
- sudo apt-get install build-essential libboost-all-dev cmake zlib1g-dev libbz2-dev liblzma-dev (to install basic dependencies)
- Install Boost:
  - Download boost_1_67_0.tar.bz2 from here
  - tar --bzip2 -xf /path/to/boost_1_67_0.tar.bz2
- Install Eigen:
  - export EIGEN3_ROOT=$HOME/eigen-eigen-07105f7124f9
  - cd $HOME; wget -O - https://bitbucket.org/eigen/eigen/get/3.2.8.tar.bz2 |tar xj
  - Go back to the kenlm/build folder and run rm CMakeCache.txt
- cmake ..
- make -j2

Data Sources

Customer Review Datasets

Yelp Service Reviews - Link
Amazon Product Reviews - Link

Word Embeddings

References to ${VALIDATION_WORD_EMBEDDINGS_PATH} in the instructions below should be replaced by the path to the file glove.6B.100d.txt, which can be downloaded from here.

Opinion Lexicon

The file "data/opinion-lexicon/sentiment-words.txt", referenced in global_config.py can be downloaded from below page.

Pretraining

Run a corpus cleaner/adapter

./scripts/run_corpus_adapter.sh \
linguistic_style_transfer_model/corpus_adapters/${CORPUS_ADAPTER_SCRIPT}

Train word embedding model

./scripts/run_word_vector_training.sh \
--text-file-path ${TRAINING_TEXT_FILE_PATH} \
--model-file-path ${WORD_EMBEDDINGS_PATH}

Train validation classifier

CUDA_DEVICE_ORDER="PCI_BUS_ID" \
CUDA_VISIBLE_DEVICES="0" \
TF_CPP_MIN_LOG_LEVEL=1 \
./scripts/run_classifier_training.sh \
--text-file-path ${TRAINING_TEXT_FILE_PATH} \
--label-file-path ${TRAINING_LABEL_FILE_PATH} \
--training-epochs ${NUM_EPOCHS} --vocab-size ${VOCAB_SIZE}

This will produce a folder like saved-models-classifier/xxxxxxxxxx.

Train Kneser-Ney Language Model

Use the below command to train a n-gram language model (run from the kenlm/build folder)

./bin/lmplz -o ${n} --text ${TRAINING_TEXT_FILE_PATH} > ${LANGUAGE_MODEL_PATH}

Extract label-correlated words

./scripts/run_word_retriever.sh \
--text-file-path ${TEXT_FILE_PATH} \
--label-file-path ${LABEL_FILE_PATH} \
--logging-level ${LOGGING_LEVEL}

Style Transfer Model Training

Train style transfer model

CUDA_DEVICE_ORDER="PCI_BUS_ID" \
CUDA_VISIBLE_DEVICES="0" \
TF_CPP_MIN_LOG_LEVEL=1 \
./scripts/run_linguistic_style_transfer_model.sh \
--train-model \
--text-file-path ${TRAINING_TEXT_FILE_PATH} \
--label-file-path ${TRAINING_LABEL_FILE_PATH} \
--training-embeddings-file-path ${TRAINING_WORD_EMBEDDINGS_PATH} \
--validation-text-file-path ${VALIDATION_TEXT_FILE_PATH} \
--validation-label-file-path ${VALIDATION_LABEL_FILE_PATH} \
--validation-embeddings-file-path ${VALIDATION_WORD_EMBEDDINGS_PATH} \
--classifier-saved-model-path ${CLASSIFIER_SAVED_MODEL_PATH} \
--dump-embeddings \
--training-epochs ${NUM_EPOCHS} \
--vocab-size ${VOCAB_SIZE} \
--logging-level="DEBUG"

This will produce a folder like saved-models/xxxxxxxxxx. It will also produce output/xxxxxxxxxx-training if validation is turned on.

Infer style transferred sentences

CUDA_DEVICE_ORDER="PCI_BUS_ID" \
CUDA_VISIBLE_DEVICES="0" \
TF_CPP_MIN_LOG_LEVEL=1 \
./scripts/run_linguistic_style_transfer_model.sh \
--transform-text \
--evaluation-text-file-path ${TEST_TEXT_FILE_PATH} \
--saved-model-path ${SAVED_MODEL_PATH} \
--logging-level="DEBUG"

This will produce a folder like output/xxxxxxxxxx-inference.

Generate new sentences

CUDA_DEVICE_ORDER="PCI_BUS_ID" \
CUDA_VISIBLE_DEVICES="0" \
TF_CPP_MIN_LOG_LEVEL=1 \
./scripts/run_linguistic_style_transfer_model.sh \
--generate-novel-text \
--saved-model-path ${SAVED_MODEL_PATH} \
--num-sentences-to-generate ${NUM_SENTENCES}
--logging-level="DEBUG"

This will produce a folder like output/xxxxxxxxxx-generation.

Visualizations

Plot validation accuracy metrics

./scripts/run_validation_scores_visualization_generator.sh \
--saved-model-path ${SAVED_MODEL_PATH}

This will produce a few files like ${SAVED_MODEL_PATH}/validation_xxxxxxxxxx.svg

Plot T-SNE embedding spaces

./scripts/run_tsne_visualization_generator.sh \
--saved-model-path ${SAVED_MODEL_PATH}

This will produce a few files like ${SAVED_MODEL_PATH}/tsne_plots/tsne_embeddings_plot_xx.svg

Run evaluation metrics

Style Transfer

CUDA_DEVICE_ORDER="PCI_BUS_ID" \
CUDA_VISIBLE_DEVICES="0" \
TF_CPP_MIN_LOG_LEVEL=1 \
./scripts/run_style_transfer_evaluator.sh \
--classifier-saved-model-path ${CLASSIFIER_SAVED_MODEL_PATH} \
--text-file-path ${GENERATED_TEXT_FILE_PATH} \
--label-index ${GENERATED_TEXT_LABEL}

Alternatively, if you have a file with the labels, use the below command instead

CUDA_DEVICE_ORDER="PCI_BUS_ID" \
CUDA_VISIBLE_DEVICES="0" \
TF_CPP_MIN_LOG_LEVEL=1 \
./scripts/run_style_transfer_evaluator.sh \
--classifier-saved-model-path ${CLASSIFIER_SAVED_MODEL_PATH} \
--text-file-path ${GENERATED_TEXT_FILE_PATH} \
--label-file-path ${GENERATED_LABELS_FILE_PATH}

Content Preservation

./scripts/run_content_preservation_evaluator.sh \
--embeddings-file-path ${VALIDATION_WORD_EMBEDDINGS_PATH} \
--source-file-path ${TEST_TEXT_FILE_PATH} \
--target-file-path ${GENERATED_TEXT_FILE_PATH}

Latent Space Predicted Label Accuracy

./scripts/run_label_accuracy_prediction.sh \
--gold-labels-file-path ${TEST_LABEL_FILE_PATH} \
--saved-model-path ${SAVED_MODEL_PATH} \
--predictions-file-path ${PREDICTIONS_LABEL_FILE_PATH}

Language Fluency

./scripts/run_language_fluency_evaluator.sh \
--language-model-path ${LANGUAGE_MODEL_PATH} \
--generated-text-file-path ${GENERATED_TEXT_FILE_PATH}

Log-likelihood values are base 10.

All Evaluation Metrics (works only for the output of this project)

CUDA_DEVICE_ORDER="PCI_BUS_ID" \
CUDA_VISIBLE_DEVICES="0" \
TF_CPP_MIN_LOG_LEVEL=1 \
./scripts/run_all_evaluators.sh \
--embeddings-path ${VALIDATION_WORD_EMBEDDINGS_PATH} \
--language-model-path ${LANGUAGE_MODEL_PATH} \
--classifier-model-path ${CLASSIFIER_SAVED_MODEL_PATH} \
--training-path ${SAVED_MODEL_PATH} \
--inference-path ${GENERATED_SENTENCES_SAVE_PATH}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 106

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗