All Projects → DevinZ1993 → Chinese Poetry Generation

DevinZ1993 / Chinese Poetry Generation

Licence: mit
An RNN-based Chinese Poem Generator

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Chinese Poetry Generation

Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (-40.38%)
Mutual labels:  nlp-machine-learning
Natural Language Processing Specialization
This repo contains my coursework, assignments, and Slides for Natural Language Processing Specialization by deeplearning.ai on Coursera
Stars: ✭ 151 (-27.4%)
Mutual labels:  nlp-machine-learning
Hntitlenator
Test your HN title against a neural network
Stars: ✭ 184 (-11.54%)
Mutual labels:  nlp-machine-learning
Lazy
Lazy, AI chatbot service.
Stars: ✭ 141 (-32.21%)
Mutual labels:  nlp-machine-learning
Financial News Dataset
Reuters and Bloomberg
Stars: ✭ 147 (-29.33%)
Mutual labels:  nlp-machine-learning
Java Deep Learning Cookbook
Code for Java Deep Learning Cookbook
Stars: ✭ 156 (-25%)
Mutual labels:  nlp-machine-learning
Dl Text
Text pre-processing library for deep learning (Keras, tensorflow).
Stars: ✭ 119 (-42.79%)
Mutual labels:  nlp-machine-learning
Sarah
Terminal Assistant For SemiCode OS
Stars: ✭ 201 (-3.37%)
Mutual labels:  nlp-machine-learning
Awesome Nlp Polish
A curated list of resources dedicated to Natural Language Processing (NLP) in polish. Models, tools, datasets.
Stars: ✭ 153 (-26.44%)
Mutual labels:  nlp-machine-learning
Ktext
Utilities for preprocessing text for deep learning with Keras
Stars: ✭ 182 (-12.5%)
Mutual labels:  nlp-machine-learning
Onnxt5
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.
Stars: ✭ 143 (-31.25%)
Mutual labels:  nlp-machine-learning
Zzz Retired openstt
RETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:
Stars: ✭ 146 (-29.81%)
Mutual labels:  nlp-machine-learning
Pytorch Sentiment Neuron
Stars: ✭ 178 (-14.42%)
Mutual labels:  nlp-machine-learning
Seq2seq tutorial
Code For Medium Article "How To Create Data Products That Are Magical Using Sequence-to-Sequence Models"
Stars: ✭ 132 (-36.54%)
Mutual labels:  nlp-machine-learning
Datastories Semeval2017 Task4
Deep-learning model presented in "DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis".
Stars: ✭ 184 (-11.54%)
Mutual labels:  nlp-machine-learning
Nlp Pretrained Model
A collection of Natural language processing pre-trained models.
Stars: ✭ 122 (-41.35%)
Mutual labels:  nlp-machine-learning
Pytorch Question Answering
Important paper implementations for Question Answering using PyTorch
Stars: ✭ 154 (-25.96%)
Mutual labels:  nlp-machine-learning
Character Based Cnn
Implementation of character based convolutional neural network
Stars: ✭ 205 (-1.44%)
Mutual labels:  nlp-machine-learning
Chatbot
一个可以自己进行训练的中文聊天机器人, 根据自己的语料训练出自己想要的聊天机器人,可以用于智能客服、在线问答、智能聊天等场景。目前包含seq2seq、seqGAN版本、tf2.0版本、pytorch版本。
Stars: ✭ 2,441 (+1073.56%)
Mutual labels:  nlp-machine-learning
Nlp profiler
A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.
Stars: ✭ 181 (-12.98%)
Mutual labels:  nlp-machine-learning

Planning-based Poetry Generation

A classical Chinese quatrain generator based on the RNN encoder-decoder framework.

Here I tried to implement the planning-based architecture purposed in Wang et al. 2016, whereas technical details might be different from the original paper. My purpose of making this was not to refine the neural network model and give better results by myself. Rather, I wish to provide a simple framework as said in the paper along with convenient data processing toolkits for all those who want to experiment their ideas on this interesting task.

By Jun 2018, this project has been refactored into Python3 using TensorFlow 1.8.

Code Organization

Structure of Code

The diagram above illustrates major dependencies in this codebase in terms of either data or functionalities. Here I tried to organize code around data, and make every data processing module a singleton at runtime. Batch processing is only done when the produced result is either missing or outdated.

Dependencies

Data Processing

Run the following command to generate training data from source text data:

./data_utils.py

Depending on your hardware, this can take you a cup of tea or over one hour. The keyword extraction is based on the TextRank algorithm, which can take a long time to converge.

Training

The poem planner was based on Gensim's Word2Vec module. To train it, simply run:

./train.py -p

The poem generator was implemented as an enc-dec model with attention mechanism. To train it, type the following command:

./train.py -g

You can also choose to train the both models altogether by running:

./train.py -a

To erase all trained models, run:

./train.py --clean

As it turned out, the attention-based generator model after refactor was really hard to train well. From my side, the average loss will typically stuck at ~5.6 and won't go down any more. There should be considerable space to improve it.

Run Tests

Type the following command:

./main.py

Then each time you type in a hint text in Chinese, it should return a kind of gibberish poem. It's up to you to decide how to improve the models and training methods to make them work better.

Improve It

  • To add data processing tools, consider adding dependency configs into __dependency_dict in paths.py. It helps you to automatically update processed data when it goes stale.

  • To improve the planning model, please refine the planner class in plan.py.

  • To improve the generation model, please refine the generator class in generate.py.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].