All Projects → THUDM → Kobe

THUDM / Kobe

Licence: mit
Source code and dataset for KDD 2019 paper "Towards Knowledge-Based Personalized Product Description Generation in E-commerce"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Kobe

Tensorflow Nlp
NLP and Text Generation Experiments in TensorFlow 2.x / 1.x
Stars: ✭ 1,487 (+904.73%)
Mutual labels:  knowledge-graph, text-generation
Kenlg Reading
Reading list for knowledge-enhanced text generation, with a survey
Stars: ✭ 257 (+73.65%)
Mutual labels:  knowledge-graph, text-generation
Crslab
CRSLab is an open-source toolkit for building Conversational Recommender System (CRS).
Stars: ✭ 183 (+23.65%)
Mutual labels:  knowledge-graph, text-generation
Nlp Projects
word2vec, sentence2vec, machine reading comprehension, dialog system, text classification, pretrained language model (i.e., XLNet, BERT, ELMo, GPT), sequence labeling, information retrieval, information extraction (i.e., entity, relation and event extraction), knowledge graph, text generation, network embedding
Stars: ✭ 360 (+143.24%)
Mutual labels:  knowledge-graph, text-generation
Kogpt2 Finetuning
🔥 Korean GPT-2, KoGPT2 FineTuning cased. 한국어 가사 데이터 학습 🔥
Stars: ✭ 124 (-16.22%)
Mutual labels:  text-generation
Lic2019 Competition
2019语言与智能技术竞赛-基于知识图谱的主动聊天
Stars: ✭ 109 (-26.35%)
Mutual labels:  knowledge-graph
Workbase
Grakn Workbase (Knowledge IDE)
Stars: ✭ 106 (-28.38%)
Mutual labels:  knowledge-graph
Delta
DELTA is a deep learning based natural language and speech processing platform.
Stars: ✭ 1,479 (+899.32%)
Mutual labels:  text-generation
Guyu
pre-training and fine-tuning framework for text generation
Stars: ✭ 144 (-2.7%)
Mutual labels:  text-generation
Piggydb
Piggydb is a Web notebook application that provides you with a platform to build your knowledge personally or collaboratively.
Stars: ✭ 130 (-12.16%)
Mutual labels:  knowledge-graph
Pytextrank
Python implementation of TextRank for phrase extraction and summarization of text documents
Stars: ✭ 1,675 (+1031.76%)
Mutual labels:  knowledge-graph
Ampligraph
Python library for Representation Learning on Knowledge Graphs https://docs.ampligraph.org
Stars: ✭ 1,662 (+1022.97%)
Mutual labels:  knowledge-graph
Cluedatasetsearch
搜索所有中文NLP数据集,附常用英文NLP数据集
Stars: ✭ 2,112 (+1327.03%)
Mutual labels:  knowledge-graph
Kadot
Kadot, the unsupervised natural language processing library.
Stars: ✭ 108 (-27.03%)
Mutual labels:  text-generation
Renku
The Renku Project provides a platform and tools for reproducible and collaborative data analysis.
Stars: ✭ 141 (-4.73%)
Mutual labels:  knowledge-graph
Web Client
Generic Linked Data browser and UX component framework. Apache license.
Stars: ✭ 105 (-29.05%)
Mutual labels:  knowledge-graph
Datasets knowledge embedding
Datasets for Knowledge Graph Completion with textual information about the entities
Stars: ✭ 116 (-21.62%)
Mutual labels:  knowledge-graph
Theographic Bible Metadata
A knowledge graph of biblical people, places, periods, and passages.
Stars: ✭ 131 (-11.49%)
Mutual labels:  knowledge-graph
Capse
A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization (NAACL 2019)
Stars: ✭ 114 (-22.97%)
Mutual labels:  knowledge-graph
Onepiece Kg
a knowledge graph project for ONEPIECE /《海贼王》知识图谱
Stars: ✭ 123 (-16.89%)
Mutual labels:  knowledge-graph

KOBE

Project | arXiv

Towards KnOwledge-Based pErsonalized Product Description Generation in E-commerce.
Qibin Chen*, Junyang Lin*, Yichang Zhang, Hongxia Yang, Jingren Zhou, Jie Tang.
*Equal contribution.
In KDD 2019 (Applied Data Science Track)

Prerequisites

  • Linux or macOS
  • Python 3.6
  • PyTorch 1.0.1
  • NVIDIA GPU + CUDA cuDNN

Getting Started

Installation

Clone this repo.

git clone https://github.com/THUDM/KOBE
cd KOBE

Please install dependencies by

pip install -r requirements.txt

Dataset

  • We use the TaoDescribe dataset, which contains 2,129,187 product titles and descriptions in Chinese.
  • (optional) You can download the un-preprocessed dataset from here or here (for users in China).

Training

Download preprocessed data

  • First, download the preprocessed TaoDescribe dataset by running python scripts/download_preprocessed_tao.py.
    • If you're in regions where Dropbox are blocked (e.g. Mainland China), try python scripts/download_preprocessed_tao.py --cn.
  • (optional) You can peek into the data/aspect-user/preprocessed/test.src.str and data/aspect-user/preprocessed/test.tgt.str, which include product titles and descriptions in the test set, respectively. In src files, <x> <y> means this product is intended to show with aspect <x> and user category <y>. Note: this slightly differs from the <A-1>, <U-1> format descripted in the paper but basically they are the same thing. You can also peek into data/aspect-user/preprocessed/test.supporting_facts_str to see the knowledge we extracted from dbpedia for the corresponding product.

Start training

  • Different configurations for models in the paper are stored under the configs/ directory. Launch a specific experiment with --config to specify the path to your desired model config and --expname to specify the name/number of this experiment which will be used in logging.

  • We include three config files here: the baseline, KOBE without adding external knowledge, and full KOBE model.

  • Baseline

python core/train.py --config configs/baseline.yaml --expname baseline
  • KOBE without adding knowledge
python core/train.py --config configs/aspect_user.yaml --expname aspect-user
  • KOBE
python core/train.py --config configs/aspect_user_knowledge.yaml --expname aspect-user-knowledge

The default batch size is set to 64. If you are having OOM problems, try to decrease it with the flag --batch-size.

Track training progress

  • You can use TensorBoard. It can take (roughly) 12 hours for the training to stop. To get comparable results in paper, you need to train for even longer (by editing epoch in the config files). However, the current setting is enough to demonstrate the effectiveness of our model.
tensorboard --logdir experiments --port 6006

Generation

  • During training, the generated descriptions on the test set is saved at experiments/<expname>/candidate.txt and the ground truth is at reference.txt. This is generated by greedy search to save time in training and doesn't block repetitive terms.
  • To do beam search with beam width = 10, run the following command.
python core/train.py --config configs/baseline.yaml --mode eval --restore experiments/finals-baseline/checkpoint.pt --expname eval-baseline --beam-size 10

Evaluation

  • BLEU
  • DIVERSITY

If you have ANY difficulties to get things working in the above steps, feel free to open an issue. You can expect a reply within 24 hours.

Cite

Please cite our paper if you use this code in your own work:

@article{chen2019towards,
  title={Towards Knowledge-Based Personalized Product Description Generation in E-commerce},
  author={Chen, Qibin and Lin, Junyang and Zhang, Yichang and Yang, Hongxia and Zhou, Jingren and Tang, Jie},
  journal={arXiv preprint arXiv:1903.12457},
  year={2019}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].