All Projects → FeiSun → Producttitlesummarizationcorpus

FeiSun / Producttitlesummarizationcorpus

Dataset for CIKM 2018 paper "Multi-Source Pointer Network for Product Title Summarization"

Projects that are alternatives of or similar to Producttitlesummarizationcorpus

Extendedsumm
On Generating Extended Summaries of Long Documents
Stars: ✭ 63 (+3.28%)
Mutual labels:  dataset, text-summarization
Stevens Vlp16 Dataset
This dataset is captured using a Velodyne VLP-16, which is mounted on an UGV - Clearpath Jackal, on Stevens Institute of Technology campus
Stars: ✭ 58 (-4.92%)
Mutual labels:  dataset
Images Web Crawler
This package is a complete tool for creating a large dataset of images (specially designed -but not only- for machine learning enthusiasts). It can crawl the web, download images, rename / resize / covert the images and merge folders..
Stars: ✭ 51 (-16.39%)
Mutual labels:  dataset
Covidnet Ct
COVID-Net Open Source Initiative - Models and Data for COVID-19 Detection in Chest CT
Stars: ✭ 57 (-6.56%)
Mutual labels:  dataset
Codar
✅ CODAR is a Framework built using PyTorch to analyze post (Text+Media) and predict Cyber Bullying and offensive content. 💬📷
Stars: ✭ 52 (-14.75%)
Mutual labels:  dataset
City Scapes Script
Download City Scapes Dataset using this script
Stars: ✭ 57 (-6.56%)
Mutual labels:  dataset
Php Ml
PHP-ML - Machine Learning library for PHP
Stars: ✭ 7,900 (+12850.82%)
Mutual labels:  dataset
Dream
DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension
Stars: ✭ 60 (-1.64%)
Mutual labels:  dataset
Geodata Br
Free open public domain geographic data of Brazil available in multiple languages and formats.
Stars: ✭ 57 (-6.56%)
Mutual labels:  dataset
Fifa Fut Data
Web-scraping script that writes the data of all players from FutHead and FutBin to a CSV file or a DB
Stars: ✭ 55 (-9.84%)
Mutual labels:  dataset
Clothing Detection Dataset
Clothing detection dataset
Stars: ✭ 55 (-9.84%)
Mutual labels:  dataset
Knyfe
knyfe is a python utility for rapid exploration of datasets.
Stars: ✭ 54 (-11.48%)
Mutual labels:  dataset
View Finding Network
A deep ranking network that learns to find good compositions in a photograph.
Stars: ✭ 57 (-6.56%)
Mutual labels:  dataset
Covid 19
Novel Coronavirus 2019 time series data on cases
Stars: ✭ 1,060 (+1637.7%)
Mutual labels:  dataset
Char Rnn Tensorflow
Multi-layer Recurrent Neural Networks for character-level language models implements by TensorFlow
Stars: ✭ 58 (-4.92%)
Mutual labels:  dataset
Courseraforums
Anonymized versions of the discussion threads from the forums of 60 Coursera MOOCs
Stars: ✭ 50 (-18.03%)
Mutual labels:  dataset
Quandl Python
Stars: ✭ 1,076 (+1663.93%)
Mutual labels:  dataset
Cinemanet
Stars: ✭ 57 (-6.56%)
Mutual labels:  dataset
Pysgs
📈 Python interface for the Brazilian Central Bank's Time Series Management System (SGS)
Stars: ✭ 60 (-1.64%)
Mutual labels:  dataset
Maskrcnn Modanet
A Mask R-CNN Keras implementation with Modanet annotations on the Paperdoll dataset
Stars: ✭ 59 (-3.28%)
Mutual labels:  dataset

Product Title Summarization(PTS) Corpus

Dataset for CIKM 2018 paper "Multi-Source Pointer Network for Product Title Summarization"

Description

Each line in corpus.txt consists of a pair of titles (original title, short title), their brands, and commodity names. Each line is tab-delimited (two tabs) with the following format:

<original title>\t\t<short title>\t\t<brand>\t\t<commodity name>

File

  • corpus: the dataset used in the cikm 2018 paper, the length of short title < 11.

  • big_corpus: much larger dataset, the length of short title < 13.

    We split the file into 5 files with prefix big_corpus.tar.gz_ due to the limitation on github.com (less than 100m).

    The way to reconstruct the big_corpus file:

    cd big_corpus
    cat big_corpus.tar.gz_* > big_corpus.tar.gz
    tar zxvf big_corpus.tar.gz
    

Note:

brand may contain multi-language versions(separated using “/”) for some products, e.g., Nintendo/任天堂.

Citation

@inproceedings{Sun:CIKM2018,
author = {Fei Sun and Peng Jiang and Hanxiao Sun and Changhua Pei and Wenwu Ou and Xiaobo Wang},
title = {{Multi-Source Pointer Network for Product Title Summarization}},
booktitle = {CIKM},
year = 2018
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].