FeiSun / Producttitlesummarizationcorpus
Dataset for CIKM 2018 paper "Multi-Source Pointer Network for Product Title Summarization"
Stars: ✭ 61
Projects that are alternatives of or similar to Producttitlesummarizationcorpus
Extendedsumm
On Generating Extended Summaries of Long Documents
Stars: ✭ 63 (+3.28%)
Mutual labels: dataset, text-summarization
Stevens Vlp16 Dataset
This dataset is captured using a Velodyne VLP-16, which is mounted on an UGV - Clearpath Jackal, on Stevens Institute of Technology campus
Stars: ✭ 58 (-4.92%)
Mutual labels: dataset
Images Web Crawler
This package is a complete tool for creating a large dataset of images (specially designed -but not only- for machine learning enthusiasts). It can crawl the web, download images, rename / resize / covert the images and merge folders..
Stars: ✭ 51 (-16.39%)
Mutual labels: dataset
Covidnet Ct
COVID-Net Open Source Initiative - Models and Data for COVID-19 Detection in Chest CT
Stars: ✭ 57 (-6.56%)
Mutual labels: dataset
Codar
✅ CODAR is a Framework built using PyTorch to analyze post (Text+Media) and predict Cyber Bullying and offensive content. 💬📷
Stars: ✭ 52 (-14.75%)
Mutual labels: dataset
City Scapes Script
Download City Scapes Dataset using this script
Stars: ✭ 57 (-6.56%)
Mutual labels: dataset
Dream
DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension
Stars: ✭ 60 (-1.64%)
Mutual labels: dataset
Geodata Br
Free open public domain geographic data of Brazil available in multiple languages and formats.
Stars: ✭ 57 (-6.56%)
Mutual labels: dataset
Fifa Fut Data
Web-scraping script that writes the data of all players from FutHead and FutBin to a CSV file or a DB
Stars: ✭ 55 (-9.84%)
Mutual labels: dataset
Knyfe
knyfe is a python utility for rapid exploration of datasets.
Stars: ✭ 54 (-11.48%)
Mutual labels: dataset
View Finding Network
A deep ranking network that learns to find good compositions in a photograph.
Stars: ✭ 57 (-6.56%)
Mutual labels: dataset
Covid 19
Novel Coronavirus 2019 time series data on cases
Stars: ✭ 1,060 (+1637.7%)
Mutual labels: dataset
Char Rnn Tensorflow
Multi-layer Recurrent Neural Networks for character-level language models implements by TensorFlow
Stars: ✭ 58 (-4.92%)
Mutual labels: dataset
Courseraforums
Anonymized versions of the discussion threads from the forums of 60 Coursera MOOCs
Stars: ✭ 50 (-18.03%)
Mutual labels: dataset
Pysgs
📈 Python interface for the Brazilian Central Bank's Time Series Management System (SGS)
Stars: ✭ 60 (-1.64%)
Mutual labels: dataset
Maskrcnn Modanet
A Mask R-CNN Keras implementation with Modanet annotations on the Paperdoll dataset
Stars: ✭ 59 (-3.28%)
Mutual labels: dataset
Product Title Summarization(PTS) Corpus
Dataset for CIKM 2018 paper "Multi-Source Pointer Network for Product Title Summarization"
Description
Each line in corpus.txt consists of a pair of titles (original title, short title), their brands, and commodity names. Each line is tab-delimited
(two tabs) with the following format:
<original title>\t\t<short title>\t\t<brand>\t\t<commodity name>
File
-
corpus: the dataset used in the cikm 2018 paper, the length of short title < 11.
-
big_corpus: much larger dataset, the length of short title < 13.
We split the file into 5 files with prefix
big_corpus.tar.gz_
due to the limitation on github.com (less than 100m).The way to reconstruct the big_corpus file:
cd big_corpus cat big_corpus.tar.gz_* > big_corpus.tar.gz tar zxvf big_corpus.tar.gz
Note:
brand
may contain multi-language versions(separated using “/”) for some products, e.g., Nintendo/任天堂.
Citation
@inproceedings{Sun:CIKM2018,
author = {Fei Sun and Peng Jiang and Hanxiao Sun and Changhua Pei and Wenwu Ou and Xiaobo Wang},
title = {{Multi-Source Pointer Network for Product Title Summarization}},
booktitle = {CIKM},
year = 2018
}
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].