wb14123 / Couplet Dataset
Licence: agpl-3.0
Dataset for couplets. 70万条对联数据库。
Stars: ✭ 589
Programming Languages
python
139335 projects - #7 most used programming language
Labels
Projects that are alternatives of or similar to Couplet Dataset
Mongodb Json Files
📦 A curated list of JSON / BSON datasets from the web in order to practice / use in MongoDB
Stars: ✭ 456 (-22.58%)
Mutual labels: dataset
Tensorflow object tracking video
Object Tracking in Tensorflow ( Localization Detection Classification ) developed to partecipate to ImageNET VID competition
Stars: ✭ 491 (-16.64%)
Mutual labels: dataset
Cdap
An open source framework for building data analytic applications.
Stars: ✭ 509 (-13.58%)
Mutual labels: dataset
Io
Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO
Stars: ✭ 427 (-27.5%)
Mutual labels: dataset
Doccano
Open source annotation tool for machine learning practitioners.
Stars: ✭ 5,600 (+850.76%)
Mutual labels: dataset
Awesome Twitter Data
A list of Twitter datasets and related resources.
Stars: ✭ 533 (-9.51%)
Mutual labels: dataset
Lidar Bonnetal
Semantic and Instance Segmentation of LiDAR point clouds for autonomous driving
Stars: ✭ 465 (-21.05%)
Mutual labels: dataset
Hate Speech And Offensive Language
Repository for the paper "Automated Hate Speech Detection and the Problem of Offensive Language", ICWSM 2017
Stars: ✭ 543 (-7.81%)
Mutual labels: dataset
Quickdraw Dataset
Documentation on how to access and use the Quick, Draw! Dataset.
Stars: ✭ 4,622 (+684.72%)
Mutual labels: dataset
Voice datasets
🔊 A comprehensive list of open-source datasets for voice and sound computing (50+ datasets).
Stars: ✭ 494 (-16.13%)
Mutual labels: dataset
Cvat
Powerful and efficient Computer Vision Annotation Tool (CVAT)
Stars: ✭ 6,557 (+1013.24%)
Mutual labels: dataset
Total Text Dataset
Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.
Stars: ✭ 580 (-1.53%)
Mutual labels: dataset
对联数据集。
This is a project to fetch couplets from 冯重朴_梨味斋散叶_的博客
This dataset contains more than 700,000 couplets.
Run the spider:
scrapy runspider sina_spider.py
It will store the data into ./output/
.
Download the data
There is an already fetched and cleaned dataset that can be used directly with the seq2seq model. You can download it at here.
The downloaded data contains 5 files:
-
train/in.txt
: The input of the couplets. Each line is an input. Each word is split by space. -
train/out.txt
: The output of the couplets. Each line is the output for the same line in thein.txt
. Each word is split by space. -
test/in.txt
: Same astrain/in.txt
but with less data. -
test/out.txt
: Same astrain/out.txt
but with less data. -
vocabs
: Vocabs file. Add<s>
and<\s>
as the first vocabs, which will be used to train in the seq2seq mode.
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].