Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → wb14123 → Couplet Dataset

wb14123 / Couplet Dataset

Licence: agpl-3.0

Dataset for couplets. 70万条对联数据库。

Programming Languages

python

139335 projects - #7 most used programming language

Labels

dataset

Projects that are alternatives of or similar to Couplet Dataset

Inat comp

iNaturalist competition details

Stars: ✭ 444 (-24.62%)

Mutual labels: dataset

Cluepretrainedmodels

高质量中文预训练模型集合：最先进大模型、最快小模型、相似度专门模型

Stars: ✭ 493 (-16.3%)

Mutual labels: dataset

Nas Bench 201

NAS-Bench-201 API and Instruction

Stars: ✭ 537 (-8.83%)

Mutual labels: dataset

Mongodb Json Files

📦 A curated list of JSON / BSON datasets from the web in order to practice / use in MongoDB

Stars: ✭ 456 (-22.58%)

Mutual labels: dataset

Tensorflow object tracking video

Object Tracking in Tensorflow ( Localization Detection Classification ) developed to partecipate to ImageNET VID competition

Stars: ✭ 491 (-16.64%)

Mutual labels: dataset

Cdap

An open source framework for building data analytic applications.

Stars: ✭ 509 (-13.58%)

Mutual labels: dataset

Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO

Stars: ✭ 427 (-27.5%)

Mutual labels: dataset

Open stt

Open STT

Stars: ✭ 584 (-0.85%)

Mutual labels: dataset

Doccano

Open source annotation tool for machine learning practitioners.

Stars: ✭ 5,600 (+850.76%)

Mutual labels: dataset

Awesome Twitter Data

A list of Twitter datasets and related resources.

Stars: ✭ 533 (-9.51%)

Mutual labels: dataset

Lidar Bonnetal

Semantic and Instance Segmentation of LiDAR point clouds for autonomous driving

Stars: ✭ 465 (-21.05%)

Mutual labels: dataset

Chinese rumor dataset

中文谣言数据

Stars: ✭ 470 (-20.2%)

Mutual labels: dataset

Pokemon.json

Pokemon dataset in JSON.

Stars: ✭ 511 (-13.24%)

Mutual labels: dataset

Joke Dataset

A dataset of 200k English plaintext jokes.

Stars: ✭ 447 (-24.11%)

Mutual labels: dataset

Hate Speech And Offensive Language

Repository for the paper "Automated Hate Speech Detection and the Problem of Offensive Language", ICWSM 2017

Stars: ✭ 543 (-7.81%)

Mutual labels: dataset

Quickdraw Dataset

Documentation on how to access and use the Quick, Draw! Dataset.

Stars: ✭ 4,622 (+684.72%)

Mutual labels: dataset

Voice datasets

🔊 A comprehensive list of open-source datasets for voice and sound computing (50+ datasets).

Stars: ✭ 494 (-16.13%)

Mutual labels: dataset

Cvat

Powerful and efficient Computer Vision Annotation Tool (CVAT)

Stars: ✭ 6,557 (+1013.24%)

Mutual labels: dataset

Total Text Dataset

Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Stars: ✭ 580 (-1.53%)

Mutual labels: dataset

Pycococreator

Helper functions to create COCO datasets

Stars: ✭ 530 (-10.02%)

Mutual labels: dataset

View All Similar Projects ➔

对联数据集。

This is a project to fetch couplets from 冯重朴_梨味斋散叶_的博客

This dataset contains more than 700,000 couplets.

Run the spider:

scrapy runspider sina_spider.py

It will store the data into ./output/.

Download the data

There is an already fetched and cleaned dataset that can be used directly with the seq2seq model. You can download it at here.

The downloaded data contains 5 files:

train/in.txt: The input of the couplets. Each line is an input. Each word is split by space.
train/out.txt: The output of the couplets. Each line is the output for the same line in the in.txt. Each word is split by space.
test/in.txt: Same as train/in.txt but with less data.
test/out.txt: Same as train/out.txt but with less data.
vocabs: Vocabs file. Add <s> and <\s> as the first vocabs, which will be used to train in the seq2seq mode.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 589

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗