All Projects → LG-1 → video_music_book_datasets

LG-1 / video_music_book_datasets

Licence: MIT license
NLP NER datasets video/music/book bio

Projects that are alternatives of or similar to video music book datasets

Zh Ner Keras
details
Stars: ✭ 252 (+245.21%)
Mutual labels:  ner
KoBERT-NER
NER Task with KoBERT (with Naver NLP Challenge dataset)
Stars: ✭ 76 (+4.11%)
Mutual labels:  ner
neural name tagging
Code for "Reliability-aware Dynamic Feature Composition for Name Tagging" (ACL2019)
Stars: ✭ 39 (-46.58%)
Mutual labels:  ner
Chinese Names Corpus
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
Stars: ✭ 3,053 (+4082.19%)
Mutual labels:  ner
Nettychat
基于Netty+TCP+Protobuf实现的Android IM库,包含Protobuf序列化、TCP拆包与粘包、长连接握手认证、心跳机制、断线重连机制、消息重发机制、读写超时机制、离线消息、线程池等功能。
Stars: ✭ 1,979 (+2610.96%)
Mutual labels:  bio
ChineseNER
中文NER的那些事儿
Stars: ✭ 241 (+230.14%)
Mutual labels:  ner
Malaya
Natural Language Toolkit for bahasa Malaysia, https://malaya.readthedocs.io/
Stars: ✭ 239 (+227.4%)
Mutual labels:  ner
datagrand bert
2019达观杯信息提取第5名代码
Stars: ✭ 20 (-72.6%)
Mutual labels:  ner
discord.bio
🚀 A powerful Node.js wrapper of https://discords.com/bio
Stars: ✭ 15 (-79.45%)
Mutual labels:  bio
PhoNER COVID19
COVID-19 Named Entity Recognition for Vietnamese (NAACL 2021)
Stars: ✭ 55 (-24.66%)
Mutual labels:  ner
ibio
Free bio link generator
Stars: ✭ 46 (-36.99%)
Mutual labels:  bio
Linkees
Awesome Linktree clone made with React ⚛️
Stars: ✭ 68 (-6.85%)
Mutual labels:  bio
sequence tagging
Named Entity Recognition (LSTM + CRF + FastText) with models for [historic] German
Stars: ✭ 25 (-65.75%)
Mutual labels:  ner
Ner Bert Pytorch
PyTorch solution of named entity recognition task Using Google AI's pre-trained BERT model.
Stars: ✭ 249 (+241.1%)
Mutual labels:  ner
extractacy
Spacy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, laboratory results)
Stars: ✭ 47 (-35.62%)
Mutual labels:  ner
Pytorch ner bilstm cnn crf
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF implement in pyotrch
Stars: ✭ 249 (+241.1%)
Mutual labels:  ner
NEMO
Neural Modeling for Named Entities and Morphology (Hebrew NER)
Stars: ✭ 25 (-65.75%)
Mutual labels:  ner
NER-and-Linking-of-Ancient-and-Historic-Places
An NER tool for ancient place names based on Pleiades and Spacy.
Stars: ✭ 26 (-64.38%)
Mutual labels:  ner
ai explore
机器学习、深度学习基础知识. 推荐系统及nlp相关算法实现
Stars: ✭ 54 (-26.03%)
Mutual labels:  ner
neuro-comma
🇷🇺 Punctuation restoration production-ready model for Russian language 🇷🇺
Stars: ✭ 46 (-36.99%)
Mutual labels:  ner

video_music_book_datasets

NLP NER datasets video/music/book bio

介绍

类似于人名/地名/组织机构名的命名体识别数据集,我花了几天时间标注了大约10000条视频/音乐/书籍数据.

数据的意义希冀能够基于此训练NLP模型识别句子中的视频/音乐/书籍等名称信息.

数据的标注过程:

  • 先纯手动提取标记了一部分(大约5000条),基于标注数据训练一个base模型,基于base模型重新审视校正标注数据.
  • 基于校正后的数据再训练一个模型,基于模型标注了另外约5000条数据.并对数据进行人工审核校验.
  • 最终数据集包含9632条数据.

理论上来说,任务也会是标准的NER任务. 难点:同一个名称可能是书籍也可能是视频(电视电影可能是由小说改编而来,有些场景关注书籍,另外一些可能关注视频),有些句子则只是提供了一长串并列的名称,可能没有更多的辅助信息;

示例:

放暑假了,最近剧荒,陈情令也才一个星期更新三次,根本不够看,问问大家有什么好看的电视剧或电影推荐吗?最好是那种搞笑,温暖的那种,日剧也可以,好像道骏枝佑的剧还不错!
label: 陈情令/video

最近有没有好看的电视剧推荐,国内国外的都可以,前两天再追少年派,但剧情走向越来越扯,非常想给编剧寄刀片,现在想看些正常三观的剧,大家有没有推荐哒?
label: 少年派/video

最近有些剧荒啊,有什么好看的电视剧或者电影可以推荐么?我看的也比较杂,权力的游戏,黑色止血钳,最近看的韩剧囚犯医生是大爱啊,类似这种类型的可以给我推荐一些么?
label: 权力的游戏/video	黑色止血钳/video	囚犯医生/video

我个人比较喜欢听古风歌曲,然后呢,我歌单里面可以给你推荐几首,归去来兮琵琶行清明上河图好可以去试着搜索一些古装剧的主题曲或者插曲
label: 归去来兮/music	琵琶行/music	清明上河图好/music

不知道你喜欢什么类型的小说,最近在看十宗罪,悬疑烧脑类的,讲述的是公安部门打击违法犯罪的故事,现在已经出到第六部了,估计够你看一个月了。大冰写的书也可以尝试看一下,文艺小清新类型的
label: 十宗罪/book

最终提供的数据集转换成了标准的BIO标注格式,欢迎尝试使用.

更多NLP细分数据集:https://github.com/SimmerChan/corpus

Copyrights & Cite

LG: [email protected]

Blog: https://www.ourantech.club/2019/08/31/029_视频音乐书籍标注数据/

Github: https://github.com/LG-1/video_music_book_datasets

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].