Weibo terminaterFinal Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Stars: ✭ 2,295 (+6854.55%)
Clue中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (+7248.48%)
OpenDialogAn Open-Source Package for Chinese Open-domain Conversational Chatbot (中文闲聊对话系统,一键部署微信闲聊机器人)
Stars: ✭ 94 (+184.85%)
DatasetsPoetry-related datasets developed by THUAIPoet (Jiuge) group.
Stars: ✭ 111 (+236.36%)
CBLUE中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Stars: ✭ 379 (+1048.48%)
Nlp chinese corpus大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+20069.7%)
dialogue-datasetscollect the open dialog corpus and some useful data processing utils.
Stars: ✭ 24 (-27.27%)
Cluecorpus2020Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
Stars: ✭ 278 (+742.42%)
thaigov-corpusโครงการเก็บรวบรวมข่าวสารจากเว็บไซต์รัฐบาลไทย
Stars: ✭ 19 (-42.42%)
SequenceToSequenceA seq2seq with attention dialogue/MT model implemented by TensorFlow.
Stars: ✭ 11 (-66.67%)
Chi-WikiA programmer who is not good at Chinese is not a advanced middle school student.
Stars: ✭ 18 (-45.45%)
ansj segansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典
Stars: ✭ 6,213 (+18727.27%)
weapp-poem诗词墨客 - 最全中华古诗词小程序
Stars: ✭ 409 (+1139.39%)
nytwitNew York Times Word Innovation Types dataset
Stars: ✭ 21 (-36.36%)
DataCLUEDataCLUE: 数据为中心的NLP基准和工具包
Stars: ✭ 133 (+303.03%)
Soft-CHS用于存放一些自行汉化的小软件 1.madvr
Stars: ✭ 97 (+193.94%)
kanji-frequencyKanji usage frequency data collected from various sources
Stars: ✭ 92 (+178.79%)
pinyin4jsA opensource javascript library for converting chinese to pinyin。welcome Star : P
Stars: ✭ 153 (+363.64%)
babi toolsAugmentation scripts for the bAbI Dialog Tasks dataset
Stars: ✭ 14 (-57.58%)
react-flashcardsA simple React + Firebase flashcard application
Stars: ✭ 29 (-12.12%)
djinnidjinni中文文档,一个根据djinni写成的demo(ios),解决了macOS Sierra 10.12环境下无法build的问题
Stars: ✭ 52 (+57.58%)
goSpidersome small project and some articles
Stars: ✭ 56 (+69.7%)
unihandecodeunihandecode is a transliteration library to convert all characters/words in Unicode into ASCII alphabet that aware with Language preference priorities
Stars: ✭ 71 (+115.15%)
ADEMTOWARDS AN AUTOMATIC TURING TEST: LEARNING TO EVALUATE DIALOGUE RESPONSES
Stars: ✭ 25 (-24.24%)
hanUsing Tensorflow to train a model to detect miswritten Chinese characters.
Stars: ✭ 12 (-63.64%)
chinese-novel📙 Chinese novel database 最全的中国古典小说数据库。
Stars: ✭ 131 (+296.97%)
gumRepository for the Georgetown University Multilayer Corpus (GUM)
Stars: ✭ 71 (+115.15%)
chinese-rhymer轻量中文押韵神器,100%绝对可用,傻瓜式命令行操作,秒速实现烈焰单押,闪电双押,龙卷三押以及海啸式四押,目前版本 v0.2.6。Search for rhymes for Chinese words, with 1, 2, 3 and 4 characters, released on PyPI with current version of 0.2.6.
Stars: ✭ 72 (+118.18%)
hsk-vocabulary🇨🇳Open source Chinese HSK vocabulary list with example sentences
Stars: ✭ 27 (-18.18%)
shuduShudu 為一個開源文字處理平台,目的是讓閱讀者能夠舒服的閱讀、編寫文案。
Stars: ✭ 25 (-24.24%)
subedSubtitle editor for Emacs
Stars: ✭ 143 (+333.33%)
LanguageCodesWe present a list of languages with their codes, families, regions and etc. We also present a list of multi-lingual corpora (with urls).
Stars: ✭ 70 (+112.12%)
ocr2textConvert a PDF via OCR to a TXT file in UTF-8 encoding
Stars: ✭ 90 (+172.73%)
resumeMy Chinese and English Resumes in LaTeX with Font Awesome 5
Stars: ✭ 296 (+796.97%)
video-subtitle-extractor视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
Stars: ✭ 1,763 (+5242.42%)
lwodfThe Chinese edition of Live Working or Die Fighting: How the Working Class Went Global (劳工的全球化), authored by Paul Mason, translated by the CNPolitics translation team.
Stars: ✭ 25 (-24.24%)
suboxSubox是一个基于 Electron 开发的以媒体资源文件为基础的字幕搜索桌面应用。可根据设定的搜索目录和忽略路径索引所有可播放的资源文件,并且以文件名为基础索引字幕文件或者辅助搜索字幕文件并下载。
Stars: ✭ 17 (-48.48%)
iTop-CNiTop in chinese
Stars: ✭ 36 (+9.09%)
rasa bot整理:基于Rasa-NLU和Rasa-Core的任务型ChatBot
Stars: ✭ 51 (+54.55%)
BSDThe Business Scene Dialogue corpus
Stars: ✭ 51 (+54.55%)
youtube toolTool for extracting comments or subtitles from youtube video's
Stars: ✭ 89 (+169.7%)
dialogreDialogue-Based Relation Extraction
Stars: ✭ 124 (+275.76%)
libgosubsgolang library to read and write various subtitle formats
Stars: ✭ 20 (-39.39%)
FCH-TTSA fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型,适用于英语、普通话/中文、日语、韩语、俄语和藏语(当前已测试)。
Stars: ✭ 154 (+366.67%)
fuzzychineseA small package to fuzzy match chinese words
Stars: ✭ 50 (+51.52%)