All Projects → Small Chinese Corpus → Similar Projects or Alternatives

145 Open source projects that are alternatives of or similar to Small Chinese Corpus

Weixin public corpus
微信公众号语料库
Stars: ✭ 465 (+0.65%)
Mutual labels:  corpus, chinese-nlp
Gossiping Chinese Corpus
PTT 八卦版問答中文語料
Stars: ✭ 137 (-70.35%)
Mutual labels:  corpus, chinese-nlp
Nlp chinese corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+1340.69%)
Mutual labels:  corpus, chinese-nlp
Chinese Nlp Corpus
Collections of Chinese NLP corpus
Stars: ✭ 438 (-5.19%)
Mutual labels:  corpus, chinese-nlp
open-discourse
Open Discourse is the first fully comprehensive corpus of the plenary proceedings of the federal German Parliament (Bundestag).
Stars: ✭ 47 (-89.83%)
Mutual labels:  corpus
bert tokenization for java
This is a java version of Chinese tokenization descried in BERT.
Stars: ✭ 39 (-91.56%)
Mutual labels:  chinese-nlp
PoetryCorpus
Поэтический корпус русского языка
Stars: ✭ 40 (-91.34%)
Mutual labels:  corpus
jrte-corpus
Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)
Stars: ✭ 66 (-85.71%)
Mutual labels:  corpus
Cluecorpus2020
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
Stars: ✭ 278 (-39.83%)
Mutual labels:  corpus
Filipino-Text-Benchmarks
Open-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (-95.24%)
Mutual labels:  corpus
Chinese-automatic-speech-recognition
Chinese speech recognition
Stars: ✭ 147 (-68.18%)
Mutual labels:  chinese-nlp
cljs-corpus
A greppable archive of ClojureScript code
Stars: ✭ 37 (-91.99%)
Mutual labels:  corpus
wordfish-python
extract relationships from standardized terms from corpus of interest with deep learning 🐟
Stars: ✭ 19 (-95.89%)
Mutual labels:  corpus
bible-corpus
A multilingual parallel corpus created from translations of the Bible.
Stars: ✭ 115 (-75.11%)
Mutual labels:  corpus
Chineseaddress ocr
Photographing Chinese-Address OCR implemented using CTPN+CTC+Address Correction. 拍照文档中文地址文字识别。
Stars: ✭ 309 (-33.12%)
Mutual labels:  chinese-nlp
CBLUE
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Stars: ✭ 379 (-17.97%)
Mutual labels:  corpus
DeepSentiPers
Repository for the experiments described in the paper named "DeepSentiPers: Novel Deep Learning Models Trained Over Proposed Augmented Persian Sentiment Corpus"
Stars: ✭ 17 (-96.32%)
Mutual labels:  corpus
berserker
Berserker - BERt chineSE woRd toKenizER
Stars: ✭ 17 (-96.32%)
Mutual labels:  chinese-nlp
Zhparser
zhparser is a PostgreSQL extension for full-text search of Chinese language
Stars: ✭ 418 (-9.52%)
Mutual labels:  chinese-nlp
SpiCE-Corpus
An open-access corpus of conversational bilingual speech in Cantonese and English
Stars: ✭ 33 (-92.86%)
Mutual labels:  corpus
LanguageCodes
We present a list of languages with their codes, families, regions and etc. We also present a list of multi-lingual corpora (with urls).
Stars: ✭ 70 (-84.85%)
Mutual labels:  corpus
kanji-frequency
Kanji usage frequency data collected from various sources
Stars: ✭ 92 (-80.09%)
Mutual labels:  corpus
thaigov-corpus
โครงการเก็บรวบรวมข่าวสารจากเว็บไซต์รัฐบาลไทย
Stars: ✭ 19 (-95.89%)
Mutual labels:  corpus
Fakenewscorpus
A dataset of millions of news articles scraped from a curated list of data sources.
Stars: ✭ 255 (-44.81%)
Mutual labels:  corpus
chinese-nlp-ner
一套针对中文实体识别的BLSTM-CRF解决方案
Stars: ✭ 14 (-96.97%)
Mutual labels:  chinese-nlp
BSD
The Business Scene Dialogue corpus
Stars: ✭ 51 (-88.96%)
Mutual labels:  corpus
named-entity-recognition-template
Build a deep learning model for predicting the named entities from text.
Stars: ✭ 51 (-88.96%)
Mutual labels:  corpus
EdgarAllanPoetry
Computer-generated poetry
Stars: ✭ 22 (-95.24%)
Mutual labels:  corpus
ChineseBert
This is a chinese Bert model specific for question answering
Stars: ✭ 24 (-94.81%)
Mutual labels:  chinese-nlp
Ltp
Language Technology Platform
Stars: ✭ 3,648 (+689.61%)
Mutual labels:  chinese-nlp
KWDLC
Kyoto University Web Document Leads Corpus
Stars: ✭ 64 (-86.15%)
Mutual labels:  corpus
fastmorph
Fast corpus search engine originally made for the Corpus of Written Tatar language
Stars: ✭ 14 (-96.97%)
Mutual labels:  corpus
CLUEmotionAnalysis2020
CLUE Emotion Analysis Dataset 细粒度情感分析数据集
Stars: ✭ 3 (-99.35%)
Mutual labels:  corpus
Corpora
A collection of small corpuses of interesting data for the creation of bots and similar stuff.
Stars: ✭ 4,293 (+829.22%)
Mutual labels:  corpus
pdf-corpus
Python script to quickly create hand-crafted PDF files
Stars: ✭ 17 (-96.32%)
Mutual labels:  corpus
Species-Names-Corpus
物种名称语料库。植物名,动物名。
Stars: ✭ 23 (-95.02%)
Mutual labels:  corpus
egret-wenda-corpus
A Public Corpus for Machine Learning
Stars: ✭ 41 (-91.13%)
Mutual labels:  corpus
Thulac Java
An Efficient Lexical Analyzer for Chinese
Stars: ✭ 285 (-38.31%)
Mutual labels:  chinese-nlp
THUCKE
THU Chinese Keyphrase Extraction Toolkit
Stars: ✭ 116 (-74.89%)
Mutual labels:  chinese-nlp
dialogue-datasets
collect the open dialog corpus and some useful data processing utils.
Stars: ✭ 24 (-94.81%)
Mutual labels:  corpus
ltp4j
ltp4j: Language Technology Platform For Java
Stars: ✭ 165 (-64.29%)
Mutual labels:  chinese-nlp
Bookcorpus
Crawl BookCorpus
Stars: ✭ 443 (-4.11%)
Mutual labels:  corpus
TV4Dialog
No description or website provided.
Stars: ✭ 33 (-92.86%)
Mutual labels:  corpus
fuzzing-corpus
My fuzzing corpus
Stars: ✭ 120 (-74.03%)
Mutual labels:  corpus
text-classification-cn
中文文本分类实践,基于搜狗新闻语料库,采用传统机器学习方法以及预训练模型等方法
Stars: ✭ 81 (-82.47%)
Mutual labels:  corpus
Korpora
Korean corpus repository
Stars: ✭ 270 (-41.56%)
Mutual labels:  corpus
Electra with tensorflow
This is an implementation of electra according to the paper {ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators}
Stars: ✭ 13 (-97.19%)
Mutual labels:  chinese-nlp
OpenDialog
An Open-Source Package for Chinese Open-domain Conversational Chatbot (中文闲聊对话系统,一键部署微信闲聊机器人)
Stars: ✭ 94 (-79.65%)
Mutual labels:  corpus
mev-corpus
MEV Data Corpus
Stars: ✭ 77 (-83.33%)
Mutual labels:  corpus
Wordless
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
Stars: ✭ 378 (-18.18%)
Mutual labels:  corpus
OneStopEnglishCorpus
No description or website provided.
Stars: ✭ 38 (-91.77%)
Mutual labels:  corpus
When-in-Rome
A meta-corpus of functional harmonic analysis.
Stars: ✭ 35 (-92.42%)
Mutual labels:  corpus
textbox
Text collections made available by the CLiGS group.
Stars: ✭ 19 (-95.89%)
Mutual labels:  corpus
malay-dataset
Text corpus for Bahasa Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html
Stars: ✭ 189 (-59.09%)
Mutual labels:  corpus
Medical-Names-Corpus
医疗语料库。医疗机构名语料库。药品本位码。
Stars: ✭ 26 (-94.37%)
Mutual labels:  corpus
PubMed-PICO-Detection
PubMed PICO Element Detection Dataset
Stars: ✭ 37 (-91.99%)
Mutual labels:  corpus
open2ch-dialogue-corpus
おーぷん2ちゃんねるをクロールして作成した対話コーパス
Stars: ✭ 65 (-85.93%)
Mutual labels:  corpus
Awesome Persian Nlp Ir
Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
Stars: ✭ 460 (-0.43%)
Mutual labels:  corpus
folia
FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for proces…
Stars: ✭ 56 (-87.88%)
Mutual labels:  corpus
gum
Repository for the Georgetown University Multilayer Corpus (GUM)
Stars: ✭ 71 (-84.63%)
Mutual labels:  corpus
1-60 of 145 similar projects