Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

Stars: ✭ 112 (-75.91%)

Mutual labels: corpus, linguistics

Fakenewscorpus

A dataset of millions of news articles scraped from a curated list of data sources.

Stars: ✭ 255 (-45.16%)

Mutual labels: corpus, natural-language-processing

Awesome Hungarian Nlp

A curated list of NLP resources for Hungarian

Stars: ✭ 121 (-73.98%)

Mutual labels: corpus, natural-language-processing

Efaqa Corpus Zh

❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库

Stars: ✭ 170 (-63.44%)

Mutual labels: corpus, natural-language-processing

Typing Assistant

Typing Assistant provides the ability to autocomplete words and suggests predictions for the next word. This makes typing faster, more intelligent and reduces effort.

Stars: ✭ 32 (-93.12%)

Mutual labels: corpus, natural-language-processing

folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for proces…

Stars: ✭ 56 (-87.96%)

Mutual labels: corpus, linguistics

proiel-treebank

Official releases of the PROIEL treebank of ancient Indo-European languages

Stars: ✭ 30 (-93.55%)

Mutual labels: corpus, linguistics

Ua Gec

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

Stars: ✭ 108 (-76.77%)

Mutual labels: corpus, natural-language-processing

Pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

Stars: ✭ 426 (-8.39%)

Mutual labels: natural-language-processing, linguistics

Ja.text8

Japanese text8 corpus for word embedding.

Stars: ✭ 79 (-83.01%)

Mutual labels: corpus, natural-language-processing

Chinese Nlp Corpus

Collections of Chinese NLP corpus

Stars: ✭ 438 (-5.81%)

Mutual labels: corpus, chinese-nlp

Coarij

Corpus of Annual Reports in Japan

Stars: ✭ 55 (-88.17%)

Mutual labels: corpus, natural-language-processing

Nlp bahasa resources

A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia

Stars: ✭ 158 (-66.02%)

Mutual labels: corpus, natural-language-processing

Nlp chinese corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

Stars: ✭ 6,656 (+1331.4%)

Mutual labels: corpus, chinese-nlp

Insuranceqa Corpus Zh

🚁 保险行业语料库，聊天机器人

Stars: ✭ 821 (+76.56%)

Mutual labels: corpus, natural-language-processing

Nlvr

Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.

Stars: ✭ 192 (-58.71%)

Mutual labels: corpus, natural-language-processing

Ltp

Language Technology Platform

Stars: ✭ 3,648 (+684.52%)

Mutual labels: natural-language-processing, chinese-nlp

View All Similar Projects ➔

微信公众号语料库

部分网络抓取的微信公众号的文章，已经去除HTML，只包含了纯文本。每行一篇，是JSON格式，name是微信公众号名字，account是微信公众号ID，title是题目，content是正文。

数据用zip分卷压缩过的, 没有密码。预览可以看preview.json。

目前数据大约3G，数据会定期更新增加。

请只用于研究用途。

有问题或者特殊需求直接建Issue。

[email protected]

欢迎志同道合的小伙伴加入校宝一起来搞有意思的事情！https://www.xiaobaoonline.com/pc/contactjoin

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 465

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗