Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.

Stars: ✭ 192 (-28.89%)

Mutual labels: corpus

folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for proces…

Stars: ✭ 56 (-79.26%)

Mutual labels: corpus

textbox

Text collections made available by the CLiGS group.

Stars: ✭ 19 (-92.96%)

Mutual labels: corpus

DeepSentiPers

Repository for the experiments described in the paper named "DeepSentiPers: Novel Deep Learning Models Trained Over Proposed Augmented Persian Sentiment Corpus"

Stars: ✭ 17 (-93.7%)

Mutual labels: corpus

nytwit

New York Times Word Innovation Types dataset

Stars: ✭ 21 (-92.22%)

Mutual labels: corpus

KWDLC

Kyoto University Web Document Leads Corpus

Stars: ✭ 64 (-76.3%)

Mutual labels: corpus

trafilatura

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Stars: ✭ 711 (+163.33%)

Mutual labels: corpus

wordfish-python

extract relationships from standardized terms from corpus of interest with deep learning 🐟

Stars: ✭ 19 (-92.96%)

Mutual labels: corpus

proiel-treebank

Official releases of the PROIEL treebank of ancient Indo-European languages

Stars: ✭ 30 (-88.89%)

Mutual labels: corpus

pdf-corpus

Python script to quickly create hand-crafted PDF files

Stars: ✭ 17 (-93.7%)

Mutual labels: corpus

german-nouns

A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.

Stars: ✭ 101 (-62.59%)

Mutual labels: corpus

SpiCE-Corpus

An open-access corpus of conversational bilingual speech in Cantonese and English

Stars: ✭ 33 (-87.78%)

Mutual labels: corpus

Awesome Deeplearning Resources

Deep Learning and deep reinforcement learning research papers and some codes

Stars: ✭ 2,483 (+819.63%)

Mutual labels: corpus

TV4Dialog

No description or website provided.

Stars: ✭ 33 (-87.78%)

Mutual labels: corpus

kanji-frequency

Kanji usage frequency data collected from various sources

Stars: ✭ 92 (-65.93%)

Mutual labels: corpus

Efaqa Corpus Zh

❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库

Stars: ✭ 170 (-37.04%)

Mutual labels: corpus

PubMed-PICO-Detection

PubMed PICO Element Detection Dataset

Stars: ✭ 37 (-86.3%)

Mutual labels: corpus

mev-corpus

MEV Data Corpus

Stars: ✭ 77 (-71.48%)

Mutual labels: corpus

Species-Names-Corpus

物种名称语料库。植物名,动物名。

Stars: ✭ 23 (-91.48%)

Mutual labels: corpus

When-in-Rome

A meta-corpus of functional harmonic analysis.

Stars: ✭ 35 (-87.04%)

Mutual labels: corpus

thai-language

computer tools for thai language

Stars: ✭ 20 (-92.59%)

Mutual labels: corpus

malay-dataset

Text corpus for Bahasa Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html

Stars: ✭ 189 (-30%)

Mutual labels: corpus

EdgarAllanPoetry

Computer-generated poetry

Stars: ✭ 22 (-91.85%)

Mutual labels: corpus

gum

Repository for the Georgetown University Multilayer Corpus (GUM)

Stars: ✭ 71 (-73.7%)

Mutual labels: corpus

cljs-corpus

A greppable archive of ClojureScript code

Stars: ✭ 37 (-86.3%)

Mutual labels: corpus

ocr2text

Convert a PDF via OCR to a TXT file in UTF-8 encoding

Stars: ✭ 90 (-66.67%)

Mutual labels: corpus

dialogue-datasets

collect the open dialog corpus and some useful data processing utils.

Stars: ✭ 24 (-91.11%)

Mutual labels: corpus

opensource-voice-tools

A repo listing known open source voice tools, ordered by where they sit in the voice stack

Stars: ✭ 21 (-92.22%)

Mutual labels: corpus

bible-corpus

A multilingual parallel corpus created from translations of the Bible.

Stars: ✭ 115 (-57.41%)

Mutual labels: corpus

Chatbot-Training-Corpus

总结了一些可以用作聊天机器人训练实作的文字语聊，包含中英文不同语言

Stars: ✭ 117 (-56.67%)

Mutual labels: corpus

Medical-Names-Corpus

医疗语料库。医疗机构名语料库。药品本位码。

Stars: ✭ 26 (-90.37%)

Mutual labels: corpus

Speech-Corpus-Collection

A Collection of Speech Corpus for ASR and TTS

Stars: ✭ 113 (-58.15%)

Mutual labels: corpus

PoetryCorpus

Поэтический корпус русского языка

Stars: ✭ 40 (-85.19%)

Mutual labels: corpus

DANeS

DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)

Stars: ✭ 64 (-76.3%)

Mutual labels: corpus

fuzzing-corpus

My fuzzing corpus

Stars: ✭ 120 (-55.56%)

Mutual labels: corpus

rclc

Rich Context leaderboard competition, including the corpus and current SOTA for required tasks.

Stars: ✭ 20 (-92.59%)

Mutual labels: corpus

CBLUE

中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark

Stars: ✭ 379 (+40.37%)

Mutual labels: corpus

megs

A merged version of multiple open-source German speech datasets.

Stars: ✭ 21 (-92.22%)

Mutual labels: corpus

fastmorph

Fast corpus search engine originally made for the Corpus of Written Tatar language

Stars: ✭ 14 (-94.81%)

Mutual labels: corpus

Chinese Names Corpus

中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。

Stars: ✭ 3,053 (+1030.74%)

Mutual labels: corpus

jrte-corpus

Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)

Stars: ✭ 66 (-75.56%)

Mutual labels: corpus

Weibo terminater

Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator

Stars: ✭ 2,295 (+750%)

Mutual labels: corpus

OpenDialog

An Open-Source Package for Chinese Open-domain Conversational Chatbot (中文闲聊对话系统，一键部署微信闲聊机器人)

Stars: ✭ 94 (-65.19%)

Mutual labels: corpus

LanguageCodes

We present a list of languages with their codes, families, regions and etc. We also present a list of multi-lingual corpora (with urls).

Stars: ✭ 70 (-74.07%)

Mutual labels: corpus

Fakenewscorpus

A dataset of millions of news articles scraped from a curated list of data sources.

Stars: ✭ 255 (-5.56%)

Mutual labels: corpus

Indian ParallelCorpus

Curated list of publicly available parallel corpus for Indian Languages

Stars: ✭ 23 (-91.48%)

Mutual labels: corpus

open-discourse

Open Discourse is the first fully comprehensive corpus of the plenary proceedings of the federal German Parliament (Bundestag).

Stars: ✭ 47 (-82.59%)

Mutual labels: corpus

OneStopEnglishCorpus

No description or website provided.

Stars: ✭ 38 (-85.93%)

Mutual labels: corpus

text-classification-cn

中文文本分类实践，基于搜狗新闻语料库，采用传统机器学习方法以及预训练模型等方法

Stars: ✭ 81 (-70%)

Mutual labels: corpus

1-60 of 106 similar projects

›