A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.

Stars: ✭ 101 (+172.97%)

Mutual labels: corpus

CNN-Sentence-Classification

A tensorflow implementation of Convolutional Neural Networks for Sentence Classification

Stars: ✭ 77 (+108.11%)

Mutual labels: sentence-classification

Dialogue-Corpus

No description or website provided.

Stars: ✭ 27 (-27.03%)

Mutual labels: corpus

mev-corpus

MEV Data Corpus

Stars: ✭ 77 (+108.11%)

Mutual labels: corpus

Awesome Deeplearning Resources

Deep Learning and deep reinforcement learning research papers and some codes

Stars: ✭ 2,483 (+6610.81%)

Mutual labels: corpus

thai-language

computer tools for thai language

Stars: ✭ 20 (-45.95%)

Mutual labels: corpus

Nlvr

Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.

Stars: ✭ 192 (+418.92%)

Mutual labels: corpus

When-in-Rome

A meta-corpus of functional harmonic analysis.

Stars: ✭ 35 (-5.41%)

Mutual labels: corpus

Nlp bahasa resources

A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia

Stars: ✭ 158 (+327.03%)

Mutual labels: corpus

pdf-corpus

Python script to quickly create hand-crafted PDF files

Stars: ✭ 17 (-54.05%)

Mutual labels: corpus

Wp2txt

WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata.

Stars: ✭ 145 (+291.89%)

Mutual labels: corpus

malay-dataset

Text corpus for Bahasa Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html

Stars: ✭ 189 (+410.81%)

Mutual labels: corpus

Prosody

Helsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text

Stars: ✭ 139 (+275.68%)

Mutual labels: corpus

KAREN

KAREN: Unifying Hatespeech Detection and Benchmarking

Stars: ✭ 18 (-51.35%)

Mutual labels: sentence-classification

Code Docstring Corpus

Preprocessed Python functions and docstrings for automated code documentation (code2doc) and automated code generation (doc2code) tasks.

Stars: ✭ 137 (+270.27%)

Mutual labels: corpus

gum

Repository for the Georgetown University Multilayer Corpus (GUM)

Stars: ✭ 71 (+91.89%)

Mutual labels: corpus

Khcoder

KH Coder: for Quantitative Content Analysis or Text Mining

Stars: ✭ 126 (+240.54%)

Mutual labels: corpus

egret-wenda-corpus

A Public Corpus for Machine Learning

Stars: ✭ 41 (+10.81%)

Mutual labels: corpus

ocr2text

Convert a PDF via OCR to a TXT file in UTF-8 encoding

Stars: ✭ 90 (+143.24%)

Mutual labels: corpus

folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for proces…

Stars: ✭ 56 (+51.35%)

Mutual labels: corpus

NSP-BERT

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

Stars: ✭ 166 (+348.65%)

Mutual labels: sentence-classification

KWDLC

Kyoto University Web Document Leads Corpus

Stars: ✭ 64 (+72.97%)

Mutual labels: corpus

jrte-corpus

Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)