Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.

Stars: ✭ 192 (+910.53%)

Mutual labels: corpus

tvsub

TVsub: DCU-Tencent Chinese-English Dialogue Corpus

Stars: ✭ 40 (+110.53%)

Mutual labels: corpus

Wp2txt

WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata.

Stars: ✭ 145 (+663.16%)

Mutual labels: corpus

Code Docstring Corpus

Preprocessed Python functions and docstrings for automated code documentation (code2doc) and automated code generation (doc2code) tasks.

Stars: ✭ 137 (+621.05%)

Mutual labels: corpus

rclc

Rich Context leaderboard competition, including the corpus and current SOTA for required tasks.

Stars: ✭ 20 (+5.26%)

Mutual labels: corpus

workshops

Scholarly Communications Workshops

Stars: ✭ 13 (-31.58%)

Mutual labels: digital-humanities

poesy

Poetic processing, for Python.

Stars: ✭ 28 (+47.37%)

Mutual labels: literary-studies

ocr2text

Convert a PDF via OCR to a TXT file in UTF-8 encoding

Stars: ✭ 90 (+373.68%)

Mutual labels: corpus

Awesome Deeplearning Resources

Deep Learning and deep reinforcement learning research papers and some codes

Stars: ✭ 2,483 (+12968.42%)

Mutual labels: corpus

evt-viewer

Edition Visualization Technology 2 - development

Stars: ✭ 66 (+247.37%)

Mutual labels: digital-humanities

Nlp bahasa resources

A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia

Stars: ✭ 158 (+731.58%)

Mutual labels: corpus

gum

Repository for the Georgetown University Multilayer Corpus (GUM)

Stars: ✭ 71 (+273.68%)

Mutual labels: corpus

Prosody

Helsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text

Stars: ✭ 139 (+631.58%)

Mutual labels: corpus

2018-2019

The GitHub repository containing all the material related to the Computational Thinking and Programming course of the Digital Humanities and Digital Knowledge degree at the University of Bologna (a.a. 2018/2019).

Stars: ✭ 29 (+52.63%)

Mutual labels: digital-humanities

Processando-Processing

Esforço para: Traduzir para o português material de referência sobre Processing; e portar para o Processing Modo Python tutoriais e outros exemplos.

Stars: ✭ 12 (-36.84%)

Mutual labels: portuguese

Khcoder

KH Coder: for Quantitative Content Analysis or Text Mining

Stars: ✭ 126 (+563.16%)

Mutual labels: corpus

Speech-Corpus-Collection

A Collection of Speech Corpus for ASR and TTS

Stars: ✭ 113 (+494.74%)

Mutual labels: corpus

Dialog corpus

用于训练中英文对话系统的语料库 Datasets for Training Chatbot System

Stars: ✭ 1,662 (+8647.37%)

Mutual labels: corpus

Sejong Corpus

Korean sejong corpus download and simple analysis

Stars: ✭ 116 (+510.53%)

Mutual labels: corpus

Probabilistic-RNN-DA-Classifier

Probabilistic Dialogue Act Classification for the Switchboard Corpus using an LSTM model

Stars: ✭ 22 (+15.79%)

Mutual labels: corpus

createurstech.fr

Première plateforme collaborative et open source qui référence les créateurs de contenus tech francophone.

Stars: ✭ 174 (+815.79%)

Mutual labels: french

german-nouns

A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.

Stars: ✭ 101 (+431.58%)

Mutual labels: corpus

verbecc

Complete Conjugation of any Verb using Machine Learning for French, Spanish, Portuguese, Italian and Romanian

Stars: ✭ 45 (+136.84%)

Mutual labels: french

tei-publisher-app

The main TEI Publisher app

Stars: ✭ 50 (+163.16%)

Mutual labels: digital-humanities

OpenConvert

Text conversion tool (from e.g. Word, HTML, txt) to corpus formats TEI or FoLiA)

Stars: ✭ 20 (+5.26%)

Mutual labels: corpus

megs

A merged version of multiple open-source German speech datasets.

Stars: ✭ 21 (+10.53%)

Mutual labels: corpus

ham4corpus

Data from "Hamilton: An American Musical", formatted for reuse. See below for some interesting text analysis basic findings! I am not throwing away my stopword?

Stars: ✭ 53 (+178.95%)

Mutual labels: digital-humanities

Chinese Names Corpus

中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。

Stars: ✭ 3,053 (+15968.42%)

Mutual labels: corpus

trafilatura

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Stars: ✭ 711 (+3642.11%)

Mutual labels: corpus

Weibo terminater

Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator

Stars: ✭ 2,295 (+11978.95%)

Mutual labels: corpus

LangageLinotte

Code source officiel du langage de programmation Linotte - Langage de programmation en français simple créé dans le but de permettre aux enfants et aux personnes n'ayant pas une connaissance approfondie de l’informatique d’apprendre la programmation facilement.

Stars: ✭ 29 (+52.63%)

Mutual labels: french

Efaqa Corpus Zh

❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库

Stars: ✭ 170 (+794.74%)

Mutual labels: corpus

Chatbot-Training-Corpus

总结了一些可以用作聊天机器人训练实作的文字语聊，包含中英文不同语言

Stars: ✭ 117 (+515.79%)

Mutual labels: corpus

Indonesian Nlp Resources

data resource untuk NLP bahasa indonesia

Stars: ✭ 143 (+652.63%)

Mutual labels: corpus

open2ch-dialogue-corpus

おーぷん2ちゃんねるをクロールして作成した対話コーパス

Stars: ✭ 65 (+242.11%)

Mutual labels: corpus

Clue

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

Stars: ✭ 2,425 (+12663.16%)

Mutual labels: corpus

awesome-dhtools

Software for humanities scholars using quantitative or computational methods.

Stars: ✭ 72 (+278.95%)

Mutual labels: digital-humanities

Gossiping Chinese Corpus

PTT 八卦版問答中文語料

Stars: ✭ 137 (+621.05%)

Mutual labels: corpus

htr-united

Ground Truth Resources for the HTR of patrimonial documents

Stars: ✭ 23 (+21.05%)

Mutual labels: french

Awesome Chatbot

Awesome Chatbot Projects,Corpus,Papers,Tutorials.Chinese Chatbot =>:

Stars: ✭ 1,785 (+9294.74%)

Mutual labels: corpus

Mis-Comandos-Linux

📋 Lista descrita de mis 💯 comandos favoritos ⭐ en GNU/Linux 💻

Stars: ✭ 28 (+47.37%)

Mutual labels: spanish

Cluedatasetsearch

搜索所有中文NLP数据集，附常用英文NLP数据集

Stars: ✭ 2,112 (+11015.79%)

Mutual labels: corpus

InMangaKindle

Descarga manga en español en diferentes formatos (PNG, PDF, EPUB, MOBI)

Stars: ✭ 43 (+126.32%)

Mutual labels: spanish

Awesome Hungarian Nlp

A curated list of NLP resources for Hungarian

Stars: ✭ 121 (+536.84%)

Mutual labels: corpus

BARIS

Use the French Open Data Portal API features from R

Stars: ✭ 21 (+10.53%)

Mutual labels: french

linguistic-datasets-portuguese

Linguistic Datasets for Portuguese: Lista de conjuntos de dados linguísticos para língua portuguesa com licença flexíveis: banco de dados, lista de palavras, sinônimos, antônimos, dicionário temático, tesauro, linked data, semântica, ontologia e representação de conhecimento

Stars: ✭ 46 (+142.11%)

Mutual labels: portuguese

Colibri Core

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

Stars: ✭ 112 (+489.47%)

Mutual labels: corpus

proiel-treebank

Official releases of the PROIEL treebank of ancient Indo-European languages

Stars: ✭ 30 (+57.89%)

Mutual labels: corpus

Datasets

Poetry-related datasets developed by THUAIPoet (Jiuge) group.

Stars: ✭ 111 (+484.21%)

Mutual labels: corpus

Ua Gec

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language