number-to-wordsconvert number into words (english, french, italian, roman, spanish, portuguese, belgium, dutch, swedish, polish, russian, iranian, roman, aegean)
Stars: ✭ 53 (+178.95%)
pH7-Internationalization🎌 pH7CMS Internationalization (I18N) package 🙊 Get new languages for your pH7CMS website!
Stars: ✭ 17 (-10.53%)
opensource-voice-toolsA repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (+10.53%)
piii.jsA filter for bad words (in Portuguese).
Stars: ✭ 122 (+542.11%)
NlvrCornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.
Stars: ✭ 192 (+910.53%)
tvsubTVsub: DCU-Tencent Chinese-English Dialogue Corpus
Stars: ✭ 40 (+110.53%)
Wp2txtWP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata.
Stars: ✭ 145 (+663.16%)
Code Docstring CorpusPreprocessed Python functions and docstrings for automated code documentation (code2doc) and automated code generation (doc2code) tasks.
Stars: ✭ 137 (+621.05%)
rclcRich Context leaderboard competition, including the corpus and current SOTA for required tasks.
Stars: ✭ 20 (+5.26%)
workshopsScholarly Communications Workshops
Stars: ✭ 13 (-31.58%)
poesyPoetic processing, for Python.
Stars: ✭ 28 (+47.37%)
ocr2textConvert a PDF via OCR to a TXT file in UTF-8 encoding
Stars: ✭ 90 (+373.68%)
evt-viewerEdition Visualization Technology 2 - development
Stars: ✭ 66 (+247.37%)
Nlp bahasa resourcesA Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Stars: ✭ 158 (+731.58%)
gumRepository for the Georgetown University Multilayer Corpus (GUM)
Stars: ✭ 71 (+273.68%)
ProsodyHelsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text
Stars: ✭ 139 (+631.58%)
2018-2019The GitHub repository containing all the material related to the Computational Thinking and Programming course of the Digital Humanities and Digital Knowledge degree at the University of Bologna (a.a. 2018/2019).
Stars: ✭ 29 (+52.63%)
Processando-ProcessingEsforço para: Traduzir para o português material de referência sobre Processing; e portar para o Processing Modo Python tutoriais e outros exemplos.
Stars: ✭ 12 (-36.84%)
KhcoderKH Coder: for Quantitative Content Analysis or Text Mining
Stars: ✭ 126 (+563.16%)
Dialog corpus用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
Stars: ✭ 1,662 (+8647.37%)
Sejong CorpusKorean sejong corpus download and simple analysis
Stars: ✭ 116 (+510.53%)
createurstech.frPremière plateforme collaborative et open source qui référence les créateurs de contenus tech francophone.
Stars: ✭ 174 (+815.79%)
german-nounsA list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.
Stars: ✭ 101 (+431.58%)
verbeccComplete Conjugation of any Verb using Machine Learning for French, Spanish, Portuguese, Italian and Romanian
Stars: ✭ 45 (+136.84%)
OpenConvertText conversion tool (from e.g. Word, HTML, txt) to corpus formats TEI or FoLiA)
Stars: ✭ 20 (+5.26%)
megsA merged version of multiple open-source German speech datasets.
Stars: ✭ 21 (+10.53%)
ham4corpusData from "Hamilton: An American Musical", formatted for reuse. See below for some interesting text analysis basic findings! I am not throwing away my stopword?
Stars: ✭ 53 (+178.95%)
Chinese Names Corpus中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
Stars: ✭ 3,053 (+15968.42%)
trafilaturaPython & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+3642.11%)
Weibo terminaterFinal Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Stars: ✭ 2,295 (+11978.95%)
LangageLinotteCode source officiel du langage de programmation Linotte - Langage de programmation en français simple créé dans le but de permettre aux enfants et aux personnes n'ayant pas une connaissance approfondie de l’informatique d’apprendre la programmation facilement.
Stars: ✭ 29 (+52.63%)
Efaqa Corpus Zh❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库
Stars: ✭ 170 (+794.74%)
Clue中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (+12663.16%)
awesome-dhtoolsSoftware for humanities scholars using quantitative or computational methods.
Stars: ✭ 72 (+278.95%)
htr-unitedGround Truth Resources for the HTR of patrimonial documents
Stars: ✭ 23 (+21.05%)
Awesome ChatbotAwesome Chatbot Projects,Corpus,Papers,Tutorials.Chinese Chatbot =>:
Stars: ✭ 1,785 (+9294.74%)
Mis-Comandos-Linux📋 Lista descrita de mis 💯 comandos favoritos ⭐ en GNU/Linux 💻
Stars: ✭ 28 (+47.37%)
InMangaKindleDescarga manga en español en diferentes formatos (PNG, PDF, EPUB, MOBI)
Stars: ✭ 43 (+126.32%)
BARISUse the French Open Data Portal API features from R
Stars: ✭ 21 (+10.53%)
linguistic-datasets-portugueseLinguistic Datasets for Portuguese: Lista de conjuntos de dados linguísticos para língua portuguesa com licença flexíveis: banco de dados, lista de palavras, sinônimos, antônimos, dicionário temático, tesauro, linked data, semântica, ontologia e representação de conhecimento
Stars: ✭ 46 (+142.11%)
Colibri CoreColibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
Stars: ✭ 112 (+489.47%)
proiel-treebankOfficial releases of the PROIEL treebank of ancient Indo-European languages
Stars: ✭ 30 (+57.89%)
DatasetsPoetry-related datasets developed by THUAIPoet (Jiuge) group.
Stars: ✭ 111 (+484.21%)
Ua GecUA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
Stars: ✭ 108 (+468.42%)
malay-datasetText corpus for Bahasa Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html
Stars: ✭ 189 (+894.74%)
TopicsExplorerExplore your own text collection with a topic model – without prior knowledge.
Stars: ✭ 53 (+178.95%)