A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.

Stars: ✭ 101 (+339.13%)

Mutual labels: corpus

thai-language

computer tools for thai language

Stars: ✭ 20 (-13.04%)

Mutual labels: corpus

Dialogue-Corpus

No description or website provided.

Stars: ✭ 27 (+17.39%)

Mutual labels: corpus

NiuTrans.NMT

A Fast Neural Machine Translation System. It is developed in C++ and resorts to NiuTensor for fast tensor APIs.

Stars: ✭ 112 (+386.96%)

Mutual labels: neural-machine-translation

Chinese Names Corpus

中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。

Stars: ✭ 3,053 (+13173.91%)

Mutual labels: corpus

dialogue-datasets

collect the open dialog corpus and some useful data processing utils.

Stars: ✭ 24 (+4.35%)

Mutual labels: corpus

Weibo terminater

Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator

Stars: ✭ 2,295 (+9878.26%)

Mutual labels: corpus

TV4Dialog

No description or website provided.

Stars: ✭ 33 (+43.48%)

Mutual labels: corpus

Efaqa Corpus Zh

❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库

Stars: ✭ 170 (+639.13%)

Mutual labels: corpus

transformer

Neutron: A pytorch based implementation of Transformer and its variants.

Stars: ✭ 60 (+160.87%)

Mutual labels: neural-machine-translation

Indonesian Nlp Resources

data resource untuk NLP bahasa indonesia

Stars: ✭ 143 (+521.74%)

Mutual labels: corpus

LanguageCodes

We present a list of languages with their codes, families, regions and etc. We also present a list of multi-lingual corpora (with urls).

Stars: ✭ 70 (+204.35%)

Mutual labels: corpus

Lyrics Corpora

An unofficial Python API that allows users to create a corpus of lyrical text from their favorite artists and billboard charts

Stars: ✭ 13 (-43.48%)

Mutual labels: corpus

xl-sum

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.

Stars: ✭ 160 (+595.65%)

Mutual labels: low-resource-languages

open2ch-dialogue-corpus

おーぷん2ちゃんねるをクロールして作成した対話コーパス

Stars: ✭ 65 (+182.61%)

Mutual labels: corpus

kanji-frequency

Kanji usage frequency data collected from various sources

Stars: ✭ 92 (+300%)

Mutual labels: corpus

Awesome Chatbot

Awesome Chatbot Projects,Corpus,Papers,Tutorials.Chinese Chatbot =>:

Stars: ✭ 1,785 (+7660.87%)

Mutual labels: corpus

cljs-corpus

A greppable archive of ClojureScript code

Stars: ✭ 37 (+60.87%)

Mutual labels: corpus

Cluedatasetsearch

搜索所有中文NLP数据集，附常用英文NLP数据集

Stars: ✭ 2,112 (+9082.61%)

Mutual labels: corpus

MT-Preparation

Machine Translation (MT) Preparation Scripts

Stars: ✭ 15 (-34.78%)

Mutual labels: neural-machine-translation

Awesome Hungarian Nlp

A curated list of NLP resources for Hungarian

Stars: ✭ 121 (+426.09%)

Mutual labels: corpus

fuzzing-corpus

My fuzzing corpus

Stars: ✭ 120 (+421.74%)

Mutual labels: corpus

Colibri Core

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

Stars: ✭ 112 (+386.96%)

Mutual labels: corpus

parallel-corpora-tools

Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.

Stars: ✭ 35 (+52.17%)

Mutual labels: neural-machine-translation

Ua Gec

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

Stars: ✭ 108 (+369.57%)

Mutual labels: corpus

bible-corpus

A multilingual parallel corpus created from translations of the Bible.