gambolputty / german-nouns

Licence: CC-BY-SA-4.0 license

A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to german-nouns

wiktionary-de-parser

Extract data from German Wiktionary XML files. Allows you to add your own extraction methods 🚀

Stars: ✭ 22 (-78.22%)

Mutual labels: wiktionary, german-language

Weibo terminater

Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator

Stars: ✭ 2,295 (+2172.28%)

Mutual labels: corpus

Awesome Hungarian Nlp

A curated list of NLP resources for Hungarian

Stars: ✭ 121 (+19.8%)

Mutual labels: corpus

Clue

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

Stars: ✭ 2,425 (+2300.99%)

Mutual labels: corpus

Cluedatasetsearch

搜索所有中文NLP数据集，附常用英文NLP数据集

Stars: ✭ 2,112 (+1991.09%)

Mutual labels: corpus

Indonesian Nlp Resources

data resource untuk NLP bahasa indonesia

Stars: ✭ 143 (+41.58%)

Mutual labels: corpus

Colibri Core

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

Stars: ✭ 112 (+10.89%)

Mutual labels: corpus

Dialogue-Corpus

No description or website provided.

Stars: ✭ 27 (-73.27%)

Mutual labels: corpus

Nlvr

Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.

Stars: ✭ 192 (+90.1%)

Mutual labels: corpus

Prosody

Helsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text

Stars: ✭ 139 (+37.62%)

Mutual labels: corpus

Gossiping Chinese Corpus

PTT 八卦版問答中文語料

Stars: ✭ 137 (+35.64%)

Mutual labels: corpus

Khcoder

KH Coder: for Quantitative Content Analysis or Text Mining

Stars: ✭ 126 (+24.75%)

Mutual labels: corpus

Nlp bahasa resources

A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia

Stars: ✭ 158 (+56.44%)

Mutual labels: corpus

Dialog corpus

用于训练中英文对话系统的语料库 Datasets for Training Chatbot System

Stars: ✭ 1,662 (+1545.54%)

Mutual labels: corpus

Awesome Deeplearning Resources

Deep Learning and deep reinforcement learning research papers and some codes

Stars: ✭ 2,483 (+2358.42%)

Mutual labels: corpus

Sejong Corpus

Korean sejong corpus download and simple analysis

Stars: ✭ 116 (+14.85%)

Mutual labels: corpus

Code Docstring Corpus

Preprocessed Python functions and docstrings for automated code documentation (code2doc) and automated code generation (doc2code) tasks.

Stars: ✭ 137 (+35.64%)

Mutual labels: corpus

Wp2txt

WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata.

Stars: ✭ 145 (+43.56%)

Mutual labels: corpus

megs

A merged version of multiple open-source German speech datasets.

Stars: ✭ 21 (-79.21%)

Mutual labels: corpus

Chinese Names Corpus

中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。

Stars: ✭ 3,053 (+2922.77%)

Mutual labels: corpus

View All Similar Projects ➔

German nouns

A comma seperated list of ~100 thousand German nouns and their grammatical properties (tense, number, gender) as CSV file. Plus a module to look up the data and parse compound words. Compiled from the WiktionaryDE.

The list can be found here: german_nouns/nouns.csv

If you want to look up nouns or parse compound words, install this package (for Python 3.8+) and follow the instructions below:

Installation

pip install german-nouns

Lookup words

from pprint import pprint
from german_nouns.lookup import Nouns

nouns = Nouns()

# Lookup a word
word = nouns['Fahrrad']
pprint(word)

# Output:
[{'flexion': {'akkusativ plural': 'Fahrräder',
              'akkusativ singular': 'Fahrrad',
              'dativ plural': 'Fahrrädern',
              'dativ singular': 'Fahrrad',
              'dativ singular*': 'Fahrrade',
              'genitiv plural': 'Fahrräder',
              'genitiv singular': 'Fahrrades',
              'genitiv singular*': 'Fahrrads',
              'nominativ plural': 'Fahrräder',
              'nominativ singular': 'Fahrrad'},
  'genus': 'n',
  'lemma': 'Fahrrad',
  'pos': ['Substantiv']}]

# parse compound word
words = nouns.parse_compound('Vermögensbildung')
print(words)

# Output:
['Vermögen', 'Bildung'] # Now lookup nouns['Vermögen'] etc.

Compiling the list

To compile the list yourself, you need Python 3.8+ and Poetry installed.

1. Clone the repository and install dependencies with Poetry:

$ git clone https://github.com/gambolputty/german-nouns
$ cd german-nouns
$ poetry install

2. Compile the list of nouns from a Wiktionary XML file:

Find the latest XML-dump files here: https://dumps.wikimedia.org/dewiktionary/latest, for example this one and download it. Then execute:

$ poetry run python -m german_nouns.parse_dump /path-to-xml-dump-file.xml.bz2

The CSV file will be saved here: german_nouns/nouns.csv.

Remove german_nouns/index.txt to let the script recreate the word-index when using the lookup methods.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

gambolputty / german-nouns

Programming Languages

Labels

Projects that are alternatives of or similar to german-nouns

German nouns

Installation

Lookup words

Compiling the list

1. Clone the repository and install dependencies with Poetry:

2. Compile the list of nouns from a Wiktionary XML file: