tblock / 10kGNAD

Licence: MIT license

Ten Thousand German News Articles Dataset for Topic Classification

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to 10kGNAD

Ask2Transformers

A Framework for Textual Entailment based Zero Shot text classification

Stars: ✭ 102 (+61.9%)

Mutual labels: text-classification, topic-classification

NewsMTSC

Target-dependent sentiment classification in news articles reporting on political events. Includes a high-quality data set of over 11k sentences and a state-of-the-art classification model.

Stars: ✭ 54 (-14.29%)

Mutual labels: text-classification, news-articles

CoinTaxman

Calculate your taxes from cryptocurrency gains

Stars: ✭ 110 (+74.6%)

Mutual labels: german

Reuters-21578-Classification

Text classification with Reuters-21578 datasets using Gensim Word2Vec and Keras LSTM

Stars: ✭ 44 (-30.16%)

Mutual labels: text-classification

article-tagging

Natural Language Processing of Chicago news articles

Stars: ✭ 41 (-34.92%)

Mutual labels: news-articles

ML4K-AI-Extension

Use machine learning in AppInventor, with easy training using text, images, or numbers through the Machine Learning for Kids website.

Stars: ✭ 18 (-71.43%)

Mutual labels: text-classification

feedIO

A Feed Aggregator that Knows What You Want to Read.

Stars: ✭ 26 (-58.73%)

Mutual labels: text-classification

ml-with-text

[Tutorial] Demystifying Natural Language Processing with Python

Stars: ✭ 18 (-71.43%)

Mutual labels: text-classification

Naive-Bayes-Text-Classifier-in-Java

Naive Bayes Classification used to classify movie reviews as positive or negative

Stars: ✭ 18 (-71.43%)

Mutual labels: text-classification

nlp-lt

Natural Language Processing for Lithuanian language

Stars: ✭ 17 (-73.02%)

Mutual labels: text-classification

yunyi

2018“云移杯- 景区口碑评价分值预测

Stars: ✭ 29 (-53.97%)

Mutual labels: text-classification

FNet-pytorch

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

Stars: ✭ 204 (+223.81%)

Mutual labels: text-classification

nlpbuddy

A text analysis application for performing common NLP tasks through a web dashboard interface and an API

Stars: ✭ 115 (+82.54%)

Mutual labels: text-classification

ganbert-pytorch

Enhancing the BERT training with Semi-supervised Generative Adversarial Networks in Pytorch/HuggingFace

Stars: ✭ 60 (-4.76%)

Mutual labels: text-classification

FormaleSysteme

Unterlagen zur Vorlesung "Formale Systeme", Fakultät Informatik, TU Dresden

Stars: ✭ 31 (-50.79%)

Mutual labels: german

nsmc-zeppelin-notebook

Movie review dataset Word2Vec & sentiment classification Zeppelin notebook

Stars: ✭ 26 (-58.73%)

Mutual labels: text-classification

jiten

jiten - japanese android/cli/web dictionary based on jmdict/kanjidic — 日本語　辞典　和英辞典　漢英字典　和独辞典　和蘭辞典

Stars: ✭ 64 (+1.59%)

Mutual labels: german

AAAI 2019 EXAM

Official implementation of "Explicit Interaction Model towards Text Classification"

Stars: ✭ 68 (+7.94%)

Mutual labels: text-classification

Pytorch-NLU

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…

Stars: ✭ 151 (+139.68%)

Mutual labels: text-classification

NumberRush

A number based React game to help you learn German numbers! 🇩🇪

Stars: ✭ 20 (-68.25%)

Mutual labels: german

View All Similar Projects ➔

Ten Thousand German News Articles Dataset

For more information visit the detailed project page.

Install the required python packages pip install -r requirements.txt.
Download the corpus.sqlite3 file into the project root from here (compressed) or directly from here.
Run python code/extract_dataset_from_sqlite.py corpus.sqlite3 articles.csv to extract the articles.
Run python code/split_articles_into_train_test.py to split the dataset.

License

All code in this repository is licensed under a MIT License.

The dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

tblock / 10kGNAD

Programming Languages

Labels

Projects that are alternatives of or similar to 10kGNAD

Ten Thousand German News Articles Dataset

License