All Projects → tblock → 10kGNAD

tblock / 10kGNAD

Licence: MIT license
Ten Thousand German News Articles Dataset for Topic Classification

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to 10kGNAD

Ask2Transformers
A Framework for Textual Entailment based Zero Shot text classification
Stars: ✭ 102 (+61.9%)
Mutual labels:  text-classification, topic-classification
NewsMTSC
Target-dependent sentiment classification in news articles reporting on political events. Includes a high-quality data set of over 11k sentences and a state-of-the-art classification model.
Stars: ✭ 54 (-14.29%)
Mutual labels:  text-classification, news-articles
CoinTaxman
Calculate your taxes from cryptocurrency gains
Stars: ✭ 110 (+74.6%)
Mutual labels:  german
Reuters-21578-Classification
Text classification with Reuters-21578 datasets using Gensim Word2Vec and Keras LSTM
Stars: ✭ 44 (-30.16%)
Mutual labels:  text-classification
article-tagging
Natural Language Processing of Chicago news articles
Stars: ✭ 41 (-34.92%)
Mutual labels:  news-articles
ML4K-AI-Extension
Use machine learning in AppInventor, with easy training using text, images, or numbers through the Machine Learning for Kids website.
Stars: ✭ 18 (-71.43%)
Mutual labels:  text-classification
feedIO
A Feed Aggregator that Knows What You Want to Read.
Stars: ✭ 26 (-58.73%)
Mutual labels:  text-classification
ml-with-text
[Tutorial] Demystifying Natural Language Processing with Python
Stars: ✭ 18 (-71.43%)
Mutual labels:  text-classification
Naive-Bayes-Text-Classifier-in-Java
Naive Bayes Classification used to classify movie reviews as positive or negative
Stars: ✭ 18 (-71.43%)
Mutual labels:  text-classification
nlp-lt
Natural Language Processing for Lithuanian language
Stars: ✭ 17 (-73.02%)
Mutual labels:  text-classification
yunyi
2018“云移杯- 景区口碑评价分值预测
Stars: ✭ 29 (-53.97%)
Mutual labels:  text-classification
FNet-pytorch
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
Stars: ✭ 204 (+223.81%)
Mutual labels:  text-classification
nlpbuddy
A text analysis application for performing common NLP tasks through a web dashboard interface and an API
Stars: ✭ 115 (+82.54%)
Mutual labels:  text-classification
ganbert-pytorch
Enhancing the BERT training with Semi-supervised Generative Adversarial Networks in Pytorch/HuggingFace
Stars: ✭ 60 (-4.76%)
Mutual labels:  text-classification
FormaleSysteme
Unterlagen zur Vorlesung "Formale Systeme", Fakultät Informatik, TU Dresden
Stars: ✭ 31 (-50.79%)
Mutual labels:  german
nsmc-zeppelin-notebook
Movie review dataset Word2Vec & sentiment classification Zeppelin notebook
Stars: ✭ 26 (-58.73%)
Mutual labels:  text-classification
jiten
jiten - japanese android/cli/web dictionary based on jmdict/kanjidic — 日本語 辞典 和英辞典 漢英字典 和独辞典 和蘭辞典
Stars: ✭ 64 (+1.59%)
Mutual labels:  german
AAAI 2019 EXAM
Official implementation of "Explicit Interaction Model towards Text Classification"
Stars: ✭ 68 (+7.94%)
Mutual labels:  text-classification
Pytorch-NLU
Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (+139.68%)
Mutual labels:  text-classification
NumberRush
A number based React game to help you learn German numbers! 🇩🇪
Stars: ✭ 20 (-68.25%)
Mutual labels:  german

Ten Thousand German News Articles Dataset

For more information visit the detailed project page.

  1. Install the required python packages pip install -r requirements.txt.
  2. Download the corpus.sqlite3 file into the project root from here (compressed) or directly from here.
  3. Run python code/extract_dataset_from_sqlite.py corpus.sqlite3 articles.csv to extract the articles.
  4. Run python code/split_articles_into_train_test.py to split the dataset.

License

All code in this repository is licensed under a MIT License.

The dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].