Alternatives and detailed information of 20-newsgroups_text-classification

gokriznastic / 20-newsgroups_text-classification

Licence: other

"20 newsgroups" dataset - Text Classification using Multinomial Naive Bayes in Python.

Programming Languages

Jupyter Notebook

11667 projects

Projects that are alternatives of or similar to 20-newsgroups text-classification

text-classification-cn

中文文本分类实践，基于搜狗新闻语料库，采用传统机器学习方法以及预训练模型等方法

Stars: ✭ 81 (+97.56%)

Mutual labels: text-classification, naive-bayes, scikit-learn

Nepali-News-Classifier

Text Classification of Nepali Language Document. This Mini Project was done for the partial fulfillment of NLP Course : COMP 473.

Stars: ✭ 13 (-68.29%)

Mutual labels: text-classification, naive-bayes-classifier

Dat8

General Assembly's 2015 Data Science course in Washington, DC

Stars: ✭ 1,516 (+3597.56%)

Mutual labels: naive-bayes, scikit-learn

cnn-text-classification

Text classification with Convolution Neural Networks on Yelp, IMDB & sentence polarity dataset v1.0

Stars: ✭ 108 (+163.41%)

Mutual labels: text-classification, multiclass-classification

100 Days Of Ml Code

100 Days of ML Coding

Stars: ✭ 33,641 (+81951.22%)

Mutual labels: scikit-learn, naive-bayes-classifier

sentiment-analysis-using-python

Large Data Analysis Course Project

Stars: ✭ 23 (-43.9%)

Mutual labels: naive-bayes, naive-bayes-classifier

GaussianNB

Gaussian Naive Bayes (GaussianNB) classifier

Stars: ✭ 17 (-58.54%)

Mutual labels: naive-bayes, naive-bayes-classifier

Shallowlearn

An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.

Stars: ✭ 196 (+378.05%)

Mutual labels: text-classification, scikit-learn

Doc2vec

📓 Long(er) text representation and classification using Doc2Vec embeddings

Stars: ✭ 92 (+124.39%)

Mutual labels: text-classification, scikit-learn

Text Analytics With Python

Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.

Stars: ✭ 1,132 (+2660.98%)

Mutual labels: text-classification, scikit-learn

Machine Learning With Python

Practice and tutorial-style notebooks covering wide variety of machine learning techniques

Stars: ✭ 2,197 (+5258.54%)

Mutual labels: naive-bayes, scikit-learn

TextClassification

基于scikit-learn实现对新浪新闻的文本分类，数据集为100w篇文档，总计10类，测试集与训练集1:1划分。分类算法采用SVM和Bayes，其中Bayes作为baseline。

Stars: ✭ 86 (+109.76%)

Mutual labels: text-classification, scikit-learn

Naive-Bayes-Text-Classifier-in-Java

Naive Bayes Classification used to classify movie reviews as positive or negative

Stars: ✭ 18 (-56.1%)

Mutual labels: text-classification, naive-bayes-classifier

Text Classification

Machine Learning and NLP: Text Classification using python, scikit-learn and NLTK

Stars: ✭ 239 (+482.93%)

Mutual labels: text-classification, scikit-learn

bayes

naive bayes in php

Stars: ✭ 61 (+48.78%)

Mutual labels: naive-bayes, naive-bayes-classifier

emoji-prediction

🤓🔮🔬 Emoji prediction from a text using machine learning

Stars: ✭ 41 (+0%)

Mutual labels: scikit-learn

Word-Embeddings-and-Document-Vectors

An evaluation of word-embeddings for classification

Stars: ✭ 32 (-21.95%)

Mutual labels: naive-bayes-classifier

bayarea-2019-scikit-sprint

Bay Area WiMLDS scikit-learn open source sprint (Nov 2, 2019)

Stars: ✭ 16 (-60.98%)

Mutual labels: scikit-learn

lapis-bayes

Naive Bayes classifier for use in Lua

Stars: ✭ 26 (-36.59%)

Mutual labels: naive-bayes-classifier

pycobra

python library implementing ensemble methods for regression, classification and visualisation tools including Voronoi tesselations.

Stars: ✭ 111 (+170.73%)

Mutual labels: scikit-learn

View All Similar Projects ➔

Text Classification in Python using the 20 newsgroup dataset.

"20 newsgroups" dataset - Text Classification using Python.

Dataset

For dataset I used the famous "20 Newsgroups" dataset.

The data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. I've included the dataset in the repo, located at 20_newsgroups\ directory.

You can find the dataset freely here.

The code

The code is pretty straight forward and well documented. The preprocessing of the documents and the implementation of classifiers have been done from scratch and then the results have been compared to inbuilt sklearn's classifiers. The code has been arranged in form of IPython Notebooks, each notebook corresponds to a particular "classifier" or "technique" used for classifying the dataset.

Requirements

python 2.7 or above
python modules:
- scikit-learn
- numpy
- matplotlib

Experiments

For each experiment we use a "feature vector", a "classifier" and a train-test splitting strategy.

Experiment 1: BOW - NB - 25% test

In this experiment we use a Bag Of Words (BOW) representation of each document containing Term Frequency. And also a Multinomial Naive Bayes (NB) classifier.

Experiment 12: TF-IDF - NB - 25% test

Ongoing

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

gokriznastic / 20-newsgroups_text-classification

Programming Languages

Labels

Projects that are alternatives of or similar to 20-newsgroups text-classification

Text Classification in Python using the 20 newsgroup dataset.

Dataset

The code

Requirements

Experiments

Experiment 1: BOW - NB - 25% test

Experiment 12: TF-IDF - NB - 25% test