Alternatives and detailed information of Product-Categorization-NLP

aniass / Product-Categorization-NLP

Licence: other

Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).

Programming Languages

Jupyter Notebook

11667 projects

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Product-Categorization-NLP

Gensim

Topic Modelling for Humans

Stars: ✭ 12,763 (+42443.33%)

Mutual labels: word2vec, topic-modeling, gensim

Nlp In Practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Stars: ✭ 790 (+2533.33%)

Mutual labels: text-classification, word2vec, gensim

Twitterldatopicmodeling

Uses topic modeling to identify context between follower relationships of Twitter users

Stars: ✭ 48 (+60%)

Mutual labels: nltk, topic-modeling, gensim

Text Analytics With Python

Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.

Stars: ✭ 1,132 (+3673.33%)

Mutual labels: text-classification, nltk, gensim

doc2vec-api

document embedding and machine learning script for beginners

Stars: ✭ 92 (+206.67%)

Mutual labels: word2vec, gensim, doc2vec

Ask2Transformers

A Framework for Textual Entailment based Zero Shot text classification

Stars: ✭ 102 (+240%)

Mutual labels: text-classification, transformers, topic-modeling

kwx

BERT, LDA, and TFIDF based keyword extraction in Python

Stars: ✭ 33 (+10%)

Mutual labels: text-classification, topic-modeling, data-analysis

Ml Projects

ML based projects such as Spam Classification, Time Series Analysis, Text Classification using Random Forest, Deep Learning, Bayesian, Xgboost in Python

Stars: ✭ 127 (+323.33%)

Mutual labels: text-classification, word2vec, gensim

Shallowlearn

An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.

Stars: ✭ 196 (+553.33%)

Mutual labels: text-classification, word2vec, gensim

nlp workshop odsc europe20

Extensive tutorials for the Advanced NLP Workshop in Open Data Science Conference Europe 2020. We will leverage machine learning, deep learning and deep transfer learning to learn and solve popular tasks using NLP including NER, Classification, Recommendation \ Information Retrieval, Summarization, Classification, Language Translation, Q&A and T…

Stars: ✭ 127 (+323.33%)

Mutual labels: transformers, nltk, gensim

text-classification-transformers

Easy text classification for everyone : Bert based models via Huggingface transformers (KR / EN)

Stars: ✭ 32 (+6.67%)

Mutual labels: text-classification, transformers, huggingface-transformers

tutorials

Short programming tutorials pertaining to data analysis.

Stars: ✭ 14 (-53.33%)

Mutual labels: pandas, data-analysis

converse

Conversational text Analysis using various NLP techniques

Stars: ✭ 147 (+390%)

Mutual labels: transformers, topic-modeling

PandasVersusExcel

Python数据分析入门，数据分析师入门

Stars: ✭ 120 (+300%)

Mutual labels: pandas, data-analysis

dataquest-guided-projects-solutions

My dataquest project solutions

Stars: ✭ 35 (+16.67%)

Mutual labels: pandas, data-analysis

DataProfiler

What's in your data? Extract schema, statistics and entities from datasets

Stars: ✭ 843 (+2710%)

Mutual labels: pandas, data-analysis

nlpbuddy

A text analysis application for performing common NLP tasks through a web dashboard interface and an API

Stars: ✭ 115 (+283.33%)

Mutual labels: text-classification, gensim

online-course-recommendation-system

Built on data from Pluralsight's course API fetched results. Works with model trained with K-means unsupervised clustering algorithm.

Stars: ✭ 31 (+3.33%)

Mutual labels: pandas, data-analysis

walklets

A lightweight implementation of Walklets from "Don't Walk Skip! Online Learning of Multi-scale Network Embeddings" (ASONAM 2017).

Stars: ✭ 94 (+213.33%)

Mutual labels: word2vec, gensim

RolX

An alternative implementation of Recursive Feature and Role Extraction (KDD11 & KDD12)

Stars: ✭ 52 (+73.33%)

Mutual labels: word2vec, gensim

View All Similar Projects ➔

Product Categorization

Multi-Class Text Classification of products based on their description

General info

The goal of the project is product categorization based on their description with Machine Learning and Deep Learning (MLP, CNN, Distilbert) algorithms. Additionaly we have created Doc2vec and Word2vec models, Topic Modeling (with LDA analysis) and EDA analysis (data exploration, data aggregation and cleaning data).

Dataset

The dataset comes from http://makeup-api.herokuapp.com/ and has been obtained by an API. It can be seen at my previous project at Extracting Data using API.

Motivation

The aim of the project is multi-class text classification to make-up products based on their description. Based on given text as an input, we have predicted what would be the category. We have five types of categories corresponding to different makeup products. In our analysis we used a different methods for a feature extraction (such as Word2vec, Doc2vec) and various Machine Learning/Deep Lerning algorithms to get more accurate predictions and choose the most accurate one for our issue.

Project contains:

Multi-class text classification with ML algorithms- Text_analysis.ipynb
Text classification with Distilbert model - Bert_products.ipynb
Text classification with MLP and Convolutional Neural Netwok (CNN) models - Text_nn.ipynb
Text classification with Doc2vec model -Doc2vec.ipynb
Word2vec model - Word2vec.ipynb
LDA - Topic modeling - LDA_Topic_modeling.ipynb
EDA analysis - Products_analysis.ipynb
Python scripts to clean data and ML model - clean_data.py, text_model.py
data, models - data and models used in the project.

Summary

We begin with data analysis and data pre-processing from our dataset. Then we have used a few combination of text representation such as BoW and TF-IDF and we have trained the word2vec and doc2vec models from our data. We have experimented with several Machine Learning algorithms: Logistic Regression, Linear SVM, Multinomial Naive Bayes, Random Forest, Gradient Boosting and MLP and Convolutional Neural Network (CNN) using different combinations of text representations and embeddings. We have also used a pretrained Distilbert model from Huggingface Transformers library to resolve our problem. We applied a transfer learning with Distilbert model.

From our experiments we can see that the tested models give a overall high accuracy and similar results for our problem. The SVM (BOW +TF-IDF) model and MLP model give the best accuracy of validation set. Logistic regression performed very well both with BOW +TF-IDF and Doc2vec and achieved similar accuracy as MLP. CNN with word embeddings also has a very comparable result (0.93) to MLP. Transfer learning with Distilbert model also gave a similar results to previous models. We achieved an accuracy on the test set equal to 93 %. That shows the extensive models are not gave a better results to our problem than simple Machine Learning models such as SVM.

Model	Embeddings	Accuracy
CNN	Word embedding	0.93
Distilbert	Distilbert tokenizer	0.93
MLP	Word embedding	0.93
SVM	Doc2vec (DBOW)	0.93
SVM	BOW +TF-IDF	0.93
Logistic Regression	Doc2vec (DBOW)	0.91
Gradient Boosting	BOW +TF-IDF	0.91
Logistic Regression	BOW +TF-IDF	0.91
Random Forest	BOW +TF-IDF	0.91
Naive Bayes	BOW +TF-IDF	0.90
Logistic Regression	Doc2vec (DM)	0.89

The project is created with:

Python 3.6/3.8
libraries: NLTK, gensim, Keras, TensorFlow, Hugging Face transformers, scikit-learn, pandas, numpy, seaborn, pyLDAvis.

Running the project:

To run this project use Jupyter Notebook or Google Colab.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

aniass / Product-Categorization-NLP

Programming Languages

Labels

Projects that are alternatives of or similar to Product-Categorization-NLP

Product Categorization

Multi-Class Text Classification of products based on their description

General info

Dataset

Motivation

Project contains:

Summary

The project is created with:

Running the project: