All Projects → Jverma → cnn-text-classification-keras

Jverma / cnn-text-classification-keras

Licence: other
Convolutional Neural Network for Text Classification in Keras

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to cnn-text-classification-keras

medical-diagnosis-cnn-rnn-rcnn
分别使用rnn/cnn/rcnn来实现根据患者描述,进行疾病诊断
Stars: ✭ 39 (+178.57%)
Mutual labels:  text-classification
support-tickets-classification
This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Stars: ✭ 142 (+914.29%)
Mutual labels:  text-classification
node-fasttext
Nodejs binding for fasttext representation and classification.
Stars: ✭ 39 (+178.57%)
Mutual labels:  text-classification
text-classification-svm
The missing SVM-based text classification module implementing HanLP's interface
Stars: ✭ 46 (+228.57%)
Mutual labels:  text-classification
fake-news-detection
This repo is a collection of AWESOME things about fake news detection, including papers, code, etc.
Stars: ✭ 34 (+142.86%)
Mutual labels:  text-classification
kwx
BERT, LDA, and TFIDF based keyword extraction in Python
Stars: ✭ 33 (+135.71%)
Mutual labels:  text-classification
synaptic-simple-trainer
A ready to go text classification trainer based on synaptic (https://github.com/cazala/synaptic)
Stars: ✭ 19 (+35.71%)
Mutual labels:  text-classification
TextUnderstandingTsetlinMachine
Using the Tsetlin Machine to learn human-interpretable rules for high-accuracy text categorization with medical applications
Stars: ✭ 48 (+242.86%)
Mutual labels:  text-classification
Filipino-Text-Benchmarks
Open-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (+57.14%)
Mutual labels:  text-classification
Kaggle-Twitter-Sentiment-Analysis
Kaggle Twitter Sentiment Analysis Competition
Stars: ✭ 18 (+28.57%)
Mutual labels:  text-classification
DaDengAndHisPython
【微信公众号:大邓和他的python】, Python语法快速入门https://www.bilibili.com/video/av44384851 Python网络爬虫快速入门https://www.bilibili.com/video/av72010301, 我的联系邮箱[email protected]
Stars: ✭ 59 (+321.43%)
Mutual labels:  text-classification
ebe-dataset
Evidence-based Explanation Dataset (AACL-IJCNLP 2020)
Stars: ✭ 16 (+14.29%)
Mutual labels:  text-classification
opentc
OpenTC is a text classification engine using several algorithms in machine learning
Stars: ✭ 27 (+92.86%)
Mutual labels:  text-classification
Binary-Text-Classification-Doc2vec-SVM
A Python implementation of a binary text classifier using Doc2Vec and SVM
Stars: ✭ 16 (+14.29%)
Mutual labels:  text-classification
HiLAP
Code for paper "Hierarchical Text Classification with Reinforced Label Assignment" EMNLP 2019
Stars: ✭ 116 (+728.57%)
Mutual labels:  text-classification
text2class
Multi-class text categorization using state-of-the-art pre-trained contextualized language models, e.g. BERT
Stars: ✭ 15 (+7.14%)
Mutual labels:  text-classification
policy-data-analyzer
Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.
Stars: ✭ 22 (+57.14%)
Mutual labels:  text-classification
Lbl2Vec
Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with predefined topics from an unlabeled document corpus.
Stars: ✭ 25 (+78.57%)
Mutual labels:  text-classification
WeSTClass
[CIKM 2018] Weakly-Supervised Neural Text Classification
Stars: ✭ 67 (+378.57%)
Mutual labels:  text-classification
ML2017FALL
Machine Learning (EE 5184) in NTU
Stars: ✭ 66 (+371.43%)
Mutual labels:  text-classification

cnn-text-classification-keras

Convolutional Neural Network for Text Classification in Keras

This is a Keras implementation of Yoon Kim's paper Convolution Neural Networks for Sentence Classification with the addition that this code also works for the Glove vectors and Fasttext vectors.

Requirements:

  • numpy
  • keras
  • cPickle

Usage:

  • Download the pre-trained Google word2vec word embedding vectors as a binary file from here

  • Pre-process the text data

from text_processing_util import TextProcessing

tp = TextProcessing(texts, labels, EMBEDDING_DIM, MAX_SEQUENCE_LENGTH, MAX_NB_WORDS, VALIDATION_SPLIT)

where

- texts: a list of sentences.
- labels: a list of labels corresponding to the sentences in the list texts.
- MAX_SEQUENCE_LENGTH: maximum length of the sentence to be considered, longer sentences will be terminated at this length.(default is 50)
- MAX_NB_WORDS: maximum number of words to be used in the model (default is 10000).
- EMBEDDING_DIM: dimension of the word vectors (default is 300 for google word2vec).
- VALIDATION_SPLIT: fraction of data to be used for validation. (default is 0.2).
  • Split into train and test data.
x_train, y_train, x_val, y_val, word_index = tp.preprocess()
  • Build the embeddings index.
embeddings_index = tp.build_embedding_index_from_word2vec(path_to_wordvec_file, word_index)
  • Serialize the data after the processing.
import cPickle

cPickle.dump([word_index, embeddings_index], open('tokenization_and_embedding.p', 'wb'))
  • Get labels index.
labels_index = tp.labels_index
  • Build the CNN model
from text_cnn import kimCNN

model = kimCNN(EMBEDDING_DIM, MAX_SEQUENCE_LENGTH, MAX_NB_WORDS, embeddings_index, word_index, labels_index=labels_index)
  • Fit the model
model.fit(x=x_train, y=y_train, batch_size=50, epochs=25 , validation_data=(x_val, y_val))

For a detailed example see example.py. This is the same example used in Kim's paper and the original theano code.

References:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].