Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Classify Kaggle Consumer Finance Complaints into 11 classes. Build the model with CNN (Convolutional Neural Network) and Word Embeddings on Tensorflow.

Stars: ✭ 410 (+1364.29%)

Mutual labels: cnn, text-classification

Multi Class Text Classification Cnn Rnn

Classify Kaggle San Francisco Crime Description into 39 classes. Build the model with CNN, RNN (GRU and LSTM) and Word Embeddings on Tensorflow.

Stars: ✭ 570 (+1935.71%)

Mutual labels: cnn, text-classification

Tensorflow cookbook

Code for Tensorflow Machine Learning Cookbook

Stars: ✭ 5,984 (+21271.43%)

Mutual labels: classification, cnn

Lightnlp

基于Pytorch和torchtext的自然语言处理深度学习框架。

Stars: ✭ 739 (+2539.29%)

Mutual labels: chinese, text-classification

Artificial Adversary

🗣️ Tool to generate adversarial text examples and test machine learning models against them

Stars: ✭ 348 (+1142.86%)

Mutual labels: classification, text-classification

View All Similar Projects ➔

Recurrent Convolutional Neural Networks for Chinese Question Classification on BQuLD

A deep learning-based Chinese question classifier (Keras implementation) on BQuLD

Model Architecture Overview
Bilingual Question Labelling Dataset (BQuLD)
Embedding Preparation
Result

Model Architecture Overview

For more details Click Here.

Bilingual Question Labelling Dataset (BQuLD)

This dataset is a bilingual (traditional Chinese & English) question labelling dataset designed for NLP researchers.
The questinon type definition is borrowed from Intelligent Agent Systems Lab:

The dataset originally consists of 1216 pairs of question and question label, which first published by the author of this GitHub tim5go
There are 9 question types in total, namely:

NUMBER
PERSON
LOCATION
ORGANIZATION
ARTIFACT
TIME
PROCEDURE
AFFIRMATION
CAUSALITY

Embedding Preparation

In my experiment, I built a word2vec model on 全網新聞數據(SogouCA) Sogou Labs

For example, in Linux:

clean XML tag

$ cat news_tensite_xml.dat | iconv -f gbk -t utf-8 -c | grep "<content>" 
  | sed 's\<content>\\' | sed 's\</content>\\' > corpus.txt

word segmentation using LTP command line

$ cws_cmdline --threads 4 --input corpus.txt --segmentor-model cws.model > corpus.seg.txt

simplified to traditional Chinese conversion using OpenCC

$ opencc -i corpus.seg.txt -o corpus_trad.txt -c s2t.json

word2Vec training using Google Word2vec

$ nohup ./word2vec -train corpus_trad.txt -output sogou_vectors.bin -cbow 0 
  -size 200 -window 10 -negative 5 -hs 0 -sample 1e-4 -threads 24 -binary 1 -iter 20 -min-count 1 &

Result

Training Loss	Training Accuracy	Validation Loss	Validation Accuracy
0.7000	87.11%	0.8945	77.87%

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 28

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (5) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

tim5go / Cnn Question Classification Keras

Programming Languages

Labels

Projects that are alternatives of or similar to Cnn Question Classification Keras

Recurrent Convolutional Neural Networks for Chinese Question Classification on BQuLD

Contents

Model Architecture Overview

Bilingual Question Labelling Dataset (BQuLD)

Embedding Preparation

Result