Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → hecongqing → 2018 Daguan Competition

hecongqing / 2018 Daguan Competition

2018年"达观杯"文本智能处理挑战赛-长文本分类-rank4

Labels

jupyter-notebook

Projects that are alternatives of or similar to 2018 Daguan Competition

Gulius Projects

收录古柳（DesertsX）的一些小项目

Stars: ✭ 252 (-1.56%)

Mutual labels: jupyter-notebook

Complete Python 3 Bootcamp

Course Files for Complete Python 3 Bootcamp Course on Udemy

Stars: ✭ 18,322 (+7057.03%)

Mutual labels: jupyter-notebook

Spacy Notebooks

💫 Jupyter notebooks for spaCy examples and tutorials

Stars: ✭ 255 (-0.39%)

Mutual labels: jupyter-notebook

Cardiac Segmentation

Right Ventricle Cardiac MRI Segmentation

Stars: ✭ 253 (-1.17%)

Mutual labels: jupyter-notebook

Practicalsessions2020

Repository for tutorial sessions at EEML2020

Stars: ✭ 252 (-1.56%)

Mutual labels: jupyter-notebook

Football Crunching

Analysis and datasets about football (soccer)

Stars: ✭ 252 (-1.56%)

Mutual labels: jupyter-notebook

Pythontutorial

From Microsoft's FREE Edx course

Stars: ✭ 253 (-1.17%)

Mutual labels: jupyter-notebook

Amazing Python Scripts

🚀 Curated collection of Amazing Python scripts from Basics to Advance with automation task scripts.

Stars: ✭ 229 (-10.55%)

Mutual labels: jupyter-notebook

Contextily

Context geo-tiles in Python

Stars: ✭ 254 (-0.78%)

Mutual labels: jupyter-notebook

Deep Learning From Scratch

Six snippets of code that made deep learning what it is today.

Stars: ✭ 255 (-0.39%)

Mutual labels: jupyter-notebook

Machinelearninginaction3x

Source Code for Machine Learning in Action for Python 3.X

Stars: ✭ 253 (-1.17%)

Mutual labels: jupyter-notebook

Visual genome python driver

A python wrapper for the Visual Genome API

Stars: ✭ 253 (-1.17%)

Mutual labels: jupyter-notebook

Keras Bert

A simple technique to integrate BERT from tf hub to keras

Stars: ✭ 255 (-0.39%)

Mutual labels: jupyter-notebook

Speech recognition with tensorflow

Implementation of a seq2seq model for Speech Recognition using the latest version of TensorFlow. Architecture similar to Listen, Attend and Spell.

Stars: ✭ 253 (-1.17%)

Mutual labels: jupyter-notebook

Stock Analysis

Regression, Scrapers, and Visualization

Stars: ✭ 255 (-0.39%)

Mutual labels: jupyter-notebook

Tensorflow2.0 Examples

Jupyter notebooks to help you started with tensorflow 2.0

Stars: ✭ 253 (-1.17%)

Mutual labels: jupyter-notebook

Pytudes

Python programs, usually short, of considerable difficulty, to perfect particular skills.

Stars: ✭ 17,219 (+6626.17%)

Mutual labels: jupyter-notebook

Causality

Stars: ✭ 252 (-1.56%)

Mutual labels: jupyter-notebook

Sipmask

SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation (ECCV2020)

Stars: ✭ 255 (-0.39%)

Mutual labels: jupyter-notebook

Learning To Reweight Examples

PyTorch Implementation of the paper Learning to Reweight Examples for Robust Deep Learning

Stars: ✭ 255 (-0.39%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

2018年"达观杯"文本智能处理挑战赛-长文本分类-rank4

非常感谢达观杯给我们提供这次机会以及科赛平台提供了很棒的GPU，再次感谢。

赛题网址：

“达观杯”文本智能处理挑战赛

任务：

达观数据提供了一批长文本数据和分类信息，结合当下最先进的NLP和人工智能技术，深入分析文本内在结构和语义信息，构建文本分类模型，实现精准分类。

数据集

https://pan.baidu.com/s/13IMDPMz0rf8kM1JAea53uQ

password: y6m4

解决方案：

由于部分代码暂时有用，现在只公开一个单模型：B榜单模型分数可达到0.798.

对于这个文本分类任务，有个小的操作其实都可以达到很高的分数，即使模型不够优秀。通过对于词向量做一个增强，即利用word2vec与glove的差异性，构建一个鲁棒性更高的词语向量表征。大家也可以试试word2vec+glove+faxttext的组合，对于我来说，效果并不是很好，我觉得可能的原因是faxttext与word2vec的相似性很高，弱化了glove的向量表征，同时，对于glove单独的词向量我也没有尝试过，大家也可以尝试一下。

对于模型的话，我开源了一个双层的biGruModel模型，最近也开源了rnnCapsuleModel，希望大家可以取得更好的成绩！

运行环境

tensorflow-gpu>=1.10.0
keras==2.16.0
gensim==3.6.0
scikit-learn==0.20.2

模型运行：

1、将原始数据集input到data文件夹

2、运行 python read_data.py,从而将原始数据的csv格式转化为feather格式（因为feather格式读取数据较快）

3、由于应用到glove算法生成词向量和字向量，且没有python接口，我们使用斯坦福大学开源的C语言版本的glove库。

生成词向量

（1）python glove_word.py (生成glove所需要的格式的词向量)

（2） make & sh glove_word.sh (生成词向量)

（3）将生成的词向量(glove_vectors_word.txt)放入embedding 文件夹下

python glove_word.py
make & sh glove_word.sh

生成字向量

（1）python glove_char.py (生成glove所需要的格式的字向量)

（2） make & sh glove_char.sh (生成字向量)

（3）将生成的词向量(glove_vectors_char.txt)放入embedding 文件夹下

python glove_char.py
make & sh glove_char.sh

4、运行模型：

biGruModel:

CUDA_VISIBLE_DEVICES=0 python main_glove_word2vec.py  --gpu="0" --column_name="word_seg" --word_seq_len=1800 --embedding_vector=200 --num_words=500000 --model_name="bi_gru_model" --batch_size=128 --KFold=10 --classification=19

rnnCapsuleModel:

CUDA_VISIBLE_DEVICES=0 python main.py  --gpu="0" --column_name="word_seg" --word_seq_len=1800 --embedding_vector=200 --num_words=500000 --model_name="Gru_Capsule_Model" --batch_size=128 --KFold=10 --classification=19

备注：如果gpu 较小，batch_size 可以设置较小一点

所有的命令都封装在 sh run.sh （很简单一个命令）！

sh run.sh

大概的先介绍到这里，有时间在介绍啦！

下面打个小广告啦！最近开了个公众号，希望和大家一起学习，成长。

关于我们

AI算法之心是一个介绍python、pyspark、机器学习、自然语言处理、深度学习、算法竞赛的平台。不管你是刚入门的小白，还是资深的算法大佬，欢迎扫一扫下方的二维码与我们在AI的领域中一起学习成长！

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 256

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗