All Projects → koomri → Text Segmentation

koomri / Text Segmentation

Implementation of the paper: Text Segmentation as a Supervised Learning Task

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Text Segmentation

Chatgirl
ChatGirl is an AI ChatBot based on TensorFlow Seq2Seq Model. ChatGirl 一个基于 TensorFlow Seq2Seq 模型的聊天机器人。(包含预处理过的 twitter 英文数据集,训练,运行,工具代码,来波 Star 。)QQ群:167122861
Stars: ✭ 105 (-8.7%)
Mutual labels:  dataset
Stanet
official implementation of the spatial-temporal attention neural network (STANet) for remote sensing image change detection
Stars: ✭ 109 (-5.22%)
Mutual labels:  dataset
Aesthetics
Image Aesthetics Toolkit - includes Fisher Vector implementation, AVA (Image Aesthetic Visual Analysis) dataset and fast multi-threaded downloader
Stars: ✭ 113 (-1.74%)
Mutual labels:  dataset
Race ar baselines
Baselines of the RACE Reading Comprehension Dataset
Stars: ✭ 108 (-6.09%)
Mutual labels:  dataset
Utbm robocar dataset
EU Long-term Dataset with Multiple Sensors for Autonomous Driving
Stars: ✭ 109 (-5.22%)
Mutual labels:  dataset
Bertqa Attention On Steroids
BertQA - Attention on Steroids
Stars: ✭ 112 (-2.61%)
Mutual labels:  dataset
Fma
FMA: A Dataset For Music Analysis
Stars: ✭ 1,391 (+1109.57%)
Mutual labels:  dataset
Know Your Intent
State of the Art results in Intent Classification using Sematic Hashing for three datasets: AskUbuntu, Chatbot and WebApplication.
Stars: ✭ 116 (+0.87%)
Mutual labels:  dataset
Graph Parser
GraphParser is a semantic parser which can convert natural language sentences to logical forms and graphs.
Stars: ✭ 110 (-4.35%)
Mutual labels:  dataset
Iros20 6d Pose Tracking
[IROS 2020] se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains
Stars: ✭ 113 (-1.74%)
Mutual labels:  dataset
Imagenetv2
A new test set for ImageNet
Stars: ✭ 109 (-5.22%)
Mutual labels:  dataset
Personalized Dialog
Code for the paper 'Personalization in Goal-oriented Dialog' (NeurIPS 2017 Conversational AI Workshop)
Stars: ✭ 109 (-5.22%)
Mutual labels:  dataset
Crypto
Cryptocurrency Historical Market Data R Package
Stars: ✭ 112 (-2.61%)
Mutual labels:  dataset
Ua Gec
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
Stars: ✭ 108 (-6.09%)
Mutual labels:  dataset
Protest Detection Violence Estimation
Implementation of the model used in the paper Protest Activity Detection and Perceived Violence Estimation from Social Media Images (ACM Multimedia 2017)
Stars: ✭ 114 (-0.87%)
Mutual labels:  dataset
Faceaging By Cyclegan
Stars: ✭ 105 (-8.7%)
Mutual labels:  dataset
Robust Lane Detection
Stars: ✭ 110 (-4.35%)
Mutual labels:  dataset
Datasets knowledge embedding
Datasets for Knowledge Graph Completion with textual information about the entities
Stars: ✭ 116 (+0.87%)
Mutual labels:  dataset
Pglib Opf
Benchmarks for the Optimal Power Flow Problem
Stars: ✭ 114 (-0.87%)
Mutual labels:  dataset
Autoannotationtool
A label tool aim to reduce semantic segmentation label time, rectangle and polygon annotation is supported
Stars: ✭ 113 (-1.74%)
Mutual labels:  dataset

Text Segmentation as a Supervised Learning Task

This repository contains code and supplementary materials which are required to train and evaluate a model as described in the paper Text Segmentation as a Supervised Learning Task

Downalod required resources

wiki-727K, wiki-50 datasets:

https://www.dropbox.com/sh/k3jh0fjbyr0gw0a/AADzAd9SDTrBnvs1qLCJY5cza?dl=0

word2vec:

https://drive.google.com/a/audioburst.com/uc?export=download&confirm=zrin&id=0B7XkCwpI5KDYNlNUTTlSS21pQmM

Fill relevant paths in configgenerator.py, and execute the script (git repository includes Choi dataset)

Creating an environment:

conda create -n textseg python=2.7 numpy scipy gensim ipython 
source activate textseg
pip install http://download.pytorch.org/whl/cu80/torch-0.3.0-cp27-cp27mu-linux_x86_64.whl 
pip install tqdm pathlib2 segeval tensorboard_logger flask flask_wtf nltk
pip install pandas xlrd xlsxwriter termcolor

How to run training process?

python run.py --help

Example:

python run.py --cuda --model max_sentence_embedding --wiki 

How to evaluate trained model (on wiki-727/choi dataset)?

python test_accuracy.py  --help

Example:

python test_accuracy.py --cuda --model <path_to_model> --wiki

How to create a new wikipedia dataset:

python wiki_processor.py --input <input> --temp <temp_files_folder> --output <output_folder> --train <ratio> --test <ratio>

Input is the full path to the wikipedia dump, temp is the path to the temporary files folder, and output is the path to the newly generated wikipedia dataset.

Wikipedia dump can be downloaded from following url:

https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].