Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → koomri → Text Segmentation

koomri / Text Segmentation

Implementation of the paper: Text Segmentation as a Supervised Learning Task

Programming Languages

python

139335 projects - #7 most used programming language

Labels

deep-learning machine-learning nlp neural-network dataset

Projects that are alternatives of or similar to Text Segmentation

Chatgirl

ChatGirl is an AI ChatBot based on TensorFlow Seq2Seq Model. ChatGirl 一个基于 TensorFlow Seq2Seq 模型的聊天机器人。（包含预处理过的 twitter 英文数据集，训练，运行，工具代码，来波 Star 。）QQ群：167122861

Stars: ✭ 105 (-8.7%)

Mutual labels: dataset

Stanet

official implementation of the spatial-temporal attention neural network (STANet) for remote sensing image change detection

Stars: ✭ 109 (-5.22%)

Mutual labels: dataset

Aesthetics

Image Aesthetics Toolkit - includes Fisher Vector implementation, AVA (Image Aesthetic Visual Analysis) dataset and fast multi-threaded downloader

Stars: ✭ 113 (-1.74%)

Mutual labels: dataset

Race ar baselines

Baselines of the RACE Reading Comprehension Dataset

Stars: ✭ 108 (-6.09%)

Mutual labels: dataset

Utbm robocar dataset

EU Long-term Dataset with Multiple Sensors for Autonomous Driving

Stars: ✭ 109 (-5.22%)

Mutual labels: dataset

Bertqa Attention On Steroids

BertQA - Attention on Steroids

Stars: ✭ 112 (-2.61%)

Mutual labels: dataset

Fma

FMA: A Dataset For Music Analysis

Stars: ✭ 1,391 (+1109.57%)

Mutual labels: dataset

Know Your Intent

State of the Art results in Intent Classification using Sematic Hashing for three datasets: AskUbuntu, Chatbot and WebApplication.

Stars: ✭ 116 (+0.87%)

Mutual labels: dataset

Graph Parser

GraphParser is a semantic parser which can convert natural language sentences to logical forms and graphs.

Stars: ✭ 110 (-4.35%)

Mutual labels: dataset

Iros20 6d Pose Tracking

[IROS 2020] se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains

Stars: ✭ 113 (-1.74%)

Mutual labels: dataset

Imagenetv2

A new test set for ImageNet

Stars: ✭ 109 (-5.22%)

Mutual labels: dataset

Personalized Dialog

Code for the paper 'Personalization in Goal-oriented Dialog' (NeurIPS 2017 Conversational AI Workshop)

Stars: ✭ 109 (-5.22%)

Mutual labels: dataset

Crypto

Cryptocurrency Historical Market Data R Package

Stars: ✭ 112 (-2.61%)

Mutual labels: dataset

Ua Gec

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

Stars: ✭ 108 (-6.09%)

Mutual labels: dataset

Protest Detection Violence Estimation

Implementation of the model used in the paper Protest Activity Detection and Perceived Violence Estimation from Social Media Images (ACM Multimedia 2017)

Stars: ✭ 114 (-0.87%)

Mutual labels: dataset

Faceaging By Cyclegan

Stars: ✭ 105 (-8.7%)

Mutual labels: dataset

Robust Lane Detection

Stars: ✭ 110 (-4.35%)

Mutual labels: dataset

Datasets knowledge embedding

Datasets for Knowledge Graph Completion with textual information about the entities

Stars: ✭ 116 (+0.87%)

Mutual labels: dataset

Pglib Opf

Benchmarks for the Optimal Power Flow Problem

Stars: ✭ 114 (-0.87%)

Mutual labels: dataset

Autoannotationtool

A label tool aim to reduce semantic segmentation label time, rectangle and polygon annotation is supported

Stars: ✭ 113 (-1.74%)

Mutual labels: dataset

View All Similar Projects ➔

Text Segmentation as a Supervised Learning Task

This repository contains code and supplementary materials which are required to train and evaluate a model as described in the paper Text Segmentation as a Supervised Learning Task

Downalod required resources

wiki-727K, wiki-50 datasets:

https://www.dropbox.com/sh/k3jh0fjbyr0gw0a/AADzAd9SDTrBnvs1qLCJY5cza?dl=0

word2vec:

https://drive.google.com/a/audioburst.com/uc?export=download&confirm=zrin&id=0B7XkCwpI5KDYNlNUTTlSS21pQmM

Fill relevant paths in configgenerator.py, and execute the script (git repository includes Choi dataset)

Creating an environment:

conda create -n textseg python=2.7 numpy scipy gensim ipython 
source activate textseg
pip install http://download.pytorch.org/whl/cu80/torch-0.3.0-cp27-cp27mu-linux_x86_64.whl 
pip install tqdm pathlib2 segeval tensorboard_logger flask flask_wtf nltk
pip install pandas xlrd xlsxwriter termcolor

How to run training process?

python run.py --help

Example:

python run.py --cuda --model max_sentence_embedding --wiki

How to evaluate trained model (on wiki-727/choi dataset)?

python test_accuracy.py  --help

Example:

python test_accuracy.py --cuda --model <path_to_model> --wiki

How to create a new wikipedia dataset:

python wiki_processor.py --input <input> --temp <temp_files_folder> --output <output_folder> --train <ratio> --test <ratio>

Input is the full path to the wikipedia dump, temp is the path to the temporary files folder, and output is the path to the newly generated wikipedia dataset.

Wikipedia dump can be downloaded from following url:

https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 115

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (9) 🔗