Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

A native Tensorflow implementation of semantic segmentation according to Multi-Scale Context Aggregation by Dilated Convolutions (2016). Optionally uses the pretrained weights by the authors.

Stars: ✭ 134 (-4.29%)

Mutual labels: segmentation

Deep Learning For Tracking And Detection

Collection of papers, datasets, code and other resources for object tracking and detection using deep learning

Stars: ✭ 1,920 (+1271.43%)

Mutual labels: segmentation

Ncrfpp

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.

Stars: ✭ 1,767 (+1162.14%)

Mutual labels: lstm

Document Classifier Lstm

A bidirectional LSTM with attention for multiclass/multilabel text classification.

Stars: ✭ 136 (-2.86%)

Mutual labels: lstm

Deeplearningfornlpinpytorch

An IPython Notebook tutorial on deep learning for natural language processing, including structure prediction.

Stars: ✭ 1,744 (+1145.71%)

Mutual labels: lstm

Kiu Net Pytorch

Official Pytorch Code of KiU-Net for Image Segmentation - MICCAI 2020 (Oral)

Stars: ✭ 134 (-4.29%)

Mutual labels: segmentation

Question Pairs Matching

第三届魔镜杯智能客服问题相似性算法设计第12名解决方案

Stars: ✭ 138 (-1.43%)

Mutual labels: lstm

Handwriting Synthesis

Implementation of "Generating Sequences With Recurrent Neural Networks" https://arxiv.org/abs/1308.0850

Stars: ✭ 135 (-3.57%)

Mutual labels: lstm

Lung Segmentation 2d

Lung fields segmentation on CXR images using convolutional neural networks.

Stars: ✭ 138 (-1.43%)

Mutual labels: segmentation

Lstm Crf

A (CNN+)RNN(LSTM/BiLSTM)+CRF model for sequence labelling.😏

Stars: ✭ 134 (-4.29%)

Mutual labels: lstm

Lstm Crypto Price Prediction

Predicting price trends in cryptomarkets using an lstm-RNN for the use of a trading bot

Stars: ✭ 136 (-2.86%)

Mutual labels: lstm

Morfessor

Morfessor is a tool for unsupervised and semi-supervised morphological segmentation

Stars: ✭ 137 (-2.14%)

Mutual labels: segmentation

Actionrecognition

Explore Action Recognition

Stars: ✭ 139 (-0.71%)

Mutual labels: lstm

View All Similar Projects ➔

RichWordSegmentor

RichWordSegmentor is a package for Word Segmentation using transition based neural networks under LibN3L package. It is the state-of-the-art neural word segmentator which supports rich pretraining from external data. With the help of rich pretraining, our model achieves the best result on 5 out of 6 Chinese word segmentation benchmarks. Performance details and model structure can be seen in our ACL paper: Neural word segmentation with rich pretraining.

Demo system:

Download the LibN3L library and configure your system. Please refer to Here
Open CMakeLists.txt and change " ../LibN3L/" into the directory of your LibN3L package.
Run the demo.sh file: sh demo.sh (didn't load pretrained char/bichar embeddings in this demo script.)

The demo system includes Chinese word segmentation sample data "train.debug", "dev.debug" and "test.debug", Chinese word embeding sample file "ctb.50d.word.debug", Chinese char and char bigram pretrained embedding sample file "char.emb" "bichar.emb"and parameter setting file"option.STD". All of these files are gathered at folder RichWordSegmentor/example.

Run:

cmake .
make

Training model:
./STDSeg -l -train ${train.data} -dev ${dev.data} -test ${dev.data} -option ${option.file} -model ${save_model_to_file} -word ${pretrain_word_emb, optional} -char ${pretrain_char_emb, optional} -bichar ${pretrain_bichar_emb, optional} -numlayer ${pretrain_parameters, optional}

Load model:
./STDSeg -test ${test.data} -model ${load_model_file} -output ${output_file}

Input:

For evaluate model performance, word seperated by a space, each sentence take one line. For example:

就做了一点微小的工作，谢谢大家。
一个人的命运啊，当然要靠自我奋斗，但是也要考虑到历史的行程。

Result will calculate the P/R/F automatically.
For raw text decoding, one sentence each line (without space).

就做了一点微小的工作，谢谢大家。
一个人的命运啊，当然要靠自我奋斗，但是也要考虑到历史的行程。

Output:

The same format with training data. Word seperated by a space, each sentence take one line.

就做了一点微小的工作，谢谢大家。
一个人的命运啊，当然要靠自我奋斗，但是也要考虑到历史的行程。

Trained model/embeddings/parameters of rich pretraining and baseline:

We shared our trained model at BaiduPan(https://pan.baidu.com/s/1pLO6T9D) for visiters reproducing our results.

File ctb.bilstm.joint4.model: the trained model on CTB6.0 corpus using multitask pretraining. You can simply load this file to decode raw text without training. Run:

./STDSeg -test ${input_raw_text} -model ctb.bilstm.joint4.model -output ${output_segmentated_text}
File joint4.all.b10c1.2h.iter17.mchar, .mbichar, .pmodel are pretrained character, character bigram embeddings and representing parameters. If you want to train your own model, you can load these three files following above instruction.
File: gigaword_chn.all.a2b.uni.ite50.vec, gigaword_chn.all.a2b.bi.ite50.vec and ctb.50d.vec are the char, bichar and word embeddings of our baseline, respectively.
If you want to do the rich pretraining experiments (for generating three files in last item), please refer to TrainEmbMultiTask.

Monitoring information

During the running of this NER system, it may print out the follow log information:

Iter 13 finished. Total time taken is: 1260.37s
dev:
Recall: P=57508/59929=0.959602, Accuracy: P=57508/59723=0.962912, Fmeasure: 0.961254
Decode dev finished. Total time taken is: 96.299s
test:
Recall: P=77895/81579=0.954841, Accuracy: P=77895/81159=0.959783, Fmeasure: 0.957306
Decode test finished. Total time taken is: 128.9s
Exceeds best previous performance of 0.960922. Saving model file..

The first "Recall..." line shows the performance of the dev set and the second "Recall..." line shows you the performance of the test set.

Note:

Current version only compatible with LibN3L after Dec. 10th 2015 , which contains the model saving and loading module.
The example files are just to verify the running for the code. For copyright consideration, we take only hundreds of sentences as example. Hence the results on those example datasets does not represent the real performance on large dataset.

Cite:

@InProceedings{yang-zhang-dong:2017:Long,
  author    = {Yang, Jie  and  Zhang, Yue  and  Dong, Fei},
  title     = {Neural Word Segmentation with Rich Pretraining},
  booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  month     = {July},
  year      = {2017},
  address   = {Vancouver, Canada},
  publisher = {Association for Computational Linguistics},
  pages     = {839--849},
  url       = {http://aclweb.org/anthology/P17-1078}
}

Update

2017-April-4: init version

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 140

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗