All Projects → egrcc → Cross-Domain-CWS

egrcc / Cross-Domain-CWS

Licence: MIT license
Code for IJCAI 2018 paper "Neural Networks Incorporating Unlabeled and Partially-labeled Data for Cross-domain Chinese Word Segmentation"

Programming Languages

python
139335 projects - #7 most used programming language
perl
6916 projects

Projects that are alternatives of or similar to Cross-Domain-CWS

Deeplearning nlp
基于深度学习的自然语言处理库
Stars: ✭ 154 (+1000%)
Mutual labels:  chinese-word-segmentation
CDFSL-ATA
[IJCAI 2021] Cross-Domain Few-Shot Classification via Adversarial Task Augmentation
Stars: ✭ 21 (+50%)
Mutual labels:  cross-domain
CrossNER
CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)
Stars: ✭ 87 (+521.43%)
Mutual labels:  cross-domain
Jiagu
Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类
Stars: ✭ 2,368 (+16814.29%)
Mutual labels:  chinese-word-segmentation
Nlp4han
中文自然语言处理工具集【断句/分词/词性标注/组块/句法分析/语义分析/NER/N元语法/HMM/代词消解/情感分析/拼写检查】
Stars: ✭ 206 (+1371.43%)
Mutual labels:  chinese-word-segmentation
Zoid
Cross domain components
Stars: ✭ 1,672 (+11842.86%)
Mutual labels:  cross-domain
Nlpcc Wordseg Weibo
NLPCC 2016 微博分词评测项目
Stars: ✭ 120 (+757.14%)
Mutual labels:  chinese-word-segmentation
PostEvent
A Cross-Domain Event Handler javascript library. Pure Vanilla JS, no dependencies.
Stars: ✭ 14 (+0%)
Mutual labels:  cross-domain
Jieba Rs
The Jieba Chinese Word Segmentation Implemented in Rust
Stars: ✭ 219 (+1464.29%)
Mutual labels:  chinese-word-segmentation
TraND
This is the code for the paper "Jinkai Zheng, Xinchen Liu, Chenggang Yan, Jiyong Zhang, Wu Liu, Xiaoping Zhang and Tao Mei: TraND: Transferable Neighborhood Discovery for Unsupervised Cross-domain Gait Recognition. ISCAS 2021" (Best Paper Award - Honorable Mention)
Stars: ✭ 32 (+128.57%)
Mutual labels:  cross-domain
Pyhanlp
中文分词 词性标注 命名实体识别 依存句法分析 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁 自然语言处理
Stars: ✭ 2,564 (+18214.29%)
Mutual labels:  chinese-word-segmentation
Monpa
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Stars: ✭ 203 (+1350%)
Mutual labels:  chinese-word-segmentation
Hprose Php
Hprose is a cross-language RPC. This project is Hprose 3.0 for PHP
Stars: ✭ 1,952 (+13842.86%)
Mutual labels:  cross-domain
G2pc
g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese
Stars: ✭ 155 (+1007.14%)
Mutual labels:  chinese-word-segmentation
NLPIR-ICTCLAS
The Java Package of NLPIR-ICTCLAS.
Stars: ✭ 16 (+14.29%)
Mutual labels:  chinese-word-segmentation
Symspell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Stars: ✭ 1,976 (+14014.29%)
Mutual labels:  chinese-word-segmentation
Xdomain
A pure JavaScript CORS alternative
Stars: ✭ 3,051 (+21692.86%)
Mutual labels:  cross-domain
Audio2Guitarist-GAN
Two-stage GANs that generate fingerstyle guitarist images from audio.
Stars: ✭ 53 (+278.57%)
Mutual labels:  cross-domain
iframe-communication
Basic two way iframe communication
Stars: ✭ 88 (+528.57%)
Mutual labels:  cross-domain
Awesome-Cross-Domain-Person-Re-identification
Awesome-Cross-Domain-Person-Re-identification
Stars: ✭ 17 (+21.43%)
Mutual labels:  cross-domain

Cross-Domain-CWS

About

A TensorFlow implementation of IJCAI 2018 paper "Neural Networks Incorporating Unlabeled and Partially-labeled Data for Cross-domain Chinese Word Segmentation".

Model Overview

model

Figure: Architecture of our proposed model. It mainly includes three components: the forward language model (pink), backward language model (yellow), and BiLSTM segmentation model (blue). We use a gate mechanism to control the influence of the language models on the segmentation model. The outputs of language models are not shown for simplicity. In this example, we assume that “c1c2c3” is a word.

Requirements

  • Python: 2.7
  • TensorFlow >= 1.4.1 (The used version for experiments in our paper is 1.4.1)

How to run

  1. Bulid vocabulary:

    python utils_data.py
  2. Train a model:

    python train.py --model lstmlm --source ctb --target zx --pl True --memory 1.0
  3. Test a model:

    python test.py --model lstmlm --source ctb --target zx --pl True --memory 1.0
  4. Evaluate a model:

    python eval.py ctb zx lstmlm_ctb_True

Citation

If you find the code helpful, please cite the following paper:

Lujun Zhao, Qi Zhang, Peng Wang and Xiaoyu Liu, Neural Networks Incorporating Unlabeled and Partially-labeled Data for Cross-domain Chinese Word Segmentation, In Proceedings of the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence (IJCAI-ECAI-18), July 9-19, 2018, Stockholm, Sweden.

@InProceedings{zhao2018cws,
  author    = {Zhao, Lujun and Zhang, Qi and Wang, Peng and Liu, Xiaoyu},
  title     = {Neural Networks Incorporating Unlabeled and Partially-labeled Data for Cross-domain Chinese Word Segmentation},
  booktitle = {Proceedings of the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence (IJCAI-ECAI-18)},
  year      = {2018},
  address   = {Stockholm, Sweden}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].