All Projects → jcyk → Cws

jcyk / Cws

Source code for an ACL2016 paper of Chinese word segmentation

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Cws

rakutenma-python
Rakuten MA (Python version)
Stars: ✭ 15 (-81.48%)
Mutual labels:  chinese, word-segmentation
Segmentit
任何 JS 环境可用的中文分词包,fork from leizongmin/node-segment
Stars: ✭ 139 (+71.6%)
Mutual labels:  chinese, segmentation
Chinese Hershey Font
Convert Chinese Characters to Single-Line Fonts using Computer Vision
Stars: ✭ 70 (-13.58%)
Mutual labels:  chinese
Pointclouddatasets
3D point cloud datasets in HDF5 format, containing uniformly sampled 2048 points per shape.
Stars: ✭ 80 (-1.23%)
Mutual labels:  segmentation
Happy Captcha
Happy Captcha是一款易于使用的Java验证码软件包,旨在花最短的时间,最少的代码量,实现Web站点的验证码功能。Happy Captcha完全遵循Apache 2.0开源许可协议,你可以自由使用该软件,如您在使用Happy Captcha时发现软件的任何缺陷,欢迎随时与我联系。
Stars: ✭ 75 (-7.41%)
Mutual labels:  chinese
Rasa nlu chi
Turn Chinese natural language into structured data 中文自然语言理解
Stars: ✭ 1,166 (+1339.51%)
Mutual labels:  chinese
Cnn Paper2
🎨 🎨 深度学习 卷积神经网络教程 :图像识别,目标检测,语义分割,实例分割,人脸识别,神经风格转换,GAN等🎨🎨 https://dataxujing.github.io/CNN-paper2/
Stars: ✭ 77 (-4.94%)
Mutual labels:  segmentation
Nezha chinese pytorch
NEZHA: Neural Contextualized Representation for Chinese Language Understanding
Stars: ✭ 65 (-19.75%)
Mutual labels:  chinese
Prin
Pointwise Rotation-Invariant Network (AAAI 2020)
Stars: ✭ 81 (+0%)
Mutual labels:  segmentation
Chinese Word Vectors
100+ Chinese Word Vectors 上百种预训练中文词向量
Stars: ✭ 9,548 (+11687.65%)
Mutual labels:  chinese
Brandenburg
Laravel Authentication Package
Stars: ✭ 79 (-2.47%)
Mutual labels:  acl
Hacker Laws Zh
💻📖对开发人员有用的定律、理论、原则和模式。(Laws, Theories, Principles and Patterns that developers will find useful.)
Stars: ✭ 9,446 (+11561.73%)
Mutual labels:  chinese
Calendar
Android日历 仿小米 华为 滴答清单 365日历(农历),周视图 月视图 平滑滚动 节假日 五六行周切换 week or month calendar
Stars: ✭ 1,183 (+1360.49%)
Mutual labels:  chinese
Awesome Telegram Cn
telegram 开发资源、机器人资源整理
Stars: ✭ 78 (-3.7%)
Mutual labels:  chinese
Learn Vim
Vim 实操教程(Learning Vim)Vim practical tutorial.
Stars: ✭ 1,166 (+1339.51%)
Mutual labels:  chinese
Py3aiml chinese
官方py3AIML基于英文,现为其增加中文支持,并将代码注释翻译为中文。实测可正常解析带中文pattern和template的aiml文件。
Stars: ✭ 80 (-1.23%)
Mutual labels:  chinese
Deep Segmentation
CNNs for semantic segmentation using Keras library
Stars: ✭ 69 (-14.81%)
Mutual labels:  segmentation
Mit Deep Learning
Tutorials, assignments, and competitions for MIT Deep Learning related courses.
Stars: ✭ 8,912 (+10902.47%)
Mutual labels:  segmentation
Laravel Vue Starter
Well Documented Laravel Starter App From Development to Production. For Full Blown RESTFUL API and SPA with Beautiful UI Using Buefy / ElementUi For Reusable Vue Components
Stars: ✭ 76 (-6.17%)
Mutual labels:  acl
Handy
Hand detection software built with OpenCV.
Stars: ✭ 81 (+0%)
Mutual labels:  segmentation

CWS

This code implements the word segmentation algorithm proposed in the following paper.

Deng Cai and Hai Zhao, Neural Word Segmentation Learing for Chinese. ACL 2016.

Lastest update! We improved the system, the corresponding paper was accepted to ACL2017, with source code at this repo.

Update! a faster implementation using dynet as backend is now available. python train.py -d to use the new (dynet based) version.

Usage (theano, also helpful to dynet version):

- train

python train.py -t. To train a model, first check the hyperparameter settings in train.py. The training procedure will result a config file at the very beginning in which your hyperparameter settings are preserved, and output the trained model parameters to *.npz per epoch.

- test

python test.py params.npz input_file output_path config_file. To test a trained model, specify the file that stores the model parameters as params.npz as well as the corresponding configuration file config_file. The test procedure will read data from input_file and output result to output_path.

- evaluate

For example, To see the best result (F1-score 95.5) on PKU dataset reported in our paper, first generate the output file through the trained model ( python test.py best_pku.npz ../data/pku_test somepath best_pku_config), then use the command ./score ../data/dic ../data/pku_test somepath.

Dependencies:

Thanks to those excellent computing tools: Dynet, Theano, Numpy, Gensim.

Author:

Deng Cai. Any question, feel free to contact me through my email.

Citation:

If you find this code useful, please cite our paper.

@InProceedings{cai-zhao:2016:P16-1,
  author    = {Cai, Deng  and  Zhao, Hai},
  title     = {Neural Word Segmentation Learning for Chinese},
  booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  month     = {August},
  year      = {2016},
  address   = {Berlin, Germany},
  publisher = {Association for Computational Linguistics},
  pages     = {409--420},
  url       = {http://www.aclweb.org/anthology/P16-1039}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].