All Projects → supercoderhawk → Dnn_cws

supercoderhawk / Dnn_cws

利用深度学习实现中文分词

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Dnn cws

Nlpcc Wordseg Weibo
NLPCC 2016 微博分词评测项目
Stars: ✭ 120 (+106.9%)
Mutual labels:  chinese-word-segmentation
Nlp4han
中文自然语言处理工具集【断句/分词/词性标注/组块/句法分析/语义分析/NER/N元语法/HMM/代词消解/情感分析/拼写检查】
Stars: ✭ 206 (+255.17%)
Mutual labels:  chinese-word-segmentation
Friso
High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.
Stars: ✭ 313 (+439.66%)
Mutual labels:  chinese-word-segmentation
Deeplearning nlp
基于深度学习的自然语言处理库
Stars: ✭ 154 (+165.52%)
Mutual labels:  chinese-word-segmentation
Lac
百度NLP:分词,词性标注,命名实体识别,词重要性
Stars: ✭ 2,792 (+4713.79%)
Mutual labels:  chinese-word-segmentation
NLPIR-ICTCLAS
The Java Package of NLPIR-ICTCLAS.
Stars: ✭ 16 (-72.41%)
Mutual labels:  chinese-word-segmentation
Chinesenlp
Datasets, SOTA results of every fields of Chinese NLP
Stars: ✭ 1,206 (+1979.31%)
Mutual labels:  chinese-word-segmentation
Jcseg
Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for the latest lucene,solr,elasticsearch
Stars: ✭ 754 (+1200%)
Mutual labels:  chinese-word-segmentation
Monpa
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Stars: ✭ 203 (+250%)
Mutual labels:  chinese-word-segmentation
nlpir-analysis-cn-ictclas
Lucene/Solr Analyzer Plugin. Support MacOS,Linux x86/64,Windows x86/64. It's a maven project, which allows you change the lucene/solr version. //Maven工程,修改Lucene/Solr版本,以兼容相应版本。
Stars: ✭ 71 (+22.41%)
Mutual labels:  chinese-word-segmentation
G2pc
g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese
Stars: ✭ 155 (+167.24%)
Mutual labels:  chinese-word-segmentation
Pyhanlp
中文分词 词性标注 命名实体识别 依存句法分析 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁 自然语言处理
Stars: ✭ 2,564 (+4320.69%)
Mutual labels:  chinese-word-segmentation
Cross-Domain-CWS
Code for IJCAI 2018 paper "Neural Networks Incorporating Unlabeled and Partially-labeled Data for Cross-domain Chinese Word Segmentation"
Stars: ✭ 14 (-75.86%)
Mutual labels:  chinese-word-segmentation
Symspell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Stars: ✭ 1,976 (+3306.9%)
Mutual labels:  chinese-word-segmentation
Symspellpy
Python port of SymSpell
Stars: ✭ 420 (+624.14%)
Mutual labels:  chinese-word-segmentation
Greedycws
Source code for an ACL2017 paper on Chinese word segmentation
Stars: ✭ 88 (+51.72%)
Mutual labels:  chinese-word-segmentation
Jieba Rs
The Jieba Chinese Word Segmentation Implemented in Rust
Stars: ✭ 219 (+277.59%)
Mutual labels:  chinese-word-segmentation
Deepnlp
基于深度学习的自然语言处理库
Stars: ✭ 34 (-41.38%)
Mutual labels:  chinese-word-segmentation
Pkuseg Python
pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation
Stars: ✭ 5,692 (+9713.79%)
Mutual labels:  chinese-word-segmentation
berserker
Berserker - BERt chineSE woRd toKenizER
Stars: ✭ 17 (-70.69%)
Mutual labels:  chinese-word-segmentation

基于深度学习的中文分词

使用TensorFlow实现基于深度学习的中文分词

本项目使用python3编写,没有支持python2的计划。

注:本项目主要是为了进行中文分词等相关自然语言处理研究而创建,暂时不推荐在正式的生产环境使用,另外本项目目前还在开发阶段

使用方法

准备

  1. 安装tensorflow:
pip install tensorflow
  1. clone本项目至本地.

  2. 运行文件init.py,生成训练用数据

开始使用

在本项目文件夹下创建一个文件,在里面添加如下代码并运行:

from seg_dnn import SegDNN
import constant

cws = SegDNN(constant.VOCAB_SIZE,50,constant.DNN_SKIP_WINDOW)
print(cws.seg('我爱北京天安门')[0])

详细示例可见文件test.py

相关代码文件说明

  • seg_dnn.py: 使用(感知机式)神经网络进行中文分词,对应论文1
  • seg_lstm.py: 使用LSTM神经网络进行中文分词,对应论文2
  • seg_mmtnn.py: 使用MMTNN网络进行中分分词,对应论文3
  • prepare_data.py: 预处理语料库,包括msr和pku
  • init.py: 用于生成进行训练和测试的数据的脚本文件

参考论文:

Todo List

  • [ ] 支持pip
  • [ ] 添加更加详细的注释
  • [ ] 提供词性标注功能
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].