Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.

Stars: ✭ 313 (+439.66%)

Mutual labels: chinese-word-segmentation

Deeplearning nlp

基于深度学习的自然语言处理库

Stars: ✭ 154 (+165.52%)

Mutual labels: chinese-word-segmentation

Lac

百度NLP：分词，词性标注，命名实体识别，词重要性

Stars: ✭ 2,792 (+4713.79%)

Mutual labels: chinese-word-segmentation

NLPIR-ICTCLAS

The Java Package of NLPIR-ICTCLAS.

Stars: ✭ 16 (-72.41%)

Mutual labels: chinese-word-segmentation

Chinesenlp

Datasets, SOTA results of every fields of Chinese NLP

Stars: ✭ 1,206 (+1979.31%)

Mutual labels: chinese-word-segmentation

Jcseg

Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for the latest lucene,solr,elasticsearch

Stars: ✭ 754 (+1200%)

Mutual labels: chinese-word-segmentation

Monpa

MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型

Stars: ✭ 203 (+250%)

Mutual labels: chinese-word-segmentation

nlpir-analysis-cn-ictclas

Lucene/Solr Analyzer Plugin. Support MacOS,Linux x86/64,Windows x86/64. It's a maven project, which allows you change the lucene/solr version. //Maven工程，修改Lucene/Solr版本，以兼容相应版本。

Stars: ✭ 71 (+22.41%)

Mutual labels: chinese-word-segmentation

G2pc

g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese

Stars: ✭ 155 (+167.24%)

Mutual labels: chinese-word-segmentation

Pyhanlp

中文分词词性标注命名实体识别依存句法分析新词发现关键词短语提取自动摘要文本分类聚类拼音简繁自然语言处理

Stars: ✭ 2,564 (+4320.69%)

Mutual labels: chinese-word-segmentation

Cross-Domain-CWS

Code for IJCAI 2018 paper "Neural Networks Incorporating Unlabeled and Partially-labeled Data for Cross-domain Chinese Word Segmentation"

Stars: ✭ 14 (-75.86%)

Mutual labels: chinese-word-segmentation

Symspell

SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

Stars: ✭ 1,976 (+3306.9%)

Mutual labels: chinese-word-segmentation

Symspellpy

Python port of SymSpell

Stars: ✭ 420 (+624.14%)

Mutual labels: chinese-word-segmentation

Greedycws

Source code for an ACL2017 paper on Chinese word segmentation

Stars: ✭ 88 (+51.72%)

Mutual labels: chinese-word-segmentation

Jieba Rs

The Jieba Chinese Word Segmentation Implemented in Rust

Stars: ✭ 219 (+277.59%)

Mutual labels: chinese-word-segmentation

Deepnlp

基于深度学习的自然语言处理库

Stars: ✭ 34 (-41.38%)

Mutual labels: chinese-word-segmentation

Pkuseg Python

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

Stars: ✭ 5,692 (+9713.79%)

Mutual labels: chinese-word-segmentation

berserker

Berserker - BERt chineSE woRd toKenizER

Stars: ✭ 17 (-70.69%)

Mutual labels: chinese-word-segmentation

View All Similar Projects ➔

基于深度学习的中文分词

使用TensorFlow实现基于深度学习的中文分词

本项目使用python3编写，没有支持python2的计划。

注：本项目主要是为了进行中文分词等相关自然语言处理研究而创建，暂时不推荐在正式的生产环境使用，另外本项目目前还在开发阶段

使用方法

准备

安装tensorflow：

pip install tensorflow

clone本项目至本地.
运行文件init.py，生成训练用数据

开始使用

在本项目文件夹下创建一个文件，在里面添加如下代码并运行：

from seg_dnn import SegDNN
import constant

cws = SegDNN(constant.VOCAB_SIZE,50,constant.DNN_SKIP_WINDOW)
print(cws.seg('我爱北京天安门')[0])

详细示例可见文件test.py

参考论文：

deep learning for chinese word segmentation and pos tagging （已完全实现，文件seg_dnn.py）
Long Short-Term Memory Neural Networks for Chinese Word Segmentation (基本实现，正在改进，文件seg_lstm.py)
Max-Margin Tensor Neural Network for Chinese Word Segmentation （正在实现，文件seg_mmtnn.py）

Todo List

[ ] 支持pip
[ ] 添加更加详细的注释
[ ] 提供词性标注功能

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 58

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

supercoderhawk / Dnn_cws

Programming Languages

Labels

Projects that are alternatives of or similar to Dnn cws

基于深度学习的中文分词

使用方法

准备

开始使用

相关代码文件说明

参考论文：

Todo List