All Projects → NLPchina → ansj_seg

NLPchina / ansj_seg

Licence: Apache-2.0 license
ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to ansj seg

rasa bot
整理:基于Rasa-NLU和Rasa-Core的任务型ChatBot
Stars: ✭ 51 (-99.18%)
Mutual labels:  chinese
weapp-poem
诗词墨客 - 最全中华古诗词小程序
Stars: ✭ 409 (-93.42%)
Mutual labels:  chinese
awesome-malware-analysis
Defund the Police.
Stars: ✭ 9,181 (+47.77%)
Mutual labels:  chinese
lwodf
The Chinese edition of Live Working or Die Fighting: How the Working Class Went Global (劳工的全球化), authored by Paul Mason, translated by the CNPolitics translation team.
Stars: ✭ 25 (-99.6%)
Mutual labels:  chinese
DataCLUE
DataCLUE: 数据为中心的NLP基准和工具包
Stars: ✭ 133 (-97.86%)
Mutual labels:  chinese
designing-with-libreoffice
The work to translate Designing with LibreOffice book into traditional Chinese.
Stars: ✭ 17 (-99.73%)
Mutual labels:  chinese
flask-docs-zh
Flask 文档简体中文翻译
Stars: ✭ 93 (-98.5%)
Mutual labels:  chinese
resume
My Chinese and English Resumes in LaTeX with Font Awesome 5
Stars: ✭ 296 (-95.24%)
Mutual labels:  chinese
hsk-vocabulary
🇨🇳Open source Chinese HSK vocabulary list with example sentences
Stars: ✭ 27 (-99.57%)
Mutual labels:  chinese
Margoulineur2000
NFC
Stars: ✭ 24 (-99.61%)
Mutual labels:  chinese
react-flashcards
A simple React + Firebase flashcard application
Stars: ✭ 29 (-99.53%)
Mutual labels:  chinese
shudu
Shudu 為一個開源文字處理平台,目的是讓閱讀者能夠舒服的閱讀、編寫文案。
Stars: ✭ 25 (-99.6%)
Mutual labels:  chinese
awesome-react-cn
收集react库,项目,文章,vscode插件的中文仓库,更新中
Stars: ✭ 20 (-99.68%)
Mutual labels:  chinese
iTop-CN
iTop in chinese
Stars: ✭ 36 (-99.42%)
Mutual labels:  chinese
Compressed-Punctuation-Sans
包含基础标点挤压效果的中文标点符号字体
Stars: ✭ 49 (-99.21%)
Mutual labels:  chinese
fuzzychinese
A small package to fuzzy match chinese words
Stars: ✭ 50 (-99.2%)
Mutual labels:  chinese
han
Using Tensorflow to train a model to detect miswritten Chinese characters.
Stars: ✭ 12 (-99.81%)
Mutual labels:  chinese
chinese-rhymer
轻量中文押韵神器,100%绝对可用,傻瓜式命令行操作,秒速实现烈焰单押,闪电双押,龙卷三押以及海啸式四押,目前版本 v0.2.6。Search for rhymes for Chinese words, with 1, 2, 3 and 4 characters, released on PyPI with current version of 0.2.6.
Stars: ✭ 72 (-98.84%)
Mutual labels:  chinese
chinese-calendar-golang
📅 公历, 农历, 干支历转换包, 提供精确的日历转换.
Stars: ✭ 104 (-98.33%)
Mutual labels:  chinese
goSpider
some small project and some articles
Stars: ✭ 56 (-99.1%)
Mutual labels:  chinese

Ansj中文分词

1.X Build Status Gitter

使用帮助
摘要

这是一个基于n-Gram+CRF+HMM的中文分词的java实现。

分词速度达到每秒钟大约200万字左右(mac air下测试),准确率能达到96%以上。

目前实现了中文分词、中文姓名识别、用户自定义词典、关键字提取、自动摘要、关键字标记等功能。

可以应用到自然语言处理等方面,适用于对分词效果要求高的各种项目。

maven
        
        <dependency>
            <groupId>org.ansj</groupId>
            <artifactId>ansj_seg</artifactId>
            <version>5.1.1</version>
        </dependency>
    
调用demo

如果你第一次下载只想测试测试效果可以调用这个简易接口


 String str = "欢迎使用ansj_seg,(ansj中文分词)在这里如果你遇到什么问题都可以联系我.我一定尽我所能.帮助大家.ansj_seg更快,更准,更自由!" ;
 System.out.println(ToAnalysis.parse(str));
 
 欢迎/v,使用/v,ansj/en,_,seg/en,,,(,ansj/en,中文/nz,分词/n,),在/p,这里/r,如果/c,你/r,遇到/v,什么/r,问题/n,都/d,可以/v,联系/v,我/r,./m,我/r,一定/d,尽我所能/l,./m,帮助/v,大家/r,./m,ansj/en,_,seg/en,更快/d,,,更/d,准/a,,,更/d,自由/a,!
Join Us

想了很久,不管有没有人帮忙吧。我写上来,如果你有兴趣,有热情可以联系我。

  • 补充文档,增加调用实例和说明
  • 增加一些规则性Recognition,举例身份证号码识别,目前未完成的有 时间识别IP地址识别邮箱识别网址识别词性识别等...
  • 提供更加优化的CRF模型。替换ansj的默认模型。
  • 补充测试用例,n多地方测试不完全。如果你有兴趣可以帮忙啦!
  • 重构人名识别模型。增加机构名识别等模型。
  • 增加句法文法分析
  • 实现lstm的分词方式
  • 拾遗补漏...
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].