All Projects → howl-anderson → PaddleTokenizer

howl-anderson / PaddleTokenizer

Licence: AGPL-3.0 License
使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎 | A DNN Chinese Tokenizer by Using PaddlePaddle

Programming Languages

javascript
184084 projects - #8 most used programming language
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to PaddleTokenizer

PLSC
Paddle Large Scale Classification Tools,supports ArcFace, CosFace, PartialFC, Data Parallel + Model Parallel. Model includes ResNet, ViT, DeiT, FaceViT.
Stars: ✭ 113 (+707.14%)
Mutual labels:  paddle, paddlepaddle
Paddle-PerceptualSimilarity
LPIPS metric on PaddlePaddle. pip install paddle-lpips
Stars: ✭ 22 (+57.14%)
Mutual labels:  paddle, paddlepaddle
Paddle-Adversarial-Toolbox
Paddle-Adversarial-Toolbox (PAT) is a Python library for Deep Learning Security based on PaddlePaddle.
Stars: ✭ 16 (+14.29%)
Mutual labels:  paddle, paddlepaddle
chinese-tokenizer
Tokenizes Chinese texts into words.
Stars: ✭ 72 (+414.29%)
Mutual labels:  tokenizer, chinese
Paddle-SEQ
低代码序列数据处理框架,最短两行即可完成训练任务!
Stars: ✭ 13 (-7.14%)
Mutual labels:  paddle, paddlepaddle
PaddlePaddle-Tutorial
PaddlePaddle Tutorial for Deep Learning Researchers.
Stars: ✭ 27 (+92.86%)
Mutual labels:  paddlepaddle
alfred-chinese-converter
支持 OpenCC 簡繁體中文詞彙級別轉換、異體字轉換以及地區習慣用詞轉換的 Alfred 2 workflow
Stars: ✭ 42 (+200%)
Mutual labels:  chinese
Functional-Light-JS-Zh
《Functional-Light-JS》中文翻译
Stars: ✭ 14 (+0%)
Mutual labels:  chinese
wasm-cn
[翻译中] WebAssembly 中文文档
Stars: ✭ 22 (+57.14%)
Mutual labels:  chinese
chinese-nlp-ner
一套针对中文实体识别的BLSTM-CRF解决方案
Stars: ✭ 14 (+0%)
Mutual labels:  chinese
hzk-pixel-font
中文像素字体,12 和 16 像素。
Stars: ✭ 14 (+0%)
Mutual labels:  chinese
LightLM
高性能小模型测评 Shared Tasks in NLPCC 2020. Task 1 - Light Pre-Training Chinese Language Model for NLP Task
Stars: ✭ 54 (+285.71%)
Mutual labels:  chinese
Smart container
🍰🍎ColugoMum--Intelligent Retail Settlement Platform can accurately locate and identify each commodity, and can return a complete shopping list and the actual total price of commodities that customers should pay.
Stars: ✭ 141 (+907.14%)
Mutual labels:  paddlepaddle
bredon
A modern CSS value compiler in JavaScript
Stars: ✭ 39 (+178.57%)
Mutual labels:  tokenizer
pbrtbook
pbrt 中文整合翻译 基于物理的渲染:从理论到实现 Physically Based Rendering: From Theory To Implementation
Stars: ✭ 221 (+1478.57%)
Mutual labels:  chinese
WMPoetry
The source codes of Working Memory model for Chinese poetry generation (IJCAI 2018).
Stars: ✭ 49 (+250%)
Mutual labels:  chinese
neural network papers
记录一些读过的论文,给出个人对论文的评分情况并简述论文insight
Stars: ✭ 152 (+985.71%)
Mutual labels:  chinese
kaldi-timit-sre-ivector
Develop speaker recognition model based on i-vector using TIMIT database
Stars: ✭ 17 (+21.43%)
Mutual labels:  chinese
fishing-funds
基金,大盘,股票,虚拟货币状态栏显示小应用,基于Electron开发,支持MacOS,Windows,Linux客户端,数据源来自天天基金,蚂蚁基金,爱基金,腾讯证券,新浪基金等
Stars: ✭ 424 (+2928.57%)
Mutual labels:  chinese
simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Stars: ✭ 32 (+128.57%)
Mutual labels:  tokenizer

PaddleTokenizer

使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎

神经网络结构

BiLSTM+CRF

环境要求

Python 3.5+

安装

从 PYPI 安装:

pip install paddle_tokenizer

或者本地安装:

pip install -e ./

预训练模型

下载 release 中的模型文件,解压缩后将目录 test.inference.model 放置到本项目的根目录下即可。

本地训练模型

下载数据

人民日报语料处理工具集,将 conll 格式的 train.txttest.txt 放到 data 目录下。

训练

python -m paddle_tokenizer.train

在 2017 款 MacBook Pro (2.5 GHz Intel Core i7) 上训练耗时约十分钟。

命令行使用

from paddle_tokenizer.server import server

result = server("王小明在北京的清华大学读书。")
print(result)

输出:

['王', '小明', '在', '北京', '的', '清华', '大学', '读书', '。']

本地推理 Demo

为了更好的展现推理效果,本项目将 PaddleTokenizer 做成了 Server+Browser 的 demo 形式

启动 PaddleTokenizer 的 HTTP 服务器

python ./http_server.py

将会在在 localhost:5000 启动一个 HTTP 服务器,用户可以通过该 HTTP 端口使用 PaddleTokenizer,注意该网址只提供 API 不提供界面。

启动前端服务器

bash ./UI.sh

将会在 localhost:8000 启动前端服务器,用户可以访问该网址,

访问前端页面

打开页面:http://127.0.0.1:8000 即可,效果如下:

LICENSE

AGPL-3.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].