howl-anderson / PaddleTokenizer

Licence: AGPL-3.0 License

使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎 | A DNN Chinese Tokenizer by Using PaddlePaddle

Programming Languages

javascript

184084 projects - #8 most used programming language

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to PaddleTokenizer

PLSC

Paddle Large Scale Classification Tools，supports ArcFace, CosFace, PartialFC, Data Parallel + Model Parallel. Model includes ResNet, ViT, DeiT, FaceViT.

Stars: ✭ 113 (+707.14%)

Mutual labels: paddle, paddlepaddle

Paddle-PerceptualSimilarity

LPIPS metric on PaddlePaddle. pip install paddle-lpips

Stars: ✭ 22 (+57.14%)

Mutual labels: paddle, paddlepaddle

Paddle-Adversarial-Toolbox

Paddle-Adversarial-Toolbox (PAT) is a Python library for Deep Learning Security based on PaddlePaddle.

Stars: ✭ 16 (+14.29%)

Mutual labels: paddle, paddlepaddle

chinese-tokenizer

Tokenizes Chinese texts into words.

Stars: ✭ 72 (+414.29%)

Mutual labels: tokenizer, chinese

Paddle-SEQ

低代码序列数据处理框架，最短两行即可完成训练任务！

Stars: ✭ 13 (-7.14%)

Mutual labels: paddle, paddlepaddle

PaddlePaddle-Tutorial

PaddlePaddle Tutorial for Deep Learning Researchers.

Stars: ✭ 27 (+92.86%)

Mutual labels: paddlepaddle

alfred-chinese-converter

支持 OpenCC 簡繁體中文詞彙級別轉換、異體字轉換以及地區習慣用詞轉換的 Alfred 2 workflow

Stars: ✭ 42 (+200%)

Mutual labels: chinese

Functional-Light-JS-Zh

《Functional-Light-JS》中文翻译

Stars: ✭ 14 (+0%)

Mutual labels: chinese

wasm-cn

[翻译中] WebAssembly 中文文档

Stars: ✭ 22 (+57.14%)

Mutual labels: chinese

chinese-nlp-ner

一套针对中文实体识别的BLSTM-CRF解决方案

Stars: ✭ 14 (+0%)

Mutual labels: chinese

hzk-pixel-font

中文像素字体，12 和 16 像素。

Stars: ✭ 14 (+0%)

Mutual labels: chinese

LightLM

高性能小模型测评 Shared Tasks in NLPCC 2020. Task 1 - Light Pre-Training Chinese Language Model for NLP Task

Stars: ✭ 54 (+285.71%)

Mutual labels: chinese

Smart container

🍰🍎ColugoMum--Intelligent Retail Settlement Platform can accurately locate and identify each commodity, and can return a complete shopping list and the actual total price of commodities that customers should pay.

Stars: ✭ 141 (+907.14%)

Mutual labels: paddlepaddle

bredon

A modern CSS value compiler in JavaScript

Stars: ✭ 39 (+178.57%)

Mutual labels: tokenizer

pbrtbook

pbrt 中文整合翻译基于物理的渲染：从理论到实现 Physically Based Rendering: From Theory To Implementation

Stars: ✭ 221 (+1478.57%)

Mutual labels: chinese

WMPoetry

The source codes of Working Memory model for Chinese poetry generation (IJCAI 2018).

Stars: ✭ 49 (+250%)

Mutual labels: chinese

neural network papers

记录一些读过的论文，给出个人对论文的评分情况并简述论文insight

Stars: ✭ 152 (+985.71%)

Mutual labels: chinese

kaldi-timit-sre-ivector

Develop speaker recognition model based on i-vector using TIMIT database

Stars: ✭ 17 (+21.43%)

Mutual labels: chinese

fishing-funds

基金,大盘,股票,虚拟货币状态栏显示小应用,基于Electron开发,支持MacOS,Windows,Linux客户端,数据源来自天天基金,蚂蚁基金,爱基金,腾讯证券,新浪基金等

Stars: ✭ 424 (+2928.57%)

Mutual labels: chinese

simplemma

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Stars: ✭ 32 (+128.57%)

Mutual labels: tokenizer

View All Similar Projects ➔

PaddleTokenizer

使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎

神经网络结构

BiLSTM+CRF

环境要求

Python 3.5+

安装

从 PYPI 安装：

pip install paddle_tokenizer

或者本地安装：

pip install -e ./

预训练模型

下载 release 中的模型文件，解压缩后将目录 test.inference.model 放置到本项目的根目录下即可。

本地训练模型

下载数据

见人民日报语料处理工具集，将 conll 格式的 train.txt 和 test.txt 放到 data 目录下。

训练

python -m paddle_tokenizer.train

在 2017 款 MacBook Pro (2.5 GHz Intel Core i7) 上训练耗时约十分钟。

命令行使用

from paddle_tokenizer.server import server

result = server("王小明在北京的清华大学读书。")
print(result)

输出：

['王', '小明', '在', '北京', '的', '清华', '大学', '读书', '。']

本地推理 Demo

为了更好的展现推理效果，本项目将 PaddleTokenizer 做成了 Server+Browser 的 demo 形式

启动 PaddleTokenizer 的 HTTP 服务器

python ./http_server.py

将会在在 localhost:5000 启动一个 HTTP 服务器，用户可以通过该 HTTP 端口使用 PaddleTokenizer，注意该网址只提供 API 不提供界面。

启动前端服务器

bash ./UI.sh

将会在 localhost:8000 启动前端服务器，用户可以访问该网址，

访问前端页面

打开页面：http://127.0.0.1:8000 即可，效果如下:

LICENSE

AGPL-3.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

howl-anderson / PaddleTokenizer

Programming Languages

Labels

Projects that are alternatives of or similar to PaddleTokenizer

PaddleTokenizer

神经网络结构

环境要求

安装

预训练模型

本地训练模型

下载数据

训练

命令行使用

本地推理 Demo

启动 PaddleTokenizer 的 HTTP 服务器

启动前端服务器

访问前端页面

LICENSE