All Projects → crownpku → Somiao Pinyin

crownpku / Somiao Pinyin

Somiao Pinyin: Train your own Chinese Input Method with Seq2seq Model 搜喵拼音输入法

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Somiao Pinyin

rust-pinyin
汉字转拼音
Stars: ✭ 111 (-46.89%)
Mutual labels:  pinyin, chinese
Xpinyin
Translate Chinese hanzi to pinyin (拼音) by Python, 汉字转拼音
Stars: ✭ 709 (+239.23%)
Mutual labels:  chinese, pinyin
Python Pinyin
汉字转拼音(pypinyin)
Stars: ✭ 3,618 (+1631.1%)
Mutual labels:  chinese, pinyin
chinese-rhymer
轻量中文押韵神器,100%绝对可用,傻瓜式命令行操作,秒速实现烈焰单押,闪电双押,龙卷三押以及海啸式四押,目前版本 v0.2.6。Search for rhymes for Chinese words, with 1, 2, 3 and 4 characters, released on PyPI with current version of 0.2.6.
Stars: ✭ 72 (-65.55%)
Mutual labels:  pinyin, chinese
Cn sort
中文排序:按拼音/笔顺快速排序简体中文词组(百万数量级,可含中英/多音字)。如果对您有所帮助,欢迎点个star鼓励一下。
Stars: ✭ 102 (-51.2%)
Mutual labels:  chinese, pinyin
pinyin4js
A opensource javascript library for converting chinese to pinyin。welcome Star : P
Stars: ✭ 153 (-26.79%)
Mutual labels:  pinyin, chinese
Pinyin
🇨🇳 汉字拼音 ➜ hàn zì pīn yīn
Stars: ✭ 6,047 (+2793.3%)
Mutual labels:  chinese, pinyin
Chineseutil
PHP 中文工具包,支持汉字转拼音、拼音分词、简繁互转、数字、金额大写;QQ群:17916227
Stars: ✭ 413 (+97.61%)
Mutual labels:  chinese, pinyin
Hallelujahim
hallelujahIM(哈利路亚 英文输入法) is an intelligent English input method with auto-suggestions and spell check features, Mac only.
Stars: ✭ 1,334 (+538.28%)
Mutual labels:  input-method, pinyin
Go Pinyin
汉字转拼音
Stars: ✭ 907 (+333.97%)
Mutual labels:  chinese, pinyin
pinyin data
🐼 Easy to use and portable pronunciation data for Hanzi characters.
Stars: ✭ 13 (-93.78%)
Mutual labels:  pinyin, chinese
Gpy
Go 语言汉字转拼音工具
Stars: ✭ 136 (-34.93%)
Mutual labels:  chinese, pinyin
rime-wugniu zaonhe
上海吳語拼音輸入方案 · 上海吴语拼音输入方案 · Rime input schemas for Shanghai Dialects
Stars: ✭ 20 (-90.43%)
Mutual labels:  input-method, chinese
syng
A free, open source, cross-platform, Chinese-To-English dictionary for desktops.
Stars: ✭ 108 (-48.33%)
Mutual labels:  pinyin, chinese
hanzi-pinyin-font
Chinese font displaying Hanzi (汉字) characters with by transliteration/pronunciation (Pīnyīn).
Stars: ✭ 79 (-62.2%)
Mutual labels:  pinyin, chinese
Hanbaobao
Mandarin Chinese text segmentation and mobile dictionary Android app (中文分词)
Stars: ✭ 17 (-91.87%)
Mutual labels:  chinese, pinyin
Chinese rime
收集現代漢語方言和古漢語的中州韻輸入法拼音方案 Collection of phonetic spelling schemas for Sinitic languages and dialects
Stars: ✭ 118 (-43.54%)
Mutual labels:  chinese, input-method
Jszhuyin
JS 注音:JavaScript 自動選字注音輸入法;"Smart" Chinese Zhuyin Input Method in JavaScript.
Stars: ✭ 184 (-11.96%)
Mutual labels:  chinese, input-method
Leetcode Python
LeetCode solutions in Python2. LeetCode题解 in Python2。
Stars: ✭ 182 (-12.92%)
Mutual labels:  chinese
Chinese To Pinyin
一个将中文翻译成拼音的库
Stars: ✭ 199 (-4.78%)
Mutual labels:  pinyin

Somiao Pinyin: Train your own Chinese Input Method with Seq2seq Model

中文Blog

Personalized Chinese Pinyin Input Method with Seq2seq model

Original code in https://github.com/Kyubyong/neural_chinese_transliterator for research purpose.

This repository intends to experiment with different training data and interactive user inputs, and possibly develop towards a real data-personalized and model-localized Pinyin Input product.

Requrements

  • Python (>=3.5)

  • TensorFlow (>=r1.2)

  • xpinyin (for Chinese pinyin annotation)

  • distance (for calculating the similarity score between two strings)

  • tqdm

Usage

Training:

  • STEP 1. Download Leipzig Chinese Corpus

    Extract it and copy zho_news_2007-2009_1M-sentences.txt to data/ folder.

    Or use your own Chinese Corpus with the same format.

  • STEP 2. Build a Pinyin-Chinese parallel corpus.

#python3 build_corpus.py
  • STEP 3. Run prepro.py to make vocabulary and training data.
#python3 prepro.py
  • STEP 4. Adjust hyperparameters in hyperparams.py if necessary.

  • STEP 5. Train the model

#python3 train.py

Inference with command line input:

For command line input testing, run:

python3 eval.py

You may change the main function name to use the original testing data evaluation.

Testing with pre-trained models:

Download the pre-trained model from blog, unzip it to generate /log and /data.

Remember to overwrite the pickle files in /data with the pre-trained model data.

Then run for command line input testing:

python3 eval.py

Sample Results

Model is trained from Chinese News in 2007-2009. So many now common Chinese sayings are not learned.

请输入测试拼音:nihao
你好

请输入测试拼音:chenggongle
成功了

请输入测试拼音:wolegequ
我了个曲

请输入测试拼音:taibangla
太棒啦

请输入测试拼音:dacolehuizenmeyang
打破了会怎么样

请输入测试拼音:pujinghehujintaotongdianhua
普京和胡锦涛通电话

请输入测试拼音:xiangbuqilaishinianqianfashengleshenme
想不起来十年前发生了什么

请输入测试拼音:meiguohongzhawomenzainansilafudedashiguan
美国轰炸我们在南斯拉夫的大事馆

请输入测试拼音:liudehuanageshihouhaonianqing
刘德华那个时候好年轻

请输入测试拼音:shishihouxunlianyixiabilibilideyuliaole
是时候训练一下比例比例的预料了

TODOLIST

  • Pretrained models on different contexts

  • Model selection for using different models while input different things (chatting? writing scientific papers? etc...)

  • Function to record LOCALLY what user has input as personalized corpus

  • User Interface

  • ...

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].