All Projects → Kyubyong → Neural_chinese_transliterator

Kyubyong / Neural_chinese_transliterator

Can CNNs transliterate Pinyin into Chinese characters correctly?

Programming Languages

python
139335 projects - #7 most used programming language
language
365 projects

Projects that are alternatives of or similar to Neural chinese transliterator

Hallelujahim
hallelujahIM(哈利路亚 英文输入法) is an intelligent English input method with auto-suggestions and spell check features, Mac only.
Stars: ✭ 1,334 (+330.32%)
Mutual labels:  keyboard, pinyin
Apnumberpad
Full clone of iOS number keyboard with the customizable function button.
Stars: ✭ 298 (-3.87%)
Mutual labels:  keyboard
Vue Touch Keyboard
Virtual keyboard component for Vue.js 2.x. Designed to Raspberry Pi Touch Display
Stars: ✭ 255 (-17.74%)
Mutual labels:  keyboard
Symon
Tiny graphical system monitor
Stars: ✭ 274 (-11.61%)
Mutual labels:  keyboard
Chrysalis
Graphical configurator for Kaleidoscope-powered keyboards
Stars: ✭ 261 (-15.81%)
Mutual labels:  keyboard
React Hotkeys
React component to listen to keydown and keyup keyboard events, defining and dispatching keyboard shortcuts.
Stars: ✭ 279 (-10%)
Mutual labels:  keyboard
Image-Sort
Sorts your image at high speed
Stars: ✭ 15 (-95.16%)
Mutual labels:  keyboard
Markovkeyboard
keyboard layout that changes by markov frequency
Stars: ✭ 307 (-0.97%)
Mutual labels:  keyboard
Whc keyboardmanager
IOS lightweight keyboard manager, use simple and powerful, the keyboard will never block input controls. iOS平台轻量级的键盘管理器,使用简单功能强大,键盘再也不会挡住输入控件
Stars: ✭ 296 (-4.52%)
Mutual labels:  keyboard
Chinesepinyin Codecompletionhelper
让你的 JetBrains 系 IDE ( IDEA ,PyCharm,PhpStorm,WebStorm,AndroidStudio,DevEco等 )支持中文标识符以拼音输入方式完成代码补全,享受和英文环境一致的中文智能编码体验,为代码表达提供更多选择,一种值得考虑的折中解决方案
Stars: ✭ 262 (-15.48%)
Mutual labels:  pinyin
Cordova Plugin Native Keyboard
🎹 Add a Slack / WhatsApp - style chat keyboard to your Cordova app!
Stars: ✭ 271 (-12.58%)
Mutual labels:  keyboard
Rogauracore
RGB keyboard control for Asus ROG laptops
Stars: ✭ 263 (-15.16%)
Mutual labels:  keyboard
Spacehammer
Hammerspoon config inspired by Spacemacs
Stars: ✭ 280 (-9.68%)
Mutual labels:  keyboard
Phrase Pinyin Data
词语拼音数据
Stars: ✭ 257 (-17.1%)
Mutual labels:  pinyin
Tinypinyin
适用于Java和Android的快速、低内存占用的汉字转拼音库。
Stars: ✭ 3,348 (+980%)
Mutual labels:  pinyin
react-keyboard-shortcuts
A declarative library for handling hotkeys based on explicit priority in React applications
Stars: ✭ 23 (-92.58%)
Mutual labels:  keyboard
React Event Components
🛰 A set of React components designed to handle global events (interval, keyboard, touch, mouse, etc)
Stars: ✭ 271 (-12.58%)
Mutual labels:  keyboard
Digital Keyboard
⌨️ Digital Keyboard 数字键盘
Stars: ✭ 275 (-11.29%)
Mutual labels:  keyboard
Python Pinyin
汉字转拼音(pypinyin)
Stars: ✭ 3,618 (+1067.1%)
Mutual labels:  pinyin
React Simple Keyboard
React Virtual Keyboard - Customizable, responsive and lightweight
Stars: ✭ 301 (-2.9%)
Mutual labels:  keyboard

Neural Pinyin-to-Chinese Character Converter—can you do better than SwiftKey™ Keyboard?

In this project, we examine how well neural networks can convert Pinyin, the official romanization system for Chinese, into Chinese characters.

Requirements

  • numpy >= 1.11.1
  • TensorFlow >= 1.2.
  • xpinyin (for Chinese pinyin annotation)
  • distance (for calculating the similarity score between two strings)
  • tqdm

Background

  • Because Chinese characters are not phonetic, various solutions have been suggested in order to type them in the digital environment. The most popular one is to use Pinyin, the official romanization system for Chinese. When people write in Chinese using smartphones, they usually type Pinyin, expecting the word(s) to appear magically on the suggestion bar. Accordingly, how accurately an engine can predict the word(s) the user has in mind is crucial in a Chinese keyboard.
  • Among several kinds in the Chinese keyboard, the major two are Qwerty keyboard and Nine keyboard (See the animations on the right. One is typing “woaini” to write 我爱你, which means “I love you.” Qwerty is on the left, and Nine is on the right). While in Qwerty each alphabet is associated with one independent space in the former, in Nine the machine is responsible for determining the one the user intended out of 3-4 grouped alphabets. Not surprisingly, it is more challenging to transliterate in Nine than in Qwerty.

Problem Formulation

I frame the problem as a labelling task. In other words, every pinyin character is associated with a Chinese character or _ which means a blank.

Inputs: woaini。
Outputs: 我_爱_你_。

Data

  • For training, we used Leipzig Chinese Corpus.
  • For evaluation, 1000 Chinese sentences were collected. See data/input.csv.

Model Architecture

Training

  • STEP 1. Download Leipzig Chinese Corpus.
  • STEP 2. Extract it and copy zho_news_2007-2009_1M-sentences.txt to data/ folder.
  • STEP 3. Run build_corpus.py to build a Pinyin-Chinese parallel corpus.
  • STEP 4. Run prepro.py to make vocabulary and training data.
  • STEP 5. Adjust hyperparameters in hyperparams.py if necessary.
  • STEP 6. Run train.py. Or download the pretrained files.

Evaluation

  • STEP 1. Run eval.py.
  • STEP 2. Install the latest SwiftKey keyboard app and manually test it for the same sentences. (Luckily, you don't have to because I've done it:))

Results

  • The training curve looks like this:

* The accuracy changes like this:

  • The evaluation metric is CER (Character Error Rate). Its formula is

    • edit distance / # characters = CER.
  • The following is the results after 19 (nine) or 20 (qwerty) epochs. Details are available in the eval folder.

Layout # Proposed SwiftKey 6.4.8.57
QWERTY 1203/10437=0.12 717/10437=0.07
NINE 2104/10437=0.2 1775/10437=0.17
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].