All Projects → xiaohk → pinyin_data

xiaohk / pinyin_data

Licence: MIT license
🐼 Easy to use and portable pronunciation data for Hanzi characters.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to pinyin data

rust-pinyin
汉字转拼音
Stars: ✭ 111 (+753.85%)
Mutual labels:  pinyin, chinese
Xpinyin
Translate Chinese hanzi to pinyin (拼音) by Python, 汉字转拼音
Stars: ✭ 709 (+5353.85%)
Mutual labels:  pinyin, chinese
Python Pinyin
汉字转拼音(pypinyin)
Stars: ✭ 3,618 (+27730.77%)
Mutual labels:  pinyin, chinese
chinese-rhymer
轻量中文押韵神器,100%绝对可用,傻瓜式命令行操作,秒速实现烈焰单押,闪电双押,龙卷三押以及海啸式四押,目前版本 v0.2.6。Search for rhymes for Chinese words, with 1, 2, 3 and 4 characters, released on PyPI with current version of 0.2.6.
Stars: ✭ 72 (+453.85%)
Mutual labels:  pinyin, chinese
Gpy
Go 语言汉字转拼音工具
Stars: ✭ 136 (+946.15%)
Mutual labels:  pinyin, chinese
syng
A free, open source, cross-platform, Chinese-To-English dictionary for desktops.
Stars: ✭ 108 (+730.77%)
Mutual labels:  pinyin, chinese
Pinyin
🇨🇳 汉字拼音 ➜ hàn zì pīn yīn
Stars: ✭ 6,047 (+46415.38%)
Mutual labels:  pinyin, chinese
pinyin4js
A opensource javascript library for converting chinese to pinyin。welcome Star : P
Stars: ✭ 153 (+1076.92%)
Mutual labels:  pinyin, chinese
Cn sort
中文排序:按拼音/笔顺快速排序简体中文词组(百万数量级,可含中英/多音字)。如果对您有所帮助,欢迎点个star鼓励一下。
Stars: ✭ 102 (+684.62%)
Mutual labels:  pinyin, chinese
Go Pinyin
汉字转拼音
Stars: ✭ 907 (+6876.92%)
Mutual labels:  pinyin, chinese
huozi.js
A simple typography engine for CJK languages, especially designed for game rich-text. 用于游戏富文本的中日韩文字排印引擎。
Stars: ✭ 135 (+938.46%)
Mutual labels:  cjk, chinese
hanzi-pinyin-font
Chinese font displaying Hanzi (汉字) characters with by transliteration/pronunciation (Pīnyīn).
Stars: ✭ 79 (+507.69%)
Mutual labels:  pinyin, chinese
Chineseutil
PHP 中文工具包,支持汉字转拼音、拼音分词、简繁互转、数字、金额大写;QQ群:17916227
Stars: ✭ 413 (+3076.92%)
Mutual labels:  pinyin, chinese
Hanbaobao
Mandarin Chinese text segmentation and mobile dictionary Android app (中文分词)
Stars: ✭ 17 (+30.77%)
Mutual labels:  pinyin, chinese
Somiao Pinyin
Somiao Pinyin: Train your own Chinese Input Method with Seq2seq Model 搜喵拼音输入法
Stars: ✭ 209 (+1507.69%)
Mutual labels:  pinyin, chinese
ark-pixel-font
Open source Pan-CJK pixel font / 开源的泛中日韩像素字体
Stars: ✭ 1,767 (+13492.31%)
Mutual labels:  cjk, chinese
NLPDataAugmentation
Chinese NLP Data Augmentation, BERT Contextual Augmentation
Stars: ✭ 94 (+623.08%)
Mutual labels:  chinese
Vanhiupun.github.io
🏖️ Vanhiupun's Awesome Site ==> another theme for elegant writers with modern flat style and beautiful night/dark mode.
Stars: ✭ 57 (+338.46%)
Mutual labels:  chinese
next-qrcode
React hooks for generating QRCode for your next React apps.
Stars: ✭ 87 (+569.23%)
Mutual labels:  chinese
langx-java
Java tools, helper, common utilities. A replacement of guava, apache-commons, hutool
Stars: ✭ 50 (+284.62%)
Mutual labels:  pinyin

Pinyin Data

Easy to use and portable pronunciation data for Hanzi characters.

Pinyin

./pinyin/pinyin.json and ./pinyin/pinyin.yaml contain the same Pinyin records for 41216 Hanzi characters (both traditional and simplified).

Each file is a dictionary mapping Hanzi character to a list of Pinyin's.

{'长' : ['zhǎng', 'cháng'],
 '長' : ['zhǎng', 'cháng', 'zhàng']}
  • First element of the Pinyin list is the most frequently used pronunciation.
  • All Pinyin records are from kMandarin, kXHC1983("现代汉语词典"), kHanyuPinlu("现代汉语频率词典"), kHanyuPinyin("汉语大字典") feilds of Unihan reading database.
  • Unihan reading database version: 2016-06-01 07:01:48 GMT

Polyphone

Some Hanzi characters have multiple pronunciation, ./polyphone/polyphone.json and ./polyphone/polyphone.yaml are used to map the particular pronunciation to corresponding word context.

Each file is a dictionary mapping Hanzi character to an inner dictionary. The inner dictionary map Pinyin to a list containing three lists of words. Three lists contain the words where the Hanzi character is at the beginning, in the middle or at the end.

{'会': {huì:[['会合'], [], ['都会']],
        kuài:[['会计'], [], ['财会']]}}

In this version, all polyphone data are parsed from this website. The overall coverage is still limited, so you are more than welcome to add more example words and entries into the polyphone collection.

  1. You can parse data from other websites and add non-duplicate words into the Polyphone dictionary using the same structure. Just a heads up, there might be lots of errors on the websites.
  2. You can simply add new words into the correct list in ./polyphone/polyphone.yaml, then run ./parse/update_json.py to sync it to ./polyphone/polyphone.json.

Use

Clone the git, then copy the interested data to your project.

Use of the Pinyin information should follow Unicode® Terms of Use. Other codes use MIT licence.

TODO List

  • Add Jyuping records

How to Contribute:

  1. Create an issue.
  2. Add words into Polyphone collection, fix bugs, add features, then pull request.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].