All Projects → ReubenBond → Hanbaobao

ReubenBond / Hanbaobao

Mandarin Chinese text segmentation and mobile dictionary Android app (中文分词)

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Hanbaobao

Somiao Pinyin
Somiao Pinyin: Train your own Chinese Input Method with Seq2seq Model 搜喵拼音输入法
Stars: ✭ 209 (+1129.41%)
Mutual labels:  chinese, pinyin
pinyin4js
A opensource javascript library for converting chinese to pinyin。welcome Star : P
Stars: ✭ 153 (+800%)
Mutual labels:  pinyin, chinese
hanzi-pinyin-font
Chinese font displaying Hanzi (汉字) characters with by transliteration/pronunciation (Pīnyīn).
Stars: ✭ 79 (+364.71%)
Mutual labels:  pinyin, chinese
Cn sort
中文排序:按拼音/笔顺快速排序简体中文词组(百万数量级,可含中英/多音字)。如果对您有所帮助,欢迎点个star鼓励一下。
Stars: ✭ 102 (+500%)
Mutual labels:  chinese, pinyin
Python Pinyin
汉字转拼音(pypinyin)
Stars: ✭ 3,618 (+21182.35%)
Mutual labels:  chinese, pinyin
Pinyin
🇨🇳 汉字拼音 ➜ hàn zì pīn yīn
Stars: ✭ 6,047 (+35470.59%)
Mutual labels:  chinese, pinyin
chinese-rhymer
轻量中文押韵神器,100%绝对可用,傻瓜式命令行操作,秒速实现烈焰单押,闪电双押,龙卷三押以及海啸式四押,目前版本 v0.2.6。Search for rhymes for Chinese words, with 1, 2, 3 and 4 characters, released on PyPI with current version of 0.2.6.
Stars: ✭ 72 (+323.53%)
Mutual labels:  pinyin, chinese
Gpy
Go 语言汉字转拼音工具
Stars: ✭ 136 (+700%)
Mutual labels:  chinese, pinyin
rust-pinyin
汉字转拼音
Stars: ✭ 111 (+552.94%)
Mutual labels:  pinyin, chinese
syng
A free, open source, cross-platform, Chinese-To-English dictionary for desktops.
Stars: ✭ 108 (+535.29%)
Mutual labels:  pinyin, chinese
Go Pinyin
汉字转拼音
Stars: ✭ 907 (+5235.29%)
Mutual labels:  chinese, pinyin
Limax
Node.js module to generate URL slugs. Another one? This one cares about i18n and transliterates non-Latin scripts to conform to the RFC3986 standard. Mostly API-compatible with similar modules.
Stars: ✭ 423 (+2388.24%)
Mutual labels:  pinyin, transliteration
pinyin data
🐼 Easy to use and portable pronunciation data for Hanzi characters.
Stars: ✭ 13 (-23.53%)
Mutual labels:  pinyin, chinese
unihandecode
unihandecode is a transliteration library to convert all characters/words in Unicode into ASCII alphabet that aware with Language preference priorities
Stars: ✭ 71 (+317.65%)
Mutual labels:  transliteration, chinese
Chineseutil
PHP 中文工具包,支持汉字转拼音、拼音分词、简繁互转、数字、金额大写;QQ群:17916227
Stars: ✭ 413 (+2329.41%)
Mutual labels:  chinese, pinyin
Xpinyin
Translate Chinese hanzi to pinyin (拼音) by Python, 汉字转拼音
Stars: ✭ 709 (+4070.59%)
Mutual labels:  chinese, pinyin
Ekho
Chinese text-to-speech engine
Stars: ✭ 690 (+3958.82%)
Mutual labels:  chinese
Opentracing Specification Zh
OpenTracing标准(中文版) `zh` (Chinese) translation of the opentracing/specification
Stars: ✭ 717 (+4117.65%)
Mutual labels:  chinese
Cluener2020
CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition
Stars: ✭ 689 (+3952.94%)
Mutual labels:  chinese
Slug Generator
Slug Generator Library for PHP, based on Unicode’s CLDR data
Stars: ✭ 740 (+4252.94%)
Mutual labels:  transliteration

HànBǎoBāo

Mandarin Chinese text segmentation and mobile dictionary Android app (中文分词)

I wrote this app to assist myself in learning Mandarin.

This repository consists of two parts:

  • A floating dictionary Android app which segments, transliterates, and provides dictionary definitions for Chinese text (simplified & traditional)
  • A program for building the database used by that app

Features:

  • Text Segmentation - split sentences into individual words. Tap a word multiple times to re-split.
  • Transliteration - transliterate words into their Pinyin representation.
  • Dictionary Definitions - tapping a word opens a list of dictionary definitions (CCEDict, NTI Buddhist Dictionary, ADSO, others).
  • Tone Markings - words are marked with their tone using both glyphs over the pinyin and colorization.
  • Tap to Read - tap on text in your chat app to load it into HanBaoBao.
  • Hide by HSK Level - optionally hide transliteration for all words below a given HSK level.
  • Part of Speech Tags - many words have part-of-speech and ontology tags.
  • Translation Tool - drag the icon into the translation tool to translate the sentence using Microsoft Translator or Google Translate (if installed)

The database building program compiles data from many sources and outputs a SQLite db which is read by the Android app. The database is likely useful for creating other apps and services.

The text segmentation algorithm used in the app is a custom one, but it works fairly well for my purposes, particularly since segments (words) can be resegmented by tapping on them.

Here's an older version of the app in action: https://www.youtube.com/watch?v=a9x9MBoLfxs

The app needs work to support Android 8 and some of the dictionary data is out-dated.

The dictionary data contained within is presented without license: obtain usage permission as needed.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].