All Projects → polm → Cutlet

polm / Cutlet

Licence: mit
Japanese to romaji converter in Python

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Cutlet

Memorize
🚀 Japanese-English-Mongolian dictionary. It lets you find words, kanji and more quickly and easily
Stars: ✭ 72 (-41.94%)
Mutual labels:  japanese
Jconv
Pure-JavaScript converter for Japanese character encodings.
Stars: ✭ 91 (-26.61%)
Mutual labels:  japanese
Topokanji
Topologically ordered lists of kanji for effective learning
Stars: ✭ 108 (-12.9%)
Mutual labels:  japanese
Risingstars2016
A complete overview of the JavaScript landscape in 2016: trends about front-end and node.js frameworks, tooling... Available in English, Japanese and Chinese.
Stars: ✭ 75 (-39.52%)
Mutual labels:  japanese
Cheatsheet Of Ui With Fuzzy Behaviors
挙動や仕様が曖昧なユーザインタフェースチートシート
Stars: ✭ 89 (-28.23%)
Mutual labels:  japanese
Toiro
A comparison tool of Japanese tokenizers
Stars: ✭ 95 (-23.39%)
Mutual labels:  japanese
Japanesetab
A Chrome extension that helps you learn Japanese with every new tab 🔴
Stars: ✭ 55 (-55.65%)
Mutual labels:  japanese
Ichiran
Linguistic tools for texts in Japanese language
Stars: ✭ 120 (-3.23%)
Mutual labels:  japanese
Epub Manga Creator
a web GUI for create japanese epub manga
Stars: ✭ 90 (-27.42%)
Mutual labels:  japanese
Languagepod101 Scraper
Python scraper for Language Pods such as Japanesepod101.com 👹 🗾 🍣 Compatible with Japanese, Chinese, French, German, Italian, Korean, Portuguese, Russian, Spanish and many more! ✨
Stars: ✭ 104 (-16.13%)
Mutual labels:  japanese
Awesome Bert Japanese
📝 A list of pre-trained BERT models for Japanese with word/subword tokenization + vocabulary construction algorithm information
Stars: ✭ 76 (-38.71%)
Mutual labels:  japanese
Qolibri
Continuation of the qolibri EPWING dictionary/book reader
Stars: ✭ 82 (-33.87%)
Mutual labels:  japanese
Nodejs Ja
Node.js 日本語ローカリゼーション
Stars: ✭ 98 (-20.97%)
Mutual labels:  japanese
Sample Boot Micro
Spring Cloud + Gradle Multi Project + Java8
Stars: ✭ 72 (-41.94%)
Mutual labels:  japanese
Posuto
🏣📮〠 Japanese postal code data.
Stars: ✭ 109 (-12.1%)
Mutual labels:  japanese
Kana
Golang library for conversion between Japanese hiragana, katakana and romaji
Stars: ✭ 68 (-45.16%)
Mutual labels:  japanese
The Tab Of Words
A minimal Chrome / Firefox extension to help you learn Japanese words in each new tab.
Stars: ✭ 94 (-24.19%)
Mutual labels:  japanese
Gse
Go efficient multilingual NLP and text segmentation; support english, chinese, japanese and other. Go 高性能多语言 NLP 和分词
Stars: ✭ 1,695 (+1266.94%)
Mutual labels:  japanese
Textlint Rule Preset Jtf Style
JTF日本語標準スタイルガイド for textlint.
Stars: ✭ 112 (-9.68%)
Mutual labels:  japanese
Source Han Code Jp
Source Han Code JP | 源ノ角ゴシック Code
Stars: ✭ 1,362 (+998.39%)
Mutual labels:  japanese

Open in Streamlit Current PyPI packages

cutlet

cutlet by Irasutoya

Cutlet is a tool to convert Japanese to romaji. Check out the interactive demo!

issueを英語で書く必要はありません。

Features:

  • support for Modified Hepburn, Kunreisiki, Nihonsiki systems
  • custom overrides for individual mappings
  • custom overrides for specific words
  • built in exceptions list (Tokyo, Osaka, etc.)
  • uses foreign spelling when available in UniDic
  • proper nouns are capitalized
  • slug mode for url generation

Things not supported:

  • traditional Hepburn n-to-m: Shimbashi
  • macrons or circumflexes: Tōkyō, Tôkyô
  • passport Hepburn: Satoh (but you can use an exception)
  • hyphenating words
  • Traditional Hepburn in general is not supported

Internally, cutlet uses fugashi, so you can use the same dictionary you use for normal tokenization.

Installation

Cutlet can be installed through pip as usual.

pip install cutlet

Note that if you don't have a MeCab dictionary installed you'll also have to install one. If you're just getting started unidic-lite is probably fine.

pip install unidic-lite

Usage

A command-line script is included for quick testing. Just use cutlet and each line of stdin will be treated as a sentence. You can specify the system to use (hepburn, kunrei, nippon, or nihon) as the first argument.

$ cutlet
ローマ字変換プログラム作ってみた。
Roma ji henkan program tsukutte mita.

In code:

import cutlet
katsu = cutlet.Cutlet()
katsu.romaji("カツカレーは美味しい")
# => 'Cutlet curry wa oishii'

# you can print a slug suitable for urls
katsu.slug("カツカレーは美味しい")
# => 'cutlet-curry-wa-oishii'

# You can disable using foreign spelling too
katsu.use_foreign_spelling = False
katsu.romaji("カツカレーは美味しい")
# => 'Katsu karee wa oishii'

# kunreisiki, nihonsiki work too
katu = cutlet.Cutlet('kunrei')
katu.romaji("富士山")
# => 'Huzi yama'

# comparison
nkatu = cutlet.Cutlet('nihon')

sent = "彼女は王への手紙を読み上げた。"
katsu.romaji(sent)
# => 'Kanojo wa ou e no tegami wo yomiageta.'
katu.romaji(sent)
# => 'Kanozyo wa ou e no tegami o yomiageta.'
nkatu.romaji(sent)
# => 'Kanozyo ha ou he no tegami wo yomiageta.'

Alternatives

  • kakasi: Historically important, but not updated since 2014.
  • pykakasi: self contained, it does segmentation on its own and uses its own dictionary.
  • kuroshiro: Javascript based.
  • kana: Go based.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].