All Projects → hakatashi → Japanese.js

hakatashi / Japanese.js

Licence: mit
Util collection for Japanese text processing. Hiraganize, Katakanize, and Romanize.

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Japanese.js

Konoha
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
Stars: ✭ 130 (-13.33%)
Mutual labels:  japanese, text-processing
Browsecloud
A web app to create and browse text visualizations for automated customer listening.
Stars: ✭ 143 (-4.67%)
Mutual labels:  text-processing
Prenlp
Preprocessing Library for Natural Language Processing
Stars: ✭ 130 (-13.33%)
Mutual labels:  text-processing
Packagehunter
📥 [Android Library] Hunt down all package information
Stars: ✭ 137 (-8.67%)
Mutual labels:  utility
Calypsobot
A fully customizable bot built with discord.js
Stars: ✭ 131 (-12.67%)
Mutual labels:  utility
Pep8 Ja
PEP8 日本語版
Stars: ✭ 138 (-8%)
Mutual labels:  japanese
Libasciidoc
A Golang library for processing Asciidoc files.
Stars: ✭ 129 (-14%)
Mutual labels:  text-processing
Fastify Sensible
Defaults for Fastify that everyone can agree on
Stars: ✭ 147 (-2%)
Mutual labels:  utility
Stanza Old
Stanford NLP group's shared Python tools.
Stars: ✭ 142 (-5.33%)
Mutual labels:  text-processing
Jprocessing
Japanese Natural Langauge Processing Libraries
Stars: ✭ 135 (-10%)
Mutual labels:  japanese
React Sizeme
Make your React Components aware of their width and height!
Stars: ✭ 1,770 (+1080%)
Mutual labels:  utility
Nest Emitter
Strongly 💪🏼 Typed Eventemitter Module For Nestjs Framework 🦁
Stars: ✭ 133 (-11.33%)
Mutual labels:  utility
Fac
Easy-to-use CUI for fixing git conflicts
Stars: ✭ 1,738 (+1058.67%)
Mutual labels:  utility
Wallutils
🌆 Utilities for handling monitors, resolutions, wallpapers and timed wallpapers
Stars: ✭ 145 (-3.33%)
Mutual labels:  utility
Tmtoolkit
Text Mining and Topic Modeling Toolkit for Python with parallel processing power
Stars: ✭ 135 (-10%)
Mutual labels:  text-processing
Asciigraph
Go package to make lightweight ASCII line graph ╭┈╯ in command line apps with no other dependencies.
Stars: ✭ 1,805 (+1103.33%)
Mutual labels:  utility
Musubii
Simple CSS Framework for JP
Stars: ✭ 138 (-8%)
Mutual labels:  japanese
Xioc
Extract indicators of compromise from text, including "escaped" ones.
Stars: ✭ 148 (-1.33%)
Mutual labels:  text-processing
Georaptor
Python Geohash Compression Tool
Stars: ✭ 143 (-4.67%)
Mutual labels:  utility
Shpotify
A command-line interface to Spotify.
Stars: ✭ 1,782 (+1088%)
Mutual labels:  utility

japanese.js NPM version Build Status Coverage Status Dependency Status Greenkeeper badge

Util collection for Japanese text processing. Hiraganize, Katakanize, and Romanize.

Install

$ npm install --save japanese

Usage

var japanese = require('japanese');

japanese.hiraganize('ヱヴァンゲリヲン');

For crazy syntax sugar junkies:

var japanese = require('japanese/sugar');

'ヱヴァンゲリヲン'.hiraganize();

Command

Command Line Interface is also available.

$ npm install japanese -g
$ japanese

  Util collection for Japanese text processing. Hiraganize, Katakanize, and Romanize.

  Usage:
    japanese <input> [options]

  Options:
    -h, --hiraganize   hiraganize input string
    -k, --katakanize   katakanize input string
    -r, --romanize     romanize input string

  Example
    japanese ヱヴァンゲリヲン --hiraganize

API

japanese.hiraganize(text)

Convert input katakana into hiragana.

Arguments

  • text The text to hiraganize

Example

japanese.hiraganize('ヱヴァンゲリヲン');     // ゑゔぁんげりをん
japanese.hiraganize('チヨコバナヽ');         // ちよこばなゝ
japanese.hiraganize('ヹルタースオリジナル'); // ゑ゙るたーすおりじなる
japanese.hiraganize('板垣死ス𪜈');           // 板垣死すとも

japanese.katakanize(text)

Convert input hiragana into katakana.

Arguments

  • text The text to katakanize

Example

japanese.katakanize('抹茶あいす');       // 抹茶アイス
japanese.katakanize('ばゞへらあいす');   // バヾヘラアイス
japanese.katakanize('ゐ゙よろん');         // ヸヨロン
japanese.katakanize('本日ゟかき氷解禁'); // 本日ヨリカキ氷解禁

japanese.romanize(text[, config])

Convert input text into romaji.

important: Most definitions of Japanese text romanizations require total recognition of Japanese text, but robots cannot actually think or understand! Some conversions are hopelessly poor. For example, ISO 3602 defines that "こうし" which means "講師" must be romanized as "kôsi", while "こうし" which means "子牛" must be romanized as "kousi" (because 子牛 is mixed word of 子 and 牛), though these are apparently the same in Kana-form. While japanese.js is very... very very thoroughly tested, this module (and any other romanization machines) cannot distinguish between these semantics. So unfortunately, you cannot use this function for official writing or something. Ugh.

Arguments

  • text The text to romanize
  • config The configuration object or string used to romanize. Described below.

Example

japanese.romanize('れんあいかんじょう');       // ren'aikanjō
japanese.romanize('ツァトゥグァ');             // tsatugwa
japanese.romanize('くうぼをきゅう', 'kunrei'); // kûbookyû
japanese.romanize('でんぢゃらす', 'nihon');    // dendyarasu
japanese.romanize('いいづか とおる', {
	'いい': 'ii',
	'おお': 'oh',
});                                            // iizuka tohru

Configs

Config is represented as plain object, where object keys stand for a collection of similar characters, and the value determines how these characters are converted. So the object is not just the same as a conversion table.

Available parameters are following.

Key Available Values
si, shi
ti, chi
tu, tsu
hu, fu
zi, ji
di, zi, ji, dzi, dji
du, zu, dsu, dzu
ああ aa, ah, â, ā, a
いい ii, ih, î, ī, i
うう uu, uh, û, ū, u
ええ ee, eh, ê, ē, e
おお oo, oh, ô, ō, o
あー a-, aa, ah, â, ā, a
えい ei, ee, eh, ê, ē, e
おう ou, oo, oh, ô, ō, o
んあ na, n'a, n-a
んば nba, mba
っち tti, tchi, cchi
i, wi
o, wo

You can also specify these predefined configs by supplying a string. Default is wikipedia.

'wikipedia' 'traditional hepburn' 'modified hepburn' 'kunrei' 'nihon'
shi shi shi si si
chi chi chi ti ti
tsu tsu tsu tu tu
fu fu fu hu hu
ji ji ji zi zi
ji ji ji zi di
zu zu zu zu du
ああ aa aa ā â ā
いい ii ii ii î ī
うう ū ū ū û ū
ええ ee ee ē ê ē
おお ō ō ō ô ō
あー ā ā ā â ā
えい ei ei ei ei ei
おう ō ō ō ô ō
んあ n'a n-a n'a n'a n'a
んば nba mba nba nba nba
っち tchi tchi tchi tti tti
i i i i wi
o wo o o wo

And here are short notes about these romanizations.

Wikipedia style

Source: http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Japan-related_articles#Romanization

The most modern and widely used form of romanization. Wikipedia uses this guideline to name their article title and text. This is mixed version of traditional and modified Hepburn and easily recognizable for everyone.

Traditional and Modified Hepburn

Source: http://en.wikipedia.org/wiki/Hepburn_romanization

Actually this is not a specification. Hepburn romanization is very widely known but nobody other than Hepburn knows the REAL definition of these method.

Kunrei-shiki and Nihon-shiki

Source: http://www.iso.org/iso/catalogue_detail.htm?csnumber=9029

Kunrei-shiki is defined as ISO 9029 and Nihon-shiki as ISO 9209 Strict. These romanizations are today kind of obsolete but still the only standardized romanization in the world.

Roadmap

  • japanese.deromanize()
  • japanese.cyrillize()
  • japanese.decyrillize()
  • japanese.hangulize()
  • japanese.dehangulize()
  • japanese.arabize()
  • japanese.dearabize()
  • japanese.gyarumojize()
  • japanese.isKatakana()
  • japanese.isHiragana()
  • japanese.isKanji()
  • japanese.isJoyoKanji()
  • japanese.isKinsoku() (JIS X 4051 compatibility is preferred)
  • CLI
    • --input <file> and --output <file> option
    • japanese --hiraganize <string> to work

...and any proposal or idea for enhancing japanese.js is welcomed! Tell me, tell me, tell me!

License

MIT © hakatashi

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].