All Projects → polm → unidic-py

polm / unidic-py

Licence: other
Unidic packaged for installation via pip.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to unidic-py

jiten
jiten - japanese android/cli/web dictionary based on jmdict/kanjidic — 日本語 辞典 和英辞典 漢英字典 和独辞典 和蘭辞典
Stars: ✭ 64 (+276.47%)
Mutual labels:  japanese
KWDLC
Kyoto University Web Document Leads Corpus
Stars: ✭ 64 (+276.47%)
Mutual labels:  japanese
activitypub
私家版ActivityPub日本語訳
Stars: ✭ 23 (+35.29%)
Mutual labels:  japanese
YuzuMarker
🍋 [WIP] Manga Translation Tool
Stars: ✭ 76 (+347.06%)
Mutual labels:  japanese
textlint-ja
textlintの日本語コミュニティ/ルールのアイデア
Stars: ✭ 41 (+141.18%)
Mutual labels:  japanese
kanji-web-app
Angular.js kanji web application
Stars: ✭ 45 (+164.71%)
Mutual labels:  japanese
bunkai
Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)
Stars: ✭ 154 (+805.88%)
Mutual labels:  japanese
KanjiRecognitionDictionary
Perfect for those who forgets kanji pronunciation
Stars: ✭ 14 (-17.65%)
Mutual labels:  japanese
jp-ocr-prunned-cnn
Attempting feature map prunning on a CNN trained for Japanese OCR
Stars: ✭ 15 (-11.76%)
Mutual labels:  japanese
Japanese-Words
整理日语N2单词(新标准日本语初级和中级)
Stars: ✭ 41 (+141.18%)
Mutual labels:  japanese
sakubun
A tool that helps you improve your Japanese vocabulary and kanji skills with practice that's customized to your needs.
Stars: ✭ 20 (+17.65%)
Mutual labels:  japanese
kanji
Haskell suite for determining what 級 (level) of the 漢字検定 (national Kanji exam) a given Kanji belongs to.
Stars: ✭ 19 (+11.76%)
Mutual labels:  japanese
gazou
Japanese OCR for Linux & Windows
Stars: ✭ 32 (+88.24%)
Mutual labels:  japanese
kanji poster
Poster of 2200 jōyō and WaniKani kanji
Stars: ✭ 19 (+11.76%)
Mutual labels:  japanese
Haxe-Macro-Book
Haxeのマクロ本
Stars: ✭ 20 (+17.65%)
Mutual labels:  japanese
wanikani-userscripts
Userscripts for the WaniKani.com website
Stars: ✭ 16 (-5.88%)
Mutual labels:  japanese
analyze-desumasu-dearu
文の敬体(ですます調)、常体(である調)を解析するJavaScriptライブラリ
Stars: ✭ 15 (-11.76%)
Mutual labels:  japanese
japanese-pitch-accent-resources
Trying to consolidate japanese phonetic, and in particular pitch accent resources into one list
Stars: ✭ 64 (+276.47%)
Mutual labels:  japanese
sample-ui-react
Material-UI+ React.js + Redux [ Pug / Scss / Babel ]
Stars: ✭ 15 (-11.76%)
Mutual labels:  japanese
wana kana rust
Utility library for checking and converting between Japanese characters - Hiragana, Katakana - and Romaji
Stars: ✭ 46 (+170.59%)
Mutual labels:  japanese

unidic-py

This is a version of UniDic packaged for use with pip.

Currently it supports 3.1.0, the latest version of UniDic. Note this will take up 770MB on disk after install. If you want a small package, try unidic-lite.

The data for this dictionary is hosted as part of the AWS Open Data Sponsorship Program. You can read the announcement here.

After installing via pip, you need to download the dictionary using the following command:

python -m unidic download

With fugashi or mecab-python3 unidic will be used automatically when installed, though if you want you can manually pass the MeCab arguments:

import fugashi
import unidic
tagger = fugashi.Tagger('-d "{}"'.format(unidic.DICDIR))
# that's it!

Differences from the Official UniDic Release

This has a few changes from the official UniDic release to make it easier to use.

  • entries for 令和 have been added
  • single-character numeric and alphabetic words have been deleted
  • unk.def has been modified so unknown punctuation won't be marked as a noun

See the extras directory for details on how to replicate the build process.

License

The modern Japanese UniDic is available under the GPL, LGPL, or BSD license, see here. UniDic is developed by NINJAL, the National Institute for Japanese Language and Linguistics. UniDic is copyrighted by the UniDic Consortium and is distributed here under the terms of the BSD License.

The code in this repository is not written or maintained by NINJAL. The code is available under the MIT or WTFPL License, as you prefer.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].