All Projects → scriptin → jmdict-simplified

scriptin / jmdict-simplified

Licence: CC-BY-SA-4.0 license
JMdict, JMnedict, Kanjidic, KRADFILE/RADKFILE in JSON format

Programming Languages

kotlin
9241 projects
typescript
32286 projects
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to jmdict-simplified

jiten
jiten - japanese android/cli/web dictionary based on jmdict/kanjidic — 日本語 辞典 和英辞典 漢英字典 和独辞典 和蘭辞典
Stars: ✭ 64 (-33.33%)
Mutual labels:  dictionary, japanese, jmdict, kanjidic
unofficial-jisho-api
Encapsulates the official Jisho.org API and also provides kanji, example, and stroke diagram search.
Stars: ✭ 88 (-8.33%)
Mutual labels:  dictionary, japanese, japanese-language
Yomichan
Japanese pop-up dictionary extension for Chrome and Firefox.
Stars: ✭ 464 (+383.33%)
Mutual labels:  dictionary, japanese, japanese-language
jmdict-kindle
Japanese - English dictionary for Kindle based on the JMdict / EDICT database
Stars: ✭ 151 (+57.29%)
Mutual labels:  dictionary, japanese, jmdict
Ichiran
Linguistic tools for texts in Japanese language
Stars: ✭ 120 (+25%)
Mutual labels:  dictionary, japanese, japanese-language
Genki Study Resources
A collection of exercises for practicing what is taught in Genki: An Integrated Course in Elementary Japanese.
Stars: ✭ 232 (+141.67%)
Mutual labels:  japanese, japanese-language
dic-nico-intersection-pixiv
ニコニコ大百科とピクシブ百科事典の共通部分のIME辞書
Stars: ✭ 49 (-48.96%)
Mutual labels:  dictionary, japanese
nihongo
Japanese Dictionary
Stars: ✭ 77 (-19.79%)
Mutual labels:  dictionary, japanese
Languagepod101 Scraper
Python scraper for Language Pods such as Japanesepod101.com 👹 🗾 🍣 Compatible with Japanese, Chinese, French, German, Italian, Korean, Portuguese, Russian, Spanish and many more! ✨
Stars: ✭ 104 (+8.33%)
Mutual labels:  japanese, japanese-language
Jotoba
A free online, self-hostable, multilang Japanese dictionary.
Stars: ✭ 87 (-9.37%)
Mutual labels:  dictionary, japanese
python-doc-ja
Python ドキュメント日本語訳プロジェクト
Stars: ✭ 130 (+35.42%)
Mutual labels:  japanese, japanese-language
Mecab Ipadic Neologd
Neologism dictionary based on the language resources on the Web for mecab-ipadic
Stars: ✭ 2,408 (+2408.33%)
Mutual labels:  dictionary, japanese-language
Kanji Data Media
Japanese language data on kanji and radicals, media files, fonts and related resources from Kanji alive
Stars: ✭ 186 (+93.75%)
Mutual labels:  japanese, japanese-language
Memorize
🚀 Japanese-English-Mongolian dictionary. It lets you find words, kanji and more quickly and easily
Stars: ✭ 72 (-25%)
Mutual labels:  dictionary, japanese
Topokanji
Topologically ordered lists of kanji for effective learning
Stars: ✭ 108 (+12.5%)
Mutual labels:  japanese, japanese-language
Google Ime Dictionary
日英変換・英語略語展開のための IME 追加辞書 📙 日本語から英語への和英変換や英語略語の展開を Google 日本語入力や ATOK などで可能にする IME 拡張辞書です
Stars: ✭ 30 (-68.75%)
Mutual labels:  dictionary, japanese
Qolibri
Continuation of the qolibri EPWING dictionary/book reader
Stars: ✭ 82 (-14.58%)
Mutual labels:  dictionary, japanese
Kagome
Self-contained Japanese Morphological Analyzer written in pure Go
Stars: ✭ 554 (+477.08%)
Mutual labels:  japanese, japanese-language
The Tab Of Words
A minimal Chrome / Firefox extension to help you learn Japanese words in each new tab.
Stars: ✭ 94 (-2.08%)
Mutual labels:  japanese, japanese-language
Emoji Ime Dictionary
日本語で絵文字入力をするための IME 追加辞書 📙 Google 日本語入力などで日本語から絵文字への変換を可能にする IME 拡張辞書です
Stars: ✭ 172 (+79.17%)
Mutual labels:  dictionary, japanese

jmdict-simplified

JMdict, JMnedict, Kanjidic, and Kradfile/Radkfile in JSON format
with more comprehensible structure and beginner-friendly documentation

Download JSON files Format docs

NPM package: @scriptin/jmdict-simplified-types
NPM package: @scriptin/jmdict-simplified-loader


Why?

Original XML files are less than ideal in terms of format. (My opinion only, the JMdict/JMnedict project in general is absolutely awesome!) This project provides the following changes and improvements:

  1. JSON instead of XML (or custom text format of RADKFILE/KRADFILE). Because the original format used some "advanced" XML features, such as entities and DOCTYPE, it could be quite difficult to use in some tech stacks, e.g. when your programming language of choice has no libraries for parsing some syntax
  2. Regular structure for every item in every collection, no "same as in previous" implicit values. This is a problem with original XML files because users' code has to keep track of various parts of state while traversing collections. In this project, I tried to make every item of every collection "self-contained," with all the fields having all the values, without a need to refer to preceding items
  3. Avoiding null (with few exceptions) and missing fields, preferring empty arrays. See http://thecodelesscode.com/case/6 for the inspiration for this
  4. Human-readable names for fields instead of cryptic abbreviations with no explanations
  5. Documentation in a single file instead of browsing obscure pages across multiple sites. In my opinion, the documentation is the weakest part of JMDict/JMnedict project

Format

See the Format documentation or TypeScript types

Please also read the original documentation if you have more questions:

There are also Kotlin types, although they contain some methods and annotations you might not need.

Full, "common-only", and language-specific versions

There are two main types of JSON files for the JMdict dictionary:

  • full - same as original files, with no omissions of entries
  • "common-only" - containing only dictionary entries considered "common" - if any of /k_ele/ke_pri or /r_ele/re_pri elements in XML files contain one of these markers: "news1", "ichi1", "spec1", "spec2", "gai1". Only one such element is enough for the whole word to be considered common. This corresponds to how online dictionaries such as https://jisho.org classify words as "common". Common-only distributions are much smaller. They are marked with "common" keyword in file names, see the latest release

Also, JMdict and Kanjidic have language-specific versions with language codes (3-letter ISO 639-2 codes for JMdict, 2-letter ISO 639-1 codes for Kanjidic) in file names:

  • all - all languages, i.e. no language filter was applied
  • eng/en - English
  • ger/de - German
  • rus/ru - Russian
  • hun/hu - Hungarian
  • dut/nl - Dutch
  • spa/es - Spanish
  • fre/fr - French
  • swe/sv - Swedish
  • slv/sl - Slovenian

JMnedict has only one version, since it's (currently) English-only, and has no "common" indicators on entries.

Requirements for running the conversion script

You don't need to install Gradle, just use the Gradle wrapper provided in this repository: gradlew (for Linux/Mac) or gradlew.bat (for Windows)

Converting XML dictionaries

NOTE: You can grab the pre-built JSON files in the latest release

Use included scripts: gradlew (for Linux/Mac OS) or gradlew.bat (for Windows).

Tasks to convert dictionary files and create distribution archives:

  • ./gradlew clean - clean all build artifacts to start a fresh build, in cases when you need to re-download and convert from scratch
  • ./gradlew download - download and extract original dictionary XML files into build/dict-xml
  • ./gradlew convert - convert all dictionaries to JSON and place into build/dict-json
  • ./gradlew archive - create distribution archives (zip, tar+gzip) in build/distributions

Utility tasks (for CI/CD workflows):

  • ./gradlew --quiet jmdictHasChanged, ./gradlew --quiet jmnedictHasChanged, and ./gradlew --quiet kanjidicHasChanged- check if dictionary files have changed by comparing checksums of downloaded files with those stored in the checksums. Outputs YES or NO. Run this only after download task! The --quiet is to silence Gradle logs, e.g. when you need to put values into environments variables.
  • ./gradlew updateChecksums - update checksum files in the checksums directory. Run after creating distribution archives and commit checksum files into the repository, so that next time CI/CD workflow knows if it needs to rebuild anything.
  • ./gradlew uberJar - create an Uber JAR for standalone use (i.e. w/o Gradle). The JAR program shows help messages and should be intuitive to use if you know how to run it.

For the full list of available tasks, run ./gradlew tasks

Troubleshooting

  • Make sure to run tasks in order: download -> convert -> archive
  • If running Gradle fails, make sure java is available on your $PATH environment variable
  • Run Gradle with --stacktrace, --info, or --debug arguments to see more details if you get an error

License

JMdict and JMnedict

The original XML files - JMdict.xml, JMdict_e.xml, and JMnedict.xml - are the property of the Electronic Dictionary Research and Development Group, and are used in conformance with the Group's license. Project started in 1991 by Jim Breen.

All derived files are distributed under the same license, as the original license requires it.

Kanjidic

The original kanjidic2.xml file is released under Creative Commons Attribution-ShareAlike License v4.0. See the Copyright and Permissions section on the Kanjidic wiki for details.

All derived files are distributed under the same license, as the original license requires it.

RADKFILE/KRADFILE

The RADKFILE and KRADFILE files are copyright and available under the EDRDG Licence. The copyright of the RADKFILE2 and KRADFILE2 files is held by Jim Rose.

NPM packages

NPM packages @scriptin/jmdict-simplified-types and @scriptin/jmdict-simplified-loader are available under MIT license.

Other files

The source code and other files of this project, excluding the files and packages mentioned above, are available under Creative Commons Attribution-ShareAlike License v4.0. See LICENSE.txt

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].