All Projects → mcthulhu → jorkens

mcthulhu / jorkens

Licence: other
epub reader based on epub.js for foreign language learners

Programming Languages

CSS
56736 projects
javascript
184084 projects - #8 most used programming language
HTML
75241 projects

Projects that are alternatives of or similar to jorkens

kthoom
Comic Book Reader in the Browser
Stars: ✭ 97 (+177.14%)
Mutual labels:  epub, epub-reader
Folioreader Android
A Java ePub reader and parser framework for Android.
Stars: ✭ 2,025 (+5685.71%)
Mutual labels:  epub, epub-reader
termpub
Epubreader for the terminal
Stars: ✭ 25 (-28.57%)
Mutual labels:  epub, epub-reader
Folioreaderkit
📚 A Swift ePub reader and parser framework for iOS.
Stars: ✭ 2,382 (+6705.71%)
Mutual labels:  epub, epub-reader
readium-css
🌈 A set of reference stylesheets for EPUB Reading Systems, starting with Readium Mobile
Stars: ✭ 78 (+122.86%)
Mutual labels:  epub, epub-reader
epub-viewer
android epub viewer
Stars: ✭ 32 (-8.57%)
Mutual labels:  epub, epub-reader
quasar-epub-reader
📚 👓 An epub reader made with quasar framework and epubjs
Stars: ✭ 22 (-37.14%)
Mutual labels:  epub-reader, epubjs
onesync-reader-app
Cross-platform ebook reader built using Xamarin.Forms
Stars: ✭ 33 (-5.71%)
Mutual labels:  epub, epub-reader
R2D2BC
https://d-i-t-a.github.io/R2D2BC/
Stars: ✭ 27 (-22.86%)
Mutual labels:  epub, epub-reader
iRead
iRead is an EPUB reader for iOS written in Swift
Stars: ✭ 83 (+137.14%)
Mutual labels:  epub, epub-reader
EveReader
Epub Reader, focused on annotation.
Stars: ✭ 68 (+94.29%)
Mutual labels:  epub, epub-reader
pubcrawl
🍺📖 Convert 'epub' Files to Text (Use https://github.com/ropensci/epubr instead)
Stars: ✭ 22 (-37.14%)
Mutual labels:  epub
mw-thesaurus.el
Merriam-Webster Thesaurus in Emacs
Stars: ✭ 84 (+140%)
Mutual labels:  dictionary
vocabulary-titan
Chatbot for searching vocabulary on mainstream dictionaries
Stars: ✭ 70 (+100%)
Mutual labels:  dictionary
react-native-ebook
React Native E-book (.mobi, .epub)
Stars: ✭ 45 (+28.57%)
Mutual labels:  epub
condict
Dictionary software for constructed languages.
Stars: ✭ 21 (-40%)
Mutual labels:  dictionary
jiten
jiten - japanese android/cli/web dictionary based on jmdict/kanjidic — 日本語 辞典 和英辞典 漢英字典 和独辞典 和蘭辞典
Stars: ✭ 64 (+82.86%)
Mutual labels:  dictionary
Jotoba
A free online, self-hostable, multilang Japanese dictionary.
Stars: ✭ 87 (+148.57%)
Mutual labels:  dictionary
epub kitty
a beautiful flutter epub reader!
Stars: ✭ 49 (+40%)
Mutual labels:  epub
google-dictionary
An android library that provides easy access to meanings of any word, phrase, or slang via Google, within any application
Stars: ✭ 46 (+31.43%)
Mutual labels:  dictionary

jorkens

Jorkens is a desktop epub reader (an Electron application) based on epub.js and intended for foreign language learners. If Calibre is installed (recommended), Jorkens can also use Calibre's conversion tool to convert other ebook formats to epub transparently before opening them. Users can also add their own Python scripts to add functionality to Jorkens. In addition to this page and the Wiki, a subreddit, r/Jorkens, has been created on Reddit for discussions of Jorkens, though bug reports should be submitted as issues here.

A binary installer can be downloaded from the releases page. The source code is likely to be more current, however, and can be run as described below.

Don't forget to look at the Wiki, which among other things has tips on sources of dictionary data, as well as a couple of sample Python plugins.

Prequisites (Important)

Jorkens relies on the Stanza NLP library for lemmatization, so it expects to find Python and Stanza installed. (Python 3.8 is no longer required; the latest Python should be fine.) You also need to install the Stanza data for the language you're reading. Jorkens itself will download the Python script for lemmatization (see below under Lemmatization). For a few languages, TreeTagger may need to be installed instead of Stanza.

As noted above, you should install Calibre for ebook format conversion (if needed - if you are only reading existing epubs, this is not necessary).

Screen Shots (single and parallel books)

single book (Italian)

screen shot

parallel books side by side (Spanish and English)

screen shot

Functionality

Jorkens can search for definitions of foreign words in numerous online dictionaries, and also uses a SQLite database for a local glossary and translation memory. (Note: The database is initially empty, and needs to be populated before search results can be returned.) A highlighted word or partial word will be automatically looked up in the glossary, and any matches will be displayed; if none are found, a concordance search will be done next (see next paragraph). Glossaries can be imported from and exported to text files. See the Wiki for some useful sources of dictionary/glossary data.

If you have versions of the same book in two languages, Jorkens can display them in a parallel text view, side by side (see the second image above).

The local translation memory can be used for bilingual concordance searches, showing all sentences where the highlighted word was found as well as their translations. It can thus function indirectly as a secondary dictionary. The results returned are currently limited to 100 hits. Jorkens can also perform similar concordance searches on the Linguee Web site. Translation memories can be imported from .tmx (Translation Memory eXchange) files.

Jorkens can also search Google Images for highlighted words; image search results can be surprisingly useful for showing the meanings of foreign words, as well as finding images to use in flashcards.

Jorkens currently supports text-to-speech (TTS) using a number of Windows TTS voices (as long as the user has installed the ones needed), as well as Amazon Polly for more limited passages (note that Amazon Polly voices can also be downloaded and installed locally). Jorkens can also search Forvo for individual word pronunciations.

So far Jorkens supports machine translation through Amazon Translate; some support for Google Translate has been added. You may need to have AWS credentials on your machine in order for Amazon Translate and Amazon Polly to work correctly.

Jorkens has an internal flashcard database and basic flashcard review, with cards presented in random order so you can test whether you know a card or not, with a score kept for the current review session. This is not a spaced repetition system (SRS), though that may come later. Jorkens' flashcards can be exported to text files to be imported into Anki, a very good SRS program, however. Anki can also be opened from the Jorkens menu.

The Tools menu includes an option to generate a word frequency list, and save it as a .csv file. Jorkens applies stopwords and lemmatizes the words before producing the word frequency list.

Users can run Python scripts, using for example natural language processing toolkits like NLTK or Spacy, against the book's or chapter's text for further analysis. Scripts found in Documents\Jorkens\Python should appear under the External/Python scripts menu. Some sample Python scripts can be found on the Wiki.

For future goals and desired features, see the Wiki.

Jorkens has so far been tested mostly on a Windows 10 machine, but it should be possible to build Linux and MacOS versions in the future with minor modifications. Jorkens seems to run fairly well from the source code under Linux. Compiled released versions will be posted occasionally, but in between, users should be able to run the working source code after installing node.js, npm, and Electron, by executing 'npm start' at the command line in the main jorkens directory.

The name Jorkens is from the storyteller character in the short story collections by Lord Dunsany.

Lemmatization

Lemmatization, converting inflected forms of words to their dictionary forms (lemmas), greatly improves dictionary lookup functions, e.g. highlighting the Italian word "tuffamo" will show results for the infinitive form "tuffare." Jorkens is now using Stanford NLP's Python library Stanza for lemmatization as the default for most languages, and TreeTagger support may be dropped in the future. Stanza supports 66 languages; see https://stanfordnlp.github.io/stanza/available_models.html for details. Users will need to have Python and Stanza installed (pip install stanza), in addition to downloading the models for the languages they need (e.g. "import stanza" and then "stanza.download('es')." Jorkens will place the stanza-lemmatizer.py script from the Python scripts page in the Wiki in the Jorkens/Python folder under the Documents folder.

Jorkens is also using TreeTagger from https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/ for some languages. Users who wish to take advantage of this should install the Windows version per the instructions at that link, as well as the parameter files for the foreign languages they need to use. The graphical interface mentioned there is not necessary. The languages supported by TreeTagger include German, English, French, Italian, Danish, Swedish, Norwegian, Dutch, Spanish, Bulgarian, Russian, Portuguese, Galician, Greek, Chinese, Swahili, Slovak, Slovenian, Latin, Estonian, Polish, Romanian, and Czech.

For Linux, Jorkens will expect the TreeTagger executable to be at ~/TreeTagger/bin/tree-tagger.

Licenses

  • epub.js - Free BSD

  • sweetalert2 - MIT

  • Tabulator - MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].