Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → tmalsburg → Guess Language.el

tmalsburg / Guess Language.el

Emacs minor mode that detects the language you're typing in. Automatically switches spell checker. Supports multiple languages per document.

Programming Languages

language

365 projects

Labels

emacs spellcheck language-detection

Projects that are alternatives of or similar to Guess Language.el

Org Sticky Header

Show off-screen Org heading at top of window

Stars: ✭ 68 (-12.82%)

Mutual labels: emacs

Spacemacs Content

Content for the book - Clojure Development with Spacemacs

Stars: ✭ 72 (-7.69%)

Mutual labels: emacs

Solo Jazz Emacs Theme

An original Emacs theme inspired by the 1992 Solo Jazz cup design.

Stars: ✭ 76 (-2.56%)

Mutual labels: emacs

Color Theme Buffer Local

Set emacs color themes by buffer.

Stars: ✭ 69 (-11.54%)

Mutual labels: emacs

Dotfiles

💻 Dotfiles for zsh, vim, emacs, tmux, and bash. Tested on mac os.

Stars: ✭ 70 (-10.26%)

Mutual labels: emacs

Dmacs

Emacs Literate Configuration with borg

Stars: ✭ 74 (-5.13%)

Mutual labels: emacs

Webpaste.el

webpaste.el can paste whole buffers or parts of buffers to several pastebin-like services and supports failover if one service fails.

Stars: ✭ 67 (-14.1%)

Mutual labels: emacs

Org Zettelkasten

An opinionated setup for managing large collections of interlinked org files.

Stars: ✭ 77 (-1.28%)

Mutual labels: emacs

Flutter.el

Flutter tools for Emacs

Stars: ✭ 72 (-7.69%)

Mutual labels: emacs

Hunspell

The most popular spellchecking library.

Stars: ✭ 1,196 (+1433.33%)

Mutual labels: spellcheck

Emacsist

Emacs微信公众号收录文章存档！

Stars: ✭ 70 (-10.26%)

Mutual labels: emacs

Jq Mode

Emacs major mode for editing jq queries.

Stars: ✭ 70 (-10.26%)

Mutual labels: emacs

Org Kanban

Simple approach to kanban with emacs' org-mode

Stars: ✭ 74 (-5.13%)

Mutual labels: emacs

.emacs.d

Personal Emacs Configuration

Stars: ✭ 69 (-11.54%)

Mutual labels: emacs

.emacs.d

My emacs configuration

Stars: ✭ 76 (-2.56%)

Mutual labels: emacs

Witchmacs

My emacs configuration files

Stars: ✭ 68 (-12.82%)

Mutual labels: emacs

Emacs Gif Screencast

[MOVED TO GITLAB] One-frame-per-action GIF recording for optimal quality/size ratio

Stars: ✭ 74 (-5.13%)

Mutual labels: emacs

Vim Litecorrect

Lightweight auto-correction for Vim

Stars: ✭ 77 (-1.28%)

Mutual labels: spellcheck

Embrace.el

Add/Change/Delete pairs based on `expand-region', similar to `evil-surround'.

Stars: ✭ 76 (-2.56%)

Mutual labels: emacs

Eziam Theme Emacs

A mostly grayscale theme for Emacs, inspired by Tao and Leuven.

Stars: ✭ 74 (-5.13%)

Mutual labels: emacs

View All Similar Projects ➔

[[https://melpa.org/#/guess-language][file:https://melpa.org/packages/guess-language-badge.svg]]

guess-language: Emacs minor mode for robust automatic language detection

Emacs minor mode that detects the language of what you're typing. Automatically switches the spell checker and typo-mode (if present).

Key features:

Detection algorithm is robust, efficient, and dead simple. Based on character trigrams.
Support for many languages. More can be easily added.
Stays out of your way. Set up once, then forget it exists.
Works with documents written in multiple languages.

I write a lot of text in multiple languages and was getting tired of constantly having to switch the dictionary of my spell-checker. In true Emacs spirit, I decided to dust off my grandpa's parentheses and wrote some code to address this problem. The result is ~~guess-language-mode~~, a minor mode for Emacs that guesses the language of the current paragraph and then changes the dictionary of ispell and the language settings of typo-mode (if present). It also reruns Flyspell on the current paragraph, but only on that paragraph because I want to leave paragraphs in other languages untouched. Language guessing is triggered when Flyspell detects an unknown word, but only if the paragraph has enough material to allow for robust detection of the language (~ 35 characters).

Currently, the following languages are supported: Arabic, Czech, Danish, Dutch, English, Esperanto, Finnish, French, German, Italian, Norwegian, Polish, Portuguese, Russian, Serbian (Cyrillic and Latin), Slovak, Slovenian, Spanish, Swedish, Vietnamese.

It is easy to add more languages and this repository includes the necessary language statistics for 47 additional languages. (These were copied from [[https://github.com/kent37/guess-language][guess_language.py]].) If we already have the required language data (see directory [[https://github.com/tmalsburg/guess-language.el/tree/master/trigrams][trigrams]]), all you need to do is to add an entry to the variable ~~guess-language-langcodes~~. See [[https://github.com/tmalsburg/guess-language.el/commit/bbafdeaf380c41e4546510df7c257b898b702d65][here]] for the commit that added support for Serbian. See the code of [[https://github.com/jorgenschaefer/typoel][typo-mode]] to determine the quoting style needed for the language that you’re adding. An overview of quoting styles across languages can be found on [[https://en.wikipedia.org/wiki/Quotation_mark][Wikipedia]]. PRs adding new languages are welcome.

** Prerequisites

This mode assumes that Flyspell is activated and configured for all relevant languages, i.e., those listed in ~~guess-language-languages~~. If [[https://github.com/jorgenschaefer/typoel][typo-mode]] is present, guess-language also changes the language there. Typo-mode is not a dependency, though.

** Installation

Guess-language-mode is available through [[https://melpa.org/#/guess-language][MELPA]].

** Configuration

*** Language settings

#+BEGIN_SRC elisp (require 'guess-language)

;; Optionally: (setq guess-language-languages '(en de)) (setq guess-language-min-paragraph-length 35) #+END_SRC

~~guess-language-languages~~ defines the candidate languages that should be considered. It is recommended to only include languages that are actually used because this improves performance. Languages are identified using ISO 639-1 codes (see table below).

~~guess-language-min-paragraph-length~~ specifies the minimal length that a paragraph needs to have before guess-language-mode starts guessing. Based on some informal tests texts shorter than 30 characters are not enough to give good results. However, above 40 characters the algorithm performs well. Of course, these numbers depend on the target language (some are easier to detect than others) and on the (number of) candidate languages that are considered. (Open the files in the directory ~~testdata~~ and do ~~M-x guess-language-mark-lines~~ to see for yourself.)

For each language, there is a default Ispell dictionary that guess-language-mode tries to use. However, for some languages there are several dictionaries available and guess-language can’t know which one you’d like to use. For example, there are several different dictionaries for German and for English. If the dictionary that guess-language-mode uses by default is not present, you will get an error message like the following:

#+BEGIN_SRC elisp Error in post-command-hook (flyspell-post-command-hook): (error "Undefined dictionary: en") #+END_SRC

In this case, use the variable ~~guess-language-langcodes~~ to tell guess-language-mode which dictionary should be used instead. For example, use the following definition if you want to use British English and Swiss German:

#+BEGIN_SRC elisp (setq guess-language-langcodes '((en . ("en_GB" "English")) (de . ("de_CH" "German")))) #+END_SRC

The key of each entry in this alist is an ISO 639-1 language code. The first element of the value is the name of the dictionary that should be used (i.e., what you would enter when setting the language via ~~M-x ispell-change-dictionary~~). The second element is the name of the language setting that should be used with typo-mode (if present). If a language is not supported by typo-mode or if you are not using typo-mode, enter ~~nil~~.

For a list of all dictionaries available for spell-checking, use the following:

#+BEGIN_SRC org (mapcar 'car ispell-dictionary-alist) #+END_SRC

Languages that are currently supported by guess-language-mode:

| Language | IDO |--------------------+-------------- | Arabic | ar | Czech | cs | Danish | da | Dutch | nl | English | en | Esperanto | eo | Finnish | fi | French | fr | German | de | Italian | it | Norwegian | nb | Polish | pl | Portuguese | pt | Russian | ru | Serbian (Cyrillic) | sr | Serbian (Latin) | sr | Slovak | sk | Slovenian | sl | Spanish | es | Swedish | sv | Vietnamese | vi 639-1 code | Default Ispell dictionary | Default typo-mode setting | --+---------------------------+----------------------------------| | ar | | | czech | Czech | | dansk | | | nederlands | | | en | English | | esperanto | English | | finnish | Finnish | | francais | French | | de | German | | italiano | Italian | | norsk | | | polish | | | portuguese | | | russian | Russian | | serbian | German (most similar to Serbian) | | sr-lat | German (most similar to Serbian) | | slovak | | | slovenian | | | spanish | | | svenska | | | viet | |

*** Custom functions to be run when a new language is detected

While changing the spell-checker’s dictionary is the main purpose of guess-language, there are other things that a user might want to do when a new language is detected, for instance, a user might want to change the input method. Things like that can be easily achieved by adding custom functions to the hook ~~guess-language-after-detection-functions~~. Functions on this hook take three arguments:

Template:

#+BEGIN_SRC elisp (defun my-custom-function (lang beginning end) (do-something))

(add-hook 'guess-language-after-detection-functions #'my-custom-function) #+END_SRC

** Usage

Activate ~~guess-language-mode~~ in the buffer in which you want to use it. To activate it automatically in buffers containing text (as opposed to code), add guess-language mode to ~~text-mode-hook~~:

#+BEGIN_SRC elisp (add-hook 'text-mode-hook (lambda () (guess-language-mode 1))) #+END_SRC

*** Changing the voice used by the Festival text-to-speech system

The code snipped below illustrates how guess-language can be configured to automatically change the voice used by the text-to-speech engine [[http://www.cstr.ed.ac.uk/projects/festival/][Festival]] (install [[https://www.emacswiki.org/emacs/festival.el][festival.el]] for this to work):

#+BEGIN_SRC elisp (defun guess-language-switch-festival-function (lang beginning end) "Switch the voice used by festival.

LANG is the ISO 639-1 code of the language (as a symbol). BEGINNING and END are the endpoints of the region in which LANG was detected but these are ignored." (when (and (featurep 'festival) (festivalp)) (pcase lang ('en (festival-voice-english-female)) ('de (festival-voice-german-female)))))

(add-hook 'guess-language-after-detection-functions #'guess-language-switch-festival-function) #+END_SRC

The ~~pcase~~ needs to be modified to use the voices that are installed on your system. Refer to the documentation of Festival for details.

*** Changing the language of Synosaurus

[[https://github.com/hpdeifel/synosaurus][Synosaurus]] is an Emacs package providing access to a German or English thesaurus. Using the code below the language of the thesaurus is automatically changed to the language of the current paragraph. Refer to the documentation of Synosaurus for details.

#+BEGIN_SRC elisp (defun guess-language-switch-synosaurus (lang beginning end) "Switch the thesaurus language.

LANG is the ISO 639-1 code of the language (as a symbol). BEGINNING and END are the endpoints of the region in which LANG was detected. These are ignored." (when (featurep 'synosaurus) (pcase lang ('en (setq synosaurus-backend 'synosaurus-backend-wordnet)) ('de (setq synosaurus-backend 'synosaurus-backend-openthesaurus)))))

(add-hook 'guess-language-after-detection-functions #'guess-language-switch-synosaurus) #+END_SRC

** Notes

Support for Latin Serbian is based on trigrams transliterated from Cyrillic Serbian. Since some Cyrillic trigrams transliterate to 4-grams in Latin, we truncated those but as a result have two duplicates, "e n" and "ra ". Not ideal but the results are probably still robust enough. Nonetheless, it would be good if someone could compute proper Latin trigrams one day.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 78

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗