All Projects → kshramt → company-ngram

kshramt / company-ngram

Licence: other
No description or website provided.

Programming Languages

python
139335 projects - #7 most used programming language
emacs lisp
2029 projects

Projects that are alternatives of or similar to company-ngram

Vietnamese-Accent-Prediction
A simple/fast/accurate accent prediction for non-accented Vietnamese text
Stars: ✭ 31 (+6.9%)
Mutual labels:  n-grams
POS-Taggers
Part-of-Speech Tagging Models in Python
Stars: ✭ 16 (-44.83%)
Mutual labels:  n-grams
2017-summer-workshop
Exercises, data, and more for our 2017 summer workshop (funded by the Estes Fund and in partnership with Project Jupyter and Berkeley's D-Lab)
Stars: ✭ 33 (+13.79%)
Mutual labels:  n-grams

company-ngram

A company backend for N-gram based completion.

This backend produces completion candidates that are fuzzily matching to N-gram data. The N-gram data is automatically constructed from *.txt files placed directly under company-ngram-data-dir directory. If you set company-ngram-n to 4, three words before the cursor are used to produce completion candidates.

To mitigate the data sparsity problem, this backend uses a fuzzy-matching strategy. Given the following sentence, Dear Dr. Aki, , this backend produces completion candidates that match at least one of following prefixes,

Dear Dr. Aki,
*    Dr. Aki,
Dear *   Aki,
*    *   Aki,
Dear Dr. *
*    Dr. *
Dear *   *

where * matches an arbitrary word. Hence, even if your *.txt does not contain the word Aki, you still have chance to get completion candidates.

Configurations

; ~/.emacs.d/init.el

(with-eval-after-load 'company-ngram
  ; ~/data/ngram/*.txt are used as data
  (setq company-ngram-data-dir "~/data/ngram")
  ; company-ngram supports python 3 or newer
  (setq company-ngram-python "python3")
  (company-ngram-init)
  (cons 'company-ngram-backend company-backends)
  ; or use `M-x turn-on-company-ngram' and
  ; `M-x turn-off-company-ngram' on individual buffers
  ;
  ; save the cache of candidates
  (run-with-idle-timer 7200 t
                       (lambda ()
                         (company-ngram-command "save_cache")
                         ))
  )

(require 'company-ngram nil t)

RFC provides handy text files for a quick trial.

wget --directory-prefix ~/data/ngram    https://www.rfc-editor.org/rfc/rfc{5661,6716,4949}.txt

License

The GNU General Public License version 3.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].