All Projects → papahigh → elasticsearch-keyboard-layout

papahigh / elasticsearch-keyboard-layout

Licence: Apache-2.0 License
Elasticsearch plugin for keyboard layout suggestions

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to elasticsearch-keyboard-layout

Strata
Раскладка клавиатуры для тех, кто любит Markdown и пишет по-русски
Stars: ✭ 70 (+233.33%)
Mutual labels:  keyboard-layout, russian
vim-plugin-ruscmd
Vim plugin: support command mode in Russian keyboard layout
Stars: ✭ 60 (+185.71%)
Mutual labels:  keyboard-layout, russian
ukrainian-typographic-layouts
Типографічні розкладки для української та російської мови / Типографские раскладки для украинского и русского языка
Stars: ✭ 69 (+228.57%)
Mutual labels:  keyboard-layout, russian
keymapper
A cross-platform context-aware key remapper.
Stars: ✭ 39 (+85.71%)
Mutual labels:  keyboard-layout
DeepMorphy
Морфологический анализатор для русского языка на C# для .NET
Stars: ✭ 23 (+9.52%)
Mutual labels:  russian
elasticsearch-approximate-nearest-neighbor
Plugin to integrate approximate nearest neighbor(ANN) search with Elasticsearch
Stars: ✭ 53 (+152.38%)
Mutual labels:  elasticsearch-plugin
raise-ergo
⌨️ Raise Ergo: an ergonomic keyboard layout for the Dygma Raise keyboard, geared towards programming & command line on macOS & Ubuntu.
Stars: ✭ 30 (+42.86%)
Mutual labels:  keyboard-layout
spacy russian tokenizer
Custom Russian tokenizer for spaCy
Stars: ✭ 35 (+66.67%)
Mutual labels:  russian
elasticsearch-dynamic-synonym
Elasticsearch Plugin for Dynaic Synonym Token Filter.
Stars: ✭ 38 (+80.95%)
Mutual labels:  elasticsearch-plugin
udar
UDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.
Stars: ✭ 15 (-28.57%)
Mutual labels:  russian
simple-about-rust
Пошаговые уроки по языку программирования Rust для начинающих
Stars: ✭ 25 (+19.05%)
Mutual labels:  russian
NeoLayoutViewer
Keyboard Layout Viewer for Neo 2.
Stars: ✭ 24 (+14.29%)
Mutual labels:  keyboard-layout
SI
SIGame и продукты, с ней связанные
Stars: ✭ 89 (+323.81%)
Mutual labels:  russian
RooCMS
RooCMS - This is easy and convenient content management system designed to quickly create websites.
Stars: ✭ 21 (+0%)
Mutual labels:  russian
number-to-words
convert number into words (english, french, italian, roman, spanish, portuguese, belgium, dutch, swedish, polish, russian, iranian, roman, aegean)
Stars: ✭ 53 (+152.38%)
Mutual labels:  russian
russian-language
Russian Language Pack for Invision Community 4
Stars: ✭ 24 (+14.29%)
Mutual labels:  russian
soar
SQL Optimizer And Rewriter
Stars: ✭ 7,786 (+36976.19%)
Mutual labels:  suggestion
kiselyov
Геометрия по Киселёву
Stars: ✭ 16 (-23.81%)
Mutual labels:  russian
MLSummerSchool
Материалы факультатива по машинному обучению и искусственному интеллекту
Stars: ✭ 27 (+28.57%)
Mutual labels:  russian
vector-search-plugin
Elasticsearch plugin for fast nearest neighbours of vectors (Similar use as FAISS)
Stars: ✭ 102 (+385.71%)
Mutual labels:  elasticsearch-plugin

Elasticsearch plugin for keyboard layout suggestions

Build Status License Apache%202.0 blue

This plugin exposes keyboard_layout term suggester which suggests terms according to the switched keyboard layout.

Examples of suggestions this plugin helps to provide
шзрщту ч 64пи               ⟶    iphone x 64gb
nt[yjkjus]                  ⟶    технології
dszdbo ytrfkmrb dsgflrfo    ⟶    выявіў некалькі выпадкаў
тшлу rhjccjdrb runner 2     ⟶    nike кроссовки runner 2
;tcnrbq lbcr 1n,            ⟶    жесткий диск 1тб

The following keyboard layouts are supported:

  • Russian

  • Ukrainian

  • Belarusian

Feel free to open a pull request with any other keyboard layouts.

This plugin may be used in combination with default term suggester which is based on string similarity in order to build a google-like search experience known as "did you mean?".

Installation

⚠️
Please note that due to the serialization issue this plugin is available only for Elasticsearch 7.0.0 and above.

In order to install the plugin, choose a version and run:

$ bin/elasticsearch-plugin install URL

where URL points to zip file of the appropriate release which corresponds to your elasticsearch version.

The plugin must be installed on every node in the cluster, and each node must be restarted after installation.

E.g., command for Elasticsearch 7.6.0

# install plugin on Elasticsearch 7.6.0
$ bin/elasticsearch-plugin install https://github.com/papahigh/elasticsearch-keyboard-layout/raw/7.6.0/dist/keyboard-layout-7.6.0.zip

After installation this plugin will expose new token filter and term suggester named keyboard_layout.

Getting started with Suggester

You can start using the keyboard_layout suggester by providing the suggest part of a search request:

POST _search
{
  "suggest": {
    "text": "шЗрщту ЧЫ 64пи",
    "keyboard_suggestion": {
      "keyboard_layout": {
        "field": "content",
        "language": "russian",
        "lowercase_token": true,
        "preserve_case": true,
        "add_original": false
      }
    }
  }
}

In the response you should see the original start offset and length in the suggest text and if any found a switched keyboard layout options. Each options array contains an option object that includes the suggested text and its document frequency. You may also request original token and its frequency by providing add_original option.

{
  "suggest": {
    "keyboard_suggestion": [
      {
        "text": "шЗрщту",
        "offset": 0,
        "length": 6,
        "options": [
          {
            "text": "iPhone",
            "freq": 4,
            "switch": true
          }
        ]
      },
      {
        "text": "ЧЫ",
        "offset": 7,
        "length": 2,
        "options": [
          {
            "text": "XS",
            "freq": 2,
            "switch": true
          }
        ]
      },
      {
        "text": "64пи",
        "offset": 10,
        "length": 4,
        "options": [
          {
            "text": "64gb",
            "freq": 1,
            "switch": true
          }
        ]
      }
    ]
  }
  ...
}

Extension for go client github.com/olivere/elastic: https://github.com/aaerofeev/go-elasic-keyboard-layout

Suggester options

List of the supported suggester options is as follows:

text

The suggest text. The suggest text is a required option that needs to be set globally or per suggestion.

field

The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion.

language

The language of the keyboard layout. This is an required option. Available options are: russian, belarusian, ukrainian.

analyzer

The analyzer to analyse the suggest text with. Defaults to the whitespace analyzer.

lowercase_token

Lower cases terms before frequency evaluation and after the suggest analysis is done. Default is false.

preserve_case

Whether case should be preserved in the switched suggest options. When lower_case is set to true this option restores the original case. Defaults to false.

min_freq

The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option.

max_freq

The maximum threshold in number of documents a suggest text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to -1 and is not enabled. This can be used to exclude high frequency terms from switch keyboard suggestions. The shard level document frequencies are used for this option.

add_original

Whether original term and its frequency should be included in the suggest options. Default is false.

Contribute

Use the issue tracker and/or open pull requests.

Licence

This project is released under version 2.0 of the Apache Licence.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].