All Projects → mrgambal → Elasticsearch Ukrainian Lemmatizer

mrgambal / Elasticsearch Ukrainian Lemmatizer

Licence: mit
Ukrainian lemmatizer plugin for ElasticSearch

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Elasticsearch Ukrainian Lemmatizer

Sentinl
Kibana Alert & Report App for Elasticsearch
Stars: ✭ 1,233 (+2702.27%)
Mutual labels:  elasticsearch, plugin
Jmeter Elasticsearch Backend Listener
JMeter plugin that lets you send sample results to an ElasticSearch engine to enable live monitoring of load tests.
Stars: ✭ 72 (+63.64%)
Mutual labels:  elasticsearch, plugin
Kibananestedsupportplugin
A plugin for Kibana 5.5 and beyond that adds support for nested field search and aggregation.
Stars: ✭ 78 (+77.27%)
Mutual labels:  elasticsearch, plugin
Elasticsearch Thulac Plugin
thulac analysis plugin for elasticsearch
Stars: ✭ 129 (+193.18%)
Mutual labels:  elasticsearch, plugin
Emoji Search
😄 Emoji synonyms to build your own emoji-capable search engine (elasticsearch, solr)
Stars: ✭ 184 (+318.18%)
Mutual labels:  elasticsearch, plugin
Kafka Connect Elasticsearch Source
Kafka Connect Elasticsearch Source
Stars: ✭ 22 (-50%)
Mutual labels:  elasticsearch, plugin
Elasticsearch Prometheus Exporter
Prometheus exporter plugin for Elasticsearch
Stars: ✭ 409 (+829.55%)
Mutual labels:  elasticsearch, plugin
3d kibana charts vis
3D Kibana Charts: Pie Chart, Bars Chart, Bubbles Chart
Stars: ✭ 34 (-22.73%)
Mutual labels:  elasticsearch, plugin
Terraform Nextjs Plugin
A plugin to generate terraform configuration for Nextjs 8 and 9
Stars: ✭ 41 (-6.82%)
Mutual labels:  plugin
Objection Unique
Unique validation for Objection.js
Stars: ✭ 42 (-4.55%)
Mutual labels:  plugin
Nagios Plugins
450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Stars: ✭ 1,000 (+2172.73%)
Mutual labels:  elasticsearch
Elasticsearch Jdbc
A elasticsearch specified SQL interface on Java, no need to tweak your es instance.
Stars: ✭ 41 (-6.82%)
Mutual labels:  elasticsearch
Marked
Confluence macro plugin which renders remote Markdown.
Stars: ✭ 42 (-4.55%)
Mutual labels:  plugin
Kinecttopin
(Eyebeam #1 of 13) Developed with @FakeGreenDress. Record, stream, and export Kinect mocap data to After Effects puppet pins. Record directly from the Kinect or over OSC. Compiling or running from source requires SimpleOpenNI.
Stars: ✭ 40 (-9.09%)
Mutual labels:  plugin
Phalcon Vm
Vagrant configuration for PHP7, Phalcon 3.x and Zephir development.
Stars: ✭ 43 (-2.27%)
Mutual labels:  elasticsearch
Kuzzle
Open-source Back-end, self-hostable & ready to use - Real-time, storage, advanced search - Web, Apps, Mobile, IoT -
Stars: ✭ 991 (+2152.27%)
Mutual labels:  elasticsearch
Foundatio.parsers
A lucene style query parser that is extensible and allows modifying the query.
Stars: ✭ 39 (-11.36%)
Mutual labels:  elasticsearch
Baseplug
MVC audio plugin framework for rust
Stars: ✭ 44 (+0%)
Mutual labels:  plugin
Yoast Seo For Typo3
Yoast SEO plugin for TYPO3
Stars: ✭ 43 (-2.27%)
Mutual labels:  plugin
Gatsby Source Instagram All
⚛️📸 Gatsby source plugin for fetching all your instagram posts
Stars: ✭ 42 (-4.55%)
Mutual labels:  plugin

Ukrainian lemmatizer plugin for ElasticSearch [1.7 - 5.x]

The plugin provides an ability for ElasticSearch installations prior to version 6 (to be expanded) to search across documents, written in ukrainian, using words in different forms. Starting from version 5.0 ElasticSearch uses Lucene of version 6.2, which provides support for ukrainian language analysis out of the box. However, this plugin is still worthy, I swear! It uses the latest and greatest from the BrUk project, and, moreover, it allows specifying arbitrary stop-words.

Principles

The thing is, it makes you able to index not source words but their lemmas (lemma – canonical form of a word), and also perform a lookup using different forms of the same word which will return you what you're looking for. Needless to say, the magic is being done under the hood! No more doubts like: "What if I put this word in plural? Maybe it'll finally find something?". Each term before settling in the storage passes through the analyzer to check if there is a lemma for the term and, in case of success, this lemma must get into index. The same sequence of actions has the place when you start a lookup over documents stored using the analyzer: it converts your search terms according to dictionary and return results if there is any match. As the source of lemmas the plugin uses the dictionary from the BrUk project.

Get plugin

Note: I won't release a build for ES 2.2.0 due to an ugly bug.

You always can get latest ready-to-go builds on the Releases page. Download a zip-file with the corresponding version of ES supported and install it with:

ES 1.7.+

<path_to_es_bin_dir>/plugin --url file://<path_to_distribution>/elasticsearch-ukrainian-lemmatizer-1.0-SNAPSHOT.zip --install ukrainian-lemmatizer

ES 2.0.0-2.4.6

<path_to_es_bin_dir>/plugin install file:<path_to_distribution>/elasticsearch-ukrainian-lemmatizer-<plugin_version>.zip

For ES version 5+

<path_to_es_bin_dir>/elasticsearch-plugin install file:<path_to_distribution>/elasticsearch-ukrainian-lemmatizer-<plugin_version>.zip

Build the plugin

Manual building of the plugin consists of only 4 steps:

  • Clone this repository
  • Get inside the root dir of the cloned repo and run gradle release
  • Find the built artifact in build/distributions/.

Usage

Here are simple example of the plugin usage that rely on ES HTTP API. First we need to create the index which must include our analyzer. But let's make it in a way a bit fancier than the usual one: make it a part of a custom analyzer with an additional list of stopwords. In effect, only the word "гусята" is to be blacklisted.

# Create index with settings
curl -XPUT "http://localhost:9200/ukrainian/" -H 'Content-Type: application/json' -d '
{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_ukrainian": {
                    "type": "ukrainian",
                    "stopwords": [
                        "гусята"
                    ]
                }
            }
        }
    }
}
'

Then we create a simple mapping:

# Define mapping
curl -XPOST "http://localhost:9200/ukrainian/user/_mapping" -H 'Content-Type: application/json' -d '
{
   "user":{
      "properties":{
         "test":{
            "type":"string",
            "analyzer":"my_ukrainian"
         }
      }
   }
}
'

And fill the index with a sample data:

# Create Documents
curl -XPOST "http://localhost:9200/ukrainian/user/_bulk" -H 'Content-Type: application/json' -d '
{"create": {"_id": 1}}
{ "test": "гусята" }
{"create": {"_id": 2}}
{ "test": "гусяти" }
{"create": {"_id": 3}}
{ "test": "гусятам" }
{"create": {"_id": 4}}
{ "test": "підострожує" }
{"create": {"_id": 5}}
{ "test": "п’яничка" }
'

Having that done and filled this index with some data we can query it using the same analyzer:

# Search
curl -XPOST "http://localhost:9200/ukrainian/user/_search?pretty=true" -H 'Content-Type: application/json' -d '
{
   "query":{
      "match":{
         "test": {
             "query": "гусятах",
             "analyzer": "my_ukrainian"
         }
      }
   }
}
'

And here is what you'll receive:

{
    "took": 104,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 0.5945348,
        "hits": [{
            "_index": "ukrainian",
            "_type": "user",
            "_id": "AWE8Vt4G8T79yKC4TtYm",
            "_score": 0.5945348,
            "_source": {
                "test": "гусятам"
            }
        }, {
            "_index": "ukrainian",
            "_type": "user",
            "_id": "AWE8Vt4G8T79yKC4TtYo",
            "_score": 0.5945348,
            "_source": {
                "test": "гусяти"
            }
        }]
    }
}

Notice you may find this particular example in test.sh inside the repository: you may use it for testing of serviceability of the plugin after you install it.

Requirements

  • ES
    • 1.7.+ (release v1.0)
    • 2.0.0 (release v1.1.0)
    • 2.0.1 (release v1.1.1)
    • 2.0.2 (release v1.1.3)
    • 2.1.0 (release v1.2.0)
    • 2.1.1 (release v1.2.1)
    • 2.1.2 (release v1.2.2)
    • 2.2.1 (release v1.3.0)
    • 2.3.3 (release v1.4.1)
    • 2.3.5 (release v1.4.3)
    • 2.4.6 (release v1.5.3)
    • 5.6.16 (release v1.6.0)
  • Java 8
  • Gradle 6+
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].