All Projects → codelibs → elasticsearch-analysis-synonym

codelibs / elasticsearch-analysis-synonym

Licence: Apache-2.0 License
NGramSynonymTokenizer for Elasticsearch

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to elasticsearch-analysis-synonym

elasticsearch-dynamic-synonym
Elasticsearch Plugin for Dynaic Synonym Token Filter.
Stars: ✭ 38 (+52%)
Mutual labels:  synonyms, elasticsearch-plugin
vector-search-plugin
Elasticsearch plugin for fast nearest neighbours of vectors (Similar use as FAISS)
Stars: ✭ 102 (+308%)
Mutual labels:  elasticsearch-plugin
Typesense
Fast, typo tolerant, fuzzy search engine for building delightful search experiences ⚡ 🔍 ✨ An Open Source alternative to Algolia and an Easier-to-Use alternative to ElasticSearch.
Stars: ✭ 8,644 (+34476%)
Mutual labels:  synonyms
querqy-elasticsearch
Querqy for Elasticsearch
Stars: ✭ 37 (+48%)
Mutual labels:  synonyms
wordhoard
This Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.
Stars: ✭ 78 (+212%)
Mutual labels:  synonyms
elasticsearch-report-engine
An Elasticsearch plugin to return query results as either PDF,HTML or CSV.
Stars: ✭ 49 (+96%)
Mutual labels:  elasticsearch-plugin
Elastik Nearest Neighbors
Go to: https://github.com/alexklibisz/elastiknn
Stars: ✭ 249 (+896%)
Mutual labels:  elasticsearch-plugin
reactivesearch-api
API Gateway for Elasticsearch with declarative querying and out-of-the-box access controls
Stars: ✭ 146 (+484%)
Mutual labels:  elasticsearch-plugin
elasticsearch-sudachi
The Japanese analysis plugin for elasticsearch
Stars: ✭ 129 (+416%)
Mutual labels:  elasticsearch-plugin
elasticsearch-langfield
This plugin provides a useful feature for multi-language
Stars: ✭ 13 (-48%)
Mutual labels:  elasticsearch-plugin
syn
syn - the thesaurus
Stars: ✭ 45 (+80%)
Mutual labels:  synonyms
synonym-extractor
Extract synonyms, keywords from sentences using modified implementation of Aho Corasick algorithm
Stars: ✭ 38 (+52%)
Mutual labels:  synonyms
elasticsearch plugin
Nodeos plugin for archiving blockchain data into Elasticsearch.
Stars: ✭ 57 (+128%)
Mutual labels:  elasticsearch-plugin
Meilisearch
Powerful, fast, and an easy to use search engine
Stars: ✭ 20,236 (+80844%)
Mutual labels:  synonyms
elasticsearch-approximate-nearest-neighbor
Plugin to integrate approximate nearest neighbor(ANN) search with Elasticsearch
Stars: ✭ 53 (+112%)
Mutual labels:  elasticsearch-plugin
Synonyms
🌿 中文近义词:聊天机器人,智能问答工具包
Stars: ✭ 4,027 (+16008%)
Mutual labels:  synonyms
docker-curator
docker images for elasticsearch curator
Stars: ✭ 23 (-8%)
Mutual labels:  elasticsearch-plugin
elasticsearch-keyboard-layout
Elasticsearch plugin for keyboard layout suggestions
Stars: ✭ 21 (-16%)
Mutual labels:  elasticsearch-plugin
node-thesaurus-com
Look up synonyms/antonyms on thesaurus.com.
Stars: ✭ 17 (-32%)
Mutual labels:  synonyms
rosette-elasticsearch-plugin
Document Enrichment plugin for Elasticsearch
Stars: ✭ 25 (+0%)
Mutual labels:  elasticsearch-plugin

Elasticsearch Analysis Synonym

Overview

Elasticsearch Analysis Synonym Plugin provides NGramSynonymTokenizer. For more details, see LUCENE-5252.

Version

Versions in Maven Repository

Issues/Questions

Please file an issue. (Japanese forum is here.)

Installation

For 5.x

$ $ES_HOME/bin/elasticsearch-plugin install org.codelibs:elasticsearch-analysis-synonym:5.3.0

For 2.x

$ $ES_HOME/bin/plugin install org.codelibs/elasticsearch-analysis-synonym/2.4.0

Getting Started

Create synonym.txt File

First of all, you need to create a synonym dictionary file, synonym.txt in $ES_CONF(ex. /etc/elasticsearch). (The following content is just a sample...)

$ cat /etc/elasticsearch/synonym.txt
あ,かき,さしす,たちつて,なにぬねの

Create Index

NGramSynonymTokenizer is defined as "ngram_synonym" type. Creating an index with "ngram_synonym" is below:

$ curl -XPUT localhost:9200/sample?pretty -d '
{
  "settings":{
    "index":{
      "analysis":{
        "tokenizer":{
          "2gram_synonym":{
            "type":"ngram_synonym",
            "n":"2",
            "synonyms_path":"synonym.txt"
          }
        },
        "analyzer":{
          "2gram_synonym_analyzer":{
            "type":"custom",
            "tokenizer":"2gram_synonym"
          }
        }
      }
    }
  },
  "mappings":{
    "item":{
      "properties":{
        "id":{
          "type":"string",
          "index":"not_analyzed"
        },
        "msg":{
          "type":"string",
          "analyzer":"2gram_synonym_analyzer"
        }
      }
    }
  }
}'

and then insert data:

$ curl -XPOST localhost:9200/sample/item/1 -d '
{
  "id":"1",
  "msg":"あいうえお"
}'

Check Search Results

Try searching...

$ curl -XPOST "http://localhost:9200/sample/_search" -d '
{
   "query": {
      "match_phrase": {
         "msg": "あ"
      }
   }
}'

$ curl -XPOST "http://localhost:9200/sample/_search" -d '
{
   "query": {
      "match_phrase": {
         "msg": "あい"
      }
   }
}'

$ curl -XPOST "http://localhost:9200/sample/_search" -d '
{
   "query": {
      "match_phrase": {
         "msg": "かき"
      }
   }
}'

$ curl -XPOST "http://localhost:9200/sample/_search" -d '
{
   "query": {
      "match_phrase": {
         "msg": "かきい"
      }
   }
}'

Reload synonyms_path File Dynamically

To add "dynamic_reload" property as true, NGramSynonymTokenizer reloads synonyms_path file on the fly(actually, it's reload on reset() method call). If you want to change an interval time to check a file timestamp, add "reload_interval".

$ curl -XPUT localhost:9200/sample?pretty -d '
{
  "settings":{
    "index":{
      "analysis":{
        "tokenizer":{
          "2gram_synonym":{
            "type":"ngram_synonym",
            "n":"2",
            "synonyms_path":"synonym.txt",
            "dynamic_reload":true,
            "reload_interval":"10s"
          }
        },
...
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].