All Projects → shikeio → Elasticsearch Analysis Hanlp

shikeio / Elasticsearch Analysis Hanlp

Licence: apache-2.0

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Elasticsearch Analysis Hanlp

Elastik Nearest Neighbors
Go to: https://github.com/alexklibisz/elastiknn
Stars: ✭ 249 (+538.46%)
Mutual labels:  elasticsearch-plugin
vector-search-plugin
Elasticsearch plugin for fast nearest neighbours of vectors (Similar use as FAISS)
Stars: ✭ 102 (+161.54%)
Mutual labels:  elasticsearch-plugin
Alerting
📟 Open Distro for Elasticsearch Alerting Plugin
Stars: ✭ 259 (+564.1%)
Mutual labels:  elasticsearch-plugin
elasticsearch-langfield
This plugin provides a useful feature for multi-language
Stars: ✭ 13 (-66.67%)
Mutual labels:  elasticsearch-plugin
rosette-elasticsearch-plugin
Document Enrichment plugin for Elasticsearch
Stars: ✭ 25 (-35.9%)
Mutual labels:  elasticsearch-plugin
elasticsearch-dynamic-synonym
Elasticsearch Plugin for Dynaic Synonym Token Filter.
Stars: ✭ 38 (-2.56%)
Mutual labels:  elasticsearch-plugin
Mirage
🎨 GUI for simplifying Elasticsearch Query DSL
Stars: ✭ 2,143 (+5394.87%)
Mutual labels:  elasticsearch-plugin
Elasticsearch Readonlyrest Plugin
Free Elasticsearch security plugin and Kibana security plugin: super-easy Kibana multi-tenancy, Encryption, Authentication, Authorization, Auditing
Stars: ✭ 917 (+2251.28%)
Mutual labels:  elasticsearch-plugin
elasticsearch-sudachi
The Japanese analysis plugin for elasticsearch
Stars: ✭ 129 (+230.77%)
Mutual labels:  elasticsearch-plugin
elasticsearch-analysis-synonym
NGramSynonymTokenizer for Elasticsearch
Stars: ✭ 25 (-35.9%)
Mutual labels:  elasticsearch-plugin
docker-curator
docker images for elasticsearch curator
Stars: ✭ 23 (-41.03%)
Mutual labels:  elasticsearch-plugin
elasticsearch plugin
Nodeos plugin for archiving blockchain data into Elasticsearch.
Stars: ✭ 57 (+46.15%)
Mutual labels:  elasticsearch-plugin
reactivesearch-api
API Gateway for Elasticsearch with declarative querying and out-of-the-box access controls
Stars: ✭ 146 (+274.36%)
Mutual labels:  elasticsearch-plugin
Elasticsearch
Elasticsearch是一个实时的分布式搜索和分析引擎,
Stars: ✭ 23 (-41.03%)
Mutual labels:  elasticsearch-plugin
Elasticsearch Hq
Monitoring and Management Web Application for ElasticSearch instances and clusters.
Stars: ✭ 4,832 (+12289.74%)
Mutual labels:  elasticsearch-plugin
Emoji Search
😄 Emoji synonyms to build your own emoji-capable search engine (elasticsearch, solr)
Stars: ✭ 184 (+371.79%)
Mutual labels:  elasticsearch-plugin
elasticsearch-approximate-nearest-neighbor
Plugin to integrate approximate nearest neighbor(ANN) search with Elasticsearch
Stars: ✭ 53 (+35.9%)
Mutual labels:  elasticsearch-plugin
Elasticsearch Analysis Dynamic Synonym
elasticsearch同义词热更新插件,支持本地文件更新,http远程文件更新,修复若干bug。
Stars: ✭ 30 (-23.08%)
Mutual labels:  elasticsearch-plugin
Gem
💎 GUI for Data Modeling with Elasticsearch
Stars: ✭ 654 (+1576.92%)
Mutual labels:  elasticsearch-plugin
elasticsearch-keyboard-layout
Elasticsearch plugin for keyboard layout suggestions
Stars: ✭ 21 (-46.15%)
Mutual labels:  elasticsearch-plugin

Important

Thanks the great projects:

  1. lucene
  2. elasticsearch
  3. HanLP

Package com.hankcs.lucene copy from hanlp-lucene-plugin

Issue

Can't use custom dictionary in JDK9. So change targetCompatibility to 1.8.

All published releases had build on JDK9.

Build and Install

Install lib

gradle mvn

Import HanLP data

  1. Download HanLP data.See here HanLP Releases
  2. Modify the data root in config, change the ${data.root} to your own HanLP root data dir

Modify Plugin Security Policy

Modify ${elasticsearchHome}/config/jvm.options add this in the end

-Djava.security.policy=file://${elasticsearchHome}/plugins/analysis-hanlp/plugin-security.policy

Index and Highlight

Support two kind analyzer:

  1. HanLPAnalyzer standard analyzer, alias hanlp
  2. HanLPIndexAnalyzer index analyzer, alias hanlp-index

Test Analyzer

GET /_analyze

{
  "analyzer" : "hanlp-index",
  "text": ["中华人民共和国","地大物博"]
}

Response is:

{
  "tokens": [
    {
      "token": "中华人民共和国",
      "start_offset": 0,
      "end_offset": 7,
      "type": "ns",
      "position": 0
    },
    {
      "token": "中华人民",
      "start_offset": 0,
      "end_offset": 4,
      "type": "nz",
      "position": 1
    },
    {
      "token": "中华",
      "start_offset": 0,
      "end_offset": 2,
      "type": "nz",
      "position": 2
    },
    {
      "token": "华人",
      "start_offset": 1,
      "end_offset": 3,
      "type": "n",
      "position": 3
    },
    {
      "token": "人民共和国",
      "start_offset": 2,
      "end_offset": 7,
      "type": "nz",
      "position": 4
    },
    {
      "token": "人民",
      "start_offset": 2,
      "end_offset": 4,
      "type": "n",
      "position": 5
    },
    {
      "token": "共和国",
      "start_offset": 4,
      "end_offset": 7,
      "type": "n",
      "position": 6
    },
    {
      "token": "共和",
      "start_offset": 4,
      "end_offset": 6,
      "type": "n",
      "position": 7
    },
    {
      "token": "地大物博",
      "start_offset": 8,
      "end_offset": 12,
      "type": "nz",
      "position": 8
    },
    {
      "token": "地大",
      "start_offset": 8,
      "end_offset": 10,
      "type": "nz",
      "position": 9
    }
  ]
}

Mapping

PUT test/_mapping/test

{
  "properties": {
    "content": {
      "type": "text",
      "analyzer": "hanlp-index",
      "search_analyzer": "hanlp-index",
      "index_options": "offsets"
    }
  }
}

Index Document

PUT /test/test/1

{
  "content": ["中华人民共和国","地大物博"]
}

Highlight

POST /test/test/_search

{
  "query": {
    "match": {
      "content": "中华"
    }
  },
  "highlight": {
    "pre_tags": [
      "<tag1>"
    ],
    "post_tags": [
      "</tag1>"
    ],
    "fields": {
      "content": {}
    }
  }
}

Response is:

{
  "took": 384,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "test",
        "_type": "test",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "content": [
            "中华人民共和国",
            "地大物博"
          ]
        },
        "highlight": {
          "content": [
            "<tag1>中华</tag1>人民共和国"
          ]
        }
      }
    ]
  }
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].