All Projects → medcl → Elasticsearch Analysis Stconvert

medcl / Elasticsearch Analysis Stconvert

Licence: apache-2.0
STConvert is analyzer that convert chinese characters between traditional and simplified.中文简繁體互相转换.

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Elasticsearch Analysis Stconvert

Elasticsearch Analysis Pinyin
This Pinyin Analysis plugin is used to do conversion between Chinese characters and Pinyin.
Stars: ✭ 2,215 (+796.76%)
Mutual labels:  elasticsearch, analyzer
Elasticsearch Analysis Ik
The IK Analysis plugin integrates Lucene IK analyzer into elasticsearch, support customized dictionary.
Stars: ✭ 13,078 (+5194.74%)
Mutual labels:  elasticsearch, analyzer
Elasticsearch Analysis Openkoreantext
Korean analysis plugin that integrates open-korean-text module into elasticsearch.
Stars: ✭ 101 (-59.11%)
Mutual labels:  elasticsearch, analyzer
Emoji Search
😄 Emoji synonyms to build your own emoji-capable search engine (elasticsearch, solr)
Stars: ✭ 184 (-25.51%)
Mutual labels:  elasticsearch, analyzer
Relevant Search Book
Code and Examples for Relevant Search
Stars: ✭ 231 (-6.48%)
Mutual labels:  elasticsearch
Userline
Query and report user logons relations from MS Windows Security Events
Stars: ✭ 221 (-10.53%)
Mutual labels:  elasticsearch
Scrutineer
Compares a source of truth sorted stream with another to find mismatches. Designed for verifying indexes such as ElasticSearch & Solr are synchronized with their source of data (usually a DB)
Stars: ✭ 218 (-11.74%)
Mutual labels:  elasticsearch
Winston Elasticsearch
An elasticsearch transport for winston
Stars: ✭ 217 (-12.15%)
Mutual labels:  elasticsearch
Sist2
Lightning-fast file system indexer and search tool
Stars: ✭ 245 (-0.81%)
Mutual labels:  elasticsearch
Neo4j To Elasticsearch
GraphAware Framework Module for Integrating Neo4j with Elasticsearch
Stars: ✭ 241 (-2.43%)
Mutual labels:  elasticsearch
Syncclient
syncClient,数据实时同步中间件(同步mysql到kafka、redis、elasticsearch、httpmq)!
Stars: ✭ 227 (-8.1%)
Mutual labels:  elasticsearch
Docker Elasticsearch Kubernetes
Ready to use Elasticsearch + Kubernetes discovery plug-in Docker image.
Stars: ✭ 227 (-8.1%)
Mutual labels:  elasticsearch
Eland
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (-4.86%)
Mutual labels:  elasticsearch
Marija
Data exploration and visualisation for Elasticsearch and Splunk.
Stars: ✭ 220 (-10.93%)
Mutual labels:  elasticsearch
Typo3 Docker Boilerplate
🍲 TYPO3 Docker Boilerplate project (NGINX, Apache HTTPd, PHP-FPM, MySQL, Solr, Elasticsearch, Redis, FTP)
Stars: ✭ 240 (-2.83%)
Mutual labels:  elasticsearch
Webpackmonitor
A tool for monitoring webpack optimization metrics through the development process
Stars: ✭ 2,432 (+884.62%)
Mutual labels:  analyzer
Springboot Learning Example
spring boot 实践学习案例,是 spring boot 初学者及核心技术巩固的最佳实践。
Stars: ✭ 14,640 (+5827.13%)
Mutual labels:  elasticsearch
Awstats
AWStats Log Analyzer project (official sources)
Stars: ✭ 238 (-3.64%)
Mutual labels:  analyzer
Archivy
Archivy is a self-hosted knowledge repository that allows you to safely preserve useful content that contributes to your own personal, searchable and extendable wiki.
Stars: ✭ 2,746 (+1011.74%)
Mutual labels:  elasticsearch
Webporter
基于 webmagic 的 Java 爬虫应用
Stars: ✭ 2,598 (+951.82%)
Mutual labels:  elasticsearch

STConvert Analysis for Elasticsearch

STConvert is analyzer that convert Chinese characters between Traditional and Simplified. [中文简繁體转换][简体到繁体][繁体到简体][简繁查询Expand]

You can download the pre-build package from release page

The plugin includes analyzer: stconvert, tokenizer: stconvert, token-filter: stconvert, and char-filter: stconvert

Supported config:

  • convert_type: default s2t ,optional option:

    1. s2t ,convert characters from Simple Chinese to Traditional Chinese
    2. t2s ,convert characters from Traditional Chinese to Simple Chinese
  • keep_both:default false ,

  • delimiter:default ,

Custom example:

PUT /stconvert/
{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "tsconvert" : {
                    "tokenizer" : "tsconvert"
                    }
            },
            "tokenizer" : {
                "tsconvert" : {
                    "type" : "stconvert",
                    "delimiter" : "#",
                    "keep_both" : false,
                    "convert_type" : "t2s"
                }
            },   
             "filter": {
               "tsconvert" : {
                     "type" : "stconvert",
                     "delimiter" : "#",
                     "keep_both" : false,
                     "convert_type" : "t2s"
                 }
             },
            "char_filter" : {
                "tsconvert" : {
                    "type" : "stconvert",
                    "convert_type" : "t2s"
                }
            }
        }
    }
}

Analyze tests

GET stconvert/_analyze
{
  "tokenizer" : "keyword",
  "filter" : ["lowercase"],
  "char_filter" : ["tsconvert"],
  "text" : "国际國際"
}

Output:
{
  "tokens": [
    {
      "token": "国际国际",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 0
    }
  ]
}

Normalizer usage

DELETE index
PUT index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "tsconvert": {
          "type": "stconvert",
          "convert_type": "t2s"
        }
      },
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": [
            "tsconvert"
          ],
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "foo": {
        "type": "keyword",
        "normalizer": "my_normalizer"
      }
    }
  }
}

PUT index/_doc/1
{
  "foo": "國際"
}

PUT index/_doc/2
{
  "foo": "国际"
}

GET index/_search
{
  "query": {
    "term": {
      "foo": "国际"
    }
  }
}

GET index/_search
{
  "query": {
    "term": {
      "foo": "國際"
    }
  }
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].