All Projects → sing1ee → Elasticsearch Jieba Plugin

sing1ee / Elasticsearch Jieba Plugin

Licence: mit
jieba analysis plugin for elasticsearch 7.0.0, 6.4.0, 6.0.0, 5.4.0,5.3.0, 5.2.2, 5.2.1, 5.2, 5.1.2, 5.1.1

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Elasticsearch Jieba Plugin

Api.rss
RSS as RESTful. This service allows you to transform RSS feed into an awesome API.
Stars: ✭ 340 (-10.29%)
Mutual labels:  elasticsearch
Serverless Photo Recognition
A collection of 3 lambda functions that are invoked by Amazon S3 or Amazon API Gateway to analyze uploaded images with Amazon Rekognition and save picture labels to ElasticSearch (written in Kotlin)
Stars: ✭ 345 (-8.97%)
Mutual labels:  elasticsearch
Elastic Builder
A Node.js implementation of the elasticsearch Query DSL 👷
Stars: ✭ 367 (-3.17%)
Mutual labels:  elasticsearch
Springboot Learning
基于Gradle构建,使用SpringBoot在各个场景的应用,包括集成消息中间件、前后端分离、数据库、缓存、分布式锁、分布式事务等
Stars: ✭ 340 (-10.29%)
Mutual labels:  elasticsearch
Elk Docker
Docker configuration for ELK monitoring stack with Curator and Beats data shippers support
Stars: ✭ 342 (-9.76%)
Mutual labels:  elasticsearch
Rent House
租房系统,前后端分离,高仿自如。 结合elasticsearch与百度api实现地图找房,距离找房等实用功能. 后端java技术栈,前端采用 react + typescript
Stars: ✭ 351 (-7.39%)
Mutual labels:  elasticsearch
Candy Api
GetCandy E-Commerce API
Stars: ✭ 339 (-10.55%)
Mutual labels:  elasticsearch
Bottleneckosmosis
瓶颈渗透,web渗透,red红队,fuzz param,注释,js字典,ctf
Stars: ✭ 368 (-2.9%)
Mutual labels:  dict
Praeco
Elasticsearch alerting made simple.
Stars: ✭ 342 (-9.76%)
Mutual labels:  elasticsearch
Flare
An analytical framework for network traffic and behavioral analytics
Stars: ✭ 363 (-4.22%)
Mutual labels:  elasticsearch
Elasticsearch Java
Elasticsearch Java API 手册
Stars: ✭ 341 (-10.03%)
Mutual labels:  elasticsearch
Sigma
Generic Signature Format for SIEM Systems
Stars: ✭ 4,418 (+1065.7%)
Mutual labels:  elasticsearch
Awesome Monitoring
INFRASTRUCTURE、OPERATION SYSTEM and APPLICATION monitoring tools for Operations.
Stars: ✭ 356 (-6.07%)
Mutual labels:  elasticsearch
Kbn network
Network Plugin for Kibana
Stars: ✭ 339 (-10.55%)
Mutual labels:  elasticsearch
Elasticsearch
The missing elasticsearch ORM for Laravel, Lumen and Native php applications
Stars: ✭ 375 (-1.06%)
Mutual labels:  elasticsearch
Pix Dict Api
API do DICT - Diretório de Identificadores de Contas Transacionais
Stars: ✭ 340 (-10.29%)
Mutual labels:  dict
Xapiand
Xapiand: A RESTful Search Engine
Stars: ✭ 347 (-8.44%)
Mutual labels:  elasticsearch
Elasticsearchdsl
Query DSL library for Elasticsearch
Stars: ✭ 373 (-1.58%)
Mutual labels:  elasticsearch
Abc
Power of appbase.io via CLI, with nifty imports from your favorite data sources
Stars: ✭ 375 (-1.06%)
Mutual labels:  elasticsearch
Journalbeat
Journalbeat is a log shipper from systemd/journald to Logstash/Elasticsearch
Stars: ✭ 362 (-4.49%)
Mutual labels:  elasticsearch

elasticsearch-jieba-plugin

jieba analysis plugin for elasticsearch: 7.7.0, 7.4.2, 7.3.0, 7.0.0, 6.4.0, 6.0.0, 5.4.0, 5.3.0, 5.2.2, 5.2.1, 5.2.0, 5.1.2, 5.1.1

特点

  • 支持动态添加字典,不重启ES。

简单的修改,即可适配不同版本的ES

戳这里

支持动态添加字典,ES不需要重启

戳这里

有关jieba_index和jieba_search的应用

戳这里

新分词支持

如果是ES6.4.0的版本,请使用6.4.0分支最新的代码,或者master分支最新代码,也可以下载6.4.1的release,强烈推荐升级!

6.4.1的release,解决了PositionIncrement问题。详细说明见ES分词PositionIncrement解析

版本对应

分支 tag elasticsearch版本 Release Link
7.7.0 tag v7.7.1 v7.7.0 Download: v7.7.0
7.4.2 tag v7.4.2 v7.4.2 Download: v7.4.2
7.3.0 tag v7.3.0 v7.3.0 Download: v7.3.0
7.0.0 tag v7.0.0 v7.0.0 Download: v7.0.0
6.4.0 tag v6.4.1 v6.4.0 Download: v6.4.1
6.4.0 tag v6.4.0 v6.4.0 Download: v6.4.0
6.0.0 tag v6.0.0 v6.0.0 Download: v6.0.1
5.4.0 tag v5.4.0 v5.4.0 Download: v5.4.0
5.3.0 tag v5.3.0 v5.3.0 Download: v5.3.0
5.2.2 tag v5.2.2 v5.2.2 Download: v5.2.2
5.2.1 tag v5.2.1 v5.2.1 Download: v5.2.1
5.2 tag v5.2.0 v5.2.0 Download: v5.2.0
5.1.2 tag v5.1.2 v5.1.2 Download: v5.1.2
5.1.1 tag v5.1.1 v5.1.1 Download: v5.1.1

more details

  • choose right version source code.
  • run
git clone https://github.com/sing1ee/elasticsearch-jieba-plugin.git --recursive
./gradlew clean pz
  • copy the zip file to plugin directory
cp build/distributions/elasticsearch-jieba-plugin-5.1.2.zip ${path.home}/plugins
  • unzip and rm zip file
unzip elasticsearch-jieba-plugin-5.1.2.zip
rm elasticsearch-jieba-plugin-5.1.2.zip
  • start elasticsearch
./bin/elasticsearch

Custom User Dict

Just put you dict file with suffix .dict into ${path.home}/plugins/jieba/dic. Your dict file should like this:

小清新 3
百搭 3
显瘦 3
隨身碟 100
your_word word_freq

Using stopwords

  • find stopwords.txt in ${path.home}/plugins/jieba/dic.
  • create folder named stopwords under ${path.home}/config
mkdir -p {path.home}/config/stopwords
  • copy stopwords.txt into the folder just created
cp ${path.home}/plugins/jieba/dic/stopwords.txt {path.home}/config/stopwords
  • create index:
PUT http://localhost:9200/jieba_index
{
  "settings": {
    "analysis": {
      "filter": {
        "jieba_stop": {
          "type":        "stop",
          "stopwords_path": "stopwords/stopwords.txt"
        },
        "jieba_synonym": {
          "type":        "synonym",
          "synonyms_path": "synonyms/synonyms.txt"
        }
      },
      "analyzer": {
        "my_ana": {
          "tokenizer": "jieba_index",
          "filter": [
            "lowercase",
            "jieba_stop",
            "jieba_synonym"
          ]
        }
      }
    }
  }
}
  • test analyzer:
PUT http://localhost:9200/jieba_index/_analyze
{
  "analyzer" : "my_ana",
  "text" : "黄河之水天上来"
}

Response as follow:

{
    "tokens": [
        {
            "token": "黄河",
            "start_offset": 0,
            "end_offset": 2,
            "type": "word",
            "position": 0
        },
        {
            "token": "黄河之水天上来",
            "start_offset": 0,
            "end_offset": 7,
            "type": "word",
            "position": 0
        },
        {
            "token": "之水",
            "start_offset": 2,
            "end_offset": 4,
            "type": "word",
            "position": 1
        },
        {
            "token": "天上",
            "start_offset": 4,
            "end_offset": 6,
            "type": "word",
            "position": 2
        },
        {
            "token": "上来",
            "start_offset": 5,
            "end_offset": 7,
            "type": "word",
            "position": 2
        }
    ]
}

NOTE

migrate from jieba-solr

Roadmap

I will add more analyzer support:

  • stanford chinese analyzer
  • fudan nlp analyzer
  • ...

If you have some ideas, you should create an issue. Then, we will do it together.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].