All Projects → microbun → Elasticsearch Thulac Plugin

microbun / Elasticsearch Thulac Plugin

thulac analysis plugin for elasticsearch

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Elasticsearch Thulac Plugin

Kafka Connect Elasticsearch Source
Kafka Connect Elasticsearch Source
Stars: ✭ 22 (-82.95%)
Mutual labels:  elasticsearch, plugin
Emoji Search
😄 Emoji synonyms to build your own emoji-capable search engine (elasticsearch, solr)
Stars: ✭ 184 (+42.64%)
Mutual labels:  elasticsearch, plugin
Elasticsearch Prometheus Exporter
Prometheus exporter plugin for Elasticsearch
Stars: ✭ 409 (+217.05%)
Mutual labels:  elasticsearch, plugin
3d kibana charts vis
3D Kibana Charts: Pie Chart, Bars Chart, Bubbles Chart
Stars: ✭ 34 (-73.64%)
Mutual labels:  elasticsearch, plugin
Elasticsearch Ukrainian Lemmatizer
Ukrainian lemmatizer plugin for ElasticSearch
Stars: ✭ 44 (-65.89%)
Mutual labels:  elasticsearch, plugin
Kibananestedsupportplugin
A plugin for Kibana 5.5 and beyond that adds support for nested field search and aggregation.
Stars: ✭ 78 (-39.53%)
Mutual labels:  elasticsearch, plugin
Jmeter Elasticsearch Backend Listener
JMeter plugin that lets you send sample results to an ElasticSearch engine to enable live monitoring of load tests.
Stars: ✭ 72 (-44.19%)
Mutual labels:  elasticsearch, plugin
Sentinl
Kibana Alert & Report App for Elasticsearch
Stars: ✭ 1,233 (+855.81%)
Mutual labels:  elasticsearch, plugin
Glean
hotfix for go applications via plugin, supports Linux and MacOS
Stars: ✭ 125 (-3.1%)
Mutual labels:  plugin
Dnspy.extension.holly
A dnSpy extension to aid reversing of obfuscated assemblies
Stars: ✭ 127 (-1.55%)
Mutual labels:  plugin
Elasticsearch Doc Zh
📖 [译] elasticsearch 中文文档
Stars: ✭ 124 (-3.88%)
Mutual labels:  elasticsearch
Honkit
📖 HonKit is building beautiful books using Markdown - Fork of GitBook
Stars: ✭ 1,901 (+1373.64%)
Mutual labels:  plugin
Dialogue.moe
Stars: ✭ 127 (-1.55%)
Mutual labels:  elasticsearch
Sounds Webpack Plugin
🔊Notify or errors, warnings, etc with sounds
Stars: ✭ 125 (-3.1%)
Mutual labels:  plugin
Vue Facebook Signin Button
A simple plugin to include a custom Facebook sign-in button into your web app.
Stars: ✭ 127 (-1.55%)
Mutual labels:  plugin
Fastapi login
FastAPI-Login tries to provide similar functionality as Flask-Login does.
Stars: ✭ 123 (-4.65%)
Mutual labels:  plugin
Http Request Plugin
This plugin does a request to an url with some parameters.
Stars: ✭ 124 (-3.88%)
Mutual labels:  plugin
Openuba
A robust, and flexible open source User & Entity Behavior Analytics (UEBA) framework used for Security Analytics. Developed with luv by Data Scientists & Security Analysts from the Cyber Security Industry. [PRE-ALPHA]
Stars: ✭ 127 (-1.55%)
Mutual labels:  elasticsearch
Aioelasticsearch
aioelasticsearch-py wrapper for asyncio
Stars: ✭ 127 (-1.55%)
Mutual labels:  elasticsearch
Docker Bro
Bro IDS Dockerfile
Stars: ✭ 126 (-2.33%)
Mutual labels:  elasticsearch

THULAC Analysis for Elasticsearch

采用THULAC实现的Elasticsearch中文分词插件。

版本

Plugin 版本 ES 版本 THULAC 版本 Link
master 7.x -> master lite
7.9.1 7.9.1 lite 下载
6.4.1-181027 6.4.1 lite 下载
6.4.0-181027 6.4.0 lite 下载
6.3.0-181027 6.3.0 lite 下载
6.2.0-181027 6.2.0 lite 下载
6.1.0-181027 6.1.0 lite 下载

下载安装

直接下载已经打包好的插件,解压到elasticsearch的plugins目录下即可。

编译安装

1.编译打包

git clone [email protected]:microbun/elasticsearch-thulac-plugin.git
cd elasticsearch-thulac-plugin
./gradlew release

2.安装到elasticsearch

cp build/distributions/elasticsearch-thulac-plugin-7.9.1.zip ${ES_HOME}/plugins
cd ${ES_HOME}/plugins
unzip elasticsearch-thulac-plugin-7.9.1.zip
rm elasticsearch-thulac-plugin-7.9.1.zip

解压后在plugins目录下会有一个thulac文件夹。

thulac
 |-elasticsearch-thulac-plugin-7.9.1.jar
 |-models #算法模型目录
 |-plugin-descriptor.properties
 |-plugin.xml

3.由于THULAC的模型太大,插件中没有包含模型数据,可以在THULAC 下载模型(lite),将模型拷贝到models中。

示例

1.创建索引

1.1 使用默认分词方式

curl -H "Content-Type:application/json" -XPUT http://localhost:9200/index -d'
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer": "thulac"
      }
    }
  }
}
'

1.2 自定义分词器

curl -H "Content-Type:application/json" -XPUT http://localhost:9200/index -d'
{
  "settings": {
    "analysis": {
      "tokenizer": {
        "custom_thulac_tokenizer": {
          "type": "thulac",
          "user_dict": "userdict.txt",
          "t2s": true,
          "filter": false
        }
      },
      "analyzer": {
        "custom_thulac_analyzer": {
          "tokenizer": "custom_thulac_tokenizer",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer": "custom_thulac_analyzer"
      }
    }
  }
}'
参数名称 含义
t2s 将句子从繁体转化为简体。默认:true false/true
filter 使用过滤器去除一些没有意义的词语,例如“可以”。默认:false false/true
user_dict 自定义词典路径,每一个词一行,UTF8编码,相对路径和绝对路径.相对路径:userdict.txt 会加载 ${ES_HOME}/plugins/module/userdict.txt文件绝对路径:/home/elasticsearch/userdict.txt默认:userdict.txt

2.查看索引

curl http://localhost:9200/index

3.测试分词效果

curl -H "Content-Type:application/json"  -XPOST http://localhost:9200/index/_analyze -d'
{
 "analyzer":"thulac", 
 "text":"我是中国人"
}
'

4.删除索引

curl -XDELETE http://localhost:9200/index
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].