microbun / Elasticsearch Thulac Plugin
thulac analysis plugin for elasticsearch
Stars: ✭ 129
Programming Languages
java
68154 projects - #9 most used programming language
Labels
Projects that are alternatives of or similar to Elasticsearch Thulac Plugin
Kafka Connect Elasticsearch Source
Kafka Connect Elasticsearch Source
Stars: ✭ 22 (-82.95%)
Mutual labels: elasticsearch, plugin
Emoji Search
😄 Emoji synonyms to build your own emoji-capable search engine (elasticsearch, solr)
Stars: ✭ 184 (+42.64%)
Mutual labels: elasticsearch, plugin
Elasticsearch Prometheus Exporter
Prometheus exporter plugin for Elasticsearch
Stars: ✭ 409 (+217.05%)
Mutual labels: elasticsearch, plugin
3d kibana charts vis
3D Kibana Charts: Pie Chart, Bars Chart, Bubbles Chart
Stars: ✭ 34 (-73.64%)
Mutual labels: elasticsearch, plugin
Elasticsearch Ukrainian Lemmatizer
Ukrainian lemmatizer plugin for ElasticSearch
Stars: ✭ 44 (-65.89%)
Mutual labels: elasticsearch, plugin
Kibananestedsupportplugin
A plugin for Kibana 5.5 and beyond that adds support for nested field search and aggregation.
Stars: ✭ 78 (-39.53%)
Mutual labels: elasticsearch, plugin
Jmeter Elasticsearch Backend Listener
JMeter plugin that lets you send sample results to an ElasticSearch engine to enable live monitoring of load tests.
Stars: ✭ 72 (-44.19%)
Mutual labels: elasticsearch, plugin
Sentinl
Kibana Alert & Report App for Elasticsearch
Stars: ✭ 1,233 (+855.81%)
Mutual labels: elasticsearch, plugin
Glean
hotfix for go applications via plugin, supports Linux and MacOS
Stars: ✭ 125 (-3.1%)
Mutual labels: plugin
Dnspy.extension.holly
A dnSpy extension to aid reversing of obfuscated assemblies
Stars: ✭ 127 (-1.55%)
Mutual labels: plugin
Honkit
📖 HonKit is building beautiful books using Markdown - Fork of GitBook
Stars: ✭ 1,901 (+1373.64%)
Mutual labels: plugin
Sounds Webpack Plugin
🔊Notify or errors, warnings, etc with sounds
Stars: ✭ 125 (-3.1%)
Mutual labels: plugin
Vue Facebook Signin Button
A simple plugin to include a custom Facebook sign-in button into your web app.
Stars: ✭ 127 (-1.55%)
Mutual labels: plugin
Fastapi login
FastAPI-Login tries to provide similar functionality as Flask-Login does.
Stars: ✭ 123 (-4.65%)
Mutual labels: plugin
Http Request Plugin
This plugin does a request to an url with some parameters.
Stars: ✭ 124 (-3.88%)
Mutual labels: plugin
Openuba
A robust, and flexible open source User & Entity Behavior Analytics (UEBA) framework used for Security Analytics. Developed with luv by Data Scientists & Security Analysts from the Cyber Security Industry. [PRE-ALPHA]
Stars: ✭ 127 (-1.55%)
Mutual labels: elasticsearch
Aioelasticsearch
aioelasticsearch-py wrapper for asyncio
Stars: ✭ 127 (-1.55%)
Mutual labels: elasticsearch
THULAC Analysis for Elasticsearch
采用THULAC实现的Elasticsearch中文分词插件。
版本
Plugin 版本 | ES 版本 | THULAC 版本 | Link |
---|---|---|---|
master | 7.x -> master | lite | |
7.9.1 | 7.9.1 | lite | 下载 |
6.4.1-181027 | 6.4.1 | lite | 下载 |
6.4.0-181027 | 6.4.0 | lite | 下载 |
6.3.0-181027 | 6.3.0 | lite | 下载 |
6.2.0-181027 | 6.2.0 | lite | 下载 |
6.1.0-181027 | 6.1.0 | lite | 下载 |
下载安装
直接下载已经打包好的插件,解压到elasticsearch的plugins目录下即可。
编译安装
1.编译打包
git clone [email protected]:microbun/elasticsearch-thulac-plugin.git
cd elasticsearch-thulac-plugin
./gradlew release
2.安装到elasticsearch
cp build/distributions/elasticsearch-thulac-plugin-7.9.1.zip ${ES_HOME}/plugins
cd ${ES_HOME}/plugins
unzip elasticsearch-thulac-plugin-7.9.1.zip
rm elasticsearch-thulac-plugin-7.9.1.zip
解压后在plugins目录下会有一个thulac文件夹。
thulac
|-elasticsearch-thulac-plugin-7.9.1.jar
|-models #算法模型目录
|-plugin-descriptor.properties
|-plugin.xml
3.由于THULAC的模型太大,插件中没有包含模型数据,可以在THULAC 下载模型(lite),将模型拷贝到models中。
示例
1.创建索引
1.1 使用默认分词方式
curl -H "Content-Type:application/json" -XPUT http://localhost:9200/index -d'
{
"mappings": {
"properties": {
"text": {
"type": "text",
"analyzer": "thulac"
}
}
}
}
'
1.2 自定义分词器
curl -H "Content-Type:application/json" -XPUT http://localhost:9200/index -d'
{
"settings": {
"analysis": {
"tokenizer": {
"custom_thulac_tokenizer": {
"type": "thulac",
"user_dict": "userdict.txt",
"t2s": true,
"filter": false
}
},
"analyzer": {
"custom_thulac_analyzer": {
"tokenizer": "custom_thulac_tokenizer",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"properties": {
"text": {
"type": "text",
"analyzer": "custom_thulac_analyzer"
}
}
}
}'
参数名称 | 含义 | 值 |
---|---|---|
t2s | 将句子从繁体转化为简体。默认:true | false/true |
filter | 使用过滤器去除一些没有意义的词语,例如“可以”。默认:false | false/true |
user_dict | 自定义词典路径,每一个词一行,UTF8编码,相对路径和绝对路径.相对路径:userdict.txt 会加载 ${ES_HOME}/plugins/module/userdict.txt文件绝对路径:/home/elasticsearch/userdict.txt默认:userdict.txt |
2.查看索引
curl http://localhost:9200/index
3.测试分词效果
curl -H "Content-Type:application/json" -XPOST http://localhost:9200/index/_analyze -d'
{
"analyzer":"thulac",
"text":"我是中国人"
}
'
4.删除索引
curl -XDELETE http://localhost:9200/index
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].