All Projects → o19s → Elyzer

o19s / Elyzer

Licence: apache-2.0
"Stop worrying about Elasticsearch analyzers", my therapist says

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Elyzer

Transformalize
Configurable Extract, Transform, and Load
Stars: ✭ 125 (-6.72%)
Mutual labels:  elasticsearch
Elasticsearch Thulac Plugin
thulac analysis plugin for elasticsearch
Stars: ✭ 129 (-3.73%)
Mutual labels:  elasticsearch
Vagrant Elastic Stack
Giving the Elastic Stack a try in Vagrant
Stars: ✭ 131 (-2.24%)
Mutual labels:  elasticsearch
Docker Bro
Bro IDS Dockerfile
Stars: ✭ 126 (-5.97%)
Mutual labels:  elasticsearch
Openuba
A robust, and flexible open source User & Entity Behavior Analytics (UEBA) framework used for Security Analytics. Developed with luv by Data Scientists & Security Analysts from the Cyber Security Industry. [PRE-ALPHA]
Stars: ✭ 127 (-5.22%)
Mutual labels:  elasticsearch
Spring Boot Quick
🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如:rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、spring-batch、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等📌
Stars: ✭ 1,819 (+1257.46%)
Mutual labels:  elasticsearch
Helm Elasticsearch
An Elasticsearch cluster on top of Kubernetes, made easier, with Helm.
Stars: ✭ 124 (-7.46%)
Mutual labels:  elasticsearch
Elasticsearch tutorial
An action-packed, example-based ElasticSearch tutorial
Stars: ✭ 133 (-0.75%)
Mutual labels:  elasticsearch
Performance Analyzer
📈 OpenDistro for Elasticsearch Performance Analyzer
Stars: ✭ 128 (-4.48%)
Mutual labels:  elasticsearch
Elogrus
Logrus Hook for ElasticSearch
Stars: ✭ 130 (-2.99%)
Mutual labels:  elasticsearch
Elasticsearch Sql
parse sql into elasticsearch dsl with antlr4
Stars: ✭ 127 (-5.22%)
Mutual labels:  elasticsearch
Aioelasticsearch
aioelasticsearch-py wrapper for asyncio
Stars: ✭ 127 (-5.22%)
Mutual labels:  elasticsearch
Rrinlog
Replacing Elasticsearch with Rust and SQLite
Stars: ✭ 129 (-3.73%)
Mutual labels:  elasticsearch
Elasticstack
📇 Configurable indexing and other extras for Haystack (with ElasticSearch biases)
Stars: ✭ 125 (-6.72%)
Mutual labels:  elasticsearch
Mongolastic
🚥 A dataset migration tool from MongoDB to Elasticsearch and vice versa.
Stars: ✭ 131 (-2.24%)
Mutual labels:  elasticsearch
Elasticsearch Doc Zh
📖 [译] elasticsearch 中文文档
Stars: ✭ 124 (-7.46%)
Mutual labels:  elasticsearch
Griffon Vm
Griffon Data Science Virtual Machine
Stars: ✭ 128 (-4.48%)
Mutual labels:  elasticsearch
Echo
🦄 开源社区系统:基于 SpringBoot + MyBatis + MySQL + Redis + Kafka + Elasticsearch + Spring Security + ... 并提供详细的开发文档和配套教程。包含帖子、评论、私信、系统通知、点赞、关注、搜索、用户设置、数据统计等模块。
Stars: ✭ 129 (-3.73%)
Mutual labels:  elasticsearch
Samsara
Samsara is a real-time analytics platform
Stars: ✭ 132 (-1.49%)
Mutual labels:  elasticsearch
Vscode Elastic
Elasticsearch for VSCode
Stars: ✭ 130 (-2.99%)
Mutual labels:  elasticsearch

Elyzer

See step-by-step how Elasticsearch custom analyzers decompose your text into tokens.

My therapist said this would be a good idea...

Doug:  hi
Eliza: How are you today.. What would you like to discuss?
Doug:  well I can't really see how my Elasticsearch analyzers work, on the inside
Eliza: How do you know you can't really see how your Elasticsearch analyzers work, on the inside?
Doug:  cause I call _analyze, and it does the whole analyzer or just one step
Eliza  What other reasons might there be?
Doug:  I guess nobody's gotten around to making a good debugger
Eliza  I'm not sure I understand you fully.
Doug   well just keep reading this README then...
Eliza  Come, come, elucidate your thoughts.

Installation

Install through pip3

(ES 2.x & 5.x)

pip3 install elyzer

Usage

Give Elyzer some text and have its analyzed. See the process broken down step-by-step on the command line.

Assuming I'd created an english_bigrams custom analyzer for the index tmdb, I would run the analyzer like so

$ elyzer --es "http://localhost:9200" --index tmdb --analyzer english_bigrams "Mary had a little lamb"
TOKENIZER: standard
{1:Mary}    {2:had} {3:a}   {4:little}  {5:lamb}    
TOKEN_FILTER: standard
{1:Mary}    {2:had} {3:a}   {4:little}  {5:lamb}    
TOKEN_FILTER: lowercase
{1:mary}    {2:had} {3:a}   {4:little}  {5:lamb}    
TOKEN_FILTER: porter_stem
{1:mari}    {2:had} {3:a}   {4:littl}   {5:lamb}    
TOKEN_FILTER: bigram_filter
{1:mari had}    {2:had a}   {3:a littl} {4:littl lamb}  

Output is each token, prefixed by the numerical position attribute in the token stream at each step.

Args

There are four required command line args:

  • es: the elasticsearch host (ie http://localhost:9200)
  • index: name of the index where your custom analyzer can be found
  • analyzer: name of your custom analyzer
  • text: the text to analyze

Shortcomings

aka "Areas for Improvement"

  • Only works for custom analyzers right now (as it accesses the settings for your index)
  • Attributes besides the token text and position would be handy

Who?

Created by OpenSource Connections

License

Released under Apache 2

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].