All Projects → agora-team → Elasticsearch Synonyms

agora-team / Elasticsearch Synonyms

Licence: mit
Curated synonym files and Helpers for Elasticsearch Synonym Token Filter

Programming Languages

shell
77523 projects

Projects that are alternatives of or similar to Elasticsearch Synonyms

Code4java
Repository for my java projects.
Stars: ✭ 164 (+221.57%)
Mutual labels:  solr, elasticsearch
Typo3 Docker Boilerplate
🍲 TYPO3 Docker Boilerplate project (NGINX, Apache HTTPd, PHP-FPM, MySQL, Solr, Elasticsearch, Redis, FTP)
Stars: ✭ 240 (+370.59%)
Mutual labels:  solr, elasticsearch
Query Translator
Query Translator is a search query translator with AST representation
Stars: ✭ 165 (+223.53%)
Mutual labels:  solr, elasticsearch
Ik Analyzer
支持Lucene5/6/7/8+版本, 长期维护。
Stars: ✭ 112 (+119.61%)
Mutual labels:  solr, elasticsearch
Datafari
Open Source, Distributed, Big Data Enterprise Search Engine
Stars: ✭ 47 (-7.84%)
Mutual labels:  solr, elasticsearch
Srchx
A standalone lightweight full-text search engine built on top of blevesearch and Go with multiple storage (scorch, boltdb, leveldb, badger)
Stars: ✭ 118 (+131.37%)
Mutual labels:  solr, elasticsearch
Relevant Search Book
Code and Examples for Relevant Search
Stars: ✭ 231 (+352.94%)
Mutual labels:  solr, elasticsearch
Vectorsinsearch
Dice.com repo to accompany the dice.com 'Vectors in Search' talk by Simon Hughes, from the Activate 2018 search conference, and the 'Searching with Vectors' talk from Haystack 2019 (US). Builds upon my conceptual search and semantic search work from 2015
Stars: ✭ 71 (+39.22%)
Mutual labels:  solr, elasticsearch
Pdf
编程电子书,电子书,编程书籍,包括C,C#,Docker,Elasticsearch,Git,Hadoop,HeadFirst,Java,Javascript,jvm,Kafka,Linux,Maven,MongoDB,MyBatis,MySQL,Netty,Nginx,Python,RabbitMQ,Redis,Scala,Solr,Spark,Spring,SpringBoot,SpringCloud,TCPIP,Tomcat,Zookeeper,人工智能,大数据类,并发编程,数据库类,数据挖掘,新面试题,架构设计,算法系列,计算机类,设计模式,软件测试,重构优化,等更多分类
Stars: ✭ 12,009 (+23447.06%)
Mutual labels:  solr, elasticsearch
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+696.08%)
Mutual labels:  solr, elasticsearch
Spring Boot 2.x Examples
Spring Boot 2.x code examples
Stars: ✭ 104 (+103.92%)
Mutual labels:  solr, elasticsearch
Springbootexamples
Spring Boot 学习教程
Stars: ✭ 794 (+1456.86%)
Mutual labels:  solr, elasticsearch
Springboot Templates
springboot和dubbo、netty的集成,redis mongodb的nosql模板, kafka rocketmq rabbit的MQ模板, solr solrcloud elasticsearch查询引擎
Stars: ✭ 100 (+96.08%)
Mutual labels:  solr, elasticsearch
Transformalize
Configurable Extract, Transform, and Load
Stars: ✭ 125 (+145.1%)
Mutual labels:  solr, elasticsearch
Logisland
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (+90.2%)
Mutual labels:  solr, elasticsearch
Open Semantic Etl
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Stars: ✭ 165 (+223.53%)
Mutual labels:  solr, elasticsearch
Janusgraph
JanusGraph: an open-source, distributed graph database
Stars: ✭ 4,277 (+8286.27%)
Mutual labels:  solr, elasticsearch
Php Docker Boilerplate
🍲 PHP Docker Boilerplate for Symfony, Wordpress, Joomla or any other PHP Project (NGINX, Apache HTTPd, PHP-FPM, MySQL, Solr, Elasticsearch, Redis, FTP)
Stars: ✭ 503 (+886.27%)
Mutual labels:  solr, elasticsearch
Nagios Plugins
450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Stars: ✭ 1,000 (+1860.78%)
Mutual labels:  solr, elasticsearch
Elasticsearch Ukrainian Lemmatizer
Ukrainian lemmatizer plugin for ElasticSearch
Stars: ✭ 44 (-13.73%)
Mutual labels:  elasticsearch

Elasticsearch Synonyms

Build Status PyPI Version

This repository contains a curated dataset of synonyms in Solr Format. These synonyms can be used for Elasticsearch Synonym Token Filter configuration.

Additional helper tools in this repository:

  • synlint: Commandline tool to lint and validate the synonym files.
  • synonyms.sublime-syntax: Syntax highlighting file for Sublime Text 3.

If you're using Elasticssearch with Django, you might find dj-elasticsearch-flex useful.

Why?

Trying to configure Synonyms in Elasticsearch, I found that docs for it are surprisingly scattered. The docs that are available do not do much justice either and miss out many corner cases.

For instance, an incorrect Solr mapping: hello, world, would be happily added in index configuration. However, as soon as you'd try to re-open the index, you'd get a malform_input_exception (discussion thread).

This repository solves such problems by with a linter tool that can be used to validate the synonym files beforehand.

Datasets

The synonym files in data/ can be used directly in elasticsearch configuration.

Following datasets are currently available:

  • be-ae: British English and American English Spellings. From AVKO.org.
  • medical-terms: A Synonym file with several Medical terminologies, abbreviations and resolution.

Installation

If you want to use the synlint tool, install the package from PIP using:

pip install elasticsearch-synonym-toolkit

The Python Package is installed as es_synonyms. This will also install a linter tool, es-synlint. Use it with:

es-synlint [synonymfile]

Usage

In most cases, you'd want to use this module as a helper for loading validated synonyms from a file or a url:

from es_synonyms import load_synonyms

# Load synonym file at some URL:
be_ae_syns = load_synonyms('https://to.noop.pw/2sI9x4s')
# Or, from filesystem:
other_syns = load_synonyms('data/be-ae.synonyms')

Configuring Synonym Tokenfilter with Elasticsearch DSL Py, is very easy, too:

from elasticsearch_dsl import analyzer, token_filter

be_ae_syns = load_synonyms('https://to.noop.pw/2sI9x4s')

# Create a tokenfilter
brit_spelling_tokenfilter = token_filter(
  'my_tokenfilter',     # Any name for the filter
  'synonym',            # Synonym filter type
  synonyms=be_ae_syns   # Synonyms mapping will be inlined
)
# Create analyzer
brit_english_analyzer = analyzer(
  'my_analyzer',
  tokenizer='standard',
  filter=[
    'lowercase',
    brit_spelling_tokenfilter
  ])

To use the underlying linter, you can import SynLint class.

Development

  • Clone this repository.
  • Install package dependencies via pip with: pip install -r requirements.txt.
  • To run tests:
./panda test:all

License

The tools and codes are licensed under MIT. The datasets are used under fair use and are derivative of the original sources.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].