All Projects → jolicode → Emoji Search

jolicode / Emoji Search

Licence: other
😄 Emoji synonyms to build your own emoji-capable search engine (elasticsearch, solr)

Projects that are alternatives of or similar to Emoji Search

Elasticsearch Analysis Openkoreantext
Korean analysis plugin that integrates open-korean-text module into elasticsearch.
Stars: ✭ 101 (-45.11%)
Mutual labels:  elasticsearch, analyzer
Elasticsearch Thulac Plugin
thulac analysis plugin for elasticsearch
Stars: ✭ 129 (-29.89%)
Mutual labels:  elasticsearch, plugin
Elasticsearch Reindexing
Elasticsearch plugin for reindexing
Stars: ✭ 106 (-42.39%)
Mutual labels:  elasticsearch, elasticsearch-plugin
Sentinl
Kibana Alert & Report App for Elasticsearch
Stars: ✭ 1,233 (+570.11%)
Mutual labels:  elasticsearch, plugin
Esparser
PHP write SQL to convert DSL to query Elasticsearch
Stars: ✭ 142 (-22.83%)
Mutual labels:  elasticsearch, elasticsearch-plugin
Syliuselasticsearchplugin
Elasticsearch integration for Sylius apps.
Stars: ✭ 88 (-52.17%)
Mutual labels:  elasticsearch, elasticsearch-plugin
Performance Analyzer
📈 OpenDistro for Elasticsearch Performance Analyzer
Stars: ✭ 128 (-30.43%)
Mutual labels:  elasticsearch, elasticsearch-plugin
Elasticsearch Learning To Rank
Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch
Stars: ✭ 1,147 (+523.37%)
Mutual labels:  elasticsearch, elasticsearch-plugin
Mirage
🎨 GUI for simplifying Elasticsearch Query DSL
Stars: ✭ 2,143 (+1064.67%)
Mutual labels:  elasticsearch, elasticsearch-plugin
Elastiknn
Elasticsearch plugin for nearest neighbor search. Store vectors and run similarity search using exact and approximate algorithms.
Stars: ✭ 139 (-24.46%)
Mutual labels:  elasticsearch, elasticsearch-plugin
Kibananestedsupportplugin
A plugin for Kibana 5.5 and beyond that adds support for nested field search and aggregation.
Stars: ✭ 78 (-57.61%)
Mutual labels:  elasticsearch, plugin
Elasticsearch Analysis Ik
The IK Analysis plugin integrates Lucene IK analyzer into elasticsearch, support customized dictionary.
Stars: ✭ 13,078 (+7007.61%)
Mutual labels:  elasticsearch, analyzer
Elasticsearch Analysis Hanlp
HanLP Analysis for Elasticsearch
Stars: ✭ 77 (-58.15%)
Mutual labels:  elasticsearch, elasticsearch-plugin
Zentity
Entity resolution for Elasticsearch.
Stars: ✭ 97 (-47.28%)
Mutual labels:  elasticsearch, elasticsearch-plugin
Jmeter Elasticsearch Backend Listener
JMeter plugin that lets you send sample results to an ElasticSearch engine to enable live monitoring of load tests.
Stars: ✭ 72 (-60.87%)
Mutual labels:  elasticsearch, plugin
Elasticsearch Analysis Kuromoji Ipadic Neologd
Elasticsearch's Analyzer for Kuromoji with Neologd
Stars: ✭ 109 (-40.76%)
Mutual labels:  elasticsearch, elasticsearch-plugin
Elasticsearch Ukrainian Lemmatizer
Ukrainian lemmatizer plugin for ElasticSearch
Stars: ✭ 44 (-76.09%)
Mutual labels:  elasticsearch, plugin
Emojitaco
Alfred Emoji Script with Taco and other unicode 9 emoji
Stars: ✭ 51 (-72.28%)
Mutual labels:  plugin, emoji
Elasticsearch Dataformat
Excel/CSV/BulkJSON downloads on Elasticsearch.
Stars: ✭ 135 (-26.63%)
Mutual labels:  elasticsearch, elasticsearch-plugin
Graph Aided Search
Elasticsearch plugin offering Neo4j integration for Personalized Search
Stars: ✭ 153 (-16.85%)
Mutual labels:  elasticsearch, elasticsearch-plugin

Emoji, flags and emoticons support for Elasticsearch

Add support for emoji and flags in any Lucene compatible search engine!

If you wish to search 🍩 to find donuts in your documents, you came to the right place. This project offer synonym files ready for usage in Elasticsearch analyzer.

Test all synonym files on a real Elasticsearch

Requirements to index emoji in Elasticsearch

Version Requirements
Elasticsearch >= 6.7 The standard tokenizer now understand Emoji 🎉 thanks to Lucene 7.7.0 - no plugin needed !
Elasticsearch >= 6.4 and < 6.7 You need to install the official ICU Plugin. See our blog post about this change.
Elasticsearch < 6.4 You need our custom ICU Tokenizer Plugin, see our blog post (2016).

Run the following test to verify that you get 4 EMOJI tokens:

GET _analyze
{
  "text": ["🍩 🇫🇷 👩‍🚒 🚣🏾‍♀"]
}

The Synonyms, flags and emoticons

What you need to search with emoji is a way to expand them to words that can match searches and documents, in your language. That's the goal of the synonym dictionaries.

We build Solr / Lucene compatible synonyms files in all languages supported by Unicode CLDR so you can set them up in an analyzer. It looks like this:

👩‍🚒 => 👩‍🚒, firefighter, firetruck, woman
👩‍✈ => 👩‍✈, pilot, plane, woman
🥓 => 🥓, bacon, meat, food
🥔 => 🥔, potato, vegetable, food
😅 => 😅, cold, face, open, smile, sweat
😆 => 😆, face, laugh, mouth, open, satisfied, smile
🚎 => 🚎, bus, tram, trolley
🇫🇷 => 🇫🇷, france
🇬🇧 => 🇬🇧, united kingdom

For emoticons, use this mapping with a char_filter to replace emoticons by emoji.

Installation

Download the emoji and emoticon file you want from this repository and store them in PATH_ES/config/analysis (or anywhere Elasticsearch can read).

config
├── analysis
│   ├── cldr-emoji-annotation-synonyms-en.txt
│   └── emoticons.txt
├── elasticsearch.yml
...

Use them like this (this is a complete english example with Elasticsearch >= 6.7):

PUT /tweets
{
  "settings": {
    "analysis": {
      "filter": {
        "english_emoji": {
          "type": "synonym",
          "synonyms_path": "analysis/cldr-emoji-annotation-synonyms-en.txt" 
        },
        "emoji_variation_selector_filter": {
          "type": "pattern_replace",
          "pattern": "\\uFE0E|\\uFE0F",
          "replace": ""
        },
        "english_stop": {
          "type":       "stop",
          "stopwords":  "_english_"
        },
        "english_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["example"]
        },
        "english_stemmer": {
          "type":       "stemmer",
          "language":   "english"
        },
        "english_possessive_stemmer": {
          "type":       "stemmer",
          "language":   "possessive_english"
        }
      },
      "analyzer": {
        "english_with_emoji": {
          "tokenizer": "standard",
          "filter": [
            "english_possessive_stemmer",
            "lowercase",
            "emoji_variation_selector_filter",
            "english_emoji",
            "english_stop",
            "english_keywords",
            "english_stemmer"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "english_with_emoji"
      }
    }
  }
}

You can now test the result with:

GET tweets/_analyze
{
  "field": "content",
  "text": "🍩 🇫🇷 👩‍🚒 🚣🏾‍♀"
}

How to contribute

Build from CLDR SVN

You will need:

  • php cli
  • php zip and curl extensions

Edit the tag in tools/build-released.php and run php tools/build-released.php.

Update emoticons

Run php tools/build-emoticon.php.

Licenses

Emoji data courtesy of CLDR. See unicode-license.txt for details. Some modifications are done on the data, see here. Emoticon data based on https://github.com/wooorm/emoticon/ (MIT).

This repository in distributed under MIT License. Feel free to use and contribute as you please!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].