All Projects → ThijsFeryn → Elasticsearch_tutorial

ThijsFeryn / Elasticsearch_tutorial

An action-packed, example-based ElasticSearch tutorial

Projects that are alternatives of or similar to Elasticsearch tutorial

Elassandra
Elassandra = Elasticsearch + Apache Cassandra
Stars: ✭ 1,610 (+1110.53%)
Mutual labels:  search, elasticsearch, nosql
Awesome Elasticsearch
A curated list of the most important and useful resources about elasticsearch: articles, videos, blogs, tips and tricks, use cases. All about Elasticsearch!
Stars: ✭ 4,168 (+3033.83%)
Mutual labels:  search, elasticsearch, nosql
Quicknote
QuckNote allows you to quickly create and search tens of thousands of short notes.
Stars: ✭ 54 (-59.4%)
Mutual labels:  search, elasticsearch
Nodbi
Document DBI connector for R
Stars: ✭ 56 (-57.89%)
Mutual labels:  elasticsearch, nosql
Photon
an open source geocoder for openstreetmap data
Stars: ✭ 1,177 (+784.96%)
Mutual labels:  search, elasticsearch
Flexsearch
Next-Generation full text search library for Browser and Node.js
Stars: ✭ 8,108 (+5996.24%)
Mutual labels:  search, elasticsearch
Rom Elasticsearch
Elasticsearch adapter for rom-rb
Stars: ✭ 30 (-77.44%)
Mutual labels:  search, elasticsearch
Docker Elk Tutorial
docker-elk-tutorial + django + logging
Stars: ✭ 69 (-48.12%)
Mutual labels:  elasticsearch, tutorial
Elasticsuite
Smile ElasticSuite - Magento 2 merchandising and search engine built on ElasticSearch
Stars: ✭ 647 (+386.47%)
Mutual labels:  search, elasticsearch
Search
PHP search-systems made possible
Stars: ✭ 101 (-24.06%)
Mutual labels:  search, elasticsearch
Elasticsearch Hn
Index & Search Hacker News using Elasticsearch and the HN API
Stars: ✭ 92 (-30.83%)
Mutual labels:  elasticsearch, tutorial
Legacy Search
Demo project showing how to add elasticsearch to a legacy application.
Stars: ✭ 103 (-22.56%)
Mutual labels:  elasticsearch, nosql
Moqui Elasticsearch
Moqui Tool Component for ElasticSearch useful for scalable faceted text search, and analytics and reporting using aggregations and other great features
Stars: ✭ 10 (-92.48%)
Mutual labels:  search, elasticsearch
Odsc 2020 nlp
Repository for ODSC talk related to Deep Learning NLP
Stars: ✭ 20 (-84.96%)
Mutual labels:  search, elasticsearch
Elasticpress
A fast and flexible search and query engine for WordPress.
Stars: ✭ 1,037 (+679.7%)
Mutual labels:  search, elasticsearch
Elasticsql
convert sql to elasticsearch DSL in golang(go)
Stars: ✭ 687 (+416.54%)
Mutual labels:  search, elasticsearch
Foselasticabundle
Elasticsearch PHP integration for your Symfony project using Elastica.
Stars: ✭ 1,142 (+758.65%)
Mutual labels:  search, elasticsearch
Reactivesearch
Search UI components for React and Vue: powered by appbase.io / Elasticsearch
Stars: ✭ 4,531 (+3306.77%)
Mutual labels:  search, elasticsearch
Fess
Fess is very powerful and easily deployable Enterprise Search Server.
Stars: ✭ 561 (+321.8%)
Mutual labels:  search, elasticsearch
Search Ui
Search UI. Libraries for the fast development of modern, engaging search experiences.
Stars: ✭ 1,294 (+872.93%)
Mutual labels:  search, elasticsearch

Disclaimer

This tutorial is built for ElasticSearch version 5.2. Version 5 features a bunch of breaking changes in terms of query DSL and mapping.

If you're still running version 2.x, please have a look at the v2 branch of this repository.

ElasticSearch examples

I've lined up a bunch of examples to showcase the features and the sheer power of ElasticSearch. A lot of the information is based on "ElasticSearch, The Definitive Guide".

Installing

Download ElasticSearch & Kibana here, then follow these simple steps:

Exercise 1: the basics

Exercise 1 is very simple and the goal is to get the hang of the ElasticSearch RESTFul interface.

Topics:

  • Navigating to the ElasticSearch landing page
  • Searching all documents
  • Counting documents
  • Adding documents to the index
  • Full document updates
  • Partial document updates
  • Retrieve individual documents
  • Searching all documents for a specific index

Load exercise 1

Exercise 2: load data in bulk

In exercise 2 we will be indexing a lot of data. To improve the performance, we're doing this in bulk.

This data contains information from the Combell blog. I've indexed the following information:

  • Title
  • Author
  • Date
  • Categories
  • Language
  • GUID

This data will be used in the other exercises.

Load the blog data in bulk

Exercise 3: search, getting to know the query DSL

In exercise 3 we're performing some basic queries using the ElasticSearch query DSL. The DSL is JSON-based and the queries are full-text searches.

Here's a couple of searches we're performing:

  • Search for a single term in an index
  • Search for multiple terms in an index
  • Perform searches on multiple terms using the "and" operator
  • Define the minimum number of matches a document should have
  • Define the proximity of terms you're searching

Load exercise 3

Exercise 4: analysis

In exercise 4, we're going to focus on the analysis of full-text and human language. We'll ignore the database capabilities of ElasticSearch and throw some text at it, and see how it tokenizes the data.

Depending on the analyzer you use, ElasticSearch will tokenize and store the data in a different way. Don't worry, the original data will remain in the source of the document, it's the inverted index that changes.

Load exercise 4

Exercise 5: schemaless? Not really.

Exercise 5 is all about the schema of an index. ElasticSearch is marketed as being schemaless. In reality, ElasticSearch will guess the schema for you.

I'll show you examples where it guesses successfully and examples where it doesn't.

Load exercise 5

Exercise 6: mapping

To avoid that ElasticSearch guesses the schema wrong, explicit mapping is a good idea. Exercise 6 will set up the right mapping for our blog example and re-insert the data.

Integers and strings will be defined accordingly and the date will have the right format.

The explicit mapping will be used in exercise 7.

Load exercise 6

Exercise 7: search using explicit mapping

The 2 searches in exercise 5 that failed will now be executed again. Thanks to explicit mapping, the output will be correct.

  • Query 1 won't return anything, because the range doesn't match
  • Queries 2 & 3 will return the documents that fit the data range

Load exercise 7

Exercise 8: non-analyzed fields

In exercise 8, we will define yet another mapping on our blog index. This mapping only treats the "title" field as full-text. The rest of the strings will not be analyzed and tokenized. They will be stored "as is".

This data will be used in exercise 9.

Load exercise 8

Exercise 9: filters, full-text vs. exact values

In exercise 9, I'll show you the difference between full-text searches using queries and exact value matches using queries in filter mode.

The mapping that was done in exercise 8 has made sure there is now a "keyword" field on the title property. This means that queries on "title" are treated as full-text searches and boolean filters on the regular "title.keyword" field are treated as exact value matches.

In one of the examples, I'll also show you how to combine multiple queries and filters.

This is what we'll do in this exercise:

  • Use a prefix query in filter context to perform a wildcard search, even if the fields are not analyzed
  • Do a standard query using the "keyword" field
  • Use a boolean query in filter mode to combine multiple filters based on the "and", "or" & "not" operators
  • Use a regular boolean query and notice how the behaviour of the (should) clause changes

Load exercise 9

Exercise 10: language-based mapping

We will again remap the data. This time, we will treat the "title" property as an analyzed field. By default the "standard" analyzer is used. Because our data is both in Dutch and English, I added 2 fields:

  • The "en" explicitly uses the English analyzer
  • The "nl" explicitly uses the Dutch analyzer

This is the final version of the mapping. The other examples will use this mapping and data.

Load exercise 10

Exercise 11: using languages

Exercise 11 is all about the analysis of text, based on the language. Exercise 4 was a hint towards the analysis of data. Now we'll actually perform searches that depend on language analysis.

  • Query 1 will look for the term "work" on the "title" property
  • Query 2 will look for the term "work" on the "title.en" field (which uses the English analyzer)
  • Query 3 will look for the term "werk" on the "title" property
  • Query 4 will look for the term "werk" on the "title.nl" field (which uses the Dutch analyzer)

Load exercise 11

Exercise 12: geo data

In exercise 12, we'll create a new "cities" index, that contains all the cities that are located in the West-Vlaanderen province of Belgium. The index stores the name of the city and its geo coordinates.

The explicit mapping and the data will be used in other exercises.

Load exercise 12

Exercise 13: geo searches

In the previous exercise, we created a new index and indexed some geo data. In exercise 13, we'll actually perform searches on this data.

2 queries will be showcased:

  • A query that displays all cities within 5km of Diksmuide
  • A query that displays all cities that are located in a specific bounding box (between Koksijde & Nieuwpoort)

Load exercise 13

Excercise 14: aggregation data

In exercise 14, we'll load data into yet another index. This index is called "cars" and it contains car sales information. Every transaction keeps track of the following information:

  • The price of the sale
  • The make of the car that was sold
  • The color of the car
  • The data of the sale

This information will be used in exercise 15.

Load exercise 14

Exercise 15: performing aggregations

Aggregations are a very powerful feature of ElasticSearch. It's basically like "group by" in SQL, but way more powerful. Aggregations are the reason why ElasticSearch is popular in the big data and data science community.

These are the aggregations we'll execute in this exercise:

  • Get the top 10 most popular authors of the Combell blog
  • Get the top 10 most popular authors of the Combell blog and display how many posts they wrote in each language
  • Get all the blog posts written in Dutch, that were published in 2016. Use aggregations to see the amount per month
  • Get the top 3 most popular cars
  • Get the average price of a sold car
  • Get extended statistics on the price of a sold car
  • Get the total revenue for cars per price range, with an interval of 20000 USD
  • Calculate the average price of a Ford, versus the total average price of all the cars that were sold

Load exercise 15

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].