All Projects → dwyl → Learn Elasticsearch

dwyl / Learn Elasticsearch

🔍 Learn how to use ElasticSearch to power a great search experience for your project/product/website.

Programming Languages

elixir
2628 projects

elasticsearch logo Build Status

In the next 30 mins you will learn how to use ElasticSearch to power a great search experience for your project/product/website.

Why?

For anything more than a basic website, people (visiting/using your site/app) expect to be able to search through your content (blog posts, recipes, products, reviews, etc.)

You could use google custom search to provide this functionality and side-step having to run your own (cluster of) search server(s)... But I suspect your project/customer wants/needs more control over the search experience and that's why you're reading this intro?

Why Not XYZ Database (that has Full-Text-Search) ?

Simple/Short answer: Pick the Best tool for the job.

In the past we've used MongoDB's full-text-search (and even wrote a tutorial for it!), MySQL full-text-search to reasonable success (Deal Searcher V.1 @Groupon) and many of our Rails friends swear by Postgres full-text-search but none of these databases were designed from scratch to provide scalable full-text search. So, if you want search, Elasticsearch!

What?

buzz explains elasticsearch

Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents. i.e. awesomeness in a box!

Read more: http://www.elasticsearch.org/overview/elasticsearch/

Whhaaaat...?

whaaat

Feeling bewildered by that buzzword fest? let's break it down:

  • Real-Time: a system in which input data is processed within milliseconds so that it is available virtually immediately as feedback to the process from which it is coming - i.e. things happen without a noticeable delay. An example of "real time" is instant messaging.
    see: https://en.wikipedia.org/wiki/Real-time_computing

  • "Near" Real-Time: means there is a small (but noticeable) delay. You can insert/update a record in the "index" and it will be searchable in less than a second. (It is not immediate, but its close, so they say "Near" Real Time) And example of "near real time" is email (not quite instant)

  • Full-Text Search: means when you search through the records in an ElasticSearch database (cluster) your search term(s) will be searched for everywhere in the desired field(s) of the document. For example: Imagine you have a blog and each blog post has: Title, Intro, Body and Comments section. When searching for a particular string e.g: "this is awesomeness", you could search in all-the-fields which could return a result in one of the comments.
    read more: https://en.wikipedia.org/wiki/Full_text_search

  • Distributed means you can have several ElasticSearch nodes in different data centers or regions to improve retrieval reliability.
    see: https://en.wikipedia.org/?title=Distributed_computing

  • Having a REST API means you can access your ElasticSearch cluster using standard HTTP requests. ˜

How?

There are a few options for running ElasticSearch:
A. Boot a Virtual Machine with ES and all its dependencies (using Vagrant)
B. Install the "binary" package for your Operating System.
C. Don't install anything and just use a free heroku instance! (See: Heroku section below)

Download & Install

ElasticSearch requires Java 8, so if you want to install ElasticSearch ("natively") on your local machine you will need to have Java running... We prefer not to have Java running on our personal machines (because its chronically insecure) so we created a Vagrant box to consistently boot ES (using a VM!) ... see below.

Running ElasticSearch on Any Operating System with Vagrant

If you aren't using Vagrant, read our Vagrant tutorial now: https://github.com/docdis/learn-vagrant

If you are already using Vagrant, simply clone this repo:

git clone [email protected]:docdis/learn-elasticsearch.git && cd learn-elasticsearch

Then run this command (in your terminal):

vagrant up

Note: expect the installation to take a few minutes, go for a walk, or skip to the Tutorial section below and start watching the video.

Ubuntu

Mac

If you don't mind having Java running on your Mac, you can use Homebrew to install ES:

brew install elasticsearch

To have launchd start elasticsearch at login:

ln -sfv /usr/local/opt/elasticsearch/*.plist ~/Library/LaunchAgents

Then to load elasticsearch now:

launchctl load ~/Library/LaunchAgents/homebrew.mxcl.elasticsearch.plist

Or, if you don't want/need launchctl, you can just run:

elasticsearch --config=/usr/local/opt/elasticsearch/config/elasticsearch.yml

Windows

see: https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-service-win.html
(but, seriously, try Vagrant!)

ElasticSearch Server Status

To confirm that everything is working as expected, open your terminal and run the following command:

curl -XGET http://localhost:9200

You should expect to see something similar to:

elasticsearch-status-response-1 6

Tutorial

Once you have installed ElasticSearch (following the instructions above)

Visit: https://www.elastic.co/webinars/getting-started-with-elasticsearch (register using fake data if you want to avoid email spam) and watch the video.

Inserting a record using cURL (REST API)

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{"user":"kimchy","post_date":"2009-11-15T14:12:12","message" : "trying out Elasticsearch"}'

Video Tutorial Code:

If you want to following along with the ElasticSearch getting started video:

Insert a record:

curl -XPUT 'http://localhost:9200/vehicles/tv/one' -d '{"color":"green","driver":{"born":"1959-09-07","name":"Walter White"},"make":"Pontiac","model":"Aztek","value_usd":5000.0, "year":2003}'

Check the mapping for the index:

curl http://localhost:9200/vehicles/_mapping?pretty

To delete an index you accidentally created:

curl -XDELETE 'http://localhost:9200/vehicles/'

Search:

curl 'localhost:9200/vehicles/tv/_search?q=_id:one&pretty'

Insert another document/record:

curl -XPUT 'http://localhost:9200/vehicles/tv/two' -d '{"color":"black","driver":{"born":"1949-01-09","name":"Michael Knight"},"make":"Pontiac","model":"Trans Am","value_usd":9999999.00, "year":1982}'

curl 'http://localhost:9200/vehicles/_search?q=pontiac&pretty'

Updating a Record (Index)

The Update API is quite well documented: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-update.html

## Node.js

see: /nodejs folder for sample scripts you can run in node.js

Elixir

This section is about using ElasticSearch within the Elixir programming language. If you are new to Elixir, see: github.com/dwyl/learn-elixir (you're in for a treat!)

Once you know a bit about Elixir, writing to an ElasticSearch cluster is quite straight forward thanks to @Zatvobor's module tirexs see: https://github.com/Zatvobor/tirexs#getting-started

We've included a simple Write/Read example in /elixir/lib/elastic.ex and /elixir/lib/elastic_test.ex

To try it out on your local computer, simply run the following command(s):

git clone [email protected]:dwyl/learn-elasticsearch.git
cd learn-elasticsearch
mix deps.get
mix test

Tip: you can copy paste the whole block and run all the commands in order.

Useful Links

Video

Background Reading

ELK

ELK is a Logging Stack comprised of ElasticSearch, LogStash & Kibana

tl;dr

History

I chose elasticsearch to power the search for a project I lead at News after careful consideration of Solr. There are great heroku addons (we used Bonsai because they have a free dev tier) and the quality of the search results is superb.

Troubleshooting

see ERRORS.md

How do we Archive a Record?

need to research this

Which Node.js Module Should I Use for ElasticSearch?

There are over a hundred modules for ElasticSearch on NPM
see: http://node-modules.com/search?q=elasticsearch

While writing this post we tried the following modules:

We Wrote a Simpler Node.js Module!

We got frustrated using the other modules, so we wrote a better one: https://github.com/dwyl/esta

How is it "Better"?

  • [x] Focus on simplicity
  • [x] Readable code
  • [x] Zero Dependencies (never worry about upgrading to the latest version of node or the module)
  • [x] 100% Test Coverage
  • [x] Optional Backup of Data

Graphical User Interfaces to ES

http://www.elasticsearch.org/guide/en/elasticsearch/client/community/current/front-ends.html

Security

Pitfalls

The Split Brain Problem

Where your cluster looses communication and you end up with two masters.

Hosted ElasticSearch Providers

If you prefer not to administer your own database/cluster there are a few services you can use:

Host your own ElasticSearch

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].