All Projects → mosuka → Blast

mosuka / Blast

Licence: apache-2.0
Blast is a full text search and indexing server, written in Go, built on top of Bleve.

Programming Languages

go
31211 projects - #10 most used programming language
golang
3204 projects

Projects that are alternatives of or similar to Blast

Riot
Go Open Source, Distributed, Simple and efficient Search Engine; Warning: This is V1 and beta version, because of big memory consume, and the V2 will be rewrite all code.
Stars: ✭ 6,025 (+545.07%)
Mutual labels:  search, search-engine, index
Sonic
🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.
Stars: ✭ 12,347 (+1221.95%)
Mutual labels:  search, search-engine, index
Typesense
Fast, typo tolerant, fuzzy search engine for building delightful search experiences ⚡ 🔍 ✨ An Open Source alternative to Algolia and an Easier-to-Use alternative to ElasticSearch.
Stars: ✭ 8,644 (+825.48%)
Mutual labels:  search, search-engine, raft
Bayard
A full-text search and indexing server written in Rust.
Stars: ✭ 1,555 (+66.49%)
Mutual labels:  grpc, search-engine, raft
Instantsearch Ios
⚡️ A library of widgets and helpers to build instant-search applications on iOS.
Stars: ✭ 498 (-46.68%)
Mutual labels:  search, search-engine
Pisa
PISA: Performant Indexes and Search for Academia
Stars: ✭ 489 (-47.64%)
Mutual labels:  search, search-engine
Resin
Hardware-accelerated vector-based search engine. Available as a HTTP service or as an embedded library.
Stars: ✭ 529 (-43.36%)
Mutual labels:  search, search-engine
Filemasta
A search application to explore, discover and share online files
Stars: ✭ 571 (-38.87%)
Mutual labels:  search, search-engine
Open Semantic Search
Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
Stars: ✭ 386 (-58.67%)
Mutual labels:  search, search-engine
Fess
Fess is very powerful and easily deployable Enterprise Search Server.
Stars: ✭ 561 (-39.94%)
Mutual labels:  search, search-engine
Search Ui
🔍 A set of UI components to build a fully customized search!
Stars: ✭ 24 (-97.43%)
Mutual labels:  search, search-engine
Xenon
The MySQL Cluster Autopilot Management with GTID and Raft
Stars: ✭ 461 (-50.64%)
Mutual labels:  raft, cluster
Lucene Solr
Apache Lucene and Solr open-source search software
Stars: ✭ 4,217 (+351.5%)
Mutual labels:  search, search-engine
Weaviate
Weaviate is a cloud-native, modular, real-time vector search engine
Stars: ✭ 509 (-45.5%)
Mutual labels:  restful-api, search-engine
Opensearchserver
Open-source Enterprise Grade Search Engine Software
Stars: ✭ 408 (-56.32%)
Mutual labels:  search, search-engine
Algoliasearch Client Php
⚡️ A fully-featured and blazing-fast PHP API client to interact with Algolia.
Stars: ✭ 565 (-39.51%)
Mutual labels:  search, search-engine
Manticoresearch
Database for search
Stars: ✭ 610 (-34.69%)
Mutual labels:  search, search-engine
Elasticsuite
Smile ElasticSuite - Magento 2 merchandising and search engine built on ElasticSearch
Stars: ✭ 647 (-30.73%)
Mutual labels:  search, search-engine
Dotnext
Next generation API for .NET
Stars: ✭ 379 (-59.42%)
Mutual labels:  raft, cluster
Dbreeze
C# .NET MONO NOSQL ( key value store embedded ) ACID multi-paradigm database management system.
Stars: ✭ 383 (-58.99%)
Mutual labels:  search, search-engine

Blast

Blast is a full-text search and indexing server written in Go built on top of Bleve.
It provides functions through gRPC (HTTP/2 + Protocol Buffers) or traditional RESTful API (HTTP/1.1 + JSON).
Blast implements a Raft consensus algorithm by hashicorp/raft. It achieves consensus across all the nodes, ensuring that every change made to the system is made to a quorum of nodes, or none at all. Blast makes it easy for programmers to develop search applications with advanced features.

Features

  • Full-text search/indexing
  • Faceted search
  • Spatial/Geospatial search
  • Search result highlighting
  • Index replication
  • Bringing up cluster
  • An easy-to-use HTTP API
  • CLI is available
  • Docker container image is available

Install build dependencies

Blast requires some C/C++ libraries if you need to enable cld2, icu, libstemmer or leveldb. The following sections are instructions for satisfying dependencies on particular platforms.

Ubuntu 18.10

$ sudo apt-get update
$ sudo apt-get install -y \
    libicu-dev \
    libstemmer-dev \
    libleveldb-dev \
    gcc-4.8 \
    g++-4.8 \
    build-essential

$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 80
$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 80
$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 90
$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.8 90

$ export GOPATH=${HOME}/go
$ mkdir -p ${GOPATH}/src/github.com/blevesearch
$ cd ${GOPATH}/src/github.com/blevesearch
$ git clone https://github.com/blevesearch/cld2.git
$ cd ${GOPATH}/src/github.com/blevesearch/cld2
$ git clone https://github.com/CLD2Owners/cld2.git
$ cd cld2/internal
$ ./compile_libs.sh
$ sudo cp *.so /usr/local/lib

macOS High Sierra Version 10.13.6

$ brew install \
    icu4c \
    leveldb

$ export GOPATH=${HOME}/go
$ go get -u -v github.com/blevesearch/cld2
$ cd ${GOPATH}/src/github.com/blevesearch/cld2
$ git clone https://github.com/CLD2Owners/cld2.git
$ cd cld2/internal
$ perl -p -i -e 's/soname=/install_name,/' compile_libs.sh
$ ./compile_libs.sh
$ sudo cp *.so /usr/local/lib

Build

Building Blast as following:

$ mkdir -p ${GOPATH}/src/github.com/mosuka
$ cd ${GOPATH}/src/github.com/mosuka
$ git clone https://github.com/mosuka/blast.git
$ cd blast
$ make

If you omit GOOS or GOARCH, it will build the binary of the platform you are using.
If you want to specify the target platform, please set GOOS and GOARCH environment variables.

Linux

$ make GOOS=linux build

macOS

$ make GOOS=darwin build

Windows

$ make GOOS=windows build

Build with extensions

Blast supports some Bleve Extensions (blevex). If you want to build with them, please set CGO_LDFLAGS, CGO_CFLAGS, CGO_ENABLED and BUILD_TAGS. For example, build LevelDB to be available for index storage as follows:

$ make GOOS=linux \
       BUILD_TAGS=icu \
       CGO_ENABLED=1 \
       build

Linux

$ make GOOS=linux \
       BUILD_TAGS="kagome icu libstemmer cld2" \
       CGO_ENABLED=1 \
       build

macOS

$ make GOOS=darwin \
       BUILD_TAGS="kagome icu libstemmer cld2" \
       CGO_ENABLED=1 \
       CGO_LDFLAGS="-L/usr/local/opt/icu4c/lib" \
       CGO_CFLAGS="-I/usr/local/opt/icu4c/include" \
       build

Build flags

Refer to the following table for the build flags of the supported Bleve extensions:

BUILD_TAGS CGO_ENABLED Description
cld2 1 Enable Compact Language Detector
kagome 0 Enable Japanese Language Analyser
icu 1 Enable ICU Tokenizer, Thai Language Analyser
libstemmer 1 Enable Language Stemmer (Danish, German, English, Spanish, Finnish, French, Hungarian, Italian, Dutch, Norwegian, Portuguese, Romanian, Russian, Swedish, Turkish)

If you want to enable the feature whose CGO_ENABLE is 1, please install it referring to the Install build dependencies section above.

Binary

You can see the binary file when build successful like so:

$ ls ./bin
blast

Test

If you want to test your changes, run command like following:

$ make test

If you want to specify the target platform, set GOOS and GOARCH environment variables in the same way as the build.

Package

To create a distribution package, run the following command:

$ make dist

Configure

Blast can change its startup options with configuration files, environment variables, and command line arguments.
Refer to the following table for the options that can be configured.

CLI Flag Environment variable Configuration File Description
--config-file - - config file. if omitted, blast.yaml in /etc and home directory will be searched
--id BLAST_ID id node ID
--raft-address BLAST_RAFT_ADDRESS raft_address Raft server listen address
--grpc-address BLAST_GRPC_ADDRESS grpc_address gRPC server listen address
--http-address BLAST_HTTP_ADDRESS http_address HTTP server listen address
--data-directory BLAST_DATA_DIRECTORY data_directory data directory which store the index and Raft logs
--mapping-file BLAST_MAPPING_FILE mapping_file path to the index mapping file
--peer-grpc-address BLAST_PEER_GRPC_ADDRESS peer_grpc_address listen address of the existing gRPC server in the joining cluster
--certificate-file BLAST_CERTIFICATE_FILE certificate_file path to the client server TLS certificate file
--key-file BLAST_KEY_FILE key_file path to the client server TLS key file
--common-name BLAST_COMMON_NAME common_name certificate common name
--cors-allowed-methods BLAST_CORS_ALLOWED_METHODS cors_allowed_methods CORS allowed methods (ex: GET,PUT,DELETE,POST)
--cors-allowed-origins BLAST_CORS_ALLOWED_ORIGINS cors_allowed_origins CORS allowed origins (ex: http://localhost:8080,http://localhost:80)
--cors-allowed-headers BLAST_CORS_ALLOWED_HEADERS cors_allowed_headers CORS allowed headers (ex: content-type,x-some-key)
--log-level BLAST_LOG_LEVEL log_level log level
--log-file BLAST_LOG_FILE log_file log file
--log-max-size BLAST_LOG_MAX_SIZE log_max_size max size of a log file in megabytes
--log-max-backups BLAST_LOG_MAX_BACKUPS log_max_backups max backup count of log files
--log-max-age BLAST_LOG_MAX_AGE log_max_age max age of a log file in days
--log-compress BLAST_LOG_COMPRESS log_compress compress a log file

Start

Starting server is easy as follows:

$ ./bin/blast start \
              --id=node1 \
              --raft-address=:7000 \
              --http-address=:8000 \
              --grpc-address=:9000 \
              --data-directory=/tmp/blast/node1 \
              --mapping-file=./examples/example_mapping.json

You can get the node information with the following command:

$ ./bin/blast node | jq .

or the following URL:

$ curl -X GET http://localhost:8000/v1/node | jq .

The result of the above command is:

{
  "node": {
    "raft_address": ":7000",
    "metadata": {
      "grpc_address": ":9000",
      "http_address": ":8000"
    },
    "state": "Leader"
  }
}

Health check

You can check the health status of the node.

$ ./bin/blast healthcheck | jq .

Also provides the following REST APIs

Liveness prove

This endpoint always returns 200 and should be used to check server health.

$ curl -X GET http://localhost:8000/v1/liveness_check | jq .

Readiness probe

This endpoint returns 200 when server is ready to serve traffic (i.e. respond to queries).

$ curl -X GET http://localhost:8000/v1/readiness_check | jq .

Put a document

To put a document, execute the following command:

$ ./bin/blast set 1 '
{
  "fields": {
    "title": "Search engine (computing)",
    "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
    "timestamp": "2018-07-04T05:41:00Z",
    "_type": "example"
  }
}
' | jq .

or, you can use the RESTful API as follows:

$ curl -X PUT 'http://127.0.0.1:8000/v1/documents/1' --data-binary '
{
  "fields": {
    "title": "Search engine (computing)",
    "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
    "timestamp": "2018-07-04T05:41:00Z",
    "_type": "example"
  }
}
' | jq .

or

$ curl -X PUT 'http://127.0.0.1:8000/v1/documents/1' -H "Content-Type: application/json" --data-binary @./examples/example_doc_1.json

Get a document

To get a document, execute the following command:

$ ./bin/blast get 1 | jq .

or, you can use the RESTful API as follows:

$ curl -X GET 'http://127.0.0.1:8000/v1/documents/1' | jq .

You can see the result. The result of the above command is:

{
  "fields": {
    "_type": "example",
    "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
    "timestamp": "2018-07-04T05:41:00Z",
    "title": "Search engine (computing)"
  }
}

Search documents

To search documents, execute the following command:

$ ./bin/blast search '
{
  "search_request": {
    "query": {
      "query": "+_all:search"
    },
    "size": 10,
    "from": 0,
    "fields": [
      "*"
    ],
    "sort": [
      "-_score"
    ]
  }
}
' | jq .

or, you can use the RESTful API as follows:

$ curl -X POST 'http://127.0.0.1:8000/v1/search' --data-binary '
{
  "search_request": {
    "query": {
      "query": "+_all:search"
    },
    "size": 10,
    "from": 0,
    "fields": [
      "*"
    ],
    "sort": [
      "-_score"
    ]
  }
}
' | jq .

You can see the result. The result of the above command is:

{
  "search_result": {
    "facets": null,
    "hits": [
      {
        "fields": {
          "_type": "example",
          "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
          "timestamp": "2018-07-04T05:41:00Z",
          "title": "Search engine (computing)"
        },
        "id": "1",
        "index": "/tmp/blast/node1/index",
        "score": 0.09703538256409851,
        "sort": [
          "_score"
        ]
      }
    ],
    "max_score": 0.09703538256409851,
    "request": {
      "explain": false,
      "facets": null,
      "fields": [
        "*"
      ],
      "from": 0,
      "highlight": null,
      "includeLocations": false,
      "query": {
        "query": "+_all:search"
      },
      "search_after": null,
      "search_before": null,
      "size": 10,
      "sort": [
        "-_score"
      ]
    },
    "status": {
      "failed": 0,
      "successful": 1,
      "total": 1
    },
    "took": 171880,
    "total_hits": 1
  }
}

Delete a document

Deleting a document, execute the following command:

$ ./bin/blast delete 1

or, you can use the RESTful API as follows:

$ curl -X DELETE 'http://127.0.0.1:8000/v1/documents/1'

Index documents in bulk

To index documents in bulk, execute the following command:

$ ./bin/blast bulk-index --file ./examples/example_bulk_index.json

or, you can use the RESTful API as follows:

$ curl -X PUT 'http://127.0.0.1:8000/v1/documents' -H "Content-Type: application/x-ndjson" --data-binary @./examples/example_bulk_index.json

Delete documents in bulk

To delete documents in bulk, execute the following command:

$ ./bin/blast bulk-delete --file ./examples/example_bulk_delete.txt

or, you can use the RESTful API as follows:

$ curl -X DELETE 'http://127.0.0.1:8000/v1/documents' -H "Content-Type: text/plain" --data-binary @./examples/example_bulk_delete.txt

Bringing up a cluster

Blast is easy to bring up the cluster. the node is already running, but that is not fault tolerant. If you need to increase the fault tolerance, bring up 2 more data nodes like so:

$ ./bin/blast start \
              --id=node2 \
              --raft-address=:7001 \
              --http-address=:8001 \
              --grpc-address=:9001 \
              --peer-grpc-address=:9000 \
              --data-directory=/tmp/blast/node2 \
              --mapping-file=./examples/example_mapping.json
$ ./bin/blast start \
              --id=node3 \
              --raft-address=:7002 \
              --http-address=:8002 \
              --grpc-address=:9002 \
              --peer-grpc-address=:9000 \
              --data-directory=/tmp/blast/node3 \
              --mapping-file=./examples/example_mapping.json

Above example shows each Blast node running on the same host, so each node must listen on different ports. This would not be necessary if each node ran on a different host.

This instructs each new node to join an existing node, each node recognizes the joining clusters when started. So you have a 3-node cluster. That way you can tolerate the failure of 1 node. You can check the cluster with the following command:

$ ./bin/blast cluster | jq .

or, you can use the RESTful API as follows:

$ curl -X GET 'http://127.0.0.1:8000/v1/cluster' | jq .

You can see the result in JSON format. The result of the above command is:

{
  "cluster": {
    "nodes": {
      "node1": {
        "raft_address": ":7000",
        "metadata": {
          "grpc_address": ":9000",
          "http_address": ":8000"
        },
        "state": "Leader"
      },
      "node2": {
        "raft_address": ":7001",
        "metadata": {
          "grpc_address": ":9001",
          "http_address": ":8001"
        },
        "state": "Follower"
      },
      "node3": {
        "raft_address": ":7002",
        "metadata": {
          "grpc_address": ":9002",
          "http_address": ":8002"
        },
        "state": "Follower"
      }
    },
    "leader": "node1"
  }
}

Recommend 3 or more odd number of nodes in the cluster. In failure scenarios, data loss is inevitable, so avoid deploying single nodes.

The above example, the node joins to the cluster at startup, but you can also join the node that already started on standalone mode to the cluster later, as follows:

$ ./bin/blast join --grpc-address=:9000 node2 127.0.0.1:9001

or, you can use the RESTful API as follows:

$ curl -X PUT 'http://127.0.0.1:8000/v1/cluster/node2' --data-binary '
{
  "raft_address": ":7001",
  "metadata": {
    "grpc_address": ":9001",
    "http_address": ":8001"
  }
}
'

To remove a node from the cluster, execute the following command:

$ ./bin/blast leave --grpc-address=:9000 node2

or, you can use the RESTful API as follows:

$ curl -X DELETE 'http://127.0.0.1:8000/v1/cluster/node2'

The following command indexes documents to any node in the cluster:

$ ./bin/blast set 1 '
{
  "fields": {
    "title": "Search engine (computing)",
    "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
    "timestamp": "2018-07-04T05:41:00Z",
    "_type": "example"
  }
}
' --grpc-address=:9000 | jq .

So, you can get the document from the node specified by the above command as follows:

$ ./bin/blast get 1 --grpc-address=:9000 | jq .

You can see the result. The result of the above command is:

value1

You can also get the same document from other nodes in the cluster as follows:

$ ./bin/blast get 1 --grpc-address=:9001 | jq .
$ ./bin/blast get 1 --grpc-address=:9002 | jq .

You can see the result. The result of the above command is:

{
  "fields": {
    "_type": "example",
    "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
    "timestamp": "2018-07-04T05:41:00Z",
    "title": "Search engine (computing)"
  }
}

Docker

Build Docker container image

You can build the Docker container image like so:

$ make docker-build

Pull Docker container image from docker.io

You can also use the Docker container image already registered in docker.io like so:

$ docker pull mosuka/blast:latest

See https://hub.docker.com/r/mosuka/blast/tags/

Start on Docker

Running a Blast data node on Docker. Start Blast node like so:

$ docker run --rm --name blast-node1 \
    -p 7000:7000 \
    -p 8000:8000 \
    -p 9000:9000 \
    -v $(pwd)/etc/blast_mapping.json:/etc/blast_mapping.json \
    mosuka/blast:latest start \
      --id=node1 \
      --raft-address=:7000 \
      --http-address=:8000 \
      --grpc-address=:9000 \
      --data-directory=/tmp/blast/node1 \
      --mapping-file=/etc/blast_mapping.json

You can execute the command in docker container as follows:

$ docker exec -it blast-node1 blast node --grpc-address=:9000

Securing Blast

Blast supports HTTPS access, ensuring that all communication between clients and a cluster is encrypted.

Generating a certificate and private key

One way to generate the necessary resources is via openssl. For example:

$ openssl req -x509 -nodes -newkey rsa:4096 -keyout ./etc/blast_key.pem -out ./etc/blast_cert.pem -days 365 -subj '/CN=localhost'
Generating a 4096 bit RSA private key
............................++
........++
writing new private key to 'key.pem'

Secure cluster example

Starting a node with HTTPS enabled, node-to-node encryption, and with the above configuration file. It is assumed the HTTPS X.509 certificate and key are at the paths server.crt and key.pem respectively.

$ ./bin/blast start \
             --id=node1 \
             --raft-address=:7000 \
             --http-address=:8000 \
             --grpc-address=:9000 \
             --peer-grpc-address=:9000 \
             --data-directory=/tmp/blast/node1 \
             --mapping-file=./etc/blast_mapping.json \
             --certificate-file=./etc/blast_cert.pem \
             --key-file=./etc/blast_key.pem \
             --common-name=localhost
$ ./bin/blast start \
             --id=node2 \
             --raft-address=:7001 \
             --http-address=:8001 \
             --grpc-address=:9001 \
             --peer-grpc-address=:9000 \
             --data-directory=/tmp/blast/node2 \
             --mapping-file=./etc/blast_mapping.json \
             --certificate-file=./etc/blast_cert.pem \
             --key-file=./etc/blast_key.pem \
             --common-name=localhost
$ ./bin/blast start \
             --id=node3 \
             --raft-address=:7002 \
             --http-address=:8002 \
             --grpc-address=:9002 \
             --peer-grpc-address=:9000 \
             --data-directory=/tmp/blast/node3 \
             --mapping-file=./etc/blast_mapping.json \
             --certificate-file=./etc/blast_cert.pem \
             --key-file=./etc/blast_key.pem \
             --common-name=localhost

You can access the cluster by adding a flag, such as the following command:

$ ./bin/blast cluster --grpc-address=:9000 --certificate-file=./etc/blast_cert.pem --common-name=localhost | jq .

or

$ curl -X GET https://localhost:8000/v1/cluster --cacert ./etc/cert.pem | jq .
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].