Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

NBoost is a scalable, search-api-boosting platform for deploying transformer models to improve the relevance of search results on different platforms (i.e. Elasticsearch)

Stars: ✭ 549 (-19.74%)

Mutual labels: search-engine, elasticsearch

Haystack

🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.

Stars: ✭ 3,409 (+398.39%)

Mutual labels: search-engine, elasticsearch

Rusticsearch

Lightweight Elasticsearch compatible search server.

Stars: ✭ 171 (-75%)

Mutual labels: search-engine, elasticsearch

Srchx

A standalone lightweight full-text search engine built on top of blevesearch and Go with multiple storage (scorch, boltdb, leveldb, badger)

Stars: ✭ 118 (-82.75%)

Mutual labels: search-engine, elasticsearch

Elasticsearch

Free and Open, Distributed, RESTful Search Engine

Stars: ✭ 57,778 (+8347.08%)

Mutual labels: search-engine, elasticsearch

Gnes

GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.

Stars: ✭ 1,178 (+72.22%)

Mutual labels: search-engine, elasticsearch

Thesaurus Of Job Titles

Open Source Thesaurus of Job Titles in US English

Stars: ✭ 77 (-88.74%)

Mutual labels: search-engine, elasticsearch

Adam qas

ADAM - A Question Answering System. Inspired from IBM Watson

Stars: ✭ 330 (-51.75%)

Mutual labels: elasticsearch, natural-language-processing

Ipfs Search

Search engine for the Interplanetary Filesystem.

Stars: ✭ 519 (-24.12%)

Mutual labels: search-engine, elasticsearch

View All Similar Projects ➔

Elasticsearch meets BERT

Below is a job search example:

System architecture

Requirements

Docker
Docker Compose >= 1.22.0

Getting Started

1. Download a pretrained BERT model

List of released pretrained BERT models (click to expand...)

BERT-Base, Uncased	12-layer, 768-hidden, 12-heads, 110M parameters
BERT-Large, Uncased	24-layer, 1024-hidden, 16-heads, 340M parameters
BERT-Base, Cased	12-layer, 768-hidden, 12-heads , 110M parameters
BERT-Large, Cased	24-layer, 1024-hidden, 16-heads, 340M parameters
BERT-Base, Multilingual Cased (New)	104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
BERT-Base, Multilingual Cased (Old)	102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
BERT-Base, Chinese	Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters

$ wget https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip
$ unzip cased_L-12_H-768_A-12.zip

2. Set environment variables

You need to set a pretrained BERT model and Elasticsearch's index name as environment variables:

$ export PATH_MODEL=./cased_L-12_H-768_A-12
$ export INDEX_NAME=jobsearch

3. Run Docker containers

$ docker-compose up

CAUTION: If possible, assign high memory(more than 8GB) to Docker's memory configuration because BERT container needs high memory.

4. Create index

You can use the create index API to add a new index to an Elasticsearch cluster. When creating an index, you can specify the following:

Settings for the index
Mappings for fields in the index
Index aliases

For example, if you want to create jobsearch index with title, text and text_vector fields, you can create the index by the following command:

$ python example/create_index.py --index_file=example/index.json --index_name=jobsearch
# index.json
{
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 1
  },
  "mappings": {
    "dynamic": "true",
    "_source": {
      "enabled": "true"
    },
    "properties": {
      "title": {
        "type": "text"
      },
      "text": {
        "type": "text"
      },
      "text_vector": {
        "type": "dense_vector",
        "dims": 768
      }
    }
  }
}

CAUTION: The dims value of text_vector must need to match the dims of a pretrained BERT model.

5. Create documents

Once you created an index, you’re ready to index some document. The point here is to convert your document into a vector using BERT. The resulting vector is stored in the text_vector field. Let`s convert your data into a JSON document:

$ python example/create_documents.py --data=example/example.csv --index_name=jobsearch
# example/example.csv
"Title","Description"
"Saleswoman","lorem ipsum"
"Software Developer","lorem ipsum"
"Chief Financial Officer","lorem ipsum"
"General Manager","lorem ipsum"
"Network Administrator","lorem ipsum"

After finishing the script, you can get a JSON document like follows:

# documents.jsonl
{"_op_type": "index", "_index": "jobsearch", "text": "lorem ipsum", "title": "Saleswoman", "text_vector": [...]}
{"_op_type": "index", "_index": "jobsearch", "text": "lorem ipsum", "title": "Software Developer", "text_vector": [...]}
{"_op_type": "index", "_index": "jobsearch", "text": "lorem ipsum", "title": "Chief Financial Officer", "text_vector": [...]}
...

6. Index documents

After converting your data into a JSON, you can adds a JSON document to the specified index and makes it searchable.

$ python example/index_documents.py

7. Open browser

Go to http://127.0.0.1:5000.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 684

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗