All Projects → filiphanes → fts-elastic

filiphanes / fts-elastic

Licence: Unknown, MIT licenses found Licenses found Unknown COPYING MIT COPYING.MIT
ElasticSearch FTS implementation for the Dovecot mail server

Programming Languages

c
50402 projects - #5 most used programming language
shell
77523 projects
M4
1887 projects
Makefile
30231 projects

Projects that are alternatives of or similar to fts-elastic

CodeIndex
A Code Index Searching Tools Based On Lucene.Net
Stars: ✭ 28 (+40%)
Mutual labels:  fulltextsearch
webadmin
SophiMail Webadmin and Dashboard
Stars: ✭ 48 (+140%)
Mutual labels:  dovecot
silverstripe-searchable
Adds to default SilverStripe search with configurable FullTextSearch, custom results controller and allowing adding custom data objects and custom fields for searching via config
Stars: ✭ 13 (-35%)
Mutual labels:  fulltextsearch
fulltextsearch elasticsearch
🔍 Use Elasticsearch to index the content of your Nextcloud
Stars: ✭ 69 (+245%)
Mutual labels:  fulltextsearch
mailfull-go
A management tool for virtual domain email for Postfix and Dovecot written in Go
Stars: ✭ 20 (+0%)
Mutual labels:  dovecot
SaorTech-cloud-services
A range of scripts to provision and configure open source cloud services.
Stars: ✭ 23 (+15%)
Mutual labels:  dovecot
ldap-mail-schema
a collection of LDAP mail schemas
Stars: ✭ 36 (+80%)
Mutual labels:  dovecot
mailserver
Simple and full-featured mail server using Docker
Stars: ✭ 88 (+340%)
Mutual labels:  dovecot
files fulltextsearch
🔍 Index the content of your files
Stars: ✭ 44 (+120%)
Mutual labels:  fulltextsearch
dovecot-ceph-plugin
Dovecot plugin for storing mails in a Ceph cluster
Stars: ✭ 116 (+480%)
Mutual labels:  dovecot
Modoboa
Mail hosting made simple
Stars: ✭ 1,998 (+9890%)
Mutual labels:  dovecot
Docker Mailserver
Production-ready fullstack but simple mail server (SMTP, IMAP, LDAP, Antispam, Antivirus, etc.) running inside a container.
Stars: ✭ 8,115 (+40475%)
Mutual labels:  dovecot
Mailcow Dockerized
mailcow: dockerized - 🐮 + 🐋 = 💕
Stars: ✭ 4,573 (+22765%)
Mutual labels:  dovecot
Excision-Mail
Fullstack, security focused mailserver based on OpenSMTPD for OpenBSD using ansible
Stars: ✭ 108 (+440%)
Mutual labels:  dovecot
mailad
Software to provision a mail server with users from a Windows or Samba 4 Active Directory
Stars: ✭ 21 (+5%)
Mutual labels:  dovecot
docker-mail-server
Ansible playbooks to deploy a full featured mail server stack using Docker.
Stars: ✭ 47 (+135%)
Mutual labels:  dovecot
openbsd-server-setup
A collection of scripts to ease bootstrapping of a new OpenBSD server. Includes nginx with SSL, mail with DKIM, WireGuard and IKEv2 VPN setup.
Stars: ✭ 33 (+65%)
Mutual labels:  dovecot
GnusSolution
A complete working solution of gnus+offlineimap+dovecot+msmtp+cron
Stars: ✭ 18 (-10%)
Mutual labels:  dovecot

fts-elastic

fts-elastic is a Dovecot full-text search indexing plugin that uses ElasticSearch as a backend.

Dovecot communicates to ES using HTTP/JSON queries. It supports automatic indexing and searching of e-mail. For mailboxes with more than 10000 messages it uses elastic scroll API.

Packaging status

Requirements

  • Dovecot 2.2+
  • JSON-C
  • ElasticSearch 6.x, 7.x
  • Autoconf 2.53+

Compiling

This plugin needs to compile against the Dovecot source for the version you intend to run it on. A dovecot-devel package is unfortunately insufficient as it does not include the required fts API header files.

You can provide the path to your source tree by passing --with-dovecot= to ./configure.

Install dependencies

# sudo apt install dovecot
sudo apt install gcc make libjson-c-dev dovecot-dev

An example build may look like:

./autogen.sh
./configure --with-dovecot=/usr/lib/dovecot/
make
make install
  sudo ln -s /usr/lib/dovecot/lib21_fts_elastic_plugin.so /usr/lib/dovecot/modules/lib21_fts_elastic_plugin.so

Configuration

Create /etc/dovecot/conf.d/90-fts.conf with content:

mail_plugins = $mail_plugins fts fts_elastic

plugin {
  fts = elastic
  fts_elastic = debug url=http://localhost:9200/m/ bulk_size=5000000 refresh=fts rawlog_dir=/var/log/fts-elastic/

# no indexes new emails when user make search
# yes indexes every email when delivered
  fts_autoindex = no
fts_autoindex_exclude = \Junk
fts_autoindex_exclude2 = \Trash
}

and (re)start dovecot:

dovecot stop; dovecot
  • url=<elasticsearch url> Required elastic URL with index name, must end with slash /
  • bulk_size=<positive integer> How large bulk requests we want to send to elastic in bytes (default=5000000)
  • refresh={fts,index,never} When you want to refresh elastic index so new emails will be searchable
    • fts: when dovecot fts plugin calls it (typically before search)
    • index: after each bulk update using ?refrest=true query param (create not effective indexes when combined with fts_autoindex=yes)
    • never: leave it to elastic, indexed emails may not be searchable immediately
  • debug Enables HTTP debugging
  • rawlog_dir is directory where HTTP communication with elasticsearch server is written (useful for debugging plugin or elastic schema)

ElasticSearch index

This plugin stores all message in one elastic index. You can use sharding to support large numbers of users. Since it uses routing key, updates and searches are accessing only one shard. _id is in the form "_id":"uid/mbox-guid/user@domain", example: "_id":"3/f40efa2f8f44ad54424000006e8130ae/[email protected]"

You can setup index mapping on Elasticsearch 6.x with command

curl -X PUT "http://elasticIP:9200/m?pretty" -H 'Content-Type: application/json' -d "@elastic6-schema.json"

on Elasticsearch 7.x there is different date format parser, you need to use different schema:

curl -X PUT "http://elasticIP:9200/m?pretty" -H 'Content-Type: application/json' -d "@elastic7-schema.json"

Fields box and user needs to be keyword fields, as you can see in file elastic-schema.json. In our schema there is _source enabled because we don't see much storage savings when _source is disabled and elastic documentation doesn't recommend it either. This plugin doesn't use _source. It explicitly disables it in response queries, but you can use it for better management and insight to indexed emails or when you want to use elastic for other than dovecot fts (analysis, spammers detection, ...). In case of elastic reindexing _source will be needed.

Any time you can reindex users mailbox with doveadm commands;

doveadm fts rescan -u [email protected]
doveadm index -u user@domain -q '*'

An example of pushed document:

{
  "user": "[email protected]",
  "box": "f40efa2f8f44ad54424000006e8130ae",
  "uid": 3,
  "date": "Thu, 08 Jan 2015 00:20:05 +0000",
  "from": "josh <[email protected]>",
  "sender": "Filip Hanes",
  "to": "<[email protected]>",
  "cc": "User <[email protected]>",
  "bcc": "\"Test User\" <[email protected]>",
  "subject": "Test #3",
  "message-id": "<[email protected]>",
  "body": "This is the body of test #3.\n"
}

An example search:

curl -X POST "http://elasticIP:9200/m/_search?pretty" -H 'Content-Type: application/json' -d '
{
  "query": {
    "bool": {
      "filter": [
        {"term": {"user": "[email protected]"}},
        {"term": {"box": "f40efa2f8f44ad54424000006e8130ae"}}
      ],
      "must": [
        {
          "multi_match": {
            "query": "test",
            "operator": "and",
            "fields": ["from","to","cc","bcc","sender","subject","body"]
          }
        }
      ]
    }
  },
  "size": 100
}
'

TODO

Thanks

This plugin borrows heavily from dovecot itself particularly for the automatic detection of dovecont-config (see m4/dovecot.m4). The fts-solr and fts-squat plugins were also used as reference material for understanding the Dovecot FTS API. FTS-lucene was used as reference for implementing proper rescan.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].