Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Dice.com's relevancy feedback solr plugin created by Simon Hughes (Dice). Contains request handlers for doing MLT style recommendations, conceptual search, semantic search and personalized search

Stars: ✭ 19 (-54.76%)

Mutual labels: search-engine

Dawnlightsearch

A Linux version of Everything Search Engine.

Stars: ✭ 26 (-38.1%)

Mutual labels: search-engine

Infinispan

Infinispan is an open source data grid platform and highly scalable NoSQL cloud data store.

Stars: ✭ 862 (+1952.38%)

Mutual labels: search-engine

Riot

Go Open Source, Distributed, Simple and efficient Search Engine; Warning: This is V1 and beta version, because of big memory consume, and the V2 will be rewrite all code.

Stars: ✭ 6,025 (+14245.24%)

Mutual labels: search-engine

Dbworld Search

🔍 简单的搜索引擎, django 框架

Stars: ✭ 39 (-7.14%)

Mutual labels: search-engine

Censys Ruby

Ruby API client for the Censys internet-wide network-scan search engine

Stars: ✭ 8 (-80.95%)

Mutual labels: search-engine

Flexsearch

Next-Generation full text search library for Browser and Node.js

Stars: ✭ 8,108 (+19204.76%)

Mutual labels: search-engine

Covid 19 Bert Researchpapers Semantic Search

BERT semantic search engine for searching literature research papers for coronavirus covid-19 in google colab

Stars: ✭ 23 (-45.24%)

Mutual labels: search-engine

Blast

Blast is a full text search and indexing server, written in Go, built on top of Bleve.

Stars: ✭ 934 (+2123.81%)

Mutual labels: search-engine

Inverted index

A simple in memory inverted index in Python

Stars: ✭ 12 (-71.43%)

Mutual labels: search-engine

Funpyspidersearchengine

Word2vec 千人千面个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索

Stars: ✭ 782 (+1761.9%)

Mutual labels: search-engine

Duckietv

A web application built with AngularJS to track your favorite tv-shows with semi-automagic torrent integration

Stars: ✭ 942 (+2142.86%)

Mutual labels: search-engine

Bertsearch

Elasticsearch with BERT for advanced document search.

Stars: ✭ 684 (+1528.57%)

Mutual labels: search-engine

Yub

yub.js - A command-line for the web

Stars: ✭ 10 (-76.19%)

Mutual labels: search-engine

Algolia Webcrawler

Simple node worker that crawls sitemaps in order to keep an algolia index up-to-date

Stars: ✭ 40 (-4.76%)

Mutual labels: search-engine

Awesome Seo

Google SEO研究及流量变现

Stars: ✭ 942 (+2142.86%)

Mutual labels: search-engine

Shebanq

Exposing the Hebrew Text Database of the ETCBC

Stars: ✭ 13 (-69.05%)

Mutual labels: search-engine

View All Similar Projects ➔

phinde - generic web search engine

Self-hosted search engine you can use for your static blog or about any other website you want search functionality for.

My live instance is at http://search.cweiske.de/ and indexes my website, blog and all linked URLs.

======== Features

Crawler and indexer with the ability to run many in parallel
Shows and highlights text that contains search words
Boolean search queries:
- foo bar searches for foo AND bar
- foo OR bar
- title:foo searches for foo only in the page title
Facets for tag, domain, language and type
Date search:
- before:2016-08-30 - modification date before that day
- after:2016-08-30 - modified after that day
- date::2016-08-30 - exact modification day match
Site search
- Query: foo bar site:example.org/dir/
- or use the site GET parameter: /?q=foo&site=example.org/dir
OpenSearch support with HTML and Atom result lists
Instant indexing with WebSub (formerly PubSubHubbub)

============ Dependencies

PHP 5.5+
Elasticsearch 2.0
MySQL or MariaDB for WebSub subscriptions
Gearman (Debian 9: gearman-job-server, not gearman-server)
PHP Gearman extension
Console_CommandLine
Net_URL2
Twig 1.x

===== Setup

#. Install and run Elasticsearch and Gearman #. Install php-gearman #. Get a local copy of the code::

 $ git clone https://git.cweiske.de/phinde.git phinde

#. Install dependencies via composer::

 $ composer install

#. Point your webserver's document root to phinde's www directory #. Copy data/config.php.dist to data/config.php and adjust it. Make sure your add your domain to the crawl whitelist. #. Create a MySQL database and import the schema from data/schema.sql #. Run bin/setup.php which sets up the Elasticsearch schema #. Put your homepage into the queue::

 $ ./bin/process.php http://example.org/

#. Start at least one worker to process the crawl+index queue::

 $ ./bin/phinde-worker.php

#. Check phinde's status page in your browser. The number of open tasks should be > 0, the number of workers also.

Re-index when your site changes

When your site changed, the search engine needs to re-crawl and re-index the pages.

Simply tell phinde that something changed by running::

$ ./bin/process.php http://example.org/foo.htm

phinde supports HTML pages and Atom feeds, so if your blog has a feed it's enough to let phinde reindex that one. It will find all linked pages automatically.

Website integration

Adding a simple search form to your website is easy. It needs two things:

<form> tag with an action that points to the phinde instance
Search text field with name of q.

Example::

System service

When using systemd, you can let it run multiple worker instances when the system boots up:

#. Copy files data/systemd/phinde*.service into /etc/systemd/system/ #. Adjust user and group names, and the work directories #. Enable three worker processes::

 $ systemctl daemon-reload
 $ systemctl enable [email protected]
 $ systemctl enable [email protected]
 $ systemctl enable [email protected]
 $ systemctl enable phinde
 $ systemctl start phinde

#. Now three workers are running. Restarting the phinde service also restarts the workers.

Cron job

Run bin/renew-subscriptions.php once a day with cron. It will renew the WebSub subscriptions.

===== Howto

Delete index data from one domain::

$ curl -iv -XDELETE -H 'Content-Type: application/json' -d '{"query":{"term":{"domain":"example.org"}}}' http://127.0.0.1:9200/phinde/_query

That's delete-by-query 2.0, see https://www.elastic.co/guide/en/elasticsearch/plugins/2.0/delete-by-query-usage.html

Subscribe to a website/feed

Phinde supports WebSub__ to get subscribe to changes of a website. When phinde gets notified by the website's hub about changes, it will immediately crawl and index the changed pages.

Subscribe to a website's feed::

$ php bin/subscribe.php http://example.org/feed.atom

Phinde will determine the website's hub and send a registration request to it.

The status page will show the number of working, and the number of open subscriptions.

Unsubscribing also happens on command line::

$ php bin/unsubscribe.php http://example.org/feed.atom

__ https://www.w3.org/TR/websub/

============ About phinde

Source code

phinde's source code is available from http://git.cweiske.de/phinde.git or the mirror on github__.

__ https://github.com/cweiske/phinde

License

phinde is licensed under the AGPL v3 or later__.

__ http://www.gnu.org/licenses/agpl.html

Author

phinde was written by Christian Weiske__.

__ http://cweiske.de/

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 42

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (6) 🔗