All Projects → cweiske → Phinde

cweiske / Phinde

Licence: agpl-3.0
Self-hosted search engine for your static blog

Projects that are alternatives of or similar to Phinde

Minisearch
Tiny and powerful JavaScript full-text search engine for browser and Node
Stars: ✭ 737 (+1654.76%)
Mutual labels:  search-engine
Better Search
Better Search WordPress plugin
Stars: ✭ 9 (-78.57%)
Mutual labels:  search-engine
Opensse
Open Sketch Search Engine- 3D object retrieval based on sketch image as input
Stars: ✭ 883 (+2002.38%)
Mutual labels:  search-engine
Relevancyfeedback
Dice.com's relevancy feedback solr plugin created by Simon Hughes (Dice). Contains request handlers for doing MLT style recommendations, conceptual search, semantic search and personalized search
Stars: ✭ 19 (-54.76%)
Mutual labels:  search-engine
Dawnlightsearch
A Linux version of Everything Search Engine.
Stars: ✭ 26 (-38.1%)
Mutual labels:  search-engine
Infinispan
Infinispan is an open source data grid platform and highly scalable NoSQL cloud data store.
Stars: ✭ 862 (+1952.38%)
Mutual labels:  search-engine
Riot
Go Open Source, Distributed, Simple and efficient Search Engine; Warning: This is V1 and beta version, because of big memory consume, and the V2 will be rewrite all code.
Stars: ✭ 6,025 (+14245.24%)
Mutual labels:  search-engine
Dbworld Search
🔍 简单的搜索引擎, django 框架
Stars: ✭ 39 (-7.14%)
Mutual labels:  search-engine
Censys Ruby
Ruby API client for the Censys internet-wide network-scan search engine
Stars: ✭ 8 (-80.95%)
Mutual labels:  search-engine
Flexsearch
Next-Generation full text search library for Browser and Node.js
Stars: ✭ 8,108 (+19204.76%)
Mutual labels:  search-engine
Covid 19 Bert Researchpapers Semantic Search
BERT semantic search engine for searching literature research papers for coronavirus covid-19 in google colab
Stars: ✭ 23 (-45.24%)
Mutual labels:  search-engine
Blast
Blast is a full text search and indexing server, written in Go, built on top of Bleve.
Stars: ✭ 934 (+2123.81%)
Mutual labels:  search-engine
Inverted index
A simple in memory inverted index in Python
Stars: ✭ 12 (-71.43%)
Mutual labels:  search-engine
Funpyspidersearchengine
Word2vec 千人千面 个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索
Stars: ✭ 782 (+1761.9%)
Mutual labels:  search-engine
Duckietv
A web application built with AngularJS to track your favorite tv-shows with semi-automagic torrent integration
Stars: ✭ 942 (+2142.86%)
Mutual labels:  search-engine
Bertsearch
Elasticsearch with BERT for advanced document search.
Stars: ✭ 684 (+1528.57%)
Mutual labels:  search-engine
Yub
yub.js - A command-line for the web
Stars: ✭ 10 (-76.19%)
Mutual labels:  search-engine
Algolia Webcrawler
Simple node worker that crawls sitemaps in order to keep an algolia index up-to-date
Stars: ✭ 40 (-4.76%)
Mutual labels:  search-engine
Awesome Seo
Google SEO研究及流量变现
Stars: ✭ 942 (+2142.86%)
Mutual labels:  search-engine
Shebanq
Exposing the Hebrew Text Database of the ETCBC
Stars: ✭ 13 (-69.05%)
Mutual labels:  search-engine

phinde - generic web search engine


Self-hosted search engine you can use for your static blog or about any other website you want search functionality for.

My live instance is at http://search.cweiske.de/ and indexes my website, blog and all linked URLs.

======== Features

  • Crawler and indexer with the ability to run many in parallel

  • Shows and highlights text that contains search words

  • Boolean search queries:

    • foo bar searches for foo AND bar
    • foo OR bar
    • title:foo searches for foo only in the page title
  • Facets for tag, domain, language and type

  • Date search:

    • before:2016-08-30 - modification date before that day
    • after:2016-08-30 - modified after that day
    • date::2016-08-30 - exact modification day match
  • Site search

    • Query: foo bar site:example.org/dir/
    • or use the site GET parameter: /?q=foo&site=example.org/dir
  • OpenSearch support with HTML and Atom result lists

  • Instant indexing with WebSub (formerly PubSubHubbub)

============ Dependencies

  • PHP 5.5+
  • Elasticsearch 2.0
  • MySQL or MariaDB for WebSub subscriptions
  • Gearman (Debian 9: gearman-job-server, not gearman-server)
  • PHP Gearman extension
  • Console_CommandLine
  • Net_URL2
  • Twig 1.x

===== Setup

#. Install and run Elasticsearch and Gearman #. Install php-gearman #. Get a local copy of the code::

 $ git clone https://git.cweiske.de/phinde.git phinde

#. Install dependencies via composer::

 $ composer install

#. Point your webserver's document root to phinde's www directory #. Copy data/config.php.dist to data/config.php and adjust it. Make sure your add your domain to the crawl whitelist. #. Create a MySQL database and import the schema from data/schema.sql #. Run bin/setup.php which sets up the Elasticsearch schema #. Put your homepage into the queue::

 $ ./bin/process.php http://example.org/

#. Start at least one worker to process the crawl+index queue::

 $ ./bin/phinde-worker.php

#. Check phinde's status page in your browser. The number of open tasks should be > 0, the number of workers also.

Re-index when your site changes

When your site changed, the search engine needs to re-crawl and re-index the pages.

Simply tell phinde that something changed by running::

$ ./bin/process.php http://example.org/foo.htm

phinde supports HTML pages and Atom feeds, so if your blog has a feed it's enough to let phinde reindex that one. It will find all linked pages automatically.

Website integration

Adding a simple search form to your website is easy. It needs two things:

  • <form> tag with an action that points to the phinde instance
  • Search text field with name of q.

Example::

Search

System service

When using systemd, you can let it run multiple worker instances when the system boots up:

#. Copy files data/systemd/phinde*.service into /etc/systemd/system/ #. Adjust user and group names, and the work directories #. Enable three worker processes::

 $ systemctl daemon-reload
 $ systemctl enable [email protected]
 $ systemctl enable [email protected]
 $ systemctl enable [email protected]
 $ systemctl enable phinde
 $ systemctl start phinde

#. Now three workers are running. Restarting the phinde service also restarts the workers.

Cron job

Run bin/renew-subscriptions.php once a day with cron. It will renew the WebSub subscriptions.

===== Howto

Delete index data from one domain::

$ curl -iv -XDELETE -H 'Content-Type: application/json' -d '{"query":{"term":{"domain":"example.org"}}}' http://127.0.0.1:9200/phinde/_query

That's delete-by-query 2.0, see https://www.elastic.co/guide/en/elasticsearch/plugins/2.0/delete-by-query-usage.html

Subscribe to a website/feed

Phinde supports WebSub__ to get subscribe to changes of a website. When phinde gets notified by the website's hub about changes, it will immediately crawl and index the changed pages.

Subscribe to a website's feed::

$ php bin/subscribe.php http://example.org/feed.atom

Phinde will determine the website's hub and send a registration request to it.

The status page will show the number of working, and the number of open subscriptions.

Unsubscribing also happens on command line::

$ php bin/unsubscribe.php http://example.org/feed.atom

__ https://www.w3.org/TR/websub/

============ About phinde

Source code

phinde's source code is available from http://git.cweiske.de/phinde.git or the mirror on github__.

__ https://github.com/cweiske/phinde

License

phinde is licensed under the AGPL v3 or later__.

__ http://www.gnu.org/licenses/agpl.html

Author

phinde was written by Christian Weiske__.

__ http://cweiske.de/

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].