All Projects → cblgh → Lieu

cblgh / Lieu

Licence: agpl-3.0
community search engine

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to Lieu

Awesome Solr
A curated list of Awesome Apache Solr links and resources.
Stars: ✭ 69 (-9.21%)
Mutual labels:  search, search-engine
Minisearch
Tiny and powerful JavaScript full-text search engine for browser and Node
Stars: ✭ 737 (+869.74%)
Mutual labels:  search, search-engine
Elasticsuite
Smile ElasticSuite - Magento 2 merchandising and search engine built on ElasticSearch
Stars: ✭ 647 (+751.32%)
Mutual labels:  search, search-engine
Typesense
Fast, typo tolerant, fuzzy search engine for building delightful search experiences ⚡ 🔍 ✨ An Open Source alternative to Algolia and an Easier-to-Use alternative to ElasticSearch.
Stars: ✭ 8,644 (+11273.68%)
Mutual labels:  search, search-engine
Flexsearch
Next-Generation full text search library for Browser and Node.js
Stars: ✭ 8,108 (+10568.42%)
Mutual labels:  search, search-engine
Manticoresearch
Database for search
Stars: ✭ 610 (+702.63%)
Mutual labels:  search, search-engine
Riot
Go Open Source, Distributed, Simple and efficient Search Engine; Warning: This is V1 and beta version, because of big memory consume, and the V2 will be rewrite all code.
Stars: ✭ 6,025 (+7827.63%)
Mutual labels:  search, search-engine
Resin
Hardware-accelerated vector-based search engine. Available as a HTTP service or as an embedded library.
Stars: ✭ 529 (+596.05%)
Mutual labels:  search, search-engine
Better Search
Better Search WordPress plugin
Stars: ✭ 9 (-88.16%)
Mutual labels:  search, search-engine
Blast
Blast is a full text search and indexing server, written in Go, built on top of Bleve.
Stars: ✭ 934 (+1128.95%)
Mutual labels:  search, search-engine
Filemasta
A search application to explore, discover and share online files
Stars: ✭ 571 (+651.32%)
Mutual labels:  search, search-engine
Github Awesome Autocomplete
Add instant search capabilities to GitHub's search bar
Stars: ✭ 1,015 (+1235.53%)
Mutual labels:  search, search-engine
Algoliasearch Client Php
⚡️ A fully-featured and blazing-fast PHP API client to interact with Algolia.
Stars: ✭ 565 (+643.42%)
Mutual labels:  search, search-engine
Searx
Privacy-respecting metasearch engine
Stars: ✭ 10,074 (+13155.26%)
Mutual labels:  search, search-engine
Fess
Fess is very powerful and easily deployable Enterprise Search Server.
Stars: ✭ 561 (+638.16%)
Mutual labels:  search, search-engine
Search cop
Search engine like fulltext query support for ActiveRecord
Stars: ✭ 660 (+768.42%)
Mutual labels:  search, search-engine
Pisa
PISA: Performant Indexes and Search for Academia
Stars: ✭ 489 (+543.42%)
Mutual labels:  search, search-engine
Instantsearch Ios
⚡️ A library of widgets and helpers to build instant-search applications on iOS.
Stars: ✭ 498 (+555.26%)
Mutual labels:  search, search-engine
Search Ui
🔍 A set of UI components to build a fully customized search!
Stars: ✭ 24 (-68.42%)
Mutual labels:  search, search-engine
Opensse
Open Sketch Search Engine- 3D object retrieval based on sketch image as input
Stars: ✭ 883 (+1061.84%)
Mutual labels:  search, search-engine

Lieu

an alternative search engine

Created in response to the environs of apathy concerning the use of hypertext search and discovery. In Lieu, the internet is not what is made searchable, but instead one's own neighbourhood. Put differently, Lieu is a neighbourhood search engine, a way for personal webrings to increase serendipitous connexions.

lieu screenshot

Goals

  • Enable serendipitous discovery
  • Support personal communities
  • Be reusable, easily

Usage

$ lieu help
Lieu: neighbourhood search engine

Commands
- precrawl  (scrapes config's general.url for a list of links: <li> elements containing an anchor <a> tag)
- crawl     (start crawler, crawls all urls in config's crawler.webring file)
- ingest    (ingest crawled data, generates database)
- search    (interactive cli for searching the database)
- host      (hosts search engine over http)

Example:
    lieu precrawl > data/webring.txt
    lieu ingest
    lieu host

Lieu's crawl & precrawl commands output to standard output, for easy inspection of the data. You typically want to redirect their output to the files Lieu reads from, as defined in the config file. See below for a typical workflow.

Workflow

  • Edit the config
  • Add domains to crawl in config.crawler.webring
    • If you have a webpage with links you want to crawl:
    • Set the config's url field to that page
    • Populate the list of domains to crawl with precrawl: lieu precrawl > data/webring.txt
  • Crawl: lieu crawl > data/source.txt
  • Create database: lieu ingest
  • Host engine: lieu host

After ingesting the data with lieu ingest, you can also use lieu to search the corpus in the terminal with lieu search.

Config

The config file is written in TOML.

[general]
name = "Merveilles Webring"
# used by the precrawl command and linked to in /about route
url = "https://webring.xxiivv.com"
port = 10001

[data]
# the source file should contain the crawl command's output 
source = "data/crawled.txt"
# location & name of the sqlite database
database = "data/searchengine.db"
# contains words and phrases disqualifying scraped paragraphs from being presented in search results
heuristics = "data/heuristics.txt"
# aka stopwords, in the search engine biz: https://en.wikipedia.org/wiki/Stop_word
wordlist = "data/wordlist.txt"

[crawler]
# manually curated list of domains, or the output of the precrawl command
webring = "data/webring.txt"
# domains that are banned from being crawled but might originally be part of the webring
bannedDomains = "data/banned-domains.txt"
# file suffixes that are banned from being crawled
bannedSuffixes = "data/banned-suffixes.txt"
# phrases and words which won't be scraped (e.g. if a contained in a link)
boringWords = "data/boring-words.txt"
# domains that won't be output as outgoing links
boringDomains = "data/boring-domains.txt"

For your own use, the following config fields should be customized:

  • name
  • url
  • port
  • source
  • webring
  • bannedDomains

The following config-defined files can stay as-is unless you have specific requirements:

  • database
  • heuristics
  • wordlist
  • bannedSuffixes

For a full rundown of the files and their various jobs, see the files description.

License

Source code AGPL-3.0-or-later, Inter is available under SIL OPEN FONT LICENSE Version 1.1, Noto Serif is licensed as Apache License, Version 2.0.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].