All Projects → rtrevinnoc → FUTURE

rtrevinnoc / FUTURE

Licence: GPL-3.0 license
A private, free, open-source search engine built on a P2P network

Programming Languages

python
139335 projects - #7 most used programming language
HTML
75241 projects
shell
77523 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to FUTURE

Lmdb Embeddings
Fast word vectors with little memory usage in Python
Stars: ✭ 404 (+2026.32%)
Mutual labels:  lmdb, gensim, glove
Wordembeddings Elmo Fasttext Word2vec
Using pre trained word embeddings (Fasttext, Word2Vec)
Stars: ✭ 146 (+668.42%)
Mutual labels:  gensim, glove
Magnitude
A fast, efficient universal vector embedding utility package.
Stars: ✭ 1,394 (+7236.84%)
Mutual labels:  gensim, glove
Vectorsinsearch
Dice.com repo to accompany the dice.com 'Vectors in Search' talk by Simon Hughes, from the Activate 2018 search conference, and the 'Searching with Vectors' talk from Haystack 2019 (US). Builds upon my conceptual search and semantic search work from 2015
Stars: ✭ 71 (+273.68%)
Mutual labels:  search-engine, glove
milli
Search engine library for Meilisearch ⚡️
Stars: ✭ 433 (+2178.95%)
Mutual labels:  search-engine, lmdb
jack bunny
Inspired by Facebook's bunnylol search engine.
Stars: ✭ 19 (+0%)
Mutual labels:  search-engine, flask-application
Text Sherlock
Text (source code) search engine with indexer and a front end web interface to search. Uses Python 3.
Stars: ✭ 103 (+442.11%)
Mutual labels:  search-engine, flask-application
Search Engine Parser
Lightweight package to query popular search engines and scrape for result titles, links and descriptions
Stars: ✭ 216 (+1036.84%)
Mutual labels:  search-engine
Elasticsearch
Free and Open, Distributed, RESTful Search Engine
Stars: ✭ 57,778 (+303994.74%)
Mutual labels:  search-engine
Acts as indexed
Acts As Indexed is a plugin which provides a pain-free way to add fulltext search to your Ruby on Rails app
Stars: ✭ 211 (+1010.53%)
Mutual labels:  search-engine
Examine
A .NET indexing and search engine powered by Lucene.Net
Stars: ✭ 208 (+994.74%)
Mutual labels:  search-engine
Magnetico
Autonomous (self-hosted) BitTorrent DHT search engine suite.
Stars: ✭ 2,626 (+13721.05%)
Mutual labels:  search-engine
hermes
A library and microservice implementing the health and care terminology SNOMED CT with support for cross-maps, inference, fast full-text search, autocompletion, compositional grammar and the expression constraint language.
Stars: ✭ 131 (+589.47%)
Mutual labels:  lmdb
Scout
RESTful search server written in Python, powered by SQLite.
Stars: ✭ 213 (+1021.05%)
Mutual labels:  search-engine
COCO-LM
[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
Stars: ✭ 109 (+473.68%)
Mutual labels:  natural-language-understanding
Alfanous
Alfanous is an Arabic search engine API provides the simple and advanced search in Quran , more features and many interfaces...
Stars: ✭ 209 (+1000%)
Mutual labels:  search-engine
first-contrib-app
A search engine to find good beginner issues across Github and become an open source contributor !
Stars: ✭ 33 (+73.68%)
Mutual labels:  search-engine
flask-ocr
use flask and tesseract to have a basic ocr, also you need opencv2, this code use opencv2 to have a basic image process
Stars: ✭ 27 (+42.11%)
Mutual labels:  flask-application
Algoliasearch Laravel
[Deprecated] We now recommend using Laravel Scout, see =>
Stars: ✭ 242 (+1173.68%)
Mutual labels:  search-engine
Dweb.page
Your Gateway to the Distributed Web
Stars: ✭ 239 (+1157.89%)
Mutual labels:  search-engine

Website Documentation Status GitHub Keybase BTC
Buy Me A Coffee

FUTURE

Screenshot_20200517_192300

FUTURE is a completely stand alone, open-source search engine that's focused on privacy and decentralization, so that any user can also self-host their own instance to contribute to a shared index of web pages accessible through any one of them. Given the small index that it currently has, it also works as a meta-search engine, mixing its own results with others from public Searx instances, to be capable of answering any request properly. Here is a small presentation that serves to show why FUTURE is different, better and how it accomplishes that.

The decentralization aspect of the search engine is a core feature as it allows anyone to expand the index and improve the service, while also increasing reliability by redundancy. Currently the main node is located at https://wearebuildingthefuture.com.

If you are planning to host your own instance, we strongly encourage you to consider using Uberspace as they offer an excellent service and instances for a fair price.

HOW DOES IT WORK?

Graph

DOCUMENTATION

Documentation is available on-line at https://wearebuildingthefuture.readthedocs.io/en/latest/ and in the docs directory.

QUICKSTART

After cloning the repository, add a config.py file, which will allow you to customize important parts of your instance without directly modifying the source code and struggling with updates. It is suggested to start with this configuration template, which is essentially equal to the one used for the main instance:

#!/usr/bin/env python3
# -*- coding: utf8 -*-
import secrets
from web3 import Web3
from tranco import Tranco

t = Tranco(cache=True, cache_dir='.tranco')

WTF_CSRF_ENABLED = True
SECRET_KEY = secrets.token_urlsafe(16)
HOST_NAME = "my_public_future_instance"         # THE NAMES 'private' and 'wearebuildingthefuture.com' are reserved for private and main nodes, respectively.
SEED_URLS = ["http://" + x for x in t.list().top(1000)]
PEER_PORT = 3000
HOME_URL = "wearebuildingthefuture.com"
LIMIT_DOMAINS = None
ALLOWED_DOMAINS = []
CONCURRENT_REQUESTS = 10
CONCURRENT_REQUESTS_PER_DOMAIN = 2.0
CONCURRENT_ITEMS = 100
REACTOR_THREADPOOL_MAXSIZE = 20
DOWNLOAD_MAXSIZE = 10000000
AUTOTHROTTLE = True
TARGET_CONCURRENCY = 2.0
MAX_DELAY = 30.0
START_DELAY = 1.0
DEPTH_PRIORITY = 1
LOG_LEVEL = 'INFO'
CONTACT = "[email protected]"
MAINTAINER = "Roberto Treviño Cervantes"
FIRST_NOTICE = "Written and Mantained By <a href='https://keybase.io/rtrevinnoc'>Roberto Treviño</a>"
SECOND_NOTICE = "Proudly Hosted on <a href='https://uberspace.de/en/'>Uberspace</a>"
DONATE = "<a href='https://www.buymeacoffee.com/searchatfuture'>DONATE</a>"
COLABORATE = "<a href='https://github.com/rtrevinnoc/FUTURE'>COLABORATE</a>"
CACHE_TIMEOUT = 15
CACHE_THRESHOLD = 100
COMPLEMENTARY_VECTOR_CACHE = -1
try:
	WEB3API = Web3(Web3.HTTPProvider('http://127.0.0.1:8545'))
	ETHEREUM_ACCOUNT = WEB3API.eth.accounts[0]
	CONTRACT_CODE = 'future-token/build/contracts/FUTURE.json'
	CONTRACT_ADDRESS = "0x2ebDA3D6B2F24aE57164b0384daa9af2C0D17323"
except:
	pass

NOTE: In case you want to use a docker container, simpy run the following commands before everything else below (Or use the pre-built image from DockerHub):

docker build -t future .
docker run -i -t -p 3000:3000 future bash

After you have configurated your FUTURE instance, but before you can start the server, you will be required to add a minimum of ~25 urls to your local index, by executing:

chmod +x bootstrap.sh
./bootstrap.sh
./build_index.sh

At any point in time, you can check how much webpages are in your local index by executing:

python3 count_index.py

And eventually, you can interrupt the crawler by executing:

./save_index.sh

Naturally, you can restart it using ./build_index.sh. And with this, you can start your development server with:

./future.py

However, if you are planning to contribute to the shared index by making your instance public, it is recommended to use uWSGI. We suggest using this configuration template, with touch uwsgi.ini, as it is used on the main instance.

[uwsgi]
module = future:app
pidfile = future.pid
http-socket = :3000
chmod-socket = 660
strict = true
master = true
enable-threads = true
vacuum = true                        ; Delete sockets during shutdown
single-interpreter = true
die-on-term = true                   ; Shutdown when receiving SIGTERM (default is respawn)
need-app = true

disable-logging = true               ; Disable built-in logging
log-4xx = true                       ; but log 4xx's anyway
log-5xx = true                       ; and 5xx's

cheaper-algo = busyness
processes = 6                        ; Maximum number of workers allowed
cheaper = 1                          ; Minimum number of workers allowed
cheaper-initial = 2                  ; Workers created at startup
cheaper-overload = 1                 ; Length of a cycle in seconds
cheaper-step = 1                     ; How many workers to spawn at a time

cheaper-busyness-multiplier = 30     ; How many cycles to wait before killing workers
cheaper-busyness-min = 20            ; Below this threshold, kill workers (if stable for multiplier cycles)
cheaper-busyness-max = 70            ; Above this threshold, spawn new workers
cheaper-busyness-backlog-alert = 4   ; Spawn emergency workers if more than this many requests are waiting in the queue
cheaper-busyness-backlog-step = 2    ; How many emergency workers to create if there are too many requests in the queue

Finally, start your public node to contribute to the shared network with the following command:

uwsgi uwsgi.ini

DEPENDENCIES

Below are listed all the projects upon which FUTURE rests.

Name License
Flask BSD 3-Clause
Werkzeug BSD 3-Clause
SymSpell MIT
Polyglot GPL v3
Beautifulsoup BSD 2-Clause
BSON Python bindings Apache 2.0
NumPy BSD 3-Clause
GeoPy MIT
SciKit Learn BSD 3-Clause
Pandas BSD 3-Clause
Gensim LGPL 2.1
NLTK Apache 2.0
Scrapy BSD License
H5PY BSD 3-Clause
LMBD OpenLDAP
LMBD Python bindings OpenLDAP
tldextract BSD 3-Clause
WTForms BSD 3-Clause
Flask_wtf BSD 3-Clause
HNSWLib Apache 2.0
JQuery MIT
JQuery UI MIT
Particles JS MIT
Ionicons MIT
Source Sans Pro OFL 1.1
GloVe Apache 2.0
SPARQLWrapper W3C License
TextScrambler BSD-like

FUTURE on w3m

asciicast

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].