All Projects → nicholaskajoh → devsearch

nicholaskajoh / devsearch

Licence: other
A web search engine built with Python which uses TF-IDF and PageRank to sort search results.

Programming Languages

python
139335 projects - #7 most used programming language
HTML
75241 projects
CSS
56736 projects
Dockerfile
14818 projects
shell
77523 projects

Projects that are alternatives of or similar to devsearch

minimal-search-engine
最小のサーチエンジン/PageRank/tf-idf
Stars: ✭ 18 (-65.38%)
Mutual labels:  search-engine, pagerank, tf-idf
Funpyspidersearchengine
Word2vec 千人千面 个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索
Stars: ✭ 782 (+1403.85%)
Mutual labels:  search-engine, spider, scrapy
Py Elasticsearch Django
基于python语言开发的千万级别搜索引擎
Stars: ✭ 207 (+298.08%)
Mutual labels:  spider, scrapy
Spiderkeeper
admin ui for scrapy/open source scrapinghub
Stars: ✭ 2,562 (+4826.92%)
Mutual labels:  spider, scrapy
Spider job
招聘网数据爬虫
Stars: ✭ 234 (+350%)
Mutual labels:  spider, scrapy
Marmot
💐Marmot | Web Crawler/HTTP protocol Download Package 🐭
Stars: ✭ 186 (+257.69%)
Mutual labels:  spider, scrapy
Scrapydweb
Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. DEMO 👉
Stars: ✭ 2,385 (+4486.54%)
Mutual labels:  spider, scrapy
Gerapy
Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js
Stars: ✭ 2,601 (+4901.92%)
Mutual labels:  spider, scrapy
scrapy helper
Dynamic configurable crawl (动态可配置化爬虫)
Stars: ✭ 84 (+61.54%)
Mutual labels:  spider, scrapy
iresearch
IResearch is a cross-platform, high-performance document oriented search engine library written entirely in C++ with the focus on a pluggability of different ranking/similarity models
Stars: ✭ 121 (+132.69%)
Mutual labels:  search-engine, tf-idf
Scrapingoutsourcing
ScrapingOutsourcing专注分享爬虫代码 尽量每周更新一个
Stars: ✭ 164 (+215.38%)
Mutual labels:  spider, scrapy
small-spider-project
日常爬虫
Stars: ✭ 14 (-73.08%)
Mutual labels:  spider, scrapy
Fp Server
Free proxy server, continuously crawling and providing proxies, based on Tornado and Scrapy. 免费代理服务器,基于Tornado和Scrapy,在本地搭建属于自己的代理池
Stars: ✭ 154 (+196.15%)
Mutual labels:  spider, scrapy
Goribot
[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。
Stars: ✭ 190 (+265.38%)
Mutual labels:  spider, scrapy
Python3 Spider
Python爬虫实战 - 模拟登陆各大网站 包含但不限于:滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝,如果喜欢请start ❤️
Stars: ✭ 2,129 (+3994.23%)
Mutual labels:  spider, scrapy
Awesome Web Scraper
A collection of awesome web scaper, crawler.
Stars: ✭ 147 (+182.69%)
Mutual labels:  spider, scrapy
Scrapy demo
all kinds of scrapy demo
Stars: ✭ 128 (+146.15%)
Mutual labels:  spider, scrapy
Taobaoscrapy
😩Tool For Taobao/Tmall| 儿时玩具已经过时
Stars: ✭ 146 (+180.77%)
Mutual labels:  spider, scrapy
Intelligent Document Finder
Document Search Engine Tool
Stars: ✭ 45 (-13.46%)
Mutual labels:  search-engine, scrapy
Web-Iota
Iota is a web scraper which can find all of the images and links/suburls on a webpage
Stars: ✭ 60 (+15.38%)
Mutual labels:  spider, scrapy

devsearch

A web search engine built with Python which uses TF-IDF and PageRank to sort search results.

Stack

  • Flask (Python 3)
  • Scrapy
  • LXML
  • MongoEngine (MongoDB)
  • Bootstrap 4

Requirements

  • Docker
  • Docker Compose

Setup

  • Install Docker and Docker Compose.
  • Clone or download this repo.
  • Create a .env file from .env.example.
  • Run docker-compose up.

Crawling

  • Update the SPIDER_ALLOWED_DOMAINS variable in .env with domains you want the spider to crawl.
  • Add at least one url to the crawl_list collection (in MongoDB) for the spider to start with.
  • Run docker-compose run web flask crawl to crawl new web pages.
  • You can add the --recrawl option to update pages already crawled: docker-compose run web flask crawl --recrawl True.

Indexing

  • To index crawled pages, run docker-compose run web flask index.
  • To compute TFIDF, run the following one after the other:
    • docker-compose run web flask idf
    • docker-compose run web flask tfidf
  • To compute PageRank, run docker-compose run web flask rank.
  • To compute page-word score, run docker-compose run web flask score.

Deploy

  • Create a .env.secret file from .env.secret.example.
  • Run docker-compose -f docker-compose.prod.yml up --build -d.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].