All Projects → palmchou → Disec

palmchou / Disec

Distributed Image Search Engine Crawler

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Disec

Jlitespider
A lite distributed Java spider framework :-)
Stars: ✭ 151 (+1272.73%)
Mutual labels:  crawler, distributed
Scaleable Crawler With Docker Cluster
a scaleable and efficient crawelr with docker cluster , crawl million pages in 2 hours with a single machine
Stars: ✭ 96 (+772.73%)
Mutual labels:  crawler, distributed
Lizard
💐 Full Amazon Automatic Download
Stars: ✭ 41 (+272.73%)
Mutual labels:  crawler, distributed
Scrapy Redis
Redis-based components for Scrapy.
Stars: ✭ 4,998 (+45336.36%)
Mutual labels:  crawler, distributed
Dotnetspider
DotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework
Stars: ✭ 3,233 (+29290.91%)
Mutual labels:  crawler, distributed
Spoon
🥄 A package for building specific Proxy Pool for different Sites.
Stars: ✭ 173 (+1472.73%)
Mutual labels:  crawler, distributed
Goscraper
Golang pkg to quickly return a preview of a webpage (title/description/images)
Stars: ✭ 72 (+554.55%)
Mutual labels:  crawler, image
Laosj
golang light-weight image crawler
Stars: ✭ 199 (+1709.09%)
Mutual labels:  crawler, image
Haipproxy
💖 High available distributed ip proxy pool, powerd by Scrapy and Redis
Stars: ✭ 4,993 (+45290.91%)
Mutual labels:  crawler, distributed
Xxl Crawler
A distributed web crawler framework.(分布式爬虫框架XXL-CRAWLER)
Stars: ✭ 561 (+5000%)
Mutual labels:  crawler, distributed
Appcrawler
Android应用市场网络爬虫
Stars: ✭ 25 (+127.27%)
Mutual labels:  crawler
Subnode.org
SubNode: Social Media App
Stars: ✭ 25 (+127.27%)
Mutual labels:  distributed
Beian Domain
获取最新可备案域名列表爬虫
Stars: ✭ 9 (-18.18%)
Mutual labels:  crawler
Cachep2p
"More users = More capacity"
Stars: ✭ 855 (+7672.73%)
Mutual labels:  distributed
Scrapit
Scraping scripts for various websites.
Stars: ✭ 25 (+127.27%)
Mutual labels:  crawler
Cometa
Super fast, on-demand and on-the-fly, image processing.
Stars: ✭ 8 (-27.27%)
Mutual labels:  image
V Photoswipe
Vue plugin for image preview base on PhotoSwipe
Stars: ✭ 25 (+127.27%)
Mutual labels:  image
Remit
RabbitMQ-backed microservices supporting RPC, pubsub, automatic service discovery and scaling with no code changes.
Stars: ✭ 24 (+118.18%)
Mutual labels:  distributed
Appcrawler
基于appium的app自动遍历工具
Stars: ✭ 925 (+8309.09%)
Mutual labels:  crawler
Bootstrap Image Hover
Image hover effects that work with or without bootstrap
Stars: ✭ 858 (+7700%)
Mutual labels:  image

DiSec

Distributed Image Search Engine Crawler

Dependency

Beautiful Soup 4, install it using pip: pip install bs4.

Features

  • Craw image results with given keywords
  • Support baidu, google, bing
  • Distributed Server-Worker design
  • Keywords could be categorised

Get Started

  1. Set up settings by creating local_settings.json, there is an example of it provided
  2. Create keyword_list.json and fill keywords into it.
  3. Use keywords_creater.py who reads the user defined keyword_list.json then generates keywords.json which will be used by the manager server.
  4. Run manager_server.py, the manager server will start and listen to the port setted in local_settings.json
  5. Run SEARCH_ENGINE_worker.py to start crawling.

说明文档

依赖

Beautiful Soup 4, 使用 pip 安装: pip install bs4.

功能

  • 依据 所给关键词列表 爬取图片搜索结果
  • 支持 baidu, google, bing
  • 分布式设计,支持多个 worker 进程同时爬取。
  • 支持关键词分类

如何使用

  1. 参考样例,配置 local_settings.json
  2. 参考样例,创建 keyword_list.json 填写所需爬取的关键词列表.
  3. 使用 keywords_creater.py 来读取用户定义的 keyword_list.json 并生成 keywords.sjon
  4. 运行 manager_server.py,manager server 将会监听 local_settings.json 所设置的端口
  5. 运行 SEARCH_ENGINE_worker.py 开始爬取.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].