palmchou / Disec
Distributed Image Search Engine Crawler
Stars: ✭ 11
Programming Languages
python
139335 projects - #7 most used programming language
Labels
Projects that are alternatives of or similar to Disec
Jlitespider
A lite distributed Java spider framework :-)
Stars: ✭ 151 (+1272.73%)
Mutual labels: crawler, distributed
Scaleable Crawler With Docker Cluster
a scaleable and efficient crawelr with docker cluster , crawl million pages in 2 hours with a single machine
Stars: ✭ 96 (+772.73%)
Mutual labels: crawler, distributed
Scrapy Redis
Redis-based components for Scrapy.
Stars: ✭ 4,998 (+45336.36%)
Mutual labels: crawler, distributed
Dotnetspider
DotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework
Stars: ✭ 3,233 (+29290.91%)
Mutual labels: crawler, distributed
Spoon
🥄 A package for building specific Proxy Pool for different Sites.
Stars: ✭ 173 (+1472.73%)
Mutual labels: crawler, distributed
Goscraper
Golang pkg to quickly return a preview of a webpage (title/description/images)
Stars: ✭ 72 (+554.55%)
Mutual labels: crawler, image
Haipproxy
💖 High available distributed ip proxy pool, powerd by Scrapy and Redis
Stars: ✭ 4,993 (+45290.91%)
Mutual labels: crawler, distributed
Xxl Crawler
A distributed web crawler framework.(分布式爬虫框架XXL-CRAWLER)
Stars: ✭ 561 (+5000%)
Mutual labels: crawler, distributed
Cometa
Super fast, on-demand and on-the-fly, image processing.
Stars: ✭ 8 (-27.27%)
Mutual labels: image
V Photoswipe
Vue plugin for image preview base on PhotoSwipe
Stars: ✭ 25 (+127.27%)
Mutual labels: image
Remit
RabbitMQ-backed microservices supporting RPC, pubsub, automatic service discovery and scaling with no code changes.
Stars: ✭ 24 (+118.18%)
Mutual labels: distributed
Bootstrap Image Hover
Image hover effects that work with or without bootstrap
Stars: ✭ 858 (+7700%)
Mutual labels: image
DiSec
Distributed Image Search Engine Crawler
Dependency
Beautiful Soup 4, install it using pip: pip install bs4
.
Features
- Craw image results with given keywords
- Support baidu, google,
bing - Distributed Server-Worker design
- Keywords could be categorised
Get Started
- Set up settings by creating
local_settings.json
, there is an example of it provided - Create
keyword_list.json
and fill keywords into it. - Use
keywords_creater.py
who reads the user definedkeyword_list.json
then generateskeywords.json
which will be used by the manager server. - Run
manager_server.py
, the manager server will start and listen to the port setted inlocal_settings.json
- Run
SEARCH_ENGINE_worker.py
to start crawling.
说明文档
依赖
Beautiful Soup 4, 使用 pip 安装: pip install bs4
.
功能
- 依据 所给关键词列表 爬取图片搜索结果
- 支持 baidu, google,
bing - 分布式设计,支持多个 worker 进程同时爬取。
- 支持关键词分类
如何使用
- 参考样例,配置
local_settings.json
- 参考样例,创建
keyword_list.json
填写所需爬取的关键词列表. - 使用
keywords_creater.py
来读取用户定义的keyword_list.json
并生成keywords.sjon
- 运行
manager_server.py
,manager server 将会监听local_settings.json
所设置的端口 - 运行
SEARCH_ENGINE_worker.py
开始爬取.
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].