A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.

✭ 38

python redis crawler kafka spider rabbitmq scraping crawling scrapy distributed-spider redisbloom rabbitmq-pipeline

douban-movie

Get movie info from douban(豆瓣) and display in your terminal

✭ 17

python terminal spider douban douban-movie

aliexscrape

Get Aliexpress product details in JSON

✭ 80

javascript json crawler scraper spider aliexpress hacktoberfest dropship aliexpress-crawler aliexpress-spider dropshipping aliexpress-api aliexpress-scraper hacktoberfest2019 hacktoberfest19

OpenScraper

An open source webapp for scraping: towards a public service for webscraping

✭ 80

python HTML CSS javascript html scraper spider mongodb bulma tornado xpath scrapy entrepreneur-interet-general

OpenYspider

千万级图片爬虫、视频爬虫 [开源版本] Image Spider

✭ 122

HTML java image spider spring-boot selenium selenium-webdriver yande rosi tujidao tangyun yalayi mzsock

photo-spider-scrapy

10 photo website spiders, 10 个国外图库的 scrapy 爬虫代码

✭ 17

python crawler spider unsplash wikimedia scrapy photo pexels absfreepic magdeleine photock stocksnap stockvault visualhunt wallhalla

ChineseStarsRelationship

中国明星数据爬取。你甚至可以拿到互联网上所有的人之间的关系，接下来你可以自己发挥！基于这些数据，你可以完成更多有趣的事情。比如说社交网络分析，关系网络可视化，算法研究，和其他有意思的事情。Chinese star data crawling. You can even get all the people on the internet! Based on these data, you can do more interesting things. For example, social network analysis, relational network visualization, algorithm research, and other interesting things.

✭ 26

java spider knowledge-graph

spider-school

自动答题程序🎉

✭ 37

python HTML spider

node-html-crawler

Simple for use node html crawler (spider) of site web pages

✭ 30

javascript HTML crawler node spider

Scrapy IPProxyPool

免费 IP 代理池。Scrapy 爬虫框架插件

✭ 100

python crawler spider schedule crawl scrapy proxypool ipproxy

elves

🎊 Design and implement of lightweight crawler framework.

✭ 322

java spider scrapy douban-movie elves 163news

spider

🌟 powered by python3( simple learning of spider) 百度文库；网易云歌曲；豆瓣电影； GitHub；京东； QQ空间；天气； vip解析助手； TED文本内容； wifi破解脚本；必应图片设置为桌面等爬取

✭ 124

python javascript HTML julia crawler spider wifi

wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

✭ 52

c python perl shell Module Management System M4 crawler scraper downloader spider ftp scraping crawling archiving wget crawl zstd crawlers warc webarchiving archiveteam wget-lua

163Music

163music spider by scrapy.

✭ 60

python spider scrapy 163music

😚 Q & A website based on Spring Boot.

✭ 46

CSS java HTML javascript redis spider spring-boot jsoup mybatis

nodejs-meizitu

妹子图全站采集10G套图资源

✭ 80

javascript nodejs pictures spider

spider

A web spider framework

✭ 25

javascript nodejs crawler spider puppeteer nodejs-spider

Subbranch-China

银行、支行名称。中国各地区各银行支行名称数据爬虫，数据来源微信商户平台，已经整理可直接导入的sql文件

✭ 31

python spider bank subbranch banksdata

go-movies

golang spider Crawler 爬虫电影

✭ 168

go HTML docker redis crawler movies spider fasthttp colly gocolly

Sina Spider

新浪爬虫，基于Python+Selenium。模拟登陆后保存cookie，实现登录状态的保存。可以通过输入关键词来爬取到关键词相关的热门微博。

✭ 25

python spider simulation sina

Spider

Spider项目将会不断更新本人学习使用过的爬虫方法！！！

✭ 16

python spider selenium scrape

weibo topic

微博话题关键词,个人微博采集, 微博博文一键删除 selenium获取cookie,requests处理

✭ 28

python spider selenium requests weibo

crawler

一个php爬虫

✭ 13

PHP HTML crawler spider

devsearch

A web search engine built with Python which uses TF-IDF and PageRank to sort search results.

✭ 52

python HTML CSS Dockerfile shell search search-engine flask crawler spider mongodb pagerank scrapy tf-idf

Shadow

计算机基础知识，数据结构，设计模式，Tomcat中间件的实现

✭ 19

java javascript HTML ocr spider repitle

NScrapy

NScrapy is a .net core corss platform Distributed Spider Framework which provide an easy way to write your own Spider

✭ 88

C#spider dotnet distributed scrapy

ICP-Checker

ICP备案查询，可查询企业或域名的ICP备案信息，自动完成滑动验证，保存结果到Excel表格，适用于2022年新版的工信部备案管理系统网站，告别频繁拖动验证，以及某站*工具要开通VIP才可查看备案信息的坑

✭ 119

python spider information-security information-gathering icp beian osint-tool

robotstxt

robots.txt file parsing and checking for R

✭ 65

r crawler scraper spider rstats r-package webscraping robotstxt peer-reviewed

js block

研究学习各种拦截：反爬虫、拦截ad、防广告注入、斗黄牛等

✭ 59

HTML javascript CSS nodejs crawler spider block-ad block-res block-spider

zhihu-crawler

徒手实现定时爬取知乎，从中发掘有价值的信息，并可视化爬取的数据作网页展示。

✭ 56

python HTML CSS redis crawler spider mongodb selenium zhihu developing pipenv

yutto

🧊 一个可爱且任性的 B 站视频下载器（bilili V2）

✭ 383

python Dockerfile downloader video spider cross-platform coroutines aiohttp asyncio danmaku bangumi bilibili

job-spider

多线程爬取互联网行业常用招聘网站

✭ 28

python spider lagou

scripter

一些脚本和工具

✭ 20

javascript HTML python CSS steam spider proxy userscript booru danbooru scripts scripter konachan yandere neteasecloudmusic mygalgame

get LibSeat

利昂图书馆预约系统自动预约&签到程序。支持包括中国人民大学、北京师范大学、济南大学、哈尔滨工业大学等在内的38所高校的图书馆系统

✭ 39

python http spider https requests automatic

ZUCC ZhenFangHelper

正方教务管理系统学生版的自动登录、选课、信息获取

✭ 36

python java crawler ocr spider zucc-zhenfanghelper

Bilibili manga download

带图形界面的哔哩哔哩漫画下载工具

✭ 52

python crawler downloader qt spider bilibili pyside6

ComicSpider

动漫之家漫画站电脑版原图爬虫

✭ 67

python Batchfile crawler spider phantomjs comics

Tieba-Birthday-Spider

百度贴吧生日爬虫，可抓取贴吧内吧友生日，并且在对应日期自动发送祝福

✭ 28

python config spider mongodb queue pymongo requests threading post tieba beautifulsoup birthday

fetchurls

A bash script to spider a site, follow links, and fetch urls (with built-in filtering) into a generated text file.

✭ 97

shell website spider wget crawl urls

PTT Beauty Spider

PTT 表特版爬蟲圖片下載器

✭ 47

crawler beauty image spider download beautifulsoup ptt

bangumi yearly report

No description or website provided.

✭ 24

HTML python go shell spider

MoMo

利用墨墨背单词的分享功能拿每日20个的单词上限奖励（多线程

✭ 45

python spider

DSpiderDemo-Android

客户端爬虫安卓端demo

✭ 43

java javascript spider dspider

goSpider

some small project and some articles

✭ 56

Jupyter Notebook python spider network learning-python chinese douban spiders spiderbasic

feaplat

爬虫管理系统，支持集群，弹性伸缩。支持运行feapder、scrapy、selenium、playwright等各种框架及脚本

✭ 42

shell crawler spider feapder feaplat

TikTokDownloader PyWebIO

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音|TikTok数据爬取工具，支持API调用，在线批量解析及下载。

✭ 919

python javascript CSS api crawler scraper spider async aiohttp web-scraping asyncio asgi douyin tiktok fastapi pywebio no-watermark online-parsing douyin-tiktok-api douyin-tiktok-download

small-spider-project

日常爬虫

✭ 14

python javascript crawler spider scrapy

crawlerdetect

Golang module to detect bots and crawlers via the user agent

✭ 22

go shell crawler user-agent spider detect bot-detection crawler-detection

scraper

图片爬取下载工具，极速爬取下载站酷https://www.zcool.com.cn/, CNU 视觉 http://www.cnu.cc/ 设计师/用户上传的图片/照片/插画。

✭ 64

python scraper spider image-download

seenreq

Generate an object for testing if a request is sent, request is Mikeal's request.

✭ 42

javascript url crawler spider request post duplicates-removed

spider

python 爬虫(amazon, confluence ...)

✭ 21

python spider

blinkist-m4a-downloader

Grabs all of the audio files from all of the Blinkist books

✭ 100

go crawler data-mining scraper books spider data-processing audiobooks blinkist data-archiving

301-360 of 395 spider projects

first

‹

›