INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰，旨在安全快捷的帮助用户拿回自己的数据，工具代码开源，流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源中国博客、简书。

Stars: ✭ 5,984 (+37300%)

Mutual labels: spider, selenium

Alipayspider Scrapy

AlipaySpider on Scrapy(use chrome driver); 支付宝爬虫(基于Scrapy)

Stars: ✭ 70 (+337.5%)

Mutual labels: spider, selenium

Python Spider

豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章

Stars: ✭ 615 (+3743.75%)

Mutual labels: spider, selenium

Python3 Spider

Python爬虫实战 - 模拟登陆各大网站包含但不限于：滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝，如果喜欢请start ❤️

Stars: ✭ 2,129 (+13206.25%)

Mutual labels: spider, selenium

Pddspider

拼多多爬虫，爬取所有商品、评论等信息

Stars: ✭ 121 (+656.25%)

Mutual labels: spider, selenium

zhihu-crawler

徒手实现定时爬取知乎，从中发掘有价值的信息，并可视化爬取的数据作网页展示。

Stars: ✭ 56 (+250%)

Mutual labels: spider, selenium

throughout

🎪 End-to-end testing made simple (using Jest and Puppeteer)

Stars: ✭ 16 (+0%)

Mutual labels: selenium

Shadow

计算机基础知识，数据结构，设计模式，Tomcat中间件的实现

Stars: ✭ 19 (+18.75%)

Mutual labels: spider

ha-multiscrape

Home Assistant custom component for scraping (html, xml or json) multiple values (from a single HTTP request) with a separate sensor/attribute for each value. Support for (login) form-submit functionality.

Stars: ✭ 103 (+543.75%)

Mutual labels: scrape

vaccipy

Automatische Impfterminbuchung für www.impfterminservice.de

Stars: ✭ 548 (+3325%)

Mutual labels: selenium

NScrapy

NScrapy is a .net core corss platform Distributed Spider Framework which provide an easy way to write your own Spider

Stars: ✭ 88 (+450%)

Mutual labels: spider

google-image-downloader

A script to download images from images.google.com

Stars: ✭ 28 (+75%)

Mutual labels: selenium

TikTok

Download public videos on TikTok using Python with Selenium

Stars: ✭ 37 (+131.25%)

Mutual labels: selenium

View All Similar Projects ➔

Spider

Spider项目将会不断更新本人学习使用过的爬虫方法！！！

1、在primary中使用的爬虫方法是比较原始基础的爬虫方法，这个方法是本人最初用的方法，实在研一期间做舆情分析项目时采用的爬虫方法。其中hotel文件夹爬取的是酒店的评论数据，先爬虫URL，然后更具URL爬虫评论，分别在url_spider和comment_spider文件中。环境语言 ubuntu python2.7 windows下没有跑过应该差不多。

2、在selenium是本人在公司帮同事爬取网页采用的方法，selenium+phantomjs/chromedriver 是比较流行的爬虫动态网页的方法，包括ajax网页。环境语言windows python2.7 ubuntu应该也能跑成功

3、CheckCaptha中利用CNN进行验证码识别，准备率可以达到98%，注释也非常详细。这个验证码识别算法仅仅是利用卷积神经算法对数字加大小写字母进行预测，不具有普遍性，可以当做学习CNN和TensorFlow的简单例子环境语言 ubuntu python2.7

4、LagouProject 这个项目是本人爬虫拉钩网职位信息写的一个爬虫程序，这个项目设置到的技术要素比较多，有cookie、多线程、IP代理以及使用scrapy都有涉及，是一个很好学习爬虫的程序，在这个不便做详细描述，可以参考本人博客地址:http://blog.csdn.net/demohui/article/details/77370313

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

demoyhui / Spider

Programming Languages

Labels

Projects that are alternatives of or similar to Spider

Spider