Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → RecluseXU → learning_spider

RecluseXU / learning_spider

Licence: MIT license

这其实是一份学习笔记。包括学习记录、爬虫练习平台（网站）、自制工具脚本

Programming Languages

75241 projects

184084 projects - #8 most used programming language

139335 projects - #7 most used programming language

56736 projects

7211 projects

77523 projects

Labels

website web spider learning-by-doing

Projects that are alternatives of or similar to learning spider

163music spider by scrapy.

Stars: ✭ 60 (+11.11%)

Mutual labels: spider

自动答题程序🎉

Stars: ✭ 37 (-31.48%)

Mutual labels: spider

Get movie info from douban(豆瓣) and display in your terminal

Stars: ✭ 17 (-68.52%)

Mutual labels: spider

🌟 powered by python3( simple learning of spider) 百度文库；网易云歌曲；豆瓣电影； GitHub；京东； QQ空间；天气； vip解析助手； TED文本内容； wifi破解脚本；必应图片设置为桌面等爬取

Stars: ✭ 124 (+129.63%)

Mutual labels: spider

node-html-crawler

Simple for use node html crawler (spider) of site web pages

Stars: ✭ 30 (-44.44%)

Mutual labels: spider

photo-spider-scrapy

10 photo website spiders, 10 个国外图库的 scrapy 爬虫代码

Stars: ✭ 17 (-68.52%)

Mutual labels: spider

learning-computer-science

Learning data structures, algorithms, machine learning and various computer science constructs by programming practice from resources around the web.

Stars: ✭ 28 (-48.15%)

Mutual labels: learning-by-doing

A web spider for shodan.io without using the Developer API.

Stars: ✭ 30 (-44.44%)

Mutual labels: spider

🕰 A customizable analog clock built using React

Stars: ✭ 16 (-70.37%)

Mutual labels: learning-by-doing

Get Aliexpress product details in JSON

Stars: ✭ 80 (+48.15%)

Mutual labels: spider

🎊 Design and implement of lightweight crawler framework.

Stars: ✭ 322 (+496.3%)

Mutual labels: spider

Nutshell-Machine-Learning

This is a repository built by the community for the community.

Stars: ✭ 77 (+42.59%)

Mutual labels: learning-by-doing

千万级图片爬虫、视频爬虫 [开源版本] Image Spider

Stars: ✭ 122 (+125.93%)

Mutual labels: spider

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

Stars: ✭ 52 (-3.7%)

Mutual labels: spider

scrapy-distributed

A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.

Stars: ✭ 38 (-29.63%)

Mutual labels: spider

😚 Q & A website based on Spring Boot.

Stars: ✭ 46 (-14.81%)

Mutual labels: spider

ChineseStarsRelationship

中国明星数据爬取。你甚至可以拿到互联网上所有的人之间的关系，接下来你可以自己发挥！基于这些数据，你可以完成更多有趣的事情。比如说社交网络分析，关系网络可视化，算法研究，和其他有意思的事情。Chinese star data crawling. You can even get all the people on the internet! Based on these data, you can do more interesting things. For example, social network analysis, relational network visualization, algorithm research, and other interesting things.

Stars: ✭ 26 (-51.85%)

Mutual labels: spider

爬取QQ用户信息（qq号、昵称、生日、地址等基本信息）并做简要analysis。

Stars: ✭ 21 (-61.11%)

Mutual labels: spider

landchina-spider

项目已经过时！无法应用在改版后的网站上。

Stars: ✭ 13 (-75.93%)

Mutual labels: spider

An open source webapp for scraping: towards a public service for webscraping

Stars: ✭ 80 (+48.15%)

Mutual labels: spider

View All Similar Projects ➔

简介

此项目主要分为三个部分

爬虫案例
对一些网站的数据进行爬取，标记出难度和要点
web网站 http://learnspider.evilrecluse.top/
根据反反爬经验，尝试去接入、编写各类爬虫障碍，顺便研究前端/后端/服务器知识
辅助工具/脚本制造
尝试制作各种工具/脚本

已完成的内容 - 目前进度估计 5%

案例

难度	内容	信息	方式	难点
基础	各类库的简单用法	基本使用方法	查看文档编写demo
入门	猫眼电影排行榜top100	静态网页	requests
	亚马逊中国商城搜索页	静态网页	requests
	今日头条搜索结果	动态网页	requests
	微博移动端用户动态信息	动态网页	requests	since_id参数的所在地
	Bilibili观察者见齐指数	动态网页	requests	被压缩的指数数据的还原
	最简单的滑块验证码	动态网页	Selenium	滑块移动
简单	某路由器密码加密方法	单个js文件		寻找加密函数
	无限debugger处理	动态网页	reres	反调试
	AAEncode解加密	动态网页	Devtool	编码加密
	CSS元素绝对定位反爬	静态网页	pyppeteer	还原元素顺序
	CSS伪类反爬	静态网页	requests	还原伪类内容
	58同城品牌公寓	静态网页	requests	静态字体加密
	安居客指纹研究	单个js文件	Devtool	研究明白收集的信息的意义
一般	知乎文章信息	动态网页	requests	头部`x-zse-86`参数加密基于时间的反调试
	china_cn字体加密处理	动态网页	fontTool	动态字体加密处理
	百度混淆代码处理	单个js文件	@bebel	各类还原插件编写
	加速乐混淆代码处理	拦截设置Cookies	@bebel	OB混淆代码还原
较难	carbosynth抓取一张图	简易TLS指纹	修改默认安全组件配置	理解TLS

基础练习网站

网站网址（已备案）: http://learnspider.evilrecluse.top/

案例

类型	难度	名称	信息
滑块验证	入门	最简单的滑块验证	只要拖动滑块，滑到尽头就可以通过，不存在任何检测
滑块验证	简单	SliderCaptcha	默认设置部署，存在基本的人机验证，匀速拉动/直线拉动不会通过验证
CSS反爬	入门	绝对定位反爬	利用绝对定位的特性，将数据分散打乱写入html后，通过坐标还原观感
	简单	伪类反爬	利用伪类content能显示数据的特点，将部分数据用content展示
	一般	静态字体加密反爬	让一些Unicode文字使用自定义字体解析，让使用标准Unicode解析的人爬不到数据单个访问过程中字体并不会变化
js反爬	入门	反调试	利用定时启动的/嵌套的debugger来让浏览器一直处于无法退出的调试状态
	简单	禁调试	编写代码禁止打开浏览器控制台
	简单	AAEncode	用表情字符取代常见字符，让人难以阅读
	简单	JSFuck	用几种基本字符取代大部分常见字符，让人难以阅读
数据加密	一般	AES对称加密	将传输的数据进行加密
数据加密	一般	自定义Base64码表加密	将传输的数据进行加密
指纹反爬	简单	最简易Selenium识别	检查自动创建两个的变量

技术应用

	使用	信息
规范	REST	规范的API，规范的响应
CDN	bootcdn.cn	免费的前端开源项目 CDN 加速服务
前端	JQuery 2.2.4	一个快速、简洁的JavaScript框架
	Materialize	基于Material Design的前端响应式框架
	twitter-bootstrap 3.4.1	Twitter推出的一个用于前端开发的开源工具包
	font-awesome 4.7.0	一套图标字体库和CSS框架
	metisMenu 3.0.6	Vanilla-JS 折叠菜单插件
代理服务器	nginx	高性能的HTTP/反向代理服务器
Web服务器	uWSGI	一个Web服务器
后端	Flask 1.1.2	Python轻量级web框架
	Flask-RESTful 0.3.8	一个支持快速创建REST APIs的 Flask插件

工具/脚本制造

工具/脚本制造
内容	信息
Auto DL ChromeWebDriver	Windows中，自动下载 Selenium ChromeWebDriver 脚本从注册表获知 Chrome 版本信息，从google下载最符合版本的 Web Driver，使得 Selenium 能正常运行 (实际上更推荐在服务器上部署docker，拉取Selenium的Image，然后部署，远程调用的这种方式)
slother	在 Selenium 基础上封装了一层, 以应对用 Selenium 进行爬虫的时候会碰到的常见问题
@Babel/traverse API document	自行编写的 Babel/traverse API 文档与用例, 内容已经转移到另外的仓库由于Babel官方并没有给出 Babel/traverse 的文档，所以只能自行记录内容根据源码内容自行理解/编写，可能存在错误，欢迎纠正
Font Encryption Detective	基于OCR制作的解字体加密脚本

TODO

完成3DM登录的内容
重构练习平台
完善Babel插件

2021年11月7日

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 54

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗