Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

《数据采集从入门到放弃》源码。内容简介：爬虫介绍、就业情况、爬虫工程师面试题；HTTP协议介绍； Requests使用；解析器Xpath介绍； MongoDB与MySQL；多线程爬虫； Scrapy介绍；Scrapy-redis介绍；使用docker部署；使用nomad管理docker集群；使用EFK查询docker日志

Stars: ✭ 118 (+2.61%)

Mutual labels: crawler, mysql, requests

Examples Of Web Crawlers

一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )

Stars: ✭ 10,724 (+9225.22%)

Mutual labels: multithreading, crawler, spider

Tieba-Birthday-Spider

百度贴吧生日爬虫，可抓取贴吧内吧友生日，并且在对应日期自动发送祝福

Stars: ✭ 28 (-75.65%)

Mutual labels: spider, queue, requests

Price Monitor

京东商品价格监控：监控用户设定商品价格，降价邮件/微信提醒。技术：Python爬虫/IP代理池/JS接口爬取/Selenium页面爬取

Stars: ✭ 634 (+451.3%)

Mutual labels: crawler, mysql, requests

Geziyor

Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.

Stars: ✭ 1,246 (+983.48%)

Mutual labels: crawler, spider

Weibo Album Crawler

新浪微博相册大图多线程爬虫。

Stars: ✭ 83 (-27.83%)

Mutual labels: crawler, requests

Geetest

滑动验证码，希望对你们有所帮助❤️

Stars: ✭ 114 (-0.87%)

Mutual labels: bilibili, spider

Douyinsdk

抖音 SDK，数据采集，爬虫抓取不是梦

Stars: ✭ 99 (-13.91%)

Mutual labels: crawler, spider

Pkulaw spider

爬取北大法宝网http://www.pkulaw.cn/Case/

Stars: ✭ 113 (-1.74%)

Mutual labels: crawler, spider

Puppeteer Walker

a puppeteer walker 🕷 🕸

Stars: ✭ 78 (-32.17%)

Mutual labels: crawler, spider

Zhihuspider

知乎用户公开个人信息爬虫, 能够爬取用户关注关系，基于Python、使用代理、多线程

Stars: ✭ 92 (-20%)

Mutual labels: spider, mysql

Crawler examples

Some classic web crawler projects.一些经典的爬虫

Stars: ✭ 74 (-35.65%)

Mutual labels: crawler, spider

Gopa Abandoned

GOPA, a spider written in Go.（NOTE: this project moved to https://github.com/infinitbyte/gopa ）

Stars: ✭ 98 (-14.78%)

Mutual labels: crawler, spider

Spider

python crawler spider

Stars: ✭ 70 (-39.13%)

Mutual labels: crawler, spider

Thesaurusspider

下载搜狗、百度、QQ输入法的词库文件的 python 爬虫，可用于构建不同行业的词汇库

Stars: ✭ 98 (-14.78%)

Mutual labels: multithreading, crawler

Ruia

Async Python 3.6+ web scraping micro-framework based on asyncio

Stars: ✭ 1,366 (+1087.83%)

Mutual labels: crawler, spider

Crawler Detect

🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent

Stars: ✭ 1,549 (+1246.96%)

Mutual labels: crawler, spider

View All Similar Projects ➔

bilibili_member_crawler

B站用户信息爬虫 (求Star(^o^)/~ 仅供娱乐学习使用

环境

python 3.6+
mysql 5.7+

下载安装

下载源码:

git clone [email protected]:cwjokaka/bilibili_member_crawler.git

或者在https://github.com/cwjokaka/bilibili_member_crawler 下载zip文件

安装相关依赖:

pip install -r requirements.txt

文件介绍

bilibili_member_crawler.py：爬虫入口
distributor.py：任务分发器,负责生成任务到任务队列
worker.py：工作线程,负责从任务队列拉取任务,并把B站用户信息持久化到mysql
res_manager.py：资源管理,用于管理任务队列
variable.py: 配置文件, 包含代理、数据库、线程设置等
sql/bilibili.sql：数据库初始化文件
user-agents.txt：浏览器agent列表文件
exception/*：各类异常

注意

请控制好您的车速(由variable.py的线程数、爬取时间间隔决定)
代理PROXIES需要定期更换(2019/9/4已加入外部代理池)

PS:有时间会做相关统计 (溜了溜了

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 115

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗