cwjokaka / Bilibili_member_crawler
Licence: mit
B站用户爬虫 好耶~是爬虫
Stars: ✭ 115
Programming Languages
Projects that are alternatives of or similar to Bilibili member crawler
Decryptlogin
APIs for loginning some websites by using requests.
Stars: ✭ 1,861 (+1518.26%)
Mutual labels: bilibili, crawler, spider, requests
Bilili
🍻 bilibili video (including bangumi) and danmaku downloader | B站视频(含番剧)、弹幕下载器
Stars: ✭ 379 (+229.57%)
Mutual labels: bilibili, crawler, spider, requests
Scrapingoutsourcing
ScrapingOutsourcing专注分享爬虫代码 尽量每周更新一个
Stars: ✭ 164 (+42.61%)
Mutual labels: crawler, spider, requests
Docs
《数据采集从入门到放弃》源码。内容简介:爬虫介绍、就业情况、爬虫工程师面试题 ;HTTP协议介绍; Requests使用 ;解析器Xpath介绍; MongoDB与MySQL; 多线程爬虫; Scrapy介绍 ;Scrapy-redis介绍; 使用docker部署; 使用nomad管理docker集群; 使用EFK查询docker日志
Stars: ✭ 118 (+2.61%)
Mutual labels: crawler, mysql, requests
Examples Of Web Crawlers
一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )
Stars: ✭ 10,724 (+9225.22%)
Mutual labels: multithreading, crawler, spider
Tieba-Birthday-Spider
百度贴吧生日爬虫,可抓取贴吧内吧友生日,并且在对应日期自动发送祝福
Stars: ✭ 28 (-75.65%)
Mutual labels: spider, queue, requests
Price Monitor
京东商品价格监控:监控用户设定商品价格,降价邮件/微信提醒。技术:Python爬虫/IP代理池/JS接口爬取/Selenium页面爬取
Stars: ✭ 634 (+451.3%)
Mutual labels: crawler, mysql, requests
Geziyor
Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Stars: ✭ 1,246 (+983.48%)
Mutual labels: crawler, spider
Zhihuspider
知乎用户公开个人信息爬虫, 能够爬取用户关注关系,基于Python、使用代理、多线程
Stars: ✭ 92 (-20%)
Mutual labels: spider, mysql
Crawler examples
Some classic web crawler projects.一些经典的爬虫
Stars: ✭ 74 (-35.65%)
Mutual labels: crawler, spider
Gopa Abandoned
GOPA, a spider written in Go.(NOTE: this project moved to https://github.com/infinitbyte/gopa )
Stars: ✭ 98 (-14.78%)
Mutual labels: crawler, spider
Thesaurusspider
下载搜狗、百度、QQ输入法的词库文件的 python 爬虫,可用于构建不同行业的词汇库
Stars: ✭ 98 (-14.78%)
Mutual labels: multithreading, crawler
Ruia
Async Python 3.6+ web scraping micro-framework based on asyncio
Stars: ✭ 1,366 (+1087.83%)
Mutual labels: crawler, spider
Crawler Detect
🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent
Stars: ✭ 1,549 (+1246.96%)
Mutual labels: crawler, spider
bilibili_member_crawler
B站用户信息爬虫 (求Star(^o^)/~ 仅供娱乐学习使用
环境
- python 3.6+
- mysql 5.7+
下载安装
- 下载源码:
git clone [email protected]:cwjokaka/bilibili_member_crawler.git
或者在https://github.com/cwjokaka/bilibili_member_crawler 下载zip文件
- 安装相关依赖:
pip install -r requirements.txt
文件介绍
-
bilibili_member_crawler.py
:爬虫入口 -
distributor.py
:任务分发器,负责生成任务到任务队列 -
worker.py
:工作线程,负责从任务队列拉取任务,并把B站用户信息持久化到mysql -
res_manager.py
:资源管理,用于管理任务队列 -
variable.py
: 配置文件, 包含代理、数据库、线程设置等 -
sql/bilibili.sql
:数据库初始化文件 -
user-agents.txt
:浏览器agent列表文件 -
exception/*
:各类异常
注意
- 请控制好您的车速(由variable.py的线程数、爬取时间间隔决定)
- 代理PROXIES需要定期更换(2019/9/4已加入外部代理池)
PS:有时间会做相关统计 (溜了溜了
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].