All Projects → cwjokaka → Bilibili_member_crawler

cwjokaka / Bilibili_member_crawler

Licence: mit
B站用户爬虫 好耶~是爬虫

Programming Languages

python
139335 projects - #7 most used programming language
python3
1442 projects

Projects that are alternatives of or similar to Bilibili member crawler

Decryptlogin
APIs for loginning some websites by using requests.
Stars: ✭ 1,861 (+1518.26%)
Mutual labels:  bilibili, crawler, spider, requests
Bilili
🍻 bilibili video (including bangumi) and danmaku downloader | B站视频(含番剧)、弹幕下载器
Stars: ✭ 379 (+229.57%)
Mutual labels:  bilibili, crawler, spider, requests
Scrapingoutsourcing
ScrapingOutsourcing专注分享爬虫代码 尽量每周更新一个
Stars: ✭ 164 (+42.61%)
Mutual labels:  crawler, spider, requests
Docs
《数据采集从入门到放弃》源码。内容简介:爬虫介绍、就业情况、爬虫工程师面试题 ;HTTP协议介绍; Requests使用 ;解析器Xpath介绍; MongoDB与MySQL; 多线程爬虫; Scrapy介绍 ;Scrapy-redis介绍; 使用docker部署; 使用nomad管理docker集群; 使用EFK查询docker日志
Stars: ✭ 118 (+2.61%)
Mutual labels:  crawler, mysql, requests
Examples Of Web Crawlers
一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )
Stars: ✭ 10,724 (+9225.22%)
Mutual labels:  multithreading, crawler, spider
Tieba-Birthday-Spider
百度贴吧生日爬虫,可抓取贴吧内吧友生日,并且在对应日期自动发送祝福
Stars: ✭ 28 (-75.65%)
Mutual labels:  spider, queue, requests
Price Monitor
京东商品价格监控:监控用户设定商品价格,降价邮件/微信提醒。技术:Python爬虫/IP代理池/JS接口爬取/Selenium页面爬取
Stars: ✭ 634 (+451.3%)
Mutual labels:  crawler, mysql, requests
Geziyor
Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Stars: ✭ 1,246 (+983.48%)
Mutual labels:  crawler, spider
Weibo Album Crawler
新浪微博相册大图多线程爬虫。
Stars: ✭ 83 (-27.83%)
Mutual labels:  crawler, requests
Geetest
滑动验证码,希望对你们有所帮助❤️
Stars: ✭ 114 (-0.87%)
Mutual labels:  bilibili, spider
Douyinsdk
抖音 SDK,数据采集,爬虫抓取不是梦
Stars: ✭ 99 (-13.91%)
Mutual labels:  crawler, spider
Pkulaw spider
爬取北大法宝网http://www.pkulaw.cn/Case/
Stars: ✭ 113 (-1.74%)
Mutual labels:  crawler, spider
Puppeteer Walker
a puppeteer walker 🕷 🕸
Stars: ✭ 78 (-32.17%)
Mutual labels:  crawler, spider
Zhihuspider
知乎用户公开个人信息爬虫, 能够爬取用户关注关系,基于Python、使用代理、多线程
Stars: ✭ 92 (-20%)
Mutual labels:  spider, mysql
Crawler examples
Some classic web crawler projects.一些经典的爬虫
Stars: ✭ 74 (-35.65%)
Mutual labels:  crawler, spider
Gopa Abandoned
GOPA, a spider written in Go.(NOTE: this project moved to https://github.com/infinitbyte/gopa )
Stars: ✭ 98 (-14.78%)
Mutual labels:  crawler, spider
Spider
python crawler spider
Stars: ✭ 70 (-39.13%)
Mutual labels:  crawler, spider
Thesaurusspider
下载搜狗、百度、QQ输入法的词库文件的 python 爬虫,可用于构建不同行业的词汇库
Stars: ✭ 98 (-14.78%)
Mutual labels:  multithreading, crawler
Ruia
Async Python 3.6+ web scraping micro-framework based on asyncio
Stars: ✭ 1,366 (+1087.83%)
Mutual labels:  crawler, spider
Crawler Detect
🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent
Stars: ✭ 1,549 (+1246.96%)
Mutual labels:  crawler, spider

bilibili_member_crawler

B站用户信息爬虫 (求Star(^o^)/~ 仅供娱乐学习使用

环境

  • python 3.6+
  • mysql 5.7+

下载安装

  • 下载源码:
git clone [email protected]:cwjokaka/bilibili_member_crawler.git

或者在https://github.com/cwjokaka/bilibili_member_crawler 下载zip文件
  • 安装相关依赖:
pip install -r requirements.txt

文件介绍

  • bilibili_member_crawler.py:爬虫入口
  • distributor.py:任务分发器,负责生成任务到任务队列
  • worker.py:工作线程,负责从任务队列拉取任务,并把B站用户信息持久化到mysql
  • res_manager.py:资源管理,用于管理任务队列
  • variable.py: 配置文件, 包含代理、数据库、线程设置等
  • sql/bilibili.sql:数据库初始化文件
  • user-agents.txt:浏览器agent列表文件
  • exception/*:各类异常

注意

  • 请控制好您的车速(由variable.py的线程数、爬取时间间隔决定)
  • 代理PROXIES需要定期更换(2019/9/4已加入外部代理池)

PS:有时间会做相关统计 (溜了溜了

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].