Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → chenjiandongx → Bili Spider

chenjiandongx / Bili Spider

Licence: mit

📺 B 站全站视频信息爬虫

Programming Languages

python

139335 projects - #7 most used programming language

Labels

spider bilibili

Projects that are alternatives of or similar to Bili Spider

Animesearcher

整合第三方网站的视频和弹幕资源, 为白嫖党提供最佳看番追剧体验

Stars: ✭ 101 (-75.6%)

Mutual labels: bilibili, spider

Bilibili member crawler

B站用户爬虫好耶~是爬虫

Stars: ✭ 115 (-72.22%)

Mutual labels: bilibili, spider

Bilibili Api

哔哩哔哩的API调用模块

Stars: ✭ 704 (+70.05%)

Mutual labels: bilibili, spider

Bilili

🍻 bilibili video (including bangumi) and danmaku downloader | B站视频（含番剧）、弹幕下载器

Stars: ✭ 379 (-8.45%)

Mutual labels: bilibili, spider

Biliutil

Bilibili.com视频批量下载工具包

Stars: ✭ 212 (-48.79%)

Mutual labels: bilibili, spider

Decryptlogin

APIs for loginning some websites by using requests.

Stars: ✭ 1,861 (+349.52%)

Mutual labels: bilibili, spider

Geetest

滑动验证码，希望对你们有所帮助❤️

Stars: ✭ 114 (-72.46%)

Mutual labels: bilibili, spider

Bilibili manga download

带图形界面的哔哩哔哩漫画下载工具

Stars: ✭ 52 (-87.44%)

Mutual labels: spider, bilibili

Videospider

抓取豆瓣，bilibili等中的电视剧、电影、动漫演员等信息

Stars: ✭ 186 (-55.07%)

Mutual labels: bilibili, spider

Bilibili User Information Spider

B站3亿用户信息爬虫（mid号，昵称，性别，关注，粉丝，等级）

Stars: ✭ 136 (-67.15%)

Mutual labels: bilibili, spider

bilibili-smallvideo

🕷️用于爬取B站前top100的小视频

Stars: ✭ 133 (-67.87%)

Mutual labels: spider, bilibili

yutto

🧊 一个可爱且任性的 B 站视频下载器（bilili V2）

Stars: ✭ 383 (-7.49%)

Mutual labels: spider, bilibili

Fictiondown

Stars: ✭ 362 (-12.56%)

Mutual labels: spider

Bilihelper Personal

（Bilibili）B 站自动领瓜子、直播助手、直播挂机脚本、主站助手 - PHP 版（Personal）

Stars: ✭ 362 (-12.56%)

Mutual labels: bilibili

Searchdialog

仿bilibili搜索框效果(三句代码实现)

Stars: ✭ 361 (-12.8%)

Mutual labels: bilibili

Templatespider

扒网站工具，看好哪个网站，指定好URL，自动扒下来做成模版。所见网站，皆可为我所用！

Stars: ✭ 390 (-5.8%)

Mutual labels: spider

Biliroku

bilibili 生放送（直播）录制

Stars: ✭ 382 (-7.73%)

Mutual labels: bilibili

Freshonions Torscraper

Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion

Stars: ✭ 348 (-15.94%)

Mutual labels: spider

Xcrawler

快速、简洁且强大的PHP爬虫框架

Stars: ✭ 344 (-16.91%)

Mutual labels: spider

Bilibili Helper O

哔哩哔哩 (bilibili.com) 辅助工具，可以替换播放器、推送通知并进行一些快捷操作

Stars: ✭ 3,717 (+797.83%)

Mutual labels: bilibili

View All Similar Projects ➔

B 站全站视频信息爬虫

B 站我想大家都熟悉吧，其实 B 站的爬虫网上一搜一大堆。不过 纸上得来终觉浅，绝知此事要躬行，我码故我在。最终爬取到数据总量为 1300 万 条。

开发环境为：Windows10 + python3

准备工作

首先打开 B 站，随便在首页找一个视频点击进去。常规操作，打开开发者工具。这次是目标是通过爬取 B 站提供的 api 来获取视频信息，不去解析网页，解析网页的速度太慢了而且容易被封 ip。

勾选 JS 选项，F5 刷新

找到了 api 的地址

复制下来，去除没必要的内容，得到 https://api.bilibili.com/x/web-interface/archive/stat?aid=15906633 ，用浏览器打开，会得到如下的 json 数据

动手写码

好了，到这里代码就可以码起来了，通过 request 不断的迭代获取数据，为了让爬虫更高效，可以利用多线程。

核心代码

result = []
req = requests.get(url, headers=headers, timeout=6).json()
time.sleep(0.6)     # 延迟，避免太快 ip 被封
try:
    data = req['data']
    video = (
        total,
        data['aid'],        # 视频编号
        data['view'],       # 播放量
        data['danmaku'],    # 弹幕数
        data['reply'],      # 评论数
        data['favorite'],   # 收藏数
        data['coin'],       # 硬币数
        data['share']       # 分享数
    )
    with lock:
        result.append(video)
        if total % 100 == 0:
            print(total)
        total += 1
except:
    pass

迭代爬取

urls = ["http://api.bilibili.com/archive_stat/stat?aid={}".format(i)
        for i in range(10000)]
with futures.ThreadPoolExecutor(32) as executor:    # 多线程
    executor.map(run, urls)

爬取后数据存放进了 MySQL 数据库，总共爬取到了 1300w+ 条数据

前 750w 条数据在这里 bili.zip

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 414

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗