facert / Tumblr_spider
Licence: mit
汤不热 python 多线程爬虫
Stars: ✭ 458
Programming Languages
python
139335 projects - #7 most used programming language
Projects that are alternatives of or similar to Tumblr spider
Webster
a reliable high-level web crawling & scraping framework for Node.js.
Stars: ✭ 364 (-20.52%)
Mutual labels: spider
Templatespider
扒网站工具,看好哪个网站,指定好URL,自动扒下来做成模版。所见网站,皆可为我所用!
Stars: ✭ 390 (-14.85%)
Mutual labels: spider
Qqzonemood
QQZone mood spider and analysis. QQ空间多线程爬虫和数据挖掘。提供线上服务,扫码登陆即可自动爬取和分析数据,还有网易云年度报告风格的数据展示;使用docker-compose打包程序,方便部署;额外提供QQ空间抽奖小程序。
Stars: ✭ 439 (-4.15%)
Mutual labels: spider
Freshonions Torscraper
Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion
Stars: ✭ 348 (-24.02%)
Mutual labels: spider
Toplist
今日热榜,一个获取各大热门网站热门头条的聚合网站,使用Go语言编写,多协程异步快速抓取信息,预览:https://mo.fish
Stars: ✭ 4,331 (+845.63%)
Mutual labels: spider
Spiders
Python爬虫,返回一定格式的信息,下载,使用flask提供简易api。抖音无水印、皮皮虾、快手、网易云音乐、qq音乐、咪咕音乐、荔枝FM音频、知乎视频、最右语音、视频、微博......
Stars: ✭ 372 (-18.78%)
Mutual labels: spider
Bilili
🍻 bilibili video (including bangumi) and danmaku downloader | B站视频(含番剧)、弹幕下载器
Stars: ✭ 379 (-17.25%)
Mutual labels: spider
Kindlebookmaker
Kindle Book Maker with KindleGen, Make Book from RSS/single URL/directory and so on.
Stars: ✭ 364 (-20.52%)
Mutual labels: spider
Fictiondown
小说下载|小说爬取|起点|笔趣阁|导出Markdown|导出txt|转换epub|广告过滤|自动校对
Stars: ✭ 362 (-20.96%)
Mutual labels: spider
Bdp Dataplatform
大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Stars: ✭ 456 (-0.44%)
Mutual labels: spider
Crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (-3.93%)
Mutual labels: spider
tumblr_spider is being sponsored by the following tool; please help to support us by taking a look and signing up to a free trial.
tumblr_spider
汤不热 python 多线程爬虫
install
pip install -r requirements.txt
run
python tumblr.py username (usename 为任意一个热门博主的 usename)
snapshoot
爬取结果
user.txt
是爬取的博主用户名结果,source.txt
是视频地址集
原理
根据一个热门博主的 usename, 脚本自动会获取博主转过文章的其他博主的 username,并放入爬取队列中,递归爬取。
申明
这是一个正经的爬虫(严肃脸),爬取的资源跟你第一个填入的 username 有很大关系,另外由于某些原因,导致 tumblr 被墙,所以最简单的方式就是用国外 vps 去跑。
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].