Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → sparrow629 → Tumblr_crawler

sparrow629 / Tumblr_crawler

Licence: gpl-3.0

This is a Multi-thread crawler for Tumblr.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

crawler tumblr

Projects that are alternatives of or similar to Tumblr crawler

Tumblr Crawler

Easily download all the photos/videos from tumblr blogs. 下载指定的 Tumblr 博客中的图片，视频

Stars: ✭ 1,118 (+333.33%)

Mutual labels: tumblr, crawler

TumblTwo

TumblTwo, an Improved Fork of TumblOne, a Tumblr Downloader.

Stars: ✭ 57 (-77.91%)

Mutual labels: crawler, tumblr

Tumblthree

A Tumblr Backup Application

Stars: ✭ 211 (-18.22%)

Mutual labels: tumblr, crawler

Tumblthree

A Tumblr Blog Backup Application

Stars: ✭ 923 (+257.75%)

Mutual labels: tumblr, crawler

Tumblr crawler

tumblr解析网站

Stars: ✭ 83 (-67.83%)

Mutual labels: tumblr, crawler

Media Scraper

Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok

Stars: ✭ 206 (-20.16%)

Mutual labels: tumblr, crawler

Annie

👾 Fast and simple video download library and CLI tool written in Go

Stars: ✭ 16,369 (+6244.57%)

Mutual labels: tumblr, crawler

snapcrawl

Crawl a website and take screenshots

Stars: ✭ 37 (-85.66%)

Mutual labels: crawler

eastmoney

python requests + Django+ nodejs koa+ mysql to crawl eastmoney fund and stock data,for data analysis and visualiaztion .

Stars: ✭ 56 (-78.29%)

Mutual labels: crawler

bots-zoo

No description or website provided.

Stars: ✭ 59 (-77.13%)

Mutual labels: crawler

WebCrawler

一个轻量级、快速、多线程、多管道、灵活配置的网络爬虫。

Stars: ✭ 39 (-84.88%)

Mutual labels: crawler

ZhengFang System Spider

🐛一只登录正方教务管理系统，爬取数据的小爬虫

Stars: ✭ 21 (-91.86%)

Mutual labels: crawler

weibo-scraper

Simple Weibo Scraper

Stars: ✭ 50 (-80.62%)

Mutual labels: crawler

octopus

Recursive and multi-threaded broken link checker

Stars: ✭ 19 (-92.64%)

Mutual labels: crawler

lightnovel epub

🍭 epub generator for (light)novels (轻) 小说 epub 生成器，支持站点：轻之国度、轻小说文库

Stars: ✭ 89 (-65.5%)

Mutual labels: crawler

PY-Login

模拟登录各类网站，操作 API 完成各种不可描述的事情

Stars: ✭ 26 (-89.92%)

Mutual labels: crawler

JQScrollNumberLabel

JQScrollNumberLabel：仿tumblr热度滚动数字条数，一个显示数字的控件，当你改变其数字时，能够有滚动的动画，同时动画和位数可以限制，动态创建和实例化可选，字体样式自定义等。

Stars: ✭ 29 (-88.76%)

Mutual labels: tumblr

dijnet-bot

Az összes számlád még egy helyen :)

Stars: ✭ 17 (-93.41%)

Mutual labels: crawler

rankr

🇰🇷 Realtime integrated information analysis service

Stars: ✭ 21 (-91.86%)

Mutual labels: crawler

MyCrawler

我的爬虫合集

Stars: ✭ 55 (-78.68%)

Mutual labels: crawler

View All Similar Projects ➔

Tumblr_Crawler

This is a multi-threade crawler for Tumblr. Able to download entire blog or any post that you like.

There are two crawler module for video and image. One is for video, another is for image including GIF. The main file is Crawler.

Change Log

update2.0 for download any Post

This version of TumblrCrawler combine video and image including GIF in the same file. What’s more, it can acknowledge whether the main content is video or photo. Current version is only for download post page directly.
The whole blog searching function is undergoing. This searching will be easy, ignoring the JS. My thoughts is using archive page to get all the post pages, then get in every page to download.

Update3.0

This version is final one which add crawler whole blog posts function, which means this crawler can download all the file, including images and video, of one blog once.
This crawler uses threading.Thread Module. Every 10 posts as a page in tumblr as a single thread one time, Multi-thread accelerate whole procession. It needs no cookie can crawler any account. Of course, the more post there are, the longer it will take to crawler all.

update4.0

Find out some blog install personal Theme, which means they use different stylesheet from the default one. So it leads to crawl from home page is unavailable, so I change to search the default page as Archive. All the Archive page is the same stylesheet. But every archive page has 50 post, which means one single thread has to process 50 post url download. Definitely, it slow downs the procession a lot. But it has to be.

PersonalThemeSearch.py Module is for discriminating whether blog use default stylesheet or personal one.

ArchiveSearch.py is the Module for crawling all the post url in Archive page, every page has 50 posts url. Meanwhile, original way to crawl main page gets 10 posts every posts.

This version only figure out that searching all the post in every kind of stylesheet blog. It need to be solved to design a more universal function to crawl personal template post’s content.

What’s more, this version fixes some exception in none post page and a little logical problem about input. There are some spacial cases of url format, like "https://.*?", "http://wanimal1983.org/" (WTF? Redirection? http://wanimal1983.tumblr.com)

update5.0

This may be final version. It fix the problem that can not download content of special stylesheet blogs, and all the problems in last version. It adds the discrimination for homepage or post page, which means that user can download whole blog or specific post.

The main function is working for lots of blogs, like special url or theme. Of course, there may be some freak blogs’ stylesheet that is incompatible. You are welcome to remind me if you have some find. :)

update 5.5 Stable version

Fix the url decoding problem, then there will be no more 'url not found' problem which can be viewed from the browser.

update 6.0

Tumblr update the format of videos' url. So the version before 6.0 may not download the video. I modify the regular expression.

Envirment

Development under Python3.5 with some basic packages, such as requests.

Run

Run the TumblrCrawler.py directly. The input could be the blog's url, such as http://name.tumblr.com/
Or any single post that you like.

Finally, Enjoy your Interested and Excited Dowload! :)

You can support me through scanning the QR Code of Wechat wallet.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 258

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗