All Projects → elvisyjlin → Media Scraper

elvisyjlin / Media Scraper

Licence: mit
Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Media Scraper

Skraper
Kotlin/Java library and cli tool for scraping posts and media from various sources with neither authorization nor full page rendering (Facebook, Instagram, Twitter, Youtube, Tiktok, Telegram, Twitch, Reddit, 9GAG, Pinterest, Flickr, Tumblr, IFunny, VK, Pikabu)
Stars: ✭ 72 (-65.05%)
Mutual labels:  tumblr, scraper, twitter, instagram, reddit
Spam Bot 3000
Social media research and promotion, semi-autonomous CLI bot
Stars: ✭ 79 (-61.65%)
Mutual labels:  scraper, twitter, instagram, reddit
Ripme
Downloads albums in bulk
Stars: ✭ 2,748 (+1233.98%)
Mutual labels:  tumblr, twitter, instagram, reddit
Liked-Saved-Image-Downloader
Save content you enjoy!
Stars: ✭ 80 (-61.17%)
Mutual labels:  reddit, tumblr, pixiv
Onegram
This repository is no longer maintained.
Stars: ✭ 137 (-33.5%)
Mutual labels:  crawler, scraper, instagram
alternative-front-ends
Overview of alternative open source front-ends for popular internet platforms (e.g. YouTube, Twitter, etc.)
Stars: ✭ 1,664 (+707.77%)
Mutual labels:  instagram, twitter, reddit
Socialmanagertools Gui
🤖 👻 Desktop application for Instagram Bot, Twitter Bot and Facebook Bot
Stars: ✭ 293 (+42.23%)
Mutual labels:  scraper, twitter, instagram
Socialreaper
Social media scraping / data collection library for Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
Stars: ✭ 338 (+64.08%)
Mutual labels:  tumblr, twitter, reddit
Rsshub
🍰 Everything is RSSible
Stars: ✭ 18,111 (+8691.75%)
Mutual labels:  pixiv, twitter, instagram
Clone Wars
100+ open-source clones of popular sites like Airbnb, Amazon, Instagram, Netflix, Tiktok, Spotify, Whatsapp, Youtube etc. See source code, demo links, tech stack, github stars.
Stars: ✭ 12,604 (+6018.45%)
Mutual labels:  twitter, instagram, reddit
Network Avatar Picker
A npm module that returns user's social network avatar. Supported providers: facebook, instagram, twitter, tumblr, vimeo, github, youtube and gmail
Stars: ✭ 74 (-64.08%)
Mutual labels:  tumblr, twitter, instagram
Postwill
Posting to the most popular social media from Ruby
Stars: ✭ 181 (-12.14%)
Mutual labels:  tumblr, twitter, instagram
Reaper
Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
Stars: ✭ 240 (+16.5%)
Mutual labels:  tumblr, twitter, reddit
Annie
👾 Fast and simple video download library and CLI tool written in Go
Stars: ✭ 16,369 (+7846.12%)
Mutual labels:  tumblr, crawler, scraper
Instagram Scraper
scrapes medias, likes, followers, tags and all metadata. Inspired by instagram-php-scraper,bot
Stars: ✭ 2,209 (+972.33%)
Mutual labels:  crawler, scraper, instagram
Social Scraper
Tổng hợp script crawl dữ liệu từ các mạng xã hội & website tiếng Việt
Stars: ✭ 47 (-77.18%)
Mutual labels:  crawler, scraper, instagram
Socialcounters
jQuery/PHP - Collection of Social Media APIs that display number of your social media fans. Facebook Likes, Twitter Followers, Instagram Followers, YouTube Subscribers, etc..
Stars: ✭ 104 (-49.51%)
Mutual labels:  tumblr, twitter, instagram
Instagram Crawler
Crawl instagram photos, posts and videos for download.
Stars: ✭ 178 (-13.59%)
Mutual labels:  crawler, scraper, instagram
Instagram Bot
An Instagram bot developed using the Selenium Framework
Stars: ✭ 138 (-33.01%)
Mutual labels:  crawler, instagram
Google Play Scraper
Google play scraper for Python inspired by <facundoolano/google-play-scraper>
Stars: ✭ 143 (-30.58%)
Mutual labels:  crawler, scraper

Media Scraper

media-scraper scrapes all photos and videos in a web page. It supports general-purpose scraping as well as SNS-specific scraping.

media-scraper utilizes the web driver to simulate a user browsing web pages. With the web driver, sessions and cookies easily can be handled easily but it works slightly slowly. On the other hand, I'm currently working on the migration of another repository, which crawls media only by HTTP requests, to this repository. See here.

General-purpose Scraping

The general media scraper scrapes and downloads all photos and videos in all links <a/>, images <img/> and videos <video/> of a web page.

SNS-specific

Currently there are Instagram scraper and Twitter scraper, which crawl all posts of a given user and download media in a proper way for each SNS.

Updates

Current, media-scraper is merged to contain two methods of scraping: by request and by browser.

Usage

python3 m-scraper.py rq instagram [USERNAME1 USERNAME2 ...] [-e] [-s SAVE_PATH] [-c CRED_FILE]
python3 m-scraper.py rq tumblr [SITE1 SITE2 ...] [-e] [-s SAVE_PATH] [-c CRED_FILE]
python3 m-scraper.py rq reddit [SUBREDDIT1 SUBREDDIT2 ...] [-e] [-s SAVE_PATH] [-c CRED_FILE]
python3 m-scraper.py rq pixiv [USERID1 USERID2 ...] [-e] [-s SAVE_PATH] [-c CRED_FILE]
python3 m-scraper.py rq tiktok [USERID1 USERID2 ...] [-e] [-s SAVE_PATH] [-c CRED_FILE]

If you'd like to download with your own credentials, i.e. logging in your account, please put your username and password in credentials.json and run m-scraper.py with -c credentials.json.

mv ./credentials.json.example ./credentials.json
vim ./credentials.json

Note that pixiv requires a user login to view all illustrations and mangas. If you scrape pixiv without logging in, you'll get only some of them.

For scraping TikTok videos, you'll need to get the user id first. Go to the user page in your TikTok App, share it via link, paste it in a browser, and you'll see the user id in the url bar. E.g. user id of TikTok is 107955. However, m-scraper here fetch video list via TikTok shared content, it contains most of the videos but not all of them. I'll dig into the mobile App API in the future.

Installation

Clone the media-scraper git repository.

git clone https://github.com/elvisyjlin/media-scraper.git
cd media-scraper

Install Python 3 (at least 3.5) and get all dependencies.

pip3 install -r requirements.txt

Web Driver

media-scraper loads the content of a web page by web driver (PhantomJS). The needed web driver will be downloaded automatically when it is used.

If you meet permission error, for example,

selenium.common.exceptions.WebDriverException: Message: 'phantomjs' executable may have wrong permissions.

Please set the web driver to 777 for convenience.

chmod 777 webdriver/phantomjsdriver_2.1.1_win32/phantomjs.exe
chmod 777 webdriver/phantomjsdriver_2.1.1_mac64/phantomjs
chmod 777 webdriver/phantomjsdriver_2.1.1_linux64/phantomjs

To Scrape

python3 -m mediascraper.general [WEB PAGE 1] [WEB PAGE 2] ...

The media will be stored in the folder download/general.

python3 -m mediascraper.instagram [USER NAME 1] [USER NAME 2] ...

The media will be stored in the folder download/instagram.

python3 -m mediascraper.twitter [USER NAME 1] [USER NAME 2] ...

The media will be stored in the folder download/twitter.

For example, to scrape Twitter

python3 -m mediascraper.twitter Twitter

Login with Credentials

If you want to scrape a user's media with your account, just rename credentials.json.example to credentials.json and fill in your username and password.

To Import

It is easy to import media-scraper into your scripts and make use of it. Please refer to the example code for more details.

Media Scraper

python3 -m mediascraper.general [URL1 URL2 ...]
python3 -m mediascraper.instagram [USERNAME1 USERNAME2 ...]
python3 -m mediascraper.twitter [USERNAME1 USERNAME2 ...]

Parameters of Scraper

Parameter Description Default Value
scroll_pause the pause interval when scrolling 0.5 (seconds)
mode 'silent', normal' or 'verbose' 'normal'
debug prints debugging messages if True False

Connect Methods of Scraper

Scraper Methods
MediaScraper connect(URL)
InstagramScraper username(USERNAME)
TwitterScraper username(USERNAME)

Note

Instagram

Instagram changed API 3 times this year (2018), so the query API in media-scraper is out-of-date. Please see instagramer.py, which works well for downloading all media form Instagram. The instruction is here.

Twitter [Solved]

For some reasons, Twitter utilizes blob url for videos, which is not supported by media-scraper currently. I'm still working on this problem.

06/07/2018 Update: It supports downloading videos in Twitter now!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].