All Projects → tzuhsial → Instagramcrawler

tzuhsial / Instagramcrawler

Licence: mit
A non API python program to crawl public photos, posts or followers

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Instagramcrawler

Instagram Profilecrawl
📝 quickly crawl the information (e.g. followers, tags etc...) of an instagram profile.
Stars: ✭ 816 (+133.81%)
Mutual labels:  crawler, instagram, selenium
Instagram Profilecrawl
💻 Quickly crawl the information (e.g. followers, tags, etc...) of an instagram profile. No login required!
Stars: ✭ 110 (-68.48%)
Mutual labels:  crawler, instagram, selenium
Instagram Bot
An Instagram bot developed using the Selenium Framework
Stars: ✭ 138 (-60.46%)
Mutual labels:  crawler, instagram, selenium
ig-automatic-story-viewer
Python Program To Send Instagram Story Views
Stars: ✭ 17 (-95.13%)
Mutual labels:  instagram, selenium
Pychromeless
Python Lambda Chrome Automation (naming pending)
Stars: ✭ 219 (-37.25%)
Mutual labels:  crawler, selenium
Awesome Java Crawler
本仓库收集整理爬虫相关资源,开发语言以Java为主
Stars: ✭ 228 (-34.67%)
Mutual labels:  crawler, selenium
Instagram Crawler
Crawl instagram photos, posts and videos for download.
Stars: ✭ 178 (-49%)
Mutual labels:  crawler, instagram
Insta-Bot
Python bot using Selenium increasing Instagram Followers.
Stars: ✭ 62 (-82.23%)
Mutual labels:  instagram, selenium
instagram-profilecrawl
📝 quickly crawl the information (e.g. followers, tags etc...) of an instagram profile.
Stars: ✭ 964 (+176.22%)
Mutual labels:  instagram, selenium
InstagramLocationScraper
No description or website provided.
Stars: ✭ 13 (-96.28%)
Mutual labels:  instagram, selenium
Instagram-Comments-Scraper
Instagram comment scraper using python and selenium. Save the comments into excel.
Stars: ✭ 73 (-79.08%)
Mutual labels:  instagram, selenium
Media Scraper
Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok
Stars: ✭ 206 (-40.97%)
Mutual labels:  crawler, instagram
Tianyancha
pip安装的天眼查爬虫API,指定的单个/多个企业工商信息一键保存为Excel/JSON格式。A Battery-included Scraper API of Tianyancha, the best Chinese business data and investigation platform.
Stars: ✭ 206 (-40.97%)
Mutual labels:  crawler, selenium
InstaPy
📷 Instagram Bot - Tool for automated Instagram interactions
Stars: ✭ 14,719 (+4117.48%)
Mutual labels:  instagram, selenium
Zhihu fun
基于 Selenium 的知乎关键词爬虫
Stars: ✭ 185 (-46.99%)
Mutual labels:  crawler, selenium
InstaBot
Simple and friendly Bot for Instagram, using Selenium and Scrapy with Python.
Stars: ✭ 32 (-90.83%)
Mutual labels:  instagram, selenium
instagram-post-scheduler
Python Program To Schedule Your Instagram Posts
Stars: ✭ 30 (-91.4%)
Mutual labels:  instagram, selenium
lostark-wait-notifier
🐤️ Lost Ark wait notifier
Stars: ✭ 38 (-89.11%)
Mutual labels:  crawler, selenium
Instagram-Scraper-2021
Scrape Instagram content and stories anonymously, using a new technique based on the har file (No Token + No public API).
Stars: ✭ 57 (-83.67%)
Mutual labels:  instagram, selenium
Python3 Spider
Python爬虫实战 - 模拟登陆各大网站 包含但不限于:滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝,如果喜欢请start ❤️
Stars: ✭ 2,129 (+510.03%)
Mutual labels:  crawler, selenium

05/03/2019 Repo is now archived.

I am now officially archiving this repo after a long time of, well, not maintaining.


InstagramCrawler

A non API python program to crawl public photos, posts, followers, and following

Login to crawl followers/following

To crawl followers or followings, you will need to login with your credentials either by filling in 'auth.json' or typing in(as you would do when you are simply browsing instagram)

Well, it is to copy 'auth.json.example' to 'auth.json' and fill in your username and password

PhantomJS for headless browser

For headless browser, after installing phantomjs, add '-l' to the arguments

Examples:

Download the first 100 photos and captions(user's posts, if any) from username "instagram"

NOTE: When I ran on public account 'instagram', somehow it stops at caption 29
$ python instagramcrawler.py -q 'instagram' -c -n 100

Search for the hashtag "#breakfast" and download first 50 photos

$ python instagramcrawler.py -q '#breakfast' -n 50

Record the first 30 followers of the username "instagram", requires log in

$ python instagramcrawler.py -q 'instagram' -t 'followers' -n 30 -a auth.json

Full usage:

usage: instagramcrawler.py [-h] [-d DIR] [-q QUERY] [-t CRAWL_TYPE] [-n NUMBER] [-c]  [-a AUTHENTICATION]
  • [-d DIR]: the directory to save crawling results, default is './data/[query]'
  • [-q QUERY] : username, add '#' to search for hashtags, e.g. 'username', '#hashtag'
  • [-t CRAWL_TYPE]: crawl_type, Options: 'photos | followers | following'
  • [-n NUMBER]: number of posts, followers, or following to crawl
  • [-c]: add this flag to download captions(what user wrote to describe their photos)
  • [-a AUTHENTICATION]: path to a json file, which contains your instagram credentials, please see 'auth.json'
  • [-l HEADLESS]: If set, will use PhantomJS driver to run script as headless
  • [-f FIREFOX_PATH]: path to the binary (not the script) of firefox on your system (see this issue in Selenium https://github.com/SeleniumHQ/selenium/issues/3884#issuecomment-296988595)

Installation

There are 2 packages : selenium & requests

NOTE: I used selenium = 3.4, geckodriver = 0.16 (fixed bug in previous versions)
$ pip install -r requirements.txt
Optional: geckodriver and phantomjs if not present on your system
bash utils/get_gecko.sh
bash utils/get_phantomjs.sh
source utils/set_path.sh
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].