Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

pip安装的天眼查爬虫API，指定的单个/多个企业工商信息一键保存为Excel/JSON格式。A Battery-included Scraper API of Tianyancha, the best Chinese business data and investigation platform.

Stars: ✭ 206 (-40.97%)

Mutual labels: crawler, selenium

InstaPy

📷 Instagram Bot - Tool for automated Instagram interactions

Stars: ✭ 14,719 (+4117.48%)

Mutual labels: instagram, selenium

Zhihu fun

基于 Selenium 的知乎关键词爬虫

Stars: ✭ 185 (-46.99%)

Mutual labels: crawler, selenium

InstaBot

Simple and friendly Bot for Instagram, using Selenium and Scrapy with Python.

Stars: ✭ 32 (-90.83%)

Mutual labels: instagram, selenium

instagram-post-scheduler

Python Program To Schedule Your Instagram Posts

Stars: ✭ 30 (-91.4%)

Mutual labels: instagram, selenium

lostark-wait-notifier

🐤️ Lost Ark wait notifier

Stars: ✭ 38 (-89.11%)

Mutual labels: crawler, selenium

Instagram-Scraper-2021

Scrape Instagram content and stories anonymously, using a new technique based on the har file (No Token + No public API).

Stars: ✭ 57 (-83.67%)

Mutual labels: instagram, selenium

Python3 Spider

Python爬虫实战 - 模拟登陆各大网站包含但不限于：滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝，如果喜欢请start ❤️

Stars: ✭ 2,129 (+510.03%)

Mutual labels: crawler, selenium

View All Similar Projects ➔

05/03/2019 Repo is now archived.

I am now officially archiving this repo after a long time of, well, not maintaining.

InstagramCrawler

A non API python program to crawl public photos, posts, followers, and following

Login to crawl followers/following

To crawl followers or followings, you will need to login with your credentials either by filling in 'auth.json' or typing in(as you would do when you are simply browsing instagram)

Well, it is to copy 'auth.json.example' to 'auth.json' and fill in your username and password

PhantomJS for headless browser

For headless browser, after installing phantomjs, add '-l' to the arguments

Examples:

Download the first 100 photos and captions(user's posts, if any) from username "instagram"

NOTE: When I ran on public account 'instagram', somehow it stops at caption 29

$ python instagramcrawler.py -q 'instagram' -c -n 100

Search for the hashtag "#breakfast" and download first 50 photos

$ python instagramcrawler.py -q '#breakfast' -n 50

Record the first 30 followers of the username "instagram", requires log in

$ python instagramcrawler.py -q 'instagram' -t 'followers' -n 30 -a auth.json

Full usage:

usage: instagramcrawler.py [-h] [-d DIR] [-q QUERY] [-t CRAWL_TYPE] [-n NUMBER] [-c]  [-a AUTHENTICATION]

[-d DIR]: the directory to save crawling results, default is './data/[query]'
[-q QUERY] : username, add '#' to search for hashtags, e.g. 'username', '#hashtag'
[-t CRAWL_TYPE]: crawl_type, Options: 'photos | followers | following'
[-n NUMBER]: number of posts, followers, or following to crawl
[-c]: add this flag to download captions(what user wrote to describe their photos)
[-a AUTHENTICATION]: path to a json file, which contains your instagram credentials, please see 'auth.json'
[-l HEADLESS]: If set, will use PhantomJS driver to run script as headless
[-f FIREFOX_PATH]: path to the binary (not the script) of firefox on your system (see this issue in Selenium https://github.com/SeleniumHQ/selenium/issues/3884#issuecomment-296988595)

Installation

There are 2 packages : selenium & requests

NOTE: I used selenium = 3.4, geckodriver = 0.16 (fixed bug in previous versions)

$ pip install -r requirements.txt

Optional: geckodriver and phantomjs if not present on your system

bash utils/get_gecko.sh
bash utils/get_phantomjs.sh
source utils/set_path.sh

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 349

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (18) 🔗