All Projects → aahouzi → Instagram-Scraper-2021

aahouzi / Instagram-Scraper-2021

Licence: MIT License
Scrape Instagram content and stories anonymously, using a new technique based on the har file (No Token + No public API).

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Instagram-Scraper-2021

Instagram Proxy Api
CORS compliant API to access Instagram's public data
Stars: ✭ 245 (+329.82%)
Mutual labels:  instagram, data, scraper, instagram-scraper, instagram-api
instagram
Php instagram library. With this library, you can use many of the same features in the mobile application.
Stars: ✭ 45 (-21.05%)
Mutual labels:  instagram-feed, instagram-scraper, instagram-api, instagram-stories, instagram-bot
igFame
📷 igFame - Tool for automated Instagram interactions [PHP]
Stars: ✭ 16 (-71.93%)
Mutual labels:  instagram, instagram-scraper, instagram-api, instagram-bot
Instagram-Auto-Pilot
Automate common Instagram activities such as following, unfollowing, commenting and reposting images from instagram accounts.
Stars: ✭ 50 (-12.28%)
Mutual labels:  instagram-feed, instagram-scraper, instagram-api, instagram-bot
Instagram-to-discord
Monitor instagram user account and automatically post new images to discord channel via a webhook. Working 2022!
Stars: ✭ 113 (+98.25%)
Mutual labels:  instagram, scraper, instagram-scraper, instagram-bot
insta-story
🤖 📷 Instagram Story Downloader Anonymously - PHP
Stars: ✭ 25 (-56.14%)
Mutual labels:  instagram-feed, instagram-scraper, instagram-api, instagram-bot
InstagramLocationScraper
No description or website provided.
Stars: ✭ 13 (-77.19%)
Mutual labels:  instagram, scraper, selenium, instagram-scraper
Instaloader
Download pictures (or videos) along with their captions and other metadata from Instagram.
Stars: ✭ 3,655 (+6312.28%)
Mutual labels:  instagram, instagram-feed, instagram-scraper, instagram-stories
Instagram-Giveaways-Winner
Instagram Bot which when given a post url will spam mentions to increase the chances of winning. Win Instagram Giveaways!
Stars: ✭ 95 (+66.67%)
Mutual labels:  instagram, selenium, instagram-scraper, instagram-bot
Socialmanagertools Igbot
🤖 📷 Instagram Bot made with love and nodejs
Stars: ✭ 699 (+1126.32%)
Mutual labels:  instagram, selenium, instagram-scraper, instagram-api
Instagram Scraper
Scrapes an instagram user's photos and videos
Stars: ✭ 5,664 (+9836.84%)
Mutual labels:  instagram, scraper, instagram-scraper, instagram-api
Instagram-Comments-Scraper
Instagram comment scraper using python and selenium. Save the comments into excel.
Stars: ✭ 73 (+28.07%)
Mutual labels:  instagram, scraper, selenium, instagram-scraper
Scrapstagram
An Instagram Scrapper
Stars: ✭ 50 (-12.28%)
Mutual labels:  instagram, scraper, selenium, instagram-scraper
nanogram.js
📷 An easy-to-use and simple Instagram package that allows you to fetch media content without API and access token.
Stars: ✭ 62 (+8.77%)
Mutual labels:  instagram, instagram-feed, instagram-scraper, instagram-api
Instascrape
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
Stars: ✭ 202 (+254.39%)
Mutual labels:  instagram, instagram-scraper, webscraping
FCommunity
multi Checkers (Hma/Hulu/Spotify/Call of duty/Instagram/smtp2go/VyprVpn) in One Tool Named FCommunity
Stars: ✭ 26 (-54.39%)
Mutual labels:  instagram, instagram-api, instagram-bot
Instagram Php Scraper
Get account information, photos, videos, stories and comments.
Stars: ✭ 2,490 (+4268.42%)
Mutual labels:  instagram, instagram-scraper, instagram-api
instagram-get-images
Instagram get images 🌄 (hashtags, account, locations) with puppeteer
Stars: ✭ 69 (+21.05%)
Mutual labels:  instagram, scraper, instagram-scraper
jekyll-instagram
A Jekyll plugin for displaying your recent Instagram photos
Stars: ✭ 24 (-57.89%)
Mutual labels:  instagram, instagram-feed, instagram-api
bot
Completely free and open-source human-like Instagram bot. Powered by UIAutomator2 and compatible with basically any Android device 5.0+ that can run Instagram - real or emulated.
Stars: ✭ 321 (+463.16%)
Mutual labels:  instagram, scraper, instagram-bot

Scrape Instagram content & stories | 2021 version.

Enseirb-Matmeca, Bordeaux INP | Anas AHOUZI


🧐 Description

  • This project enables the user to scrape all content and feed of a public instagram page, as well as the stories anonymously given the username or hashtag of the account.

  • In 2021, Instagram made it even more difficult to scrape data from its graphql API. Even though there are many open-source projects that enables you to scrape content from Instagram, many of those projects don't work anymore or work partially, and get you only a small portion of the data you need.

  • In this project, I used a new technique based on the har file. This file contains all the GET requests sent by Instagram to its graphql API, and by getting access to this file we can capture all the precious json files containing all the data we need to scrape.

🚀 Repository Structure

The repository contains the following files & directories:

  • scraper/insta_feed_scraper.py: Scrape content/feed from a user public page.
  • scraper/insta_story_scraper.py: Scrape stories from a user public page.
  • scraper/insta_hashtag_scraper.py: Scrape content from a hashtag page.
  • data_analysis.ipynb: It contains some data analysis for the scraped nike page feed.

📜 Scraping process

  • Before executing the code, the user needs to get browsermob-proxy-2.1.4 from here and put it in the project directory . This proxy will help us get access to the har file during the execution with Selenium.

  • Scraping stories is an easy task, since we don't need to analyze graphql responses or get the har file, we only access to Instagram and get every story using their XPath with selenium.

  • For scraping content, the user is asked to enter the username or hashtag he wants to scrape, then the program gets access directly to the username page.However, sometimes Instagram blocks the direct access to public pages, and asks the user to log in. In this case, the program types some random user account that was created for scraping purposes. After getting access to the page we want to scrape, Selenium executes a Javascript code that enables to keep scrolling down until all the content is loaded. After this step, we analyze the resulting har file in order to extract all graphql responses, in a json format. Finally, we loop through every response to get all the informations we need. Here's a small demo of scraping nike page feed:

💡 Scraping comments

An improvement for this project would be to use the same technique of the har file to scrape all comments given the link of a certain publication. It can be easily implemented using the same strategy: "We start by having access to the publication (Format: https://www.instagram.com/p/***********), scrolling up comments and clicking every time on the plus button to load more comments". The more we click on the plus button, the more we collect graphql responses, and so comments (12 comments per graphql response). However, scraping comments will take much more time than scraping content, since we can have thousands of comments in a publication, and getting 12 comments per graphql response is time consuming.


📪 License & Contact

This code is free to use, share and modify for any non-commercial purposes, any commercial use is strictly prohibited without the authors' consent. This project is for educational purposes, and has no intent to mess with Instagram policies concerning data privacy. For any information, feedback or questions, please contact me

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].