All Projects β†’ jpmondoni β†’ instagram_explorer

jpmondoni / instagram_explorer

Licence: LGPL-3.0 License
πŸ“· An app to scrap instagram posts and analyze data.

Programming Languages

python
139335 projects - #7 most used programming language
HTML
75241 projects
CSS
56736 projects
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to instagram explorer

Instagram-to-discord
Monitor instagram user account and automatically post new images to discord channel via a webhook. Working 2022!
Stars: ✭ 113 (+564.71%)
Mutual labels:  scraping, instagram-scraper
Instagram Scraper
Scrape the Instagram frontend. Inspired from twitter-scraper by @kennethreitz.
Stars: ✭ 903 (+5211.76%)
Mutual labels:  scraping, instagram-scraper
whatsapp-tracking
Scraping the status of WhatsApp contacts
Stars: ✭ 49 (+188.24%)
Mutual labels:  scraping
ARGUS
ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: ✭ 68 (+300%)
Mutual labels:  scraping
python-overwatch
A simple API for scraping Overwatch stats
Stars: ✭ 14 (-17.65%)
Mutual labels:  scraping
TorScrapper
A Scraper made 100% in Python using BeautifulSoup and Tor. It can be used to scrape both normal and onion links. Happy Scraping :)
Stars: ✭ 24 (+41.18%)
Mutual labels:  scraping
raspagem-de-dados-fatec
πŸ““ Minicurso de raspagem de dados web com Python ministrado na Semana de Tecnologia da FATEC JundiaΓ­
Stars: ✭ 22 (+29.41%)
Mutual labels:  scraping
humanparser
Parse a human name string into salutation, first name, middle name, last name, suffix.
Stars: ✭ 78 (+358.82%)
Mutual labels:  scraping
auto-Instagram-posting-bot
A bot that downloads 9gag and Instagram posts, and re-uploads it to your Instagram account
Stars: ✭ 87 (+411.76%)
Mutual labels:  instagram-scraper
webdext
Intelligent Web Data Extractor
Stars: ✭ 75 (+341.18%)
Mutual labels:  scraping
bots-zoo
No description or website provided.
Stars: ✭ 59 (+247.06%)
Mutual labels:  scraping
api-flight.com
Main API Flight Git Repository
Stars: ✭ 26 (+52.94%)
Mutual labels:  scraping
Zeiver
A Scraper, Downloader, & Recorder for static open directories.
Stars: ✭ 14 (-17.65%)
Mutual labels:  scraping
scraper
Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom.
Stars: ✭ 37 (+117.65%)
Mutual labels:  scraping
papercut
Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-11.76%)
Mutual labels:  scraping
jazz
The Scripting Engine that Combines Speed, Safety, and Simplicity
Stars: ✭ 132 (+676.47%)
Mutual labels:  scraping
InstantInsta
Android Application To Download and Manage Instagram Images And Videos
Stars: ✭ 47 (+176.47%)
Mutual labels:  instagram-scraper
PyLex
Perform lexical analysis on words, one word at a time.
Stars: ✭ 60 (+252.94%)
Mutual labels:  scraping
memes-api
API for scrapping common meme sites
Stars: ✭ 17 (+0%)
Mutual labels:  scraping
Instagram-Scraper-2021
Scrape Instagram content and stories anonymously, using a new technique based on the har file (No Token + No public API).
Stars: ✭ 57 (+235.29%)
Mutual labels:  instagram-scraper

Instragram Explorer - Scraping and Learning

A simple & basic package to build social media datasets based on Instagram public posts, using web scraping techiniques with BeautifulSoup.

Purpose

This repo provides a pack of scraping functions that work with Instagram "Explore" page as of March, 2018.

Goals

The goal of this project is to provide a tool to analysts, programmers, data scientists and students that need to build datasets from social media posts, such as Instagram. The initial idea was to use Instagram official API, but it's currently not supported an endpoint that retrieves public posts based on hashtags or locations. The intent is also to create tweaks that help on data augmentation of datasets.

Usage

This app can be used as of v0.1.0-beta.2 as with 3 types of arguments: single hashtag, hashtag list from file or hashtag + hashtag similar words.

Single Hashtag

A single hashtag can be passed as an argument on the compiler, such as: python read_tags.py -w soccer The app will explore only the sys.argv[1], which is soccer, and get only it's results.

Hashtag and Similar Words

NLTK provides a WordNet Interface, which is used to discover similar words based on a given word. As this is still being sharpened, it's not that useful as of v0.1.0-beta.2 as , but improvements will come. It won't work with adjectives, for instance.

To use this function, the -wn argument shall be passed to the compiler, as shwon below: python read_tags.py -wn sunshine The console will print all the words used to scrap Instagram data. On the sunshine example, the result will be:

words =  [['sunshine', 1.0], ['sunlight', 1.0], ['fair_weather', 0.1]]

All words within the list will be scraped individually. Currently there is no distinction between choosen words and generated words on the database to provide some kind of identity, but it will be implemented later.

The second element of each element within the list is the similarity score calculated by NLTK a.path_similarity(b) function. Please refer to NLTK documentation for more information. This score will be stored in the database in the future.

Hashtag List

A list of hashtags can be input in this package by using an argument in command line.

  1. Create a textfile with a list of words, containing one word per line, as shown below:
soccer
brazil
neymar
worldcup
ronaldinho
  1. Use the argument -f filename.txt to execute the code, like: python read_tags.py -f my_words.txt
  2. The code will read the file and print it's content in a python list format, as:
words = ['soccer','brazil','neymar','worldcup','ronaldinho']
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].