ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9

Stars: ✭ 68 (+300%)

Mutual labels: scraping

python-overwatch

A simple API for scraping Overwatch stats

Stars: ✭ 14 (-17.65%)

Mutual labels: scraping

TorScrapper

A Scraper made 100% in Python using BeautifulSoup and Tor. It can be used to scrape both normal and onion links. Happy Scraping :)

Stars: ✭ 24 (+41.18%)

Mutual labels: scraping

raspagem-de-dados-fatec

📓 Minicurso de raspagem de dados web com Python ministrado na Semana de Tecnologia da FATEC Jundiaí

Stars: ✭ 22 (+29.41%)

Mutual labels: scraping

humanparser

Parse a human name string into salutation, first name, middle name, last name, suffix.

Stars: ✭ 78 (+358.82%)

Mutual labels: scraping

auto-Instagram-posting-bot

A bot that downloads 9gag and Instagram posts, and re-uploads it to your Instagram account

Stars: ✭ 87 (+411.76%)

Mutual labels: instagram-scraper

webdext

Intelligent Web Data Extractor

Stars: ✭ 75 (+341.18%)

Mutual labels: scraping

bots-zoo

No description or website provided.

Stars: ✭ 59 (+247.06%)

Mutual labels: scraping

api-flight.com

Main API Flight Git Repository

Stars: ✭ 26 (+52.94%)

Mutual labels: scraping

Zeiver

A Scraper, Downloader, & Recorder for static open directories.

Stars: ✭ 14 (-17.65%)

Mutual labels: scraping

scraper

Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom.

Stars: ✭ 37 (+117.65%)

Mutual labels: scraping

papercut

Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.

Stars: ✭ 15 (-11.76%)

Mutual labels: scraping

jazz

The Scripting Engine that Combines Speed, Safety, and Simplicity

Stars: ✭ 132 (+676.47%)

Mutual labels: scraping

InstantInsta

Android Application To Download and Manage Instagram Images And Videos

Stars: ✭ 47 (+176.47%)

Mutual labels: instagram-scraper

PyLex

Perform lexical analysis on words, one word at a time.

Stars: ✭ 60 (+252.94%)

Mutual labels: scraping

memes-api

API for scrapping common meme sites

Stars: ✭ 17 (+0%)

Mutual labels: scraping

Instagram-Scraper-2021

Scrape Instagram content and stories anonymously, using a new technique based on the har file (No Token + No public API).

Stars: ✭ 57 (+235.29%)

Mutual labels: instagram-scraper

View All Similar Projects ➔

Instragram Explorer - Scraping and Learning

A simple & basic package to build social media datasets based on Instagram public posts, using web scraping techiniques with BeautifulSoup.

Purpose

This repo provides a pack of scraping functions that work with Instagram "Explore" page as of March, 2018.

Goals

The goal of this project is to provide a tool to analysts, programmers, data scientists and students that need to build datasets from social media posts, such as Instagram. The initial idea was to use Instagram official API, but it's currently not supported an endpoint that retrieves public posts based on hashtags or locations. The intent is also to create tweaks that help on data augmentation of datasets.

Usage

This app can be used as of v0.1.0-beta.2 as with 3 types of arguments: single hashtag, hashtag list from file or hashtag + hashtag similar words.

Single Hashtag

A single hashtag can be passed as an argument on the compiler, such as: python read_tags.py -w soccer The app will explore only the sys.argv[1], which is soccer, and get only it's results.

Hashtag and Similar Words

NLTK provides a WordNet Interface, which is used to discover similar words based on a given word. As this is still being sharpened, it's not that useful as of v0.1.0-beta.2 as , but improvements will come. It won't work with adjectives, for instance.

To use this function, the -wn argument shall be passed to the compiler, as shwon below: python read_tags.py -wn sunshine The console will print all the words used to scrap Instagram data. On the sunshine example, the result will be:

words =  [['sunshine', 1.0], ['sunlight', 1.0], ['fair_weather', 0.1]]

All words within the list will be scraped individually. Currently there is no distinction between choosen words and generated words on the database to provide some kind of identity, but it will be implemented later.

The second element of each element within the list is the similarity score calculated by NLTK a.path_similarity(b) function. Please refer to NLTK documentation for more information. This score will be stored in the database in the future.

Hashtag List

A list of hashtags can be input in this package by using an argument in command line.

Create a textfile with a list of words, containing one word per line, as shown below:

soccer
brazil
neymar
worldcup
ronaldinho

Use the argument -f filename.txt to execute the code, like: python read_tags.py -f my_words.txt
The code will read the file and print it's content in a python list format, as:

words = ['soccer','brazil','neymar','worldcup','ronaldinho']

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

jpmondoni / instagram_explorer

Programming Languages

Labels

Projects that are alternatives of or similar to instagram explorer

Instragram Explorer - Scraping and Learning

Purpose

Goals

Usage

Single Hashtag

Hashtag and Similar Words

Hashtag List