SocialreaperSocial media scraping / data collection library for Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
Stars: ✭ 338 (+865.71%)
ReaperSocial media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
Stars: ✭ 240 (+585.71%)
rubiumRubium is a lightweight alternative to Selenium/Capybara/Watir if you need to perform some operations (like web scraping) using Headless Chromium and Ruby
Stars: ✭ 65 (+85.71%)
ha-multiscrapeHome Assistant custom component for scraping (html, xml or json) multiple values (from a single HTTP request) with a separate sensor/attribute for each value. Support for (login) form-submit functionality.
Stars: ✭ 103 (+194.29%)
scrapmanRetrieve real (with Javascript executed) HTML code from an URL, ultra fast and supports multiple parallel loading of webs
Stars: ✭ 21 (-40%)
ferendaTransform unstructured document collections to structured Linked Data
Stars: ✭ 22 (-37.14%)
Instagram-to-discordMonitor instagram user account and automatically post new images to discord channel via a webhook. Working 2022!
Stars: ✭ 113 (+222.86%)
chesfCHeSF is the Chrome Headless Scraping Framework, a very very alpha code to scrape javascript intensive web pages
Stars: ✭ 18 (-48.57%)
zcrawlAn open source web crawling platform
Stars: ✭ 21 (-40%)
puppeteer-botcheck🕵♂ Bot detection tests for Puppeteer. Hide and seek!
Stars: ✭ 42 (+20%)
ogpParserOpen Graph Protocol Parser for Node.js
Stars: ✭ 43 (+22.86%)
crawling-frameworkEasily crawl news portals or blog sites using Storm Crawler.
Stars: ✭ 22 (-37.14%)
kuwalaKuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+1254.29%)
browser-poolA Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
Stars: ✭ 71 (+102.86%)
covid19br-pubProjeto de monitoramento de publicações oficiais relacionadas a COVID-19 no Brasil.
Stars: ✭ 12 (-65.71%)
oversmashOverwatch API library for player details and career stats
Stars: ✭ 42 (+20%)
iowebWeb Scraping Framework
Stars: ✭ 31 (-11.43%)
internet-affordability🌍 Dataset that shows the Internet affordability by country (a shocking reality!)
Stars: ✭ 13 (-62.86%)
ScrappingMastering the art of scrapping 🎓
Stars: ✭ 24 (-31.43%)
scrapy-fieldstatsA Scrapy extension to log items coverage when the spider shuts down
Stars: ✭ 17 (-51.43%)
proxycrawl-pythonProxyCrawl Python library for scraping and crawling
Stars: ✭ 51 (+45.71%)
document-dlCommand line program to download documents from web portals.
Stars: ✭ 14 (-60%)
htmltabCommand-line utility to convert HTML tables into CSV files
Stars: ✭ 13 (-62.86%)
subscene scraperLibrary to download subtitles from subscene.com
Stars: ✭ 14 (-60%)
browser-automation-apiBrowser automation API for repetitive web-based tasks, with a friendly user interface. You can use it to scrape content or do many other things like capture a screenshot, generate pdf, extract content or execute custom Puppeteer, Playwright functions.
Stars: ✭ 24 (-31.43%)
go-scrapyWeb crawling and scraping framework for Golang
Stars: ✭ 17 (-51.43%)
ksoupKotlin Wrapper for Jsoup
Stars: ✭ 59 (+68.57%)
web-clipperEasily download the main content of a web page in html, markdown, and/or epub format from command line.
Stars: ✭ 15 (-57.14%)
wget-luaWget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (+48.57%)
yttrexyoutube & tiktok analysis + youchoose recommendation custmizer. backend, extensions, and tooling
Stars: ✭ 31 (-11.43%)
Captcha-ToolsAll-in-one Python (And now Go!) module to help solve captchas with Capmonster, 2captcha and Anticaptcha API's!
Stars: ✭ 23 (-34.29%)
selectorlibA library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
Stars: ✭ 53 (+51.43%)
sg-food-mlThis script is used to scrap images from the Internet to classify 5 common noodle "mee" dishes in Singapore. Wanton Mee, Bak Chor Mee, Lor Mee, Prawn Mee and Mee Siam.
Stars: ✭ 18 (-48.57%)
Scraper-Projects🕸 List of mini projects that involve web scraping 🕸
Stars: ✭ 25 (-28.57%)
docker-selenium-lambdaThe simplest demo of chrome automation by python and selenium in AWS Lambda
Stars: ✭ 172 (+391.43%)
torchestratorSpin up Tor containers and then proxy HTTP requests via these Tor instances
Stars: ✭ 32 (-8.57%)
node-red-contrib-nbrowserProvides a virtual web browser (a.k.a. "headless browser") appearing as a node.
Stars: ✭ 31 (-11.43%)
proxiProxy pool. Finds and checks proxies with rest api for querying results. Can find over 25k proxies in under 5 minutes.
Stars: ✭ 32 (-8.57%)
asyncio-hnPython (asyncio) wrapper for hackernews api
Stars: ✭ 27 (-22.86%)
scavengerScrape and take screenshots of dynamic and static webpages
Stars: ✭ 14 (-60%)
shorter.recipesA website dedicated to making recipes from any website easy to read.
Stars: ✭ 27 (-22.86%)
AngleParseHTML parsing and processing tool for PowerShell.
Stars: ✭ 35 (+0%)
anime-scraper[partially working] Scrape and add anime episode stream URLs to uGet (Linux) or IDM (Windows) ~ Python3
Stars: ✭ 21 (-40%)
diffbot-php-client[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: ✭ 53 (+51.43%)
RARBG-scraperWith Selenium headless browsing and CAPTCHA solving
Stars: ✭ 38 (+8.57%)
4catThe 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms.
Stars: ✭ 144 (+311.43%)
scrapy-distributedA series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (+8.57%)
copycatA PHP Scraping Class
Stars: ✭ 70 (+100%)
ScrapeBotA Selenium-driven tool for automated website interaction and scraping.
Stars: ✭ 16 (-54.29%)
InstaBotSimple and friendly Bot for Instagram, using Selenium and Scrapy with Python.
Stars: ✭ 32 (-8.57%)
image-collectorDownload images from Google Image Search
Stars: ✭ 38 (+8.57%)