Home Assistant custom component for scraping (html, xml or json) multiple values (from a single HTTP request) with a separate sensor/attribute for each value. Support for (login) form-submit functionality.

Stars: ✭ 103 (+692.31%)

Mutual labels: scraping

puppeteer-botcheck

🕵‍♂ Bot detection tests for Puppeteer. Hide and seek!

Stars: ✭ 42 (+223.08%)

Mutual labels: scraping

scrapers

scrapers for building your own image databases

Stars: ✭ 46 (+253.85%)

Mutual labels: scraping

scavenger

Scrape and take screenshots of dynamic and static webpages

Stars: ✭ 14 (+7.69%)

Mutual labels: scraping

crawling-framework

Easily crawl news portals or blog sites using Storm Crawler.

Stars: ✭ 22 (+69.23%)

Mutual labels: scraping

wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

Stars: ✭ 52 (+300%)

Mutual labels: scraping

covid19br-pub

Projeto de monitoramento de publicações oficiais relacionadas a COVID-19 no Brasil.

Stars: ✭ 12 (-7.69%)

Mutual labels: scraping

InstaBot

Simple and friendly Bot for Instagram, using Selenium and Scrapy with Python.

Stars: ✭ 32 (+146.15%)

Mutual labels: scraping

asyncio-hn

Python (asyncio) wrapper for hackernews api

Stars: ✭ 27 (+107.69%)

Mutual labels: scraping

namecoin-core

Namecoin full node + wallet based on the current Bitcoin Core codebase.

Stars: ✭ 425 (+3169.23%)

Mutual labels: human-rights

diffbot-php-client

[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library

Stars: ✭ 53 (+307.69%)

Mutual labels: scraping

Instagram-to-discord

Monitor instagram user account and automatically post new images to discord channel via a webhook. Working 2022!

Stars: ✭ 113 (+769.23%)

Mutual labels: scraping

ScrapeBot

A Selenium-driven tool for automated website interaction and scraping.

Stars: ✭ 16 (+23.08%)

Mutual labels: scraping

fabric8-analytics-vscode-extension

Red Hat Dependency Analytics extension

Stars: ✭ 125 (+861.54%)

Mutual labels: insights

socials

👨‍👩‍👦 Social account detection and extraction in Python, e.g. for crawling/scraping.

Stars: ✭ 37 (+184.62%)

Mutual labels: scraping

zcrawl

An open source web crawling platform

Stars: ✭ 21 (+61.54%)

Mutual labels: scraping

html-table-extractor

extract data from html table

Stars: ✭ 74 (+469.23%)

Mutual labels: scraping

etf4u

📊 Python tool to scrape real-time information about ETFs from the web and mixing them together by proportionally distributing their assets allocation

Stars: ✭ 29 (+123.08%)

Mutual labels: scraping

browser-pool

A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.

Stars: ✭ 71 (+446.15%)

Mutual labels: scraping

yttrex

youtube & tiktok analysis + youchoose recommendation custmizer. backend, extensions, and tooling

Stars: ✭ 31 (+138.46%)

Mutual labels: scraping

rubium

Rubium is a lightweight alternative to Selenium/Capybara/Watir if you need to perform some operations (like web scraping) using Headless Chromium and Ruby

Stars: ✭ 65 (+400%)

Mutual labels: scraping

selectorlib

A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them

Stars: ✭ 53 (+307.69%)

Mutual labels: scraping

Scrapping

Mastering the art of scrapping 🎓

Stars: ✭ 24 (+84.62%)

Mutual labels: scraping

reason-rust-scraper

🦀 Scraping & crawling websites using Rust, and ReasonML

Stars: ✭ 21 (+61.54%)

Mutual labels: scraping

document-dl

Command line program to download documents from web portals.

Stars: ✭ 14 (+7.69%)

Mutual labels: scraping

docker-selenium-lambda

The simplest demo of chrome automation by python and selenium in AWS Lambda

Stars: ✭ 172 (+1223.08%)

Mutual labels: scraping

copycat

A PHP Scraping Class

Stars: ✭ 70 (+438.46%)

Mutual labels: scraping

wikicensorship.github.io

An open encyclopedia of Internet censorship

Stars: ✭ 91 (+600%)

Mutual labels: human-rights

illuminsight

💡👀 Read EPUB books with built-in insights from wikis, definitions, translations, and Google.

Stars: ✭ 55 (+323.08%)

Mutual labels: insights

oversmash

Overwatch API library for player details and career stats

Stars: ✭ 42 (+223.08%)

Mutual labels: scraping

scrap

Scrapping Facebook with JavaScript.

Stars: ✭ 25 (+92.31%)

Mutual labels: scraping

ioweb

Web Scraping Framework

Stars: ✭ 31 (+138.46%)

Mutual labels: scraping

scrapy-distributed

A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.

Stars: ✭ 38 (+192.31%)

Mutual labels: scraping

scrapy-fieldstats

A Scrapy extension to log items coverage when the spider shuts down

Stars: ✭ 17 (+30.77%)

Mutual labels: scraping

ProjectLockdown

Project Lockdown (an initiative from The IO Foundation) is a civic tech, interactive platform providing an overview of the state of Human and Digital Rights around the globe. It evaluates policies obtained from official sources that may impact their observance. It provides, among other tools, a layered map interface that allows for a visual repr…

Stars: ✭ 34 (+161.54%)

Mutual labels: human-rights

nrql-simple

nrql-simple provides a convenient way to interact with the New Relic Insights query API.

Stars: ✭ 13 (+0%)

Mutual labels: insights

sg-food-ml

This script is used to scrap images from the Internet to classify 5 common noodle "mee" dishes in Singapore. Wanton Mee, Bak Chor Mee, Lor Mee, Prawn Mee and Mee Siam.

Stars: ✭ 18 (+38.46%)

Mutual labels: scraping

4cat

The 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms.

Stars: ✭ 144 (+1007.69%)

Mutual labels: scraping

htmltab

Command-line utility to convert HTML tables into CSV files

Stars: ✭ 13 (+0%)

Mutual labels: scraping

html-table-to-json

Generate JSON representations of HTML tables

Stars: ✭ 39 (+200%)

Mutual labels: scraping

chesf

CHeSF is the Chrome Headless Scraping Framework, a very very alpha code to scrape javascript intensive web pages

Stars: ✭ 18 (+38.46%)

Mutual labels: scraping

stateOfVeganism

🌱 Get insights into the current state of Veganism around the world based on global news

Stars: ✭ 26 (+100%)

Mutual labels: insights

browser-automation-api

Browser automation API for repetitive web-based tasks, with a friendly user interface. You can use it to scrape content or do many other things like capture a screenshot, generate pdf, extract content or execute custom Puppeteer, Playwright functions.

Stars: ✭ 24 (+84.62%)

Mutual labels: scraping

trafilatura

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Stars: ✭ 711 (+5369.23%)

Mutual labels: scraping

angel.co-companies-list-scraping

No description or website provided.

Stars: ✭ 54 (+315.38%)

Mutual labels: scraping

ksoup

Kotlin Wrapper for Jsoup