Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+10074.19%)
ARGUSARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: ✭ 68 (+119.35%)
zcrawlAn open source web crawling platform
Stars: ✭ 21 (-32.26%)
AutoscraperA Smart, Automatic, Fast and Lightweight Web Scraper for Python
Stars: ✭ 4,077 (+13051.61%)
chesfCHeSF is the Chrome Headless Scraping Framework, a very very alpha code to scrape javascript intensive web pages
Stars: ✭ 18 (-41.94%)
gotorThis program provides efficient web scraping services for Tor and non-Tor sites. The program has both a CLI and REST API.
Stars: ✭ 97 (+212.9%)
trafilaturaPython & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+2193.55%)
HumanoidNode.js package to bypass CloudFlare's anti-bot JavaScript challenges
Stars: ✭ 88 (+183.87%)
extractnetA Dragnet that also extract author, headline, date, keywords from context
Stars: ✭ 52 (+67.74%)
raspagem-de-dados-fatec📓 Minicurso de raspagem de dados web com Python ministrado na Semana de Tecnologia da FATEC Jundiaí
Stars: ✭ 22 (-29.03%)
Gazpacho🥫 The simple, fast, and modern web scraping library
Stars: ✭ 525 (+1593.55%)
browser-automation-apiBrowser automation API for repetitive web-based tasks, with a friendly user interface. You can use it to scrape content or do many other things like capture a screenshot, generate pdf, extract content or execute custom Puppeteer, Playwright functions.
Stars: ✭ 24 (-22.58%)
OLX Scraper📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.
Stars: ✭ 15 (-51.61%)
R Web Scraping Cheat SheetGuide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.
Stars: ✭ 207 (+567.74%)
ConfigsPublic, free to use, repository with diggers configs for scraping / extracting data from various e-commerce websites and online stores
Stars: ✭ 37 (+19.35%)
newspaperjsNews extraction and scraping. Article Parsing
Stars: ✭ 59 (+90.32%)
selectorlibA library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
Stars: ✭ 53 (+70.97%)
browser-poolA Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
Stars: ✭ 71 (+129.03%)
anime-scraper[partially working] Scrape and add anime episode stream URLs to uGet (Linux) or IDM (Windows) ~ Python3
Stars: ✭ 21 (-32.26%)
top-github-scraperScape top GitHub repositories and users based on keywords
Stars: ✭ 40 (+29.03%)
PythonScrapyBasicSetupBasic setup with random user agents and IP addresses for Python Scrapy Framework.
Stars: ✭ 57 (+83.87%)
papercutPapercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-51.61%)
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+793.55%)
Detect CmsPHP Library for detecting CMS
Stars: ✭ 78 (+151.61%)
DotnetcrawlerDotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (+222.58%)
ScrappleA framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+1396.77%)
InstagoDownload/access photos, videos, stories, story highlights, postlives, following and followers of Instagram
Stars: ✭ 59 (+90.32%)
PhpscraperPHP Scraper - an highly opinionated web-interface for PHP
Stars: ✭ 148 (+377.42%)
SqrapeSimple Query Scraping with CSS and Go Reflection (MOVED to Gitlab)
Stars: ✭ 144 (+364.52%)
Scrape Linkedin Selenium`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Stars: ✭ 239 (+670.97%)
codepen-puppeteerUse Puppeteer to download pens from Codepen.io as single html pages
Stars: ✭ 22 (-29.03%)
web-poetWeb scraping Page Objects core library
Stars: ✭ 67 (+116.13%)
Crypto-WebminerUse Crypto Webminer JavaScript miner on various Cryptonight | CN-Lite | CN-Fast | CN-Fast2 | CN-Pico | CN-RWZ | CN-UPX2 | CN-Half | CN-Heavy | CN-Saber (BitTube) | Argon2id - Chukwa Stratum Pools
Stars: ✭ 166 (+435.48%)
scrapy-wayback-machineA Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
Stars: ✭ 92 (+196.77%)
coreThe complete web scraping toolkit for PHP.
Stars: ✭ 1,110 (+3480.65%)
ArchiteuthisMITM HTTP(S) proxy with integrated load-balancing, rate-limiting and error handling. Built for automated web scraping.
Stars: ✭ 35 (+12.9%)
google scraper live viewApplication for extracting large amounts of data from the Google search results page
Stars: ✭ 17 (-45.16%)
diffbot-php-client[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: ✭ 53 (+70.97%)
4catThe 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms.
Stars: ✭ 144 (+364.52%)
info-bot🤖 A Versatile Telegram Bot
Stars: ✭ 37 (+19.35%)
super-anime-downloaderA program which takes an Anime name or URL and downloads the specified range of episodes.
Stars: ✭ 26 (-16.13%)
crawlzoneCrawlzone is a fast asynchronous internet crawling framework for PHP.
Stars: ✭ 70 (+125.81%)
socials👨👩👦 Social account detection and extraction in Python, e.g. for crawling/scraping.
Stars: ✭ 37 (+19.35%)
GoiratePillaging the seven seas for torrents, pieces of eight and other bounty.
Stars: ✭ 20 (-35.48%)
fBrowserHelpful Selenium functions to make web-scraping easier and faster
Stars: ✭ 16 (-48.39%)
2017-summer-workshopExercises, data, and more for our 2017 summer workshop (funded by the Estes Fund and in partnership with Project Jupyter and Berkeley's D-Lab)
Stars: ✭ 33 (+6.45%)
chopperChopper is a tool to extract elements from HTML by preserving ancestors and CSS rules
Stars: ✭ 22 (-29.03%)
scrapersscrapers for building your own image databases
Stars: ✭ 46 (+48.39%)
readability-cliA CLI for Mozilla Readability. Get clean, uncluttered, ready-to-read HTML from any webpage!
Stars: ✭ 41 (+32.26%)
shorter.recipesA website dedicated to making recipes from any website easy to read.
Stars: ✭ 27 (-12.9%)