Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.

Stars: ✭ 15 (-98.1%)

Mutual labels: crawler, scraper, scraping

Geziyor

Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.

Stars: ✭ 1,246 (+57.92%)

Mutual labels: crawler, scraper, scraping

Zeiver

A Scraper, Downloader, & Recorder for static open directories.

Stars: ✭ 14 (-98.23%)

Mutual labels: scraper, downloader, scraping

Gopa

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

Stars: ✭ 277 (-64.89%)

Mutual labels: crawler, scraping, crawling

img-cli

An interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL

Stars: ✭ 15 (-98.1%)

Mutual labels: crawler, downloader, crawling

Annie

👾 Fast and simple video download library and CLI tool written in Go

Stars: ✭ 16,369 (+1974.65%)

Mutual labels: crawler, scraper, downloader

Goose Parser

Universal scrapping tool, which allows you to extract data using multiple environments

Stars: ✭ 211 (-73.26%)

Mutual labels: crawler, scraper, scraping

Scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Stars: ✭ 42,343 (+5266.67%)

Mutual labels: crawler, scraping, crawling

Autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

Stars: ✭ 4,077 (+416.73%)

Mutual labels: crawler, scraper, scraping

Sasila

一个灵活、友好的爬虫框架

Stars: ✭ 286 (-63.75%)

Mutual labels: crawler, scraping, crawling

Dataflowkit

Extract structured data from web sites. Web sites scraping.

Stars: ✭ 456 (-42.21%)

Mutual labels: scraper, scraping, crawling

Dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (-87.33%)

Mutual labels: crawler, scraping, crawling

Newspaper

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Stars: ✭ 11,545 (+1363.24%)

Mutual labels: crawler, scraper, crawling

diffbot-php-client

[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library

Stars: ✭ 53 (-93.28%)

Mutual labels: scraper, scraping, crawling

proxycrawl-python

ProxyCrawl Python library for scraping and crawling

Stars: ✭ 51 (-93.54%)

Mutual labels: scraper, scraping, crawling

Newcrawler

Free Web Scraping Tool with Java

Stars: ✭ 589 (-25.35%)

Mutual labels: crawler, scraping

patreon-scraper

WIP Patreon attachment download written in TypeScript

Stars: ✭ 25 (-96.83%)

Mutual labels: scraper, downloader

scrapy facebooker

Collection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.

Stars: ✭ 22 (-97.21%)

Mutual labels: scraper, scraping

whatsapp-tracking

Scraping the status of WhatsApp contacts

Stars: ✭ 49 (-93.79%)

Mutual labels: scraper, scraping

pomp

Screen scraping and web crawling framework

Stars: ✭ 61 (-92.27%)

Mutual labels: scraping, crawling

Scraper-Projects

🕸 List of mini projects that involve web scraping 🕸

Stars: ✭ 25 (-96.83%)

Mutual labels: scraper, scraping

feedsearch-crawler

Crawl sites for RSS, Atom, and JSON feeds.

Stars: ✭ 23 (-97.08%)

Mutual labels: scraping, crawling

flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

Stars: ✭ 48 (-93.92%)

Mutual labels: crawler, crawling

arachnod

High performance crawler for Nodejs

Stars: ✭ 17 (-97.85%)

Mutual labels: crawler, scraper

TorScrapper

A Scraper made 100% in Python using BeautifulSoup and Tor. It can be used to scrape both normal and onion links. Happy Scraping :)

Stars: ✭ 24 (-96.96%)

Mutual labels: scraper, scraping

TumblTwo

TumblTwo, an Improved Fork of TumblOne, a Tumblr Downloader.

Stars: ✭ 57 (-92.78%)

Mutual labels: crawler, downloader

ARGUS

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9

Stars: ✭ 68 (-91.38%)

Mutual labels: scraping, crawling

dijnet-bot

Az összes számlád még egy helyen :)

Stars: ✭ 17 (-97.85%)

Mutual labels: crawler, scraper

facebook-discussion-tk

A collection of tools to (semi-)automatically collect and analyze data from online discussions on Facebook groups and pages.

Stars: ✭ 33 (-95.82%)

Mutual labels: scraper, scraping

fiction-dl

A content downloader, capable of retrieving works of (fan)fiction from the web and saving them in a few common file formats.

Stars: ✭ 22 (-97.21%)

Mutual labels: scraper, downloader

lightnovel epub

🍭 epub generator for (light)novels (轻) 小说 epub 生成器，支持站点：轻之国度、轻小说文库

Stars: ✭ 89 (-88.72%)

Mutual labels: crawler, scraper

MyCrawler

我的爬虫合集

Stars: ✭ 55 (-93.03%)

Mutual labels: crawler, scraper

Spidy

The simple, easy to use command line web crawler.

Stars: ✭ 257 (-67.43%)

Mutual labels: crawler, crawling

Fbcrawl

A Facebook crawler

Stars: ✭ 536 (-32.07%)

Mutual labels: crawler, scraper

Captcha-Tools

All-in-one Python (And now Go!) module to help solve captchas with Capmonster, 2captcha and Anticaptcha API's!

Stars: ✭ 23 (-97.08%)

Mutual labels: scraper, scraping

scraper

Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom.

Stars: ✭ 37 (-95.31%)

Mutual labels: scraper, scraping

weibo-scraper

Simple Weibo Scraper

Stars: ✭ 50 (-93.66%)

Mutual labels: crawler, scraper

Crawler

A high performance web crawler in Elixir.

Stars: ✭ 781 (-1.01%)

Mutual labels: crawler, scraper

Imagescraper

✂️ High performance, multi-threaded image scraper

Stars: ✭ 630 (-20.15%)

Mutual labels: scraper, scraping

Scrapy Crawlera

Crawlera middleware for Scrapy

Stars: ✭ 281 (-64.39%)

Mutual labels: crawler, scraping

Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy

Stars: ✭ 309 (-60.84%)

Mutual labels: scraper, scraping

Rcrawler

An R web crawler and scraper

Stars: ✭ 274 (-65.27%)

Mutual labels: crawler, scraper

Hquery.php

An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.

Stars: ✭ 295 (-62.61%)

Mutual labels: crawler, scraper

Spidermon

Scrapy Extension for monitoring spiders execution.

Stars: ✭ 309 (-60.84%)

Mutual labels: scraping, crawling

Pornhub Downloader

Download videos from pornhub.

Stars: ✭ 346 (-56.15%)

Mutual labels: crawler, downloader

Xcrawler

快速、简洁且强大的PHP爬虫框架

Stars: ✭ 344 (-56.4%)

Mutual labels: crawler, scraper

Freshonions Torscraper

Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion

Stars: ✭ 348 (-55.89%)

Mutual labels: crawler, scraper

Apify Js

Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

Stars: ✭ 3,154 (+299.75%)

Mutual labels: scraping, crawling

Webster

a reliable high-level web crawling & scraping framework for Node.js.

Stars: ✭ 364 (-53.87%)

Mutual labels: crawler, crawling

Katana

A Python Tool For google Hacking

Stars: ✭ 355 (-55.01%)

Mutual labels: scraper, scraping

1-60 of 1299 similar projects

›

next*5