Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.

Stars: ✭ 15 (-96.59%)

Mutual labels: crawler, scraper, scraping

Django Dynamic Scraper

Creating Scrapy scrapers via the Django admin interface

Stars: ✭ 1,024 (+132.73%)

Mutual labels: spider, scraper, scraping

Antch

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

Stars: ✭ 198 (-55%)

Mutual labels: crawler, scraping, crawling

Awesome Crawler

A collection of awesome web crawler,spider in different languages

Stars: ✭ 4,793 (+989.32%)

Mutual labels: crawler, spider, scraper

Webster

a reliable high-level web crawling & scraping framework for Node.js.

Stars: ✭ 364 (-17.27%)

Mutual labels: crawler, spider, crawling

Scrapyrt

HTTP API for Scrapy spiders

Stars: ✭ 637 (+44.77%)

Mutual labels: crawler, scraper, crawling

Fbcrawl

A Facebook crawler

Stars: ✭ 536 (+21.82%)

Mutual labels: crawler, spider, scraper

Newcrawler

Free Web Scraping Tool with Java

Stars: ✭ 589 (+33.86%)

Mutual labels: crawler, spider, scraping

Spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (+49.09%)

Mutual labels: crawler, spider, scraper

Arachnid

Powerful web scraping framework for Crystal

Stars: ✭ 68 (-84.55%)

Mutual labels: crawler, spider, crawling

Scrapit

Scraping scripts for various websites.

Stars: ✭ 25 (-94.32%)

Mutual labels: crawler, spider, scraper

Freshonions Torscraper

Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion

Stars: ✭ 348 (-20.91%)

Mutual labels: crawler, spider, scraper

Sasila

一个灵活、友好的爬虫框架

Stars: ✭ 286 (-35%)

Mutual labels: crawler, scraping, crawling

Not Your Average Web Crawler

A web crawler (for bug hunting) that gathers more than you can imagine.

Stars: ✭ 107 (-75.68%)

Mutual labels: crawler, spider, scraper

Avbook

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

Stars: ✭ 8,133 (+1748.41%)

Mutual labels: crawler, spider, scraper

Dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (-77.27%)

Mutual labels: crawler, scraping, crawling

Newspaper

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Stars: ✭ 11,545 (+2523.86%)

Mutual labels: crawler, scraper, crawling

Xcrawler

快速、简洁且强大的PHP爬虫框架

Stars: ✭ 344 (-21.82%)

Mutual labels: crawler, spider, scraper

Awesome Python Primer

自学入门 Python 优质中文资源索引，包含书籍 / 文档 / 视频，适用于爬虫 / Web / 数据分析 / 机器学习方向

Stars: ✭ 57 (-87.05%)

Mutual labels: crawler, spider, scraping

Dataflowkit

Extract structured data from web sites. Web sites scraping.

Stars: ✭ 456 (+3.64%)

Mutual labels: scraper, scraping, crawling

Skycaiji

蓝天采集器是一款免费的数据采集发布爬虫软件，采用php+mysql开发，可部署在云服务器，几乎能采集所有类型的网页，无缝对接各类CMS建站程序，免登录实时发布数据，全自动无需人工干预！是网页大数据采集软件中完全跨平台的云端爬虫系统

Stars: ✭ 1,514 (+244.09%)

Mutual labels: crawler, spider, crawling

Goose Parser

Universal scrapping tool, which allows you to extract data using multiple environments

Stars: ✭ 211 (-52.05%)

Mutual labels: crawler, scraper, scraping

crawler-chrome-extensions

爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer

Stars: ✭ 53 (-87.95%)

Mutual labels: scraper, spider, scraping

proxycrawl-python

ProxyCrawl Python library for scraping and crawling

Stars: ✭ 51 (-88.41%)

Mutual labels: scraper, scraping, crawling

Easy Scraping Tutorial

Simple but useful Python web scraping tutorial code.

Stars: ✭ 583 (+32.5%)

Mutual labels: crawler, scraping, crawling

diffbot-php-client

[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library

Stars: ✭ 53 (-87.95%)

Mutual labels: scraper, scraping, crawling

Crawler

A high performance web crawler in Elixir.

Stars: ✭ 781 (+77.5%)

Mutual labels: crawler, spider, scraper

Goribot

[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。

Stars: ✭ 190 (-56.82%)

Mutual labels: crawler, spider, scraper

scrapy-distributed

A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.

Stars: ✭ 38 (-91.36%)

Mutual labels: spider, scraping, crawling

Autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

Stars: ✭ 4,077 (+826.59%)

Mutual labels: crawler, scraper, scraping

Scraper-Projects

🕸 List of mini projects that involve web scraping 🕸

Stars: ✭ 25 (-94.32%)

Mutual labels: scraper, scraping

pomp

Screen scraping and web crawling framework

Stars: ✭ 61 (-86.14%)

Mutual labels: scraping, crawling

feedsearch-crawler

Crawl sites for RSS, Atom, and JSON feeds.

Stars: ✭ 23 (-94.77%)

Mutual labels: scraping, crawling

talospider

talospider - A simple,lightweight scraping micro-framework

Stars: ✭ 57 (-87.05%)

Mutual labels: spider, crawling

whatsapp-tracking

Scraping the status of WhatsApp contacts

Stars: ✭ 49 (-88.86%)

Mutual labels: scraper, scraping

img-cli

An interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL

Stars: ✭ 15 (-96.59%)

Mutual labels: crawler, crawling

crawler

A simple and flexible web crawler framework for java.

Stars: ✭ 20 (-95.45%)

Mutual labels: crawler, spider

Captcha-Tools

All-in-one Python (And now Go!) module to help solve captchas with Capmonster, 2captcha and Anticaptcha API's!

Stars: ✭ 23 (-94.77%)

Mutual labels: scraper, scraping

TorScrapper

A Scraper made 100% in Python using BeautifulSoup and Tor. It can be used to scrape both normal and onion links. Happy Scraping :)

Stars: ✭ 24 (-94.55%)

Mutual labels: scraper, scraping

Spider Flow

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

Stars: ✭ 365 (-17.05%)

Mutual labels: crawler, spider

WebCrawler

一个轻量级、快速、多线程、多管道、灵活配置的网络爬虫。

Stars: ✭ 39 (-91.14%)

Mutual labels: crawler, spider

slime

🍰 一个可视化的爬虫平台

Stars: ✭ 27 (-93.86%)

Mutual labels: crawler, spider

dijnet-bot

Az összes számlád még egy helyen :)

Stars: ✭ 17 (-96.14%)

Mutual labels: crawler, scraper

scraper

Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom.

Stars: ✭ 37 (-91.59%)

Mutual labels: scraper, scraping

ZhengFang System Spider

🐛一只登录正方教务管理系统，爬取数据的小爬虫

Stars: ✭ 21 (-95.23%)

Mutual labels: crawler, spider

1-60 of 1201 similar projects

›

next*5