DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (-41.52%)

Mutual labels: crawler, scraping, crawling

Jsonframe Cheerio

simple multi-level scraper json input/output for Cheerio

Stars: ✭ 196 (+14.62%)

Mutual labels: json, scraper, scraping

Scrape Linkedin Selenium

`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.

Stars: ✭ 239 (+39.77%)

Mutual labels: linkedin, scraper, scraping

double-agent

A test suite of common scraper detection techniques. See how detectable your scraper stack is.

Stars: ✭ 123 (-28.07%)

Mutual labels: scraping, crawling, puppeteer

Not Your Average Web Crawler

A web crawler (for bug hunting) that gathers more than you can imagine.

Stars: ✭ 107 (-37.43%)

Mutual labels: crawler, spider, scraper

diffbot-php-client

[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library

Stars: ✭ 53 (-69.01%)

Mutual labels: scraper, scraping, crawling

LInkedIn-Reverese-Lookup

🔎Search LinkedIn profile by email address📧

Stars: ✭ 20 (-88.3%)

Mutual labels: linkedin, scraping, puppeteer

whatsapp-tracking

Scraping the status of WhatsApp contacts

Stars: ✭ 49 (-71.35%)

Mutual labels: scraper, scraping, puppeteer

papercut

Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.

Stars: ✭ 15 (-91.23%)

Mutual labels: crawler, scraper, scraping

arachnod

High performance crawler for Nodejs

Stars: ✭ 17 (-90.06%)

Mutual labels: crawler, scraper, spider

Sasila

一个灵活、友好的爬虫框架

Stars: ✭ 286 (+67.25%)

Mutual labels: crawler, scraping, crawling

Querylist

🕷️ The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

Stars: ✭ 2,392 (+1298.83%)

Mutual labels: crawler, spider, scraper

proxycrawl-python

ProxyCrawl Python library for scraping and crawling

Stars: ✭ 51 (-70.18%)

Mutual labels: scraper, scraping, crawling

Awesome Puppeteer

A curated list of awesome puppeteer resources.

Stars: ✭ 1,728 (+910.53%)

Mutual labels: scraping, crawling, puppeteer

Squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head

Stars: ✭ 125 (-26.9%)

Mutual labels: crawler, crawling, puppeteer

Freshonions Torscraper

Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion

Stars: ✭ 348 (+103.51%)

Mutual labels: crawler, spider, scraper

Ppspider

web spider built by puppeteer, support task-queue and task-scheduling by decorators，support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架，提供灵活的任务队列管理调度方案，提供便捷的数据保存方案（nedb/mongodb），提供数据可视化和用户交互的实现方案

Stars: ✭ 237 (+38.6%)

Mutual labels: crawler, spider, puppeteer

Chromium for spider

dynamic crawler for web vulnerability scanner

Stars: ✭ 220 (+28.65%)

Mutual labels: crawler, spider, puppeteer

crawler-chrome-extensions

爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer

Stars: ✭ 53 (-69.01%)

Mutual labels: scraper, spider, scraping

Goose Parser

Universal scrapping tool, which allows you to extract data using multiple environments

Stars: ✭ 211 (+23.39%)

Mutual labels: crawler, scraper, scraping

Skycaiji

蓝天采集器是一款免费的数据采集发布爬虫软件，采用php+mysql开发，可部署在云服务器，几乎能采集所有类型的网页，无缝对接各类CMS建站程序，免登录实时发布数据，全自动无需人工干预！是网页大数据采集软件中完全跨平台的云端爬虫系统

Stars: ✭ 1,514 (+785.38%)

Mutual labels: crawler, spider, crawling

socials

👨‍👩‍👦 Social account detection and extraction in Python, e.g. for crawling/scraping.

Stars: ✭ 37 (-78.36%)

Mutual labels: linkedin, scraping, crawling

Scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Stars: ✭ 42,343 (+24661.99%)

Mutual labels: crawler, scraping, crawling

Gosint

OSINT Swiss Army Knife

Stars: ✭ 401 (+134.5%)

Mutual labels: crawler, spider, scraper

Newspaper

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Stars: ✭ 11,545 (+6651.46%)

Mutual labels: crawler, scraper, crawling

scrapy facebooker

Collection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.

Stars: ✭ 22 (-87.13%)

Mutual labels: scraper, spider, scraping

scrapy-distributed

A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.

Stars: ✭ 38 (-77.78%)

Mutual labels: spider, scraping, crawling

flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

Stars: ✭ 48 (-71.93%)

Mutual labels: crawler, spider, crawling

Awesome Crawler

A collection of awesome web crawler,spider in different languages

Stars: ✭ 4,793 (+2702.92%)

Mutual labels: crawler, spider, scraper

Apify Js

Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

Stars: ✭ 3,154 (+1744.44%)

Mutual labels: scraping, crawling, puppeteer

Scrapedin

LinkedIn Scraper (currently working 2020)

Stars: ✭ 453 (+164.91%)

Mutual labels: linkedin, crawler, scraper

Fbcrawl

A Facebook crawler

Stars: ✭ 536 (+213.45%)

Mutual labels: crawler, spider, scraper

Easy Scraping Tutorial

Simple but useful Python web scraping tutorial code.

Stars: ✭ 583 (+240.94%)

Mutual labels: crawler, scraping, crawling

Xcrawler

快速、简洁且强大的PHP爬虫框架

Stars: ✭ 344 (+101.17%)

Mutual labels: crawler, spider, scraper

Crawler

A high performance web crawler in Elixir.

Stars: ✭ 781 (+356.73%)

Mutual labels: crawler, spider, scraper

Autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

Stars: ✭ 4,077 (+2284.21%)

Mutual labels: crawler, scraper, scraping

Scrapit

Scraping scripts for various websites.

Stars: ✭ 25 (-85.38%)

Mutual labels: crawler, spider, scraper

Django Dynamic Scraper

Creating Scrapy scrapers via the Django admin interface

Stars: ✭ 1,024 (+498.83%)

Mutual labels: spider, scraper, scraping

Dataflowkit

Extract structured data from web sites. Web sites scraping.

Stars: ✭ 456 (+166.67%)

Mutual labels: scraper, scraping, crawling

Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy

Stars: ✭ 309 (+80.7%)

Mutual labels: linkedin, scraper, scraping

Spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (+283.63%)

Mutual labels: crawler, spider, scraper

Arachnid

Powerful web scraping framework for Crystal

Stars: ✭ 68 (-60.23%)

Mutual labels: crawler, spider, crawling

Avbook

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

Stars: ✭ 8,133 (+4656.14%)

Mutual labels: crawler, spider, scraper

Scrapyrt

HTTP API for Scrapy spiders

Stars: ✭ 637 (+272.51%)

Mutual labels: crawler, scraper, crawling

Jvppeteer

Headless Chrome For Java （Java 爬虫）

Stars: ✭ 193 (+12.87%)

Mutual labels: crawler, scraper, puppeteer

Antch

Antch, a fast, powerful and extensible web crawling & scraping framework for Go