WebmagicA scalable web crawler framework for Java.
Stars: ✭ 10,186 (+10086%)
Docs《数据采集从入门到放弃》源码。内容简介:爬虫介绍、就业情况、爬虫工程师面试题 ;HTTP协议介绍; Requests使用 ;解析器Xpath介绍; MongoDB与MySQL; 多线程爬虫; Scrapy介绍 ;Scrapy-redis介绍; 使用docker部署; 使用nomad管理docker集群; 使用EFK查询docker日志
Stars: ✭ 118 (+18%)
Crawler爬虫, http代理, 模拟登陆!
Stars: ✭ 106 (+6%)
SquidwarcSquidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Stars: ✭ 125 (+25%)
Crawlab LiteLite version of Crawlab. 轻量版 Crawlab 爬虫管理平台
Stars: ✭ 122 (+22%)
Youtube ProjectsThis repository contains all the code I use in my YouTube tutorials.
Stars: ✭ 144 (+44%)
D4n155OWASP D4N155 - Intelligent and dynamic wordlist using OSINT
Stars: ✭ 105 (+5%)
N2h4네이버 뉴스 수집을 위한 도구
Stars: ✭ 177 (+77%)
Goose ParserUniversal scrapping tool, which allows you to extract data using multiple environments
Stars: ✭ 211 (+111%)
SpidermonScrapy Extension for monitoring spiders execution.
Stars: ✭ 309 (+209%)
SeleniumcrawlerAn example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
Stars: ✭ 117 (+17%)
socials👨👩👦 Social account detection and extraction in Python, e.g. for crawling/scraping.
Stars: ✭ 37 (-63%)
SimplcommerceA simple, cross platform, modularized ecommerce system built on .NET Core
Stars: ✭ 3,474 (+3374%)
Vaultswiss army knife for hackers
Stars: ✭ 346 (+246%)
zcrawlAn open source web crawling platform
Stars: ✭ 21 (-79%)
crawling-frameworkEasily crawl news portals or blog sites using Storm Crawler.
Stars: ✭ 22 (-78%)
InstaBotSimple and friendly Bot for Instagram, using Selenium and Scrapy with Python.
Stars: ✭ 32 (-68%)
PoliteBe nice on the web
Stars: ✭ 253 (+153%)
chesfCHeSF is the Chrome Headless Scraping Framework, a very very alpha code to scrape javascript intensive web pages
Stars: ✭ 18 (-82%)
go-scrapyWeb crawling and scraping framework for Golang
Stars: ✭ 17 (-83%)
wget-luaWget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (-48%)
Module Shop一个基于 .NET Core构建的简单、跨平台、模块化的商城系统
Stars: ✭ 398 (+298%)
hk0weatherWeb scraper project to collect the useful Hong Kong weather data from HKO website
Stars: ✭ 49 (-51%)
pompScreen scraping and web crawling framework
Stars: ✭ 61 (-39%)
WswpCode for the second edition Web Scraping with Python book by Packt Publications
Stars: ✭ 112 (+12%)
flink-crawlerContinuous scalable web crawler built on top of Flink and crawler-commons
Stars: ✭ 48 (-52%)
memes-apiAPI for scrapping common meme sites
Stars: ✭ 17 (-83%)
img-cliAn interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL
Stars: ✭ 15 (-85%)
DotnetspiderDotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework
Stars: ✭ 3,233 (+3133%)
Webstera reliable high-level web crawling & scraping framework for Node.js.
Stars: ✭ 364 (+264%)
DataflowkitExtract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (+356%)
Email ExtractorThe main functionality is to extract all the emails from one or several URLs - La funcionalidad principal es extraer todos los correos electrónicos de una o varias Url
Stars: ✭ 81 (-19%)
LinkedinLinkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy
Stars: ✭ 309 (+209%)
FbcrawlA Facebook crawler
Stars: ✭ 536 (+436%)
Scrapy SeleniumScrapy middleware to handle javascript pages using selenium
Stars: ✭ 550 (+450%)
IcrawlerA multi-thread crawler framework with many builtin image crawlers provided.
Stars: ✭ 629 (+529%)
Awesome Python Primer自学入门 Python 优质中文资源索引,包含 书籍 / 文档 / 视频,适用于 爬虫 / Web / 数据分析 / 机器学习 方向
Stars: ✭ 57 (-43%)
papercutPapercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-85%)
Scrapy RedisRedis-based components for Scrapy.
Stars: ✭ 4,998 (+4898%)
Gazpacho🥫 The simple, fast, and modern web scraping library
Stars: ✭ 525 (+425%)
Haipproxy💖 High available distributed ip proxy pool, powerd by Scrapy and Redis
Stars: ✭ 4,993 (+4893%)
NewcrawlerFree Web Scraping Tool with Java
Stars: ✭ 589 (+489%)
Scrapy ClusterThis Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
Stars: ✭ 921 (+821%)
Awesome PuppeteerA curated list of awesome puppeteer resources.
Stars: ✭ 1,728 (+1628%)
MemoriousDistributed crawling framework for documents and structured data.
Stars: ✭ 248 (+148%)
Pdf downloaderA Scrapy Spider for downloading PDF files from a webpage.
Stars: ✭ 18 (-82%)
ConfigsPublic, free to use, repository with diggers configs for scraping / extracting data from various e-commerce websites and online stores
Stars: ✭ 37 (-63%)
CrawlabDistributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Stars: ✭ 8,392 (+8292%)