AntchAntch, a fast, powerful and extensible web crawling & scraping framework for Go
Stars: ✭ 198 (-30.77%)
CollyElegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+5331.82%)
ScrapyScrapy, a fast high-level web crawling & scraping framework for Python.
Stars: ✭ 42,343 (+14705.24%)
Lulu[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+175.87%)
WebmagicA scalable web crawler framework for Java.
Stars: ✭ 10,186 (+3461.54%)
FerretDeclarative web scraping
Stars: ✭ 4,837 (+1591.26%)
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (-3.15%)
Linkedin Profile Scraper🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (-40.21%)
DotnetcrawlerDotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (-65.03%)
TransistorTransistor, a Python web scraping framework for intelligent use cases.
Stars: ✭ 205 (-28.32%)
CrawlyCrawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (+53.85%)
bots-zooNo description or website provided.
Stars: ✭ 59 (-79.37%)
ScrapyrtHTTP API for Scrapy spiders
Stars: ✭ 637 (+122.73%)
Creeper🐾 Creeper - The Next Generation Crawler Framework (Go)
Stars: ✭ 762 (+166.43%)
Social ScraperTổng hợp script crawl dữ liệu từ các mạng xã hội & website tiếng Việt
Stars: ✭ 47 (-83.57%)
Awesome Python Primer自学入门 Python 优质中文资源索引,包含 书籍 / 文档 / 视频,适用于 爬虫 / Web / 数据分析 / 机器学习 方向
Stars: ✭ 57 (-80.07%)
D4n155OWASP D4N155 - Intelligent and dynamic wordlist using OSINT
Stars: ✭ 105 (-63.29%)
SquidwarcSquidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Stars: ✭ 125 (-56.29%)
NewspaperNews, full-text, and article metadata extraction in Python 3. Advanced docs:
Stars: ✭ 11,545 (+3936.71%)
Instagram BotAn Instagram bot developed using the Selenium Framework
Stars: ✭ 138 (-51.75%)
N2h4네이버 뉴스 수집을 위한 도구
Stars: ✭ 177 (-38.11%)
CrawlerGo process used to crawl websites
Stars: ✭ 147 (-48.6%)
Price Monitor京东商品价格监控:监控用户设定商品价格,降价邮件/微信提醒。技术:Python爬虫/IP代理池/JS接口爬取/Selenium页面爬取
Stars: ✭ 634 (+121.68%)
Course Crawler🎓 中国大学MOOC、学堂在线、网易云课堂、好大学在线、爱课程 MOOC 课程下载。
Stars: ✭ 611 (+113.64%)
NewcrawlerFree Web Scraping Tool with Java
Stars: ✭ 589 (+105.94%)
GeziyorGeziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Stars: ✭ 1,246 (+335.66%)
ArachnidPowerful web scraping framework for Crystal
Stars: ✭ 68 (-76.22%)
ArachnidCrawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites
Stars: ✭ 224 (-21.68%)
info-bot🤖 A Versatile Telegram Bot
Stars: ✭ 37 (-87.06%)
double-agentA test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (-56.99%)
Docs《数据采集从入门到放弃》源码。内容简介:爬虫介绍、就业情况、爬虫工程师面试题 ;HTTP协议介绍; Requests使用 ;解析器Xpath介绍; MongoDB与MySQL; 多线程爬虫; Scrapy介绍 ;Scrapy-redis介绍; 使用docker部署; 使用nomad管理docker集群; 使用EFK查询docker日志
Stars: ✭ 118 (-58.74%)
DecryptloginAPIs for loginning some websites by using requests.
Stars: ✭ 1,861 (+550.7%)
Zhihu Spider一个获取知乎用户主页信息的多线程Python爬虫程序。
Stars: ✭ 137 (-52.1%)
crawling-frameworkEasily crawl news portals or blog sites using Storm Crawler.
Stars: ✭ 22 (-92.31%)
socials👨👩👦 Social account detection and extraction in Python, e.g. for crawling/scraping.
Stars: ✭ 37 (-87.06%)
zcrawlAn open source web crawling platform
Stars: ✭ 21 (-92.66%)
proxycrawl-pythonProxyCrawl Python library for scraping and crawling
Stars: ✭ 51 (-82.17%)
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+1002.8%)
Goose ParserUniversal scrapping tool, which allows you to extract data using multiple environments
Stars: ✭ 211 (-26.22%)
scrapy-fieldstatsA Scrapy extension to log items coverage when the spider shuts down
Stars: ✭ 17 (-94.06%)
diffbot-php-client[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: ✭ 53 (-81.47%)
Php Curl ClassPHP Curl Class makes it easy to send HTTP requests and integrate with web APIs
Stars: ✭ 2,903 (+915.03%)
GooglescraperA Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
Stars: ✭ 2,363 (+726.22%)
scrapy-distributedA series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (-86.71%)
pompScreen scraping and web crawling framework
Stars: ✭ 61 (-78.67%)
img-cliAn interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL
Stars: ✭ 15 (-94.76%)
go-scrapyWeb crawling and scraping framework for Golang
Stars: ✭ 17 (-94.06%)
papercutPapercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-94.76%)
flink-crawlerContinuous scalable web crawler built on top of Flink and crawler-commons
Stars: ✭ 48 (-83.22%)
ScrappleA framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+62.24%)
Web Bee🐝 Web vertical crawler framework for fun
Stars: ✭ 184 (-35.66%)
wget-luaWget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (-81.82%)