DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (-65.03%)

Mutual labels: crawler, scraping, crawling

Headless Chrome Crawler

Distributed crawler powered by Headless Chrome

Stars: ✭ 5,129 (+1693.36%)

Mutual labels: crawler, scraping, crawling

Transistor

Transistor, a Python web scraping framework for intelligent use cases.

Stars: ✭ 205 (-28.32%)

Mutual labels: scraping, framework, requests

Crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

Stars: ✭ 440 (+53.85%)

Mutual labels: crawler, scraping, crawling

bots-zoo

No description or website provided.

Stars: ✭ 59 (-79.37%)

Mutual labels: crawler, scraping, crawling

Scrapyrt

HTTP API for Scrapy spiders

Stars: ✭ 637 (+122.73%)

Mutual labels: crawler, crawling

Creeper

🐾 Creeper - The Next Generation Crawler Framework (Go)

Stars: ✭ 762 (+166.43%)

Mutual labels: crawler, framework

Social Scraper

Tổng hợp script crawl dữ liệu từ các mạng xã hội & website tiếng Việt

Stars: ✭ 47 (-83.57%)

Mutual labels: crawler, requests

Weibo Album Crawler

新浪微博相册大图多线程爬虫。

Stars: ✭ 83 (-70.98%)

Mutual labels: crawler, requests

Awesome Python Primer

自学入门 Python 优质中文资源索引，包含书籍 / 文档 / 视频，适用于爬虫 / Web / 数据分析 / 机器学习方向

Stars: ✭ 57 (-80.07%)

Mutual labels: crawler, scraping

D4n155

OWASP D4N155 - Intelligent and dynamic wordlist using OSINT

Stars: ✭ 105 (-63.29%)

Mutual labels: crawler, scraping

Squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head

Stars: ✭ 125 (-56.29%)

Mutual labels: crawler, crawling

Newspaper

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Stars: ✭ 11,545 (+3936.71%)

Mutual labels: crawler, crawling

Instagram Bot

An Instagram bot developed using the Selenium Framework

Stars: ✭ 138 (-51.75%)

Mutual labels: crawler, crawling

Scrapingoutsourcing

ScrapingOutsourcing专注分享爬虫代码尽量每周更新一个

Stars: ✭ 164 (-42.66%)

Mutual labels: crawler, requests

N2h4

네이버 뉴스 수집을 위한 도구

Stars: ✭ 177 (-38.11%)

Mutual labels: crawler, crawling

Crawler

Go process used to crawl websites

Stars: ✭ 147 (-48.6%)

Mutual labels: crawler, crawling

Scrapy Crawlera

Crawlera middleware for Scrapy

Stars: ✭ 281 (-1.75%)

Mutual labels: crawler, scraping

Price Monitor

京东商品价格监控：监控用户设定商品价格，降价邮件/微信提醒。技术：Python爬虫/IP代理池/JS接口爬取/Selenium页面爬取

Stars: ✭ 634 (+121.68%)

Mutual labels: crawler, requests

Course Crawler

🎓 中国大学MOOC、学堂在线、网易云课堂、好大学在线、爱课程 MOOC 课程下载。

Stars: ✭ 611 (+113.64%)

Mutual labels: crawler, requests

Newcrawler

Free Web Scraping Tool with Java

Stars: ✭ 589 (+105.94%)

Mutual labels: crawler, scraping

Geziyor

Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.

Stars: ✭ 1,246 (+335.66%)

Mutual labels: crawler, scraping

Arachnid

Powerful web scraping framework for Crystal

Stars: ✭ 68 (-76.22%)

Mutual labels: crawler, crawling

Arachnid

Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites

Stars: ✭ 224 (-21.68%)

Mutual labels: crawler, scraping

info-bot

🤖 A Versatile Telegram Bot

Stars: ✭ 37 (-87.06%)

Mutual labels: scraping, requests

double-agent

A test suite of common scraper detection techniques. See how detectable your scraper stack is.

Stars: ✭ 123 (-56.99%)

Mutual labels: scraping, crawling

Docs

《数据采集从入门到放弃》源码。内容简介：爬虫介绍、就业情况、爬虫工程师面试题；HTTP协议介绍； Requests使用；解析器Xpath介绍； MongoDB与MySQL；多线程爬虫； Scrapy介绍；Scrapy-redis介绍；使用docker部署；使用nomad管理docker集群；使用EFK查询docker日志

Stars: ✭ 118 (-58.74%)

Mutual labels: crawler, requests

Decryptlogin

APIs for loginning some websites by using requests.

Stars: ✭ 1,861 (+550.7%)

Mutual labels: crawler, requests

Zhihu Spider

一个获取知乎用户主页信息的多线程Python爬虫程序。

Stars: ✭ 137 (-52.1%)

Mutual labels: crawler, requests

Bilibili member crawler

B站用户爬虫好耶~是爬虫

Stars: ✭ 115 (-59.79%)

Mutual labels: crawler, requests

crawling-framework

Easily crawl news portals or blog sites using Storm Crawler.

Stars: ✭ 22 (-92.31%)

Mutual labels: scraping, crawling

socials

👨‍👩‍👦 Social account detection and extraction in Python, e.g. for crawling/scraping.

Stars: ✭ 37 (-87.06%)

Mutual labels: scraping, crawling

zcrawl

An open source web crawling platform

Stars: ✭ 21 (-92.66%)

Mutual labels: scraping, crawling

proxycrawl-python

ProxyCrawl Python library for scraping and crawling

Stars: ✭ 51 (-82.17%)

Mutual labels: scraping, crawling

Apify Js

Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

Stars: ✭ 3,154 (+1002.8%)

Mutual labels: scraping, crawling

Goose Parser

Universal scrapping tool, which allows you to extract data using multiple environments

Stars: ✭ 211 (-26.22%)

Mutual labels: crawler, scraping

scrape-github-trending

Tutorial for web scraping / crawling with Node.js.

Stars: ✭ 42 (-85.31%)

Mutual labels: scraping, crawling

scrapy-fieldstats

A Scrapy extension to log items coverage when the spider shuts down

Stars: ✭ 17 (-94.06%)

Mutual labels: scraping, crawling

diffbot-php-client

[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library

Stars: ✭ 53 (-81.47%)

Mutual labels: scraping, crawling

Php Curl Class

PHP Curl Class makes it easy to send HTTP requests and integrate with web APIs

Stars: ✭ 2,903 (+915.03%)

Mutual labels: framework, requests

Googlescraper

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.

Stars: ✭ 2,363 (+726.22%)

Mutual labels: crawler, scraping

scrapy-distributed

A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.

Stars: ✭ 38 (-86.71%)

Mutual labels: scraping, crawling

feedsearch-crawler

Crawl sites for RSS, Atom, and JSON feeds.

Stars: ✭ 23 (-91.96%)

Mutual labels: scraping, crawling

pomp

Screen scraping and web crawling framework

Stars: ✭ 61 (-78.67%)

Mutual labels: scraping, crawling

img-cli

An interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL

Stars: ✭ 15 (-94.76%)

Mutual labels: crawler, crawling

go-scrapy

Web crawling and scraping framework for Golang

Stars: ✭ 17 (-94.06%)

Mutual labels: scraping, crawling

papercut

Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.

Stars: ✭ 15 (-94.76%)

Mutual labels: crawler, scraping

flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

Stars: ✭ 48 (-83.22%)

Mutual labels: crawler, crawling

Scrapple

A framework for creating semi-automatic web content extractors

Stars: ✭ 464 (+62.24%)

Mutual labels: crawler, scraping

Web Bee

🐝 Web vertical crawler framework for fun