DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

✭ 100

csharp crawler dotnetcore scrapy scraping entity-framework-core webscraping ddd-architecture crawling

Grawler

Grawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.

✭ 98

automation proxy osint curl scraping crawling

Dig Etl Engine

Download DIG to run on your laptop or server.

✭ 81

search-engine information-extraction crawling etl-framework

Arachnid

Powerful web scraping framework for Crystal

✭ 68

crystal bot crawler spider web-scraping crawling web-scraper

Python Crawling Tutorial

Python crawling tutorial

✭ 57

python jupyter-notebook crawling

Crawling Projects

Web scraping and automation using python

✭ 49

python automation crawling

Pdf downloader

A Scrapy Spider for downloading PDF files from a webpage.

✭ 18

python scrapy crawling

Lulu

[Unmaintained] A simple and clean video/music/image downloader 👾

✭ 789

python python3 video crawler downloader scraper scraping crawling

Scrapyrt

HTTP API for Scrapy spiders

✭ 637

python crawler scraper scrapy crawling twisted

Easy Scraping Tutorial

Simple but useful Python web scraping tutorial code.

✭ 583

jupyter-notebook crawler asyncio regex scrapy scraping requests crawling beautifulsoup

Headless Chrome Crawler

Distributed crawler powered by Headless Chrome

✭ 5,129

javascript Dockerfile jquery chrome crawler promise scraper puppeteer scraping chromium headless-chrome crawling

Scrapy Selenium

Scrapy middleware to handle javascript pages using selenium

✭ 550

python selenium scrapy crawling

Ferret

Declarative web scraping

✭ 4,837

go HTML javascript hacktoberfest cli library chrome tool crawler scraper data-mining scraping crawling query-language scraping-websites cdp hacktoberfest2021

Dataflowkit

Extract structured data from web sites. Web sites scraping.

✭ 456

go golang scraper scraping headless crawling

Crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

✭ 440

elixir erlang crawler spider scraper scraping crawling

Isp Data Pollution

ISP Data Pollution to Protect Private Browsing History with Obfuscation

✭ 425

python web privacy data obfuscation crawling data-analytics

Webster

a reliable high-level web crawling & scraping framework for Node.js.

✭ 364

javascript nodejs crawler spider puppeteer chromium headless-chrome javascript-framework crawling nodejs-framework

Spidermon

Scrapy Extension for monitoring spiders execution.

✭ 309

python hacktoberfest testing monitoring scraping crawling monitoring-tool

Sasila

一个灵活、友好的爬虫框架

✭ 286

python framework http crawler scraping requests crawling

Stopstalk Deployment

Stop stalking and start StopStalking 😉

✭ 276

python hacktoberfest aws competitive-programming crawling hackerrank codeforces programming-contests materializecss

Gopa

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

✭ 277

go elasticsearch crawler spider lightweight scraping web-scraping crawling web-crawler

Apify Js

Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

✭ 3,154

javascript automation npm javascript-library puppeteer scraping headless-chrome web-scraping crawling web-crawling rpa apify

Spidy

The simple, easy to use command line web crawler.

✭ 257

python python3 crawler crawling web-crawler

Skycaiji

蓝天采集器是一款免费的数据采集发布爬虫软件，采用php+mysql开发，可部署在云服务器，几乎能采集所有类型的网页，无缝对接各类CMS建站程序，免登录实时发布数据，全自动无需人工干预！是网页大数据采集软件中完全跨平台的云端爬虫系统

✭ 1,514

PHP SCSS Less Smarty PLpgSQL crawler spider crawling webcrawler

ARGUS

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9

✭ 68

python Jupyter Notebook Batchfile scraping crawling scrapy webscraping scrapyd webcrawling

bots-zoo

No description or website provided.

✭ 59

javascript python ruby go bot crawler scraper user-agent scraping crawling selenium useragent puppeteer playwright

flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

✭ 48

java crawler spider web-crawler crawling flink web-crawling

img-cli

An interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL

✭ 15

javascript nodejs crawler downloader webpage phantomjs buffer crawling image-downloader image-downloading

popular restaurants from officials

서울시 공무원의 업무추진비를 분석하여 진짜 맛집 찾기 프로젝트

✭ 22

Jupyter Notebook HTML javascript CSS python data-mining crawling data-analysis

SlackWebhooksGithubCrawler

Search for Slack Webhooks token publicly exposed on Github

✭ 21

javascript nodejs slack crawler slackbot webhook crawling slack-bot slack-webhook messages puppeteer

serverless-instagram-crawler

serverless, instagram hashtag crawler with lambda, dynamoDB

✭ 33

typescript javascript aws instagram crawler serverless crawling

talospider

talospider - A simple,lightweight scraping micro-framework

✭ 57

python crawler spider crawling web-spider

pomp

Screen scraping and web crawling framework

✭ 61

python crawler framework scraping crawling asyncio

kasthack.osp

Генератор сырых дампов пользователей VK.

✭ 15

C#crawler data-mining crawling vk vkontakte vk-api vkapi kasthack programmable-web

EngineeringTeam

와이빅타 엔지니어링팀의 자료를 정리해두는 곳입니다.

✭ 41

engineering sql kafka spark hive hadoop nosql crawling ybigta

Infect

Create you virus in termux!

✭ 33

shell python linux virus crawling viruses termux hacking-tool infection termux-tool termux-hacking infect virus-termux allhackingtools

feedsearch-crawler

Crawl sites for RSS, Atom, and JSON feeds.

✭ 23

python rss crawler pypi scraping crawling aiohttp feed feeds asyncio feedsearch-crawler

Mimo-Crawler

A web crawler that uses Firefox and js injection to interact with webpages and crawl their content, written in nodejs.

✭ 22

javascript nodejs firefox crawler scraper framework browser webpage web-crawler crawling webcrawler webscraping xvfb mimo js-injection mimo-crawler web-spidering mimo-api crawl-webpages

crawlkit

A crawler based on Phantom. Allows discovery of dynamic content and supports custom scrapers.

✭ 23

javascript scraper phantomjs crawling axe

scrapy-distributed

A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.

✭ 38

python redis crawler kafka spider rabbitmq scraping crawling scrapy distributed-spider redisbloom rabbitmq-pipeline

custom-crawler

🌌 High productivity semi-automatic crawler generator 🛠️🧰

✭ 33

C#crawler crawling crawling-framework wpf-application custom-crawler

go-scrapy

Web crawling and scraping framework for Golang

✭ 17

go Makefile crawler framework scraping crawling

wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

✭ 52

c python perl shell Module Management System M4 crawler scraper downloader spider ftp scraping crawling archiving wget crawl zstd crawlers warc webarchiving archiveteam wget-lua

1-60 of 80 crawling projects

›