A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (+730.38%)

Mutual labels: web-crawler

Awesome Crawler

A collection of awesome web crawler,spider in different languages

Stars: ✭ 4,793 (+5967.09%)

Mutual labels: web-crawler

Spider Flow

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

Stars: ✭ 365 (+362.03%)

Mutual labels: web-crawler

Sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stars: ✭ 362 (+358.23%)

Mutual labels: web-crawler

Ache

ACHE is a web crawler for domain-specific search.

Stars: ✭ 320 (+305.06%)

Mutual labels: web-crawler

Supercrawler

A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.

Stars: ✭ 306 (+287.34%)

Mutual labels: web-crawler

Gopa

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

Stars: ✭ 277 (+250.63%)

Mutual labels: web-crawler

Spidy

The simple, easy to use command line web crawler.

Stars: ✭ 257 (+225.32%)

Mutual labels: web-crawler

Lagoujob

Job data mining repo for lagou.com

Stars: ✭ 256 (+224.05%)

Mutual labels: web-crawler

UnChain

A tool to find redirection chains in multiple URLs

Stars: ✭ 77 (-2.53%)

Mutual labels: web-crawler

ComicBookMaker

Script to fetch webcomics and use them to create ebooks.

Stars: ✭ 27 (-65.82%)

Mutual labels: web-crawler

CrawlBox

Easy way to brute-force web directory.

Stars: ✭ 118 (+49.37%)

Mutual labels: web-crawler

flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

Stars: ✭ 48 (-39.24%)

Mutual labels: web-crawler

SchweizerMesser

🎯Python 3 网络爬虫实战、数据分析合集 | 当当 | 网易云音乐 | unsplash | 必胜客 | 猫眼 |

Stars: ✭ 89 (+12.66%)

Mutual labels: web-crawler

evine

Interactive CLI Web Crawler

Stars: ✭ 140 (+77.22%)

Mutual labels: web-crawler

pyCreeper

一个用来快速提取网页内容的信息采集（爬虫）框架，实现了对网页的动态加载与控制。

Stars: ✭ 25 (-68.35%)

Mutual labels: web-crawler

proxi

Proxy pool. Finds and checks proxies with rest api for querying results. Can find over 25k proxies in under 5 minutes.

Stars: ✭ 32 (-59.49%)

Mutual labels: web-crawler

Mimo-Crawler

A web crawler that uses Firefox and js injection to interact with webpages and crawl their content, written in nodejs.

Stars: ✭ 22 (-72.15%)

Mutual labels: web-crawler

siteshooter

📷 Automate full website screenshots and PDF generation with multiple viewport support.

Stars: ✭ 63 (-20.25%)

Mutual labels: web-crawler

OLX Scraper

📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.

Stars: ✭ 15 (-81.01%)

Mutual labels: web-crawler

WebCrawler

Just a simple web crawler which return crawled links as IObservable using reactive extension and async await.

Stars: ✭ 55 (-30.38%)

Mutual labels: web-crawler

learncpp-download

Scrape bot, to get you an offline copy of tutorials

Stars: ✭ 23 (-70.89%)

Mutual labels: web-crawler

bolsa

Biblioteca feita em Python com o objetivo de facilitar o acesso a dados de seus investimentos na bolsa de valores(B3/CEI) através do Portal CEI.

Stars: ✭ 46 (-41.77%)

Mutual labels: web-crawler

leek

Distributed task redisqueue(最简单python分布式函数调度框架)

Stars: ✭ 60 (-24.05%)

Mutual labels: web-crawler

json-web-crawler

Use JSON to list all elements (with css 3 and jquery selector) that you want to crawl.

Stars: ✭ 17 (-78.48%)

Mutual labels: web-crawler

StackOverflow-Crawler

It is a web crawler which crawls the stackoverfolw website (http://stackoverflow.com/) and finds the most popular technologies at current point of time by getting the tags info of the newest questions asked on the website.

Stars: ✭ 25 (-68.35%)

Mutual labels: web-crawler

WeReadScan

扫描“微信读书”已购图书并下载本地PDF的爬虫

Stars: ✭ 273 (+245.57%)

Mutual labels: web-crawler

Raspagem-de-dados-para-iniciantes

Raspagem de dados para iniciante usando Scrapy e outras libs básicas

Stars: ✭ 113 (+43.04%)

Mutual labels: web-crawler

ant

A web crawler for Go

Stars: ✭ 264 (+234.18%)

Mutual labels: web-crawler

doc crawler.py

Explore a website recursively and download all the wanted documents (PDF, ODT…)

Stars: ✭ 22 (-72.15%)

Mutual labels: web-crawler

Market-Trend-Prediction

This is a project of build knowledge graph course. The project leverages historical stock price, and integrates social media listening from customers to predict market Trend On Dow Jones Industrial Average (DJIA).

Stars: ✭ 57 (-27.85%)

Mutual labels: web-crawler

Strong Web Crawler

基于C#.NET+PhantomJS+Sellenium的高级网络爬虫程序。可执行Javascript代码、触发各类事件、操纵页面Dom结构。

Stars: ✭ 238 (+201.27%)

Mutual labels: web-crawler

Kochat

Opensource Korean chatbot framework

Stars: ✭ 204 (+158.23%)

Mutual labels: web-crawler

Antch

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

Stars: ✭ 198 (+150.63%)

Mutual labels: web-crawler

Nutch

Apache Nutch is an extensible and scalable web crawler

Stars: ✭ 2,277 (+2782.28%)

Mutual labels: web-crawler

Zhihu Crawler People

A simple distributed crawler for zhihu && data analysis

Stars: ✭ 182 (+130.38%)

Mutual labels: web-crawler

Crawler Commons

A set of reusable Java components that implement functionality common to any web crawler

Stars: ✭ 173 (+118.99%)

Mutual labels: web-crawler

Abot

Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.

Stars: ✭ 1,961 (+2382.28%)

Mutual labels: web-crawler

Awesome Web Scraper

A collection of awesome web scaper, crawler.

Stars: ✭ 147 (+86.08%)

Mutual labels: web-crawler

Collector Http

Norconex HTTP Collector is a flexible web crawler for collecting, parsing, and manipulating data from the Internet (or Intranet) to various data repositories such as search engines.

Stars: ✭ 130 (+64.56%)

Mutual labels: web-crawler

Proxy

A simple tool for fetching usable proxies from several websites.

Stars: ✭ 124 (+56.96%)

Mutual labels: web-crawler

Crawlab Lite

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

Stars: ✭ 122 (+54.43%)

Mutual labels: web-crawler

Pspider

简单易用的Python爬虫框架，QQ交流群：597510560