Norconex HTTP Collector is a flexible web crawler for collecting, parsing, and manipulating data from the Internet (or Intranet) to various data repositories such as search engines.

✭ 130

java search-engine flexible web-crawler

Proxy

A simple tool for fetching usable proxies from several websites.

✭ 124

python web-crawler proxies proxy-list proxypool

Crawlab Lite

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

✭ 122

vue crawler spider scrapy platform web-crawler

Pspider

简单易用的Python爬虫框架，QQ交流群：597510560

✭ 1,611

python crawler spider multiprocessing multi-threading web-crawler proxies python-spider web-spider

Pulsar

Turn large Web sites into tables and charts using simple SQLs.

✭ 100

html data-science selenium web-scraping web-crawler

Infinitycrawler

A simple but powerful web crawler library for .NET

✭ 97

hacktoberfest crawler web-crawler

Ultimate Dork

Web Crawler

✭ 79

python web-crawler

Ospider

开源矢量地理数据获取与预处理工具(POI/AOI/行政区/路网/土地利用)

✭ 74

python free download web-crawler poi

Cvpr2019

Displays all the 2019 CVPR Accepted Papers in a way that they are easy to parse.

✭ 65

python html computer-vision imagemagick web-crawler lda

Abotx

Cross Platform C# Web crawler framework, headless browser, parallel crawler. Please star this project! +1.

✭ 63

csharp framework cross-platform spider netcore netstandard headless web-crawler

Terpene Profile Parser For Cannabis Strains

Parser and database to index the terpene profile of different strains of Cannabis from online databases

✭ 63

python database data-science crawler bioinformatics analysis scrapy health web-crawler

Crawlab

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

✭ 8,392

go Dockerfile Makefile shell docker crawler spider scrapy platform web-crawler webcrawler scrapyd-ui webspider crawling-tasks crawlab spiders-management

Maman

Rust Web Crawler saving pages on Redis

✭ 39

rust web http crawler spider web-crawler

Dutsso

快速登录大连理工大学统一身份认证系统（SSO）的Python模块，可轻松实现成绩提醒、抢课、玉兰卡信息、个人信息查询等功能。

✭ 32

python sso web-crawler

Storm Crawler

A scalable, mature and versatile web crawler based on Apache Storm

✭ 703

java distributed web-crawler

Spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

✭ 656

ruby web crawler spider scraper web-scraping web-crawler web-scraper

Awesome Crawler

A collection of awesome web crawler,spider in different languages

✭ 4,793

awesome crawler spider scraper web-crawler web-scraper node-crawler

Spider Flow

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

✭ 365

java crawler spider xpath web-crawler jsoup

Sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

✭ 362

java search spark distributed-systems big-data search-engine information-retrieval solr web-crawler

Ache

ACHE is a web crawler for domain-specific search.

✭ 320

java web-scraping web-crawler

Supercrawler

A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.

✭ 306

javascript crawler robot sitemap web-crawler

Gopa

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

✭ 277

go elasticsearch crawler spider lightweight scraping web-scraping crawling web-crawler

Spidy

The simple, easy to use command line web crawler.

✭ 257

python python3 crawler crawling web-crawler

Lagoujob

Job data mining repo for lagou.com

✭ 256

python python3 machine-learning nlp data-analysis data-mining web-crawler

UnChain

A tool to find redirection chains in multiple URLs

✭ 77

go Nix url redirection web-crawler url-redirection reconnaissance

ComicBookMaker

Script to fetch webcomics and use them to create ebooks.

✭ 27

python web-crawler comic webcomic ebook beautiful-soup comics mobi comic-downloader download-comic kindle beautifulsoup

CrawlBox

Easy way to brute-force web directory.

✭ 118

python crawler web-crawler wordlist admin-finder

flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

✭ 48

java crawler spider web-crawler crawling flink web-crawling

SchweizerMesser

🎯Python 3 网络爬虫实战、数据分析合集 | 当当 | 网易云音乐 | unsplash | 必胜客 | 猫眼 |

✭ 89

HTML python spider web-crawler selenium

evine

Interactive CLI Web Crawler

✭ 140

go cli crawler data-mining scraper osint web-crawler fuzzing

pyCreeper

一个用来快速提取网页内容的信息采集（爬虫）框架，实现了对网页的动态加载与控制。

✭ 25

python phantomjs web-crawler gevent

proxi

Proxy pool. Finds and checks proxies with rest api for querying results. Can find over 25k proxies in under 5 minutes.

✭ 32

go shell Makefile Dockerfile crawler proxy web-crawler scraping http-proxy scrapy proxypool proxy-list

Mimo-Crawler

A web crawler that uses Firefox and js injection to interact with webpages and crawl their content, written in nodejs.

✭ 22

javascript nodejs firefox crawler scraper framework browser webpage web-crawler crawling webcrawler webscraping xvfb mimo js-injection mimo-crawler web-spidering mimo-api crawl-webpages

siteshooter

📷 Automate full website screenshots and PDF generation with multiple viewport support.

✭ 63

javascript sitemap screenshot phantomjs seo web-crawler salesforce pdf-generation

OLX Scraper

📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.

✭ 15

python data-science data machine-learning scraper mongodb nosql web-crawler pymongo web-scraper artificial-intelligence web-scraping scrapping scrapy scraping-websites web-crawling olx web-crawler-python nosql-mongodb

WebCrawler

Just a simple web crawler which return crawled links as IObservable using reactive extension and async await.

✭ 55

C#crawler dotnet web-crawler reactive-extension

learncpp-download

Scrape bot, to get you an offline copy of tutorials

✭ 23

python Dockerfile pdf crawler offline web-crawler pdf-format learncpp learncpp-offline wwwlearncppcom learncpp-download learncpp-tutorials learncpp-content learncpp-python learncpp-crawler

bolsa

Biblioteca feita em Python com o objetivo de facilitar o acesso a dados de seus investimentos na bolsa de valores(B3/CEI) através do Portal CEI.

✭ 46

python Makefile crawler web-crawler portal asyncio b3 bolsa cei bolsa-de-valores

leek

Distributed task redisqueue(最简单python分布式函数调度框架)

✭ 60

python redis kafka web-crawler thread-pool sqlite3 producer-consumer redisqueue distribute-crawler queue-tasks leek

json-web-crawler

Use JSON to list all elements (with css 3 and jquery selector) that you want to crawl.

✭ 17

javascript jquery json crawler web-crawler

StackOverflow-Crawler

It is a web crawler which crawls the stackoverfolw website (http://stackoverflow.com/) and finds the most popular technologies at current point of time by getting the tags info of the newest questions asked on the website.

✭ 25

python web-crawler stackoverflow-crawler stackoverfolw-website

WeReadScan

扫描“微信读书”已购图书并下载本地PDF的爬虫

✭ 273

python web-crawler selenium pdf-converter weread

Raspagem-de-dados-para-iniciantes

Raspagem de dados para iniciante usando Scrapy e outras libs básicas

✭ 113

python Jupyter Notebook opensource web-crawler jupyter-notebook scrapy hacktoberfest spyder estudo datascraping webcrawling raspagem-de-dados

ant

A web crawler for Go

✭ 264

go Makefile scraper spider web-crawler

doc crawler.py

Explore a website recursively and download all the wanted documents (PDF, ODT…)

✭ 22

crawler downloader web-crawler recursive file-download pdf-extractor web-crawler-python

Market-Trend-Prediction

This is a project of build knowledge graph course. The project leverages historical stock price, and integrates social media listening from customers to predict market Trend On Dow Jones Industrial Average (DJIA).

✭ 57

julia facebook twitter jupyter web-crawler prediction semantic-web knowledge-graph lstm yahoo-finance-api rnn twitter-crawler social-media-mining facebook-crawler djia dow-jones-industrial-average market-trend-prediction knowledge-graph-course

1-54 of 54 web-crawler projects