KochatOpensource Korean chatbot framework
AntchAntch, a fast, powerful and extensible web crawling & scraping framework for Go
NutchApache Nutch is an extensible and scalable web crawler
Crawler CommonsA set of reusable Java components that implement functionality common to any web crawler
AbotCross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
Collector HttpNorconex HTTP Collector is a flexible web crawler for collecting, parsing, and manipulating data from the Internet (or Intranet) to various data repositories such as search engines.
ProxyA simple tool for fetching usable proxies from several websites.
Pspider简单易用的Python爬虫框架,QQ交流群:597510560
PulsarTurn large Web sites into tables and charts using simple SQLs.
Ospider开源矢量地理数据获取与预处理工具(POI/AOI/行政区/路网/土地利用)
Cvpr2019Displays all the 2019 CVPR Accepted Papers in a way that they are easy to parse.
AbotxCross Platform C# Web crawler framework, headless browser, parallel crawler. Please star this project! +1.
CrawlabDistributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
MamanRust Web Crawler saving pages on Redis
Dutsso快速登录大连理工大学统一身份认证系统(SSO)的Python模块,可轻松实现成绩提醒、抢课、玉兰卡信息、个人信息查询等功能。
Storm CrawlerA scalable, mature and versatile web crawler based on Apache Storm
SpidrA versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Awesome CrawlerA collection of awesome web crawler,spider in different languages
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
AcheACHE is a web crawler for domain-specific search.
SupercrawlerA web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
SpidyThe simple, easy to use command line web crawler.
LagoujobJob data mining repo for lagou.com
UnChainA tool to find redirection chains in multiple URLs
ComicBookMakerScript to fetch webcomics and use them to create ebooks.
CrawlBoxEasy way to brute-force web directory.
flink-crawlerContinuous scalable web crawler built on top of Flink and crawler-commons
SchweizerMesser🎯Python 3 网络爬虫实战、数据分析合集 | 当当 | 网易云音乐 | unsplash | 必胜客 | 猫眼 |
evineInteractive CLI Web Crawler
pyCreeper一个用来快速提取网页内容的信息采集(爬虫)框架, 实现了对网页的动态加载与控制。
proxiProxy pool. Finds and checks proxies with rest api for querying results. Can find over 25k proxies in under 5 minutes.
Mimo-CrawlerA web crawler that uses Firefox and js injection to interact with webpages and crawl their content, written in nodejs.
siteshooter📷 Automate full website screenshots and PDF generation with multiple viewport support.
OLX Scraper📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.
WebCrawlerJust a simple web crawler which return crawled links as IObservable using reactive extension and async await.
bolsaBiblioteca feita em Python com o objetivo de facilitar o acesso a dados de seus investimentos na bolsa de valores(B3/CEI) através do Portal CEI.
leekDistributed task redisqueue(最简单python分布式函数调度框架)
json-web-crawlerUse JSON to list all elements (with css 3 and jquery selector) that you want to crawl.
StackOverflow-CrawlerIt is a web crawler which crawls the stackoverfolw website (http://stackoverflow.com/) and finds the most popular technologies at current point of time by getting the tags info of the newest questions asked on the website.
doc crawler.pyExplore a website recursively and download all the wanted documents (PDF, ODT…)
Market-Trend-PredictionThis is a project of build knowledge graph course. The project leverages historical stock price, and integrates social media listening from customers to predict market Trend On Dow Jones Industrial Average (DJIA).