Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → liuslnlp → Crawler_examples

liuslnlp / Crawler_examples

Licence: apache-2.0

Some classic web crawler projects.一些经典的爬虫

Programming Languages

139335 projects - #7 most used programming language

Labels

web crawler spider

Projects that are alternatives of or similar to Crawler examples

Powerful web scraping framework for Crystal

Stars: ✭ 68 (-8.11%)

Mutual labels: crawler, spider

💐 Full Amazon Automatic Download

Stars: ✭ 41 (-44.59%)

Mutual labels: crawler, spider

Scraping scripts for various websites.

Stars: ✭ 25 (-66.22%)

Mutual labels: crawler, spider

Gospider - Fast web spider written in Go

Stars: ✭ 785 (+960.81%)

Mutual labels: crawler, spider

python crawler spider

Stars: ✭ 70 (-5.41%)

Mutual labels: crawler, spider

Dark Web OSINT Tool

Stars: ✭ 821 (+1009.46%)

Mutual labels: crawler, spider

Rust Web Crawler saving pages on Redis

Stars: ✭ 39 (-47.3%)

Mutual labels: crawler, spider

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (+786.49%)

Mutual labels: crawler, spider

Incredibly fast crawler designed for OSINT.

Stars: ✭ 8,332 (+11159.46%)

Mutual labels: crawler, spider

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

Stars: ✭ 8,133 (+10890.54%)

Mutual labels: crawler, spider

A high performance web crawler in Elixir.

Stars: ✭ 781 (+955.41%)

Mutual labels: crawler, spider

Golang爬虫爬取汽车之家二手车产品库

Stars: ✭ 57 (-22.97%)

Mutual labels: crawler, spider

🐾 Creeper - The Next Generation Crawler Framework (Go)

Stars: ✭ 762 (+929.73%)

Mutual labels: crawler, spider

zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目

Stars: ✭ 890 (+1102.7%)

Mutual labels: crawler, spider

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

Stars: ✭ 680 (+818.92%)

Mutual labels: crawler, spider

[DEPRECATED] Simple, flexible, delightful web crawler/spider package

Stars: ✭ 33 (-55.41%)

Mutual labels: crawler, spider

Baiduimagespider

一个超级轻量的百度图片爬虫

Stars: ✭ 591 (+698.65%)

Mutual labels: crawler, spider

A multi-thread crawler framework with many builtin image crawlers provided.

Stars: ✭ 629 (+750%)

Mutual labels: crawler, spider

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

Stars: ✭ 8,392 (+11240.54%)

Mutual labels: crawler, spider

Awesome Python Primer

自学入门 Python 优质中文资源索引，包含书籍 / 文档 / 视频，适用于爬虫 / Web / 数据分析 / 机器学习方向

Stars: ✭ 57 (-22.97%)

Mutual labels: crawler, spider

View All Similar Projects ➔

crawler examples

总结了一下自己学习爬虫过程中做过的小项目。

运行环境

Windows/Linux/Mac OS

Python 3.5.2

可能会用到的第三方库

requests
bs4
pillow
lxml
pymongo
scrapy
Numpy
redis
pillow
matplotlib

可能要用到的其他程序

MongoDB
Redis

内容

Baidu_Picture : 百度贴吧图片爬虫，可以爬取给定帖子内的全部图片。
CSDN_Blog : CSDN博客爬虫，基于scrapy，可以从起始URL开始，抓取博客的标题、内容、作者、修改日期、标签等信息保存到数据库中，并进入下一页继续抓取。
DouBan_Movie_Top250 : 豆瓣电影爬虫，基于scrapy，可以抓取豆瓣电影TOP 250并保存在MongoDB中。
IT_Juzi : IT橘子爬虫，可以从IT桔子网站上抓取近期融资的公司和最近的融资大事件)。
QiuShi : 糗事百科爬虫，可以定时抓取糗事百科上的笑话，并将其推送到手机。
TaoBao_Lady : 淘女郎爬虫，可抓取淘女郎的信息和照片。
Lagou : 拉勾网爬虫，可爬取拉勾网的招聘信息并生成报告。
ZhiHu : 知乎网爬虫，可以抓取知乎用户的个人信息，提供了分布式版本。
Wikipedia : 维基百科爬虫，可以抓取维基百科的词条贡献者的IP，并统计出他们的国家分布，提供了分布式版本。
DouBan_Movie: 豆瓣电影爬虫，可以从豆瓣电影的TOP页开始，抓取所有的电影信息，并保存在MongoDB中。

详细的说明请点开具体的文件夹查看

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 74

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗