All Projects → liuslnlp → Crawler_examples

liuslnlp / Crawler_examples

Licence: apache-2.0
Some classic web crawler projects.一些经典的爬虫

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Crawler examples

Arachnid
Powerful web scraping framework for Crystal
Stars: ✭ 68 (-8.11%)
Mutual labels:  crawler, spider
Lizard
💐 Full Amazon Automatic Download
Stars: ✭ 41 (-44.59%)
Mutual labels:  crawler, spider
Scrapit
Scraping scripts for various websites.
Stars: ✭ 25 (-66.22%)
Mutual labels:  crawler, spider
Gospider
Gospider - Fast web spider written in Go
Stars: ✭ 785 (+960.81%)
Mutual labels:  crawler, spider
Spider
python crawler spider
Stars: ✭ 70 (-5.41%)
Mutual labels:  crawler, spider
Torbot
Dark Web OSINT Tool
Stars: ✭ 821 (+1009.46%)
Mutual labels:  crawler, spider
Maman
Rust Web Crawler saving pages on Redis
Stars: ✭ 39 (-47.3%)
Mutual labels:  crawler, spider
Spidr
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Stars: ✭ 656 (+786.49%)
Mutual labels:  crawler, spider
Photon
Incredibly fast crawler designed for OSINT.
Stars: ✭ 8,332 (+11159.46%)
Mutual labels:  crawler, spider
Avbook
AV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
Stars: ✭ 8,133 (+10890.54%)
Mutual labels:  crawler, spider
Crawler
A high performance web crawler in Elixir.
Stars: ✭ 781 (+955.41%)
Mutual labels:  crawler, spider
Car Prices
Golang爬虫 爬取汽车之家 二手车产品库
Stars: ✭ 57 (-22.97%)
Mutual labels:  crawler, spider
Creeper
🐾 Creeper - The Next Generation Crawler Framework (Go)
Stars: ✭ 762 (+929.73%)
Mutual labels:  crawler, spider
Zhihu Crawler
zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目
Stars: ✭ 890 (+1102.7%)
Mutual labels:  crawler, spider
Grab Site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Stars: ✭ 680 (+818.92%)
Mutual labels:  crawler, spider
Nodespider
[DEPRECATED] Simple, flexible, delightful web crawler/spider package
Stars: ✭ 33 (-55.41%)
Mutual labels:  crawler, spider
Baiduimagespider
一个超级轻量的百度图片爬虫
Stars: ✭ 591 (+698.65%)
Mutual labels:  crawler, spider
Icrawler
A multi-thread crawler framework with many builtin image crawlers provided.
Stars: ✭ 629 (+750%)
Mutual labels:  crawler, spider
Crawlab
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Stars: ✭ 8,392 (+11240.54%)
Mutual labels:  crawler, spider
Awesome Python Primer
自学入门 Python 优质中文资源索引,包含 书籍 / 文档 / 视频,适用于 爬虫 / Web / 数据分析 / 机器学习 方向
Stars: ✭ 57 (-22.97%)
Mutual labels:  crawler, spider

crawler examples

总结了一下自己学习爬虫过程中做过的小项目。

运行环境

Windows/Linux/Mac OS

Python 3.5.2

可能会用到的第三方库

  • requests
  • bs4
  • pillow
  • lxml
  • pymongo
  • scrapy
  • Numpy
  • redis
  • pillow
  • matplotlib

可能要用到的其他程序

  • MongoDB
  • Redis

内容

  • Baidu_Picture : 百度贴吧图片爬虫,可以爬取给定帖子内的全部图片。
  • CSDN_Blog : CSDN博客爬虫, 基于scrapy,可以从起始URL开始,抓取博客的标题、内容、作者、修改日期、标签等信息保存到数据库中,并进入下一页继续抓取。
  • DouBan_Movie_Top250 : 豆瓣电影爬虫,基于scrapy,可以抓取豆瓣电影TOP 250并保存在MongoDB中。
  • IT_Juzi : IT橘子爬虫,可以从IT桔子网站上抓取近期融资的公司和最近的融资大事件)。
  • QiuShi : 糗事百科爬虫,可以定时抓取糗事百科上的笑话,并将其推送到手机。
  • TaoBao_Lady : 淘女郎爬虫,可抓取淘女郎的信息和照片。
  • Lagou : 拉勾网爬虫,可爬取拉勾网的招聘信息并生成报告。
  • ZhiHu : 知乎网爬虫,可以抓取知乎用户的个人信息,提供了分布式版本。
  • Wikipedia : 维基百科爬虫,可以抓取维基百科的词条贡献者的IP,并统计出他们的国家分布,提供了分布式版本。
  • DouBan_Movie: 豆瓣电影爬虫,可以从豆瓣电影的TOP页开始,抓取所有的电影信息,并保存在MongoDB中。

详细的说明请点开具体的文件夹查看

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].