All Projects → ld000 → spider

ld000 / spider

Licence: other
python 爬虫(amazon, confluence ...)

Programming Languages

python
139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to spider

weixin article spiders
A spiders' program for weixin which made by Express & cheerio
Stars: ✭ 33 (+57.14%)
Mutual labels:  spider
dcard-spider
A spider on Dcard. Strong and speedy.
Stars: ✭ 91 (+333.33%)
Mutual labels:  spider
ant
A web crawler for Go
Stars: ✭ 264 (+1157.14%)
Mutual labels:  spider
TaobaoSpider
This taobao spider has been archived
Stars: ✭ 28 (+33.33%)
Mutual labels:  spider
php-crawler
🕷️ A simple crawler (spider) writen in php just for fun, with zero dependencies
Stars: ✭ 39 (+85.71%)
Mutual labels:  spider
sede
Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data
Stars: ✭ 83 (+295.24%)
Mutual labels:  spider
gospider
⚡ Light weight Golang spider framework | 轻量的 Golang 爬虫框架
Stars: ✭ 183 (+771.43%)
Mutual labels:  spider
crawlBaiduWenku
这可能是爬百度文库最全的项目了
Stars: ✭ 63 (+200%)
Mutual labels:  spider
tuchong Spider
⭐ 图虫网爬虫
Stars: ✭ 16 (-23.81%)
Mutual labels:  spider
SpiderCard
蜘蛛纸牌 for mac
Stars: ✭ 29 (+38.1%)
Mutual labels:  spider
ZSpider
基于Electron爬虫程序
Stars: ✭ 37 (+76.19%)
Mutual labels:  spider
gathertool
gathertool是golang脚本化开发库,目的是提高对应场景程序开发的效率;轻量级爬虫库,接口测试&压力测试库,DB操作库等。
Stars: ✭ 36 (+71.43%)
Mutual labels:  spider
glyphhanger
Your web font utility belt. It can subset web fonts. It can find unicode-ranges for you automatically. It makes julienne fries.
Stars: ✭ 422 (+1909.52%)
Mutual labels:  spider
grapy
Grapy, a fast high-level web crawling framework for Python 3.3 or later base on asyncio.
Stars: ✭ 18 (-14.29%)
Mutual labels:  spider
Novel-crawler
这是一个用Python写的小说爬虫软件
Stars: ✭ 75 (+257.14%)
Mutual labels:  spider
crawler-chrome-extensions
爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer
Stars: ✭ 53 (+152.38%)
Mutual labels:  spider
spider-mzitu
妹子图
Stars: ✭ 13 (-38.1%)
Mutual labels:  spider
blinkist-m4a-downloader
Grabs all of the audio files from all of the Blinkist books
Stars: ✭ 100 (+376.19%)
Mutual labels:  spider
bet365-websocket-crawler
bet365 bot: bet365的比赛实时比分数据、实时赔率
Stars: ✭ 67 (+219.05%)
Mutual labels:  spider
DeadPool
该项目是一个使用celery作为主体框架的爬虫应用,能够灵活的添加爬虫任务,并且同时运行多站点的爬虫工作,所有组件都能够原生支持规模并发和分布式,加上celery原生的分布式调用,实现大规模并发。
Stars: ✭ 38 (+80.95%)
Mutual labels:  spider

spider

normal spider

iushibaike_spider.py,是爬取糗事百科首页内容的

tieba_spider.py,是按楼层爬取百度贴吧帖子的

location_code_spider.py, 爬取统计局行政区划代码, 输出 insert sql

scrapy spider

require python2.7 scrapy1.0+

how to use

cd confluence
scrapy crawl confluence

amazonsims

亚马逊 还买了什么 列表

confluence

修改 spider.py 里的 allowed_domains, start_urls, base_url, cookies 参数

e.g

allowed_domains = ["www.confluence.com"]
start_urls = [
      'http://www.confluence.com/dashboard.action',
]
base_url = 'http://www.confluence.com'
cookies = {
  'JSESSIONID': '338CACC64F0C6C9CA88550EAB7978674',
  'doc-sidebar': '300px'
}

JSESSIONID 为登录后 cookies 里的 sessionId,这里简单处理了,没有实现页面登录,有需要的自己实现下

babynames

https://www.familyeducation.com/baby-names/browse-origin/surname

爬取各国家人名

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].