ld000 / spider

Licence: other

python 爬虫(amazon, confluence ...)

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to spider

weixin article spiders

A spiders' program for weixin which made by Express & cheerio

Stars: ✭ 33 (+57.14%)

Mutual labels: spider

dcard-spider

A spider on Dcard. Strong and speedy.

Stars: ✭ 91 (+333.33%)

Mutual labels: spider

ant

A web crawler for Go

Stars: ✭ 264 (+1157.14%)

Mutual labels: spider

TaobaoSpider

This taobao spider has been archived

Stars: ✭ 28 (+33.33%)

Mutual labels: spider

php-crawler

🕷️ A simple crawler (spider) writen in php just for fun, with zero dependencies

Stars: ✭ 39 (+85.71%)

Mutual labels: spider

sede

Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data

Stars: ✭ 83 (+295.24%)

Mutual labels: spider

gospider

⚡ Light weight Golang spider framework | 轻量的 Golang 爬虫框架

Stars: ✭ 183 (+771.43%)

Mutual labels: spider

crawlBaiduWenku

这可能是爬百度文库最全的项目了

Stars: ✭ 63 (+200%)

Mutual labels: spider

tuchong Spider

⭐ 图虫网爬虫

Stars: ✭ 16 (-23.81%)

Mutual labels: spider

SpiderCard

蜘蛛纸牌 for mac

Stars: ✭ 29 (+38.1%)

Mutual labels: spider

ZSpider

基于Electron爬虫程序

Stars: ✭ 37 (+76.19%)

Mutual labels: spider

gathertool

gathertool是golang脚本化开发库，目的是提高对应场景程序开发的效率；轻量级爬虫库，接口测试&压力测试库，DB操作库等。

Stars: ✭ 36 (+71.43%)

Mutual labels: spider

glyphhanger

Your web font utility belt. It can subset web fonts. It can find unicode-ranges for you automatically. It makes julienne fries.

Stars: ✭ 422 (+1909.52%)

Mutual labels: spider

grapy

Grapy, a fast high-level web crawling framework for Python 3.3 or later base on asyncio.

Stars: ✭ 18 (-14.29%)

Mutual labels: spider

Novel-crawler

这是一个用Python写的小说爬虫软件

Stars: ✭ 75 (+257.14%)

Mutual labels: spider

crawler-chrome-extensions

爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer

Stars: ✭ 53 (+152.38%)

Mutual labels: spider

spider-mzitu

妹子图

Stars: ✭ 13 (-38.1%)

Mutual labels: spider

blinkist-m4a-downloader

Grabs all of the audio files from all of the Blinkist books

Stars: ✭ 100 (+376.19%)

Mutual labels: spider

bet365-websocket-crawler

bet365 bot: bet365的比赛实时比分数据、实时赔率

Stars: ✭ 67 (+219.05%)

Mutual labels: spider

DeadPool

该项目是一个使用celery作为主体框架的爬虫应用，能够灵活的添加爬虫任务，并且同时运行多站点的爬虫工作，所有组件都能够原生支持规模并发和分布式，加上celery原生的分布式调用，实现大规模并发。

Stars: ✭ 38 (+80.95%)

Mutual labels: spider

View All Similar Projects ➔

spider

normal spider

iushibaike_spider.py，是爬取糗事百科首页内容的

tieba_spider.py，是按楼层爬取百度贴吧帖子的

location_code_spider.py, 爬取统计局行政区划代码, 输出 insert sql

scrapy spider

require python2.7 scrapy1.0+

how to use

cd confluence
scrapy crawl confluence

amazonsims

亚马逊还买了什么列表

confluence

修改 spider.py 里的 allowed_domains, start_urls, base_url, cookies 参数

e.g

allowed_domains = ["www.confluence.com"]
start_urls = [
      'http://www.confluence.com/dashboard.action',
]
base_url = 'http://www.confluence.com'
cookies = {
  'JSESSIONID': '338CACC64F0C6C9CA88550EAB7978674',
  'doc-sidebar': '300px'
}

JSESSIONID 为登录后 cookies 里的 sessionId，这里简单处理了，没有实现页面登录，有需要的自己实现下

babynames

https://www.familyeducation.com/baby-names/browse-origin/surname

爬取各国家人名

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

ld000 / spider

Programming Languages

Labels

Projects that are alternatives of or similar to spider

spider

normal spider

scrapy spider

amazonsims

confluence

babynames