All Projects → Bin-Huang → Nodespider

Bin-Huang / Nodespider

Licence: apache-2.0
[DEPRECATED] Simple, flexible, delightful web crawler/spider package

Programming Languages

typescript
32286 projects

Projects that are alternatives of or similar to Nodespider

Python3 Spider
Python爬虫实战 - 模拟登陆各大网站 包含但不限于:滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝,如果喜欢请start ❤️
Stars: ✭ 2,129 (+6351.52%)
Mutual labels:  crawler, spider, crawl
Grab Site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Stars: ✭ 680 (+1960.61%)
Mutual labels:  crawler, spider, crawl
Go spider
[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.
Stars: ✭ 1,745 (+5187.88%)
Mutual labels:  crawler, spider, pipeline
Proxy pool
Python爬虫代理IP池(proxy pool)
Stars: ✭ 13,964 (+42215.15%)
Mutual labels:  crawler, spider, crawl
Fooproxy
稳健高效的评分制-针对性- IP代理池 + API服务,可以自己插入采集器进行代理IP的爬取,针对你的爬虫的一个或多个目标网站分别生成有效的IP代理数据库,支持MongoDB 4.0 使用 Python3.7(Scored IP proxy pool ,customise proxy data crawler can be added anytime)
Stars: ✭ 195 (+490.91%)
Mutual labels:  async, crawler, spider
Zhihu Login
知乎模拟登录,支持提取验证码和保存 Cookies
Stars: ✭ 340 (+930.3%)
Mutual labels:  crawler, spider, crawl
Ok ip proxy pool
🍿爬虫代理IP池(proxy pool) python🍟一个还ok的IP代理池
Stars: ✭ 196 (+493.94%)
Mutual labels:  async, crawler, spider
Fbcrawl
A Facebook crawler
Stars: ✭ 536 (+1524.24%)
Mutual labels:  crawler, spider, crawl
Icrawler
A multi-thread crawler framework with many builtin image crawlers provided.
Stars: ✭ 629 (+1806.06%)
Mutual labels:  crawler, spider
P Map
Map over promises concurrently
Stars: ✭ 639 (+1836.36%)
Mutual labels:  async, promise
Spidr
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Stars: ✭ 656 (+1887.88%)
Mutual labels:  crawler, spider
Infospider
INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源中国博客、简书。
Stars: ✭ 5,984 (+18033.33%)
Mutual labels:  spider, crawl
Baiduimagespider
一个超级轻量的百度图片爬虫
Stars: ✭ 591 (+1690.91%)
Mutual labels:  crawler, spider
Newcrawler
Free Web Scraping Tool with Java
Stars: ✭ 589 (+1684.85%)
Mutual labels:  crawler, spider
Creeper
🐾 Creeper - The Next Generation Crawler Framework (Go)
Stars: ✭ 762 (+2209.09%)
Mutual labels:  crawler, spider
Wx Promise Pro
✨强大、优雅的微信小程序异步库🚀
Stars: ✭ 762 (+2209.09%)
Mutual labels:  async, promise
Douyin
API of DouYin for Humans used to Crawl Popular Videos and Musics
Stars: ✭ 580 (+1657.58%)
Mutual labels:  crawler, spider
Awaitkit
The ES8 Async/Await control flow for Swift
Stars: ✭ 709 (+2048.48%)
Mutual labels:  async, promise
Crawler
A high performance web crawler in Elixir.
Stars: ✭ 781 (+2266.67%)
Mutual labels:  crawler, spider
Ws Promise Client
PROJECT MOVED: https://github.com/kdex/ws-promise
Stars: ✭ 6 (-81.82%)
Mutual labels:  async, promise

[DEPRECATED]

[不再维护]


(开发阶段,部分接口有小概率修改的可能)

NodeSpider 是基于 Nodejs 的新一代爬虫框架。

Feature

  • 开箱即用,用最少的代码开发五脏俱全的爬虫程序
  • 计划规则与任务相分离,再复杂的爬取需求也可以轻松实现
  • 简单好用的数据管道,保存抓取的数据是一件轻松的事情
  • 自动转码 utf8、jQ选择器……可爱的小功能该有的都有
  • 性能优异,你对异步并发有绝对的控制自由
  • 丰富、简约的拓展接口,玩在手里就像灵活的积木
  • 支持现代 promise 和 async function
  • 更多等待发现的特性……

Install

npm:

npm install nodespider --save

yarn:

yarn add nodespider

Example

const { Spider, jqPlan, csvPipe } = require("nodespider")
const s = new Spider()

// 声明一个数据管道
s.pipe(csvPipe({
  name: "data",
  path: "./data.csv",
  items: ["url", "count"],
}))

// 声明一个爬取计划
s.plan(jqPlan({
  name: "extract",
  toUtf8: true, // 自动转码为 utf8
  retries: 3, // 失败自动重试
  handle: ($, current) => {
    const title = $("title").text() // 你想要的 jq 选择器
    console.log(title)
    s.save("data", {
      count: $("body").text().length,
      url: current.url,
    }) // 使用管道保存数据
    s.addU("extract", $("a").urls())  // 添加新任务
  },
}))

s.add("extract", "https://github.com/Bin-Huang/NodeSpider") // 添加新任务

Document

设置 Options

爬取计划 Plan

数据管道 Pipe

方法 API

事件 Events

与 0.9.x 版本的不同

当前版本几乎是 0.9 的重构,不管是 api 还设计理念,大部分均已改变。如果你的项目依赖 0.9.x 版本,这里保留了 0.9.3 版本文档

Contribute

  • 任何疑问、建议、Bug,欢迎提交 Issuse
  • 分享这个年轻的项目给其他开发者、社区、邮件组
  • 欢迎 Pull Request,尤其是:
    • 翻译文档到其他语言
    • 文档的修改和补充
    • ……
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].