All Projects → jiangtao → puppeteer-usage

jiangtao / puppeteer-usage

Licence: MIT license
基于puppeteer的实践和应用

Programming Languages

javascript
184084 projects - #8 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to puppeteer-usage

Webporter
基于 webmagic 的 Java 爬虫应用
Stars: ✭ 2,598 (+14333.33%)
Mutual labels:  zhihu
web-scraping
Web Scraping using puppeteer
Stars: ✭ 21 (+16.67%)
Mutual labels:  puppeteer
pdf-crawler
SimFin's open source PDF crawler
Stars: ✭ 100 (+455.56%)
Mutual labels:  puppeteer
Alfred Web Search Suggest
Alfred search suggest workflow for various popular websites.
Stars: ✭ 249 (+1283.33%)
Mutual labels:  zhihu
mpvue-wechat-zhihu
一个入门级别的资讯类微信小程序
Stars: ✭ 25 (+38.89%)
Mutual labels:  zhihu
pupflare
A webpage proxy that request through Chromium (puppeteer) - can be used to bypass Cloudflare anti bot / anti ddos on any application (like curl)
Stars: ✭ 183 (+916.67%)
Mutual labels:  puppeteer
Vue Zhihudaily
🗞 知乎日报 Web 基于 Vue 2.3
Stars: ✭ 213 (+1083.33%)
Mutual labels:  zhihu
Dhalang
Generate PDFs and make screenshots of HTML using Puppeteer in Ruby
Stars: ✭ 41 (+127.78%)
Mutual labels:  puppeteer
softest
Recording Browser Interactions And Generating Test Scripts.
Stars: ✭ 225 (+1150%)
Mutual labels:  puppeteer
Whatsapp-Net
Generate a network graph of connections from your WhatsApp groups data
Stars: ✭ 75 (+316.67%)
Mutual labels:  puppeteer
macaca-puppeteer
Macaca puppeteer driver
Stars: ✭ 39 (+116.67%)
Mutual labels:  puppeteer
aws-pdf-textract-pipeline
🔍 Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript
Stars: ✭ 141 (+683.33%)
Mutual labels:  puppeteer
clusteer
Clusteer is a Puppeteer wrapper written for Laravel, with the super-power of parallelizing pages across multiple browser instances.
Stars: ✭ 81 (+350%)
Mutual labels:  puppeteer
Awesome crawl
腾讯新闻、知乎话题、微博粉丝,Tumblr爬虫、斗鱼弹幕、妹子图爬虫、分布式设计等
Stars: ✭ 246 (+1266.67%)
Mutual labels:  zhihu
CrawlerSamples
This is a Puppeteer+AngleSharp crawler console app samples, used C# 7.1 coding and dotnet core build.
Stars: ✭ 36 (+100%)
Mutual labels:  puppeteer
Zhihu Analysis Python
Social Network Analysis of Zhihu with Python
Stars: ✭ 215 (+1094.44%)
Mutual labels:  zhihu
Youtube-remove-copyright
A puppeteer script to trim copyrighted part from Youtube videos
Stars: ✭ 18 (+0%)
Mutual labels:  puppeteer
puppet-master
Puppeteer as a service hosted on Saasify.
Stars: ✭ 25 (+38.89%)
Mutual labels:  puppeteer
puppeteer-lambda
Module for using Headless-Chrome by Puppeteer on AWS Lambda.
Stars: ✭ 117 (+550%)
Mutual labels:  puppeteer
WebWhatsAppSendMessage
Automatizador de envio de mensagem via WebWhatsApp.
Stars: ✭ 39 (+116.67%)
Mutual labels:  puppeteer

puppeteer-usage

基于 puppeteer 的一些实例应用,本教程仅用来做技术交流使用,请勿用于商业行为。

如果您也用puppeteer做一些有趣的事儿,欢迎交流。

爬虫

针对百度图片搜索的爬虫,此类页面特点:

  • 滚动加载,加载图片
  • 请求接口,接口会变
  • 页面结构会变

为什么使用puppeteer?

因为 api 简洁,对于其他的,前端更易上手。 使用 puppeteer 的好处,可以无视 dom 变化,无视接口变化。因为 puppeteer,可以操作浏览器滚动加载等行为,同时可以监听请求,拿到 header 信息等。具体代码可以查看

可执行下列命令运行实例:

git clone https://github.com/ijs/puppeteer-usage.git
cd puppeteer-usage
yarn install
node src/samples/scrawler/pic.baidu.com.js 卡通

如果需要爬取其他类似页面(如 google, weibo, flickr, huaban)等,请自行修改代码中的 url 地址即可

自动化测试

以之前接手的公司的一个项目贷款超市为例,模拟用户行为。

测试思考

原则:首先保证页面正常显示 > 干掉所有的运行时的异常 > 干掉前端层面过慢的因素

  1. 测试所有的页面,是否能正常显示,以下指标
  • 界面是否出现
  • 白屏时间
  • ui 显示是否正常,评判标准是什么?
  • 兼容性测试如何测 ?
  1. 功能测试
  • 哪些功能是需要通过编写测试用例测试的
  • 接口

问题

  1. 如何防止测试数据污染业务统计数据
  • 可以通过注入参数的形式,在业务统计模块加一层判定即可解决
  • 也可以在页面中注入全局变量解决
  1. 如何和 ui 稿对比,确保一致性

目前想到的方法:通过截屏,人肉对比,标注; 更科学的方案page-diff

测试实现

  1. 以不同机型,先访问页面,并且截屏,确保可用性

截屏为了存储大小,截屏为 jpg,质量为 0.6,全页面截屏。我们这里通过人肉判定对比。

  1. 模拟交互行为,确定核心功能可用性
  • 首页入口是否可点击,与 1 中的功能其实是相似的

TODO

  • 知乎图片爬取
  • 增加 LOG
  • 集成 Docker
  • 暴露接口
  • 图片显示界面
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].