All Projects → RxGirlz → OpenYspider

RxGirlz / OpenYspider

Licence: other
千万级图片爬虫、视频爬虫 [开源版本] Image Spider

Programming Languages

HTML
75241 projects
java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to OpenYspider

Instagram-Giveaways-Winner
Instagram Bot which when given a post url will spam mentions to increase the chances of winning. Win Instagram Giveaways!
Stars: ✭ 95 (-22.13%)
Mutual labels:  selenium, selenium-webdriver
spydriver
🕵️ Lightweight utility to intercept WebDriver and WebElement method calls.
Stars: ✭ 24 (-80.33%)
Mutual labels:  selenium, selenium-webdriver
selenium-grid-docker-swarm
web scraping in parallel with Selenium Grid and Docker
Stars: ✭ 32 (-73.77%)
Mutual labels:  selenium, selenium-webdriver
TRA-Ticket-Booker
(已不適用新版臺鐵訂票系統,且不再更新)台灣鐵路訂票應用程式(臺鐵 / 台鐵 / 訂單程票 / 訂來回票),基於 Selenium + PyQt4。
Stars: ✭ 26 (-78.69%)
Mutual labels:  selenium, selenium-webdriver
Spider
Spider项目将会不断更新本人学习使用过的爬虫方法!!!
Stars: ✭ 16 (-86.89%)
Mutual labels:  spider, selenium
frontend testing
Repository containing sample code used in a Frontend Testing workshop
Stars: ✭ 14 (-88.52%)
Mutual labels:  selenium, selenium-webdriver
python-appium-framework
Complete Python Appium framework in 360 degree
Stars: ✭ 43 (-64.75%)
Mutual labels:  selenium, selenium-webdriver
Pddspider
拼多多爬虫,爬取所有商品、评论等信息
Stars: ✭ 121 (-0.82%)
Mutual labels:  spider, selenium
weibo topic
微博话题关键词,个人微博采集, 微博博文一键删除 selenium获取cookie,requests处理
Stars: ✭ 28 (-77.05%)
Mutual labels:  spider, selenium
resgen
Keep track of jobs you've applied to, automate resume & cover letter creation; generate PDFs from .odt templates on the fly while scraping the job post and tracking employer status.
Stars: ✭ 31 (-74.59%)
Mutual labels:  selenium, selenium-webdriver
selenium-cheatsheet-java
A comprehensive list of selenium commands in Java
Stars: ✭ 20 (-83.61%)
Mutual labels:  selenium, selenium-webdriver
Selenium.HtmlElements.Net
Elements model for Selenium.WebDriver
Stars: ✭ 26 (-78.69%)
Mutual labels:  selenium, selenium-webdriver
google-meet-bot
Bot for scheduling and entering google meet sessions automatically
Stars: ✭ 33 (-72.95%)
Mutual labels:  selenium, selenium-webdriver
ScatterFly
An attempt to improve user privacy by intelligent data obfuscation.
Stars: ✭ 49 (-59.84%)
Mutual labels:  selenium, selenium-webdriver
Python3 Spider
Python爬虫实战 - 模拟登陆各大网站 包含但不限于:滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝,如果喜欢请start ❤️
Stars: ✭ 2,129 (+1645.08%)
Mutual labels:  spider, selenium
frameworkium-examples
Sample project which utilises frameworkium-core, a framework for writing maintainable Selenium and REST API tests and facilitates reporting and integration to JIRA.
Stars: ✭ 52 (-57.38%)
Mutual labels:  selenium, selenium-webdriver
Hive
lots of spider (很多爬虫)
Stars: ✭ 110 (-9.84%)
Mutual labels:  spider, selenium-webdriver
Examples Of Web Crawlers
一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )
Stars: ✭ 10,724 (+8690.16%)
Mutual labels:  spider, selenium
zhihu-crawler
徒手实现定时爬取知乎,从中发掘有价值的信息,并可视化爬取的数据作网页展示。
Stars: ✭ 56 (-54.1%)
Mutual labels:  spider, selenium
PhpScreenRecorder
A slim PHP wrapper around ffmpeg to record screen,best for recording your acceptance test using selenium, easy to use and clean OOP interface
Stars: ✭ 44 (-63.93%)
Mutual labels:  selenium, selenium-webdriver

OpenYspider 4.x

千万级图片、视频爬虫 [开源版本]

简介

OpenYspider 是一个使用 Java 编写的简单爬虫。主要用到的技术栈有:

  1. spring-boot-starter-web
  2. spring-boot-starter-test
  3. mybatis-plus-boot-starter
  4. springfox-boot-starter
  5. lombok
  6. jsoup
  7. mockito + jacoco

当前 LTS 的网站有:

  1. tujidao.com

Deprecated 的网站(请于历史提交中查看):

  1. tangyun365.com
  2. yalayi.com
  3. rosmm88.com
  4. mzsock.com
  5. meinvla.net
  6. leetcode-cn.com

开发环境

Windows 11 + JDK 17 + Mysql 8.x

$ java --version
openjdk 17.0.1 2021-10-19
OpenJDK Runtime Environment (build 17.0.1+12-39)
OpenJDK 64-Bit Server VM (build 17.0.1+12-39, mixed mode, sharing)

运行启动类 OpenYspiderApplication 后,浏览器访问 http://localhost:23333/swagger-ui/index.html#/

数据库脚本: sql_scripts

爬取网站

数据统计截止 2022-02-12

1 图集岛(原美图日) [ 2,647,717P / 905G ]

select count(*) from oys_tujidao_album_t where album_id > 0 and album_id <= 10000; -- 9995 ok
select count(*) from oys_tujidao_album_t where album_id > 10000 and album_id <= 20000; -- 10000
select count(*) from oys_tujidao_album_t where album_id > 20000 and album_id <= 30000; -- 9999 [23001]
select count(*) from oys_tujidao_album_t where album_id > 30000 and album_id <= 40000; -- 10000
select count(*) from oys_tujidao_album_t where album_id > 40000 and album_id <= 50000; -- 8925 [46018]

部分成果展示

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].