All Projects → yhangf → Pythoncrawler

yhangf / Pythoncrawler

Licence: mit
💗用python编写的爬虫项目集合

Programming Languages

python
139335 projects - #7 most used programming language
python3
1442 projects

Labels

Projects that are alternatives of or similar to Pythoncrawler

Xshok Proxmox
proxmox post installation scripts
Stars: ✭ 260 (-69.63%)
Mutual labels:  scripts
Tpt Oracle
Tanel Poder's Troubleshooting & Performance Tools for Oracle Databases
Stars: ✭ 429 (-49.88%)
Mutual labels:  scripts
Oscp
Collection of things made during my OSCP journey
Stars: ✭ 709 (-17.17%)
Mutual labels:  scripts
Unity Script Collection
A maintained collection of useful & free unity scripts / library's / plugins and extensions
Stars: ✭ 3,640 (+325.23%)
Mutual labels:  scripts
Tcl
The Tcl Core. (Mirror of core.tcl-lang.org)
Stars: ✭ 342 (-60.05%)
Mutual labels:  scripts
Pentestkit
Useful tools and scripts during Penetration Testing engagements
Stars: ✭ 463 (-45.91%)
Mutual labels:  scripts
Sophia-Script-for-Windows
⚡ The most powerful PowerShell module on GitHub for Windows 10 & Windows 11 fine-tuning and tweaking
Stars: ✭ 4,311 (+403.62%)
Mutual labels:  scripts
Sneaky Scripts
Automated setup of development environments and other miscellaneous scripts.
Stars: ✭ 7 (-99.18%)
Mutual labels:  scripts
Velociraptor
An alternative to npm scripts for Deno
Stars: ✭ 386 (-54.91%)
Mutual labels:  scripts
Digispark Scripts
USB Rubber Ducky type scripts written for the DigiSpark.
Stars: ✭ 629 (-26.52%)
Mutual labels:  scripts
Rofimoji
An emoji and character picker for rofi 😁
Stars: ✭ 319 (-62.73%)
Mutual labels:  scripts
Utility Bash Scripts
🤓 Useful bash scripts to do automatable tasks with a single command
Stars: ✭ 336 (-60.75%)
Mutual labels:  scripts
Python Programs
My collection of Python Programs
Stars: ✭ 518 (-39.49%)
Mutual labels:  scripts
Edizon cheatsconfigsandscripts
The official EdiZon Editor Config and Editor Script repository.
Stars: ✭ 271 (-68.34%)
Mutual labels:  scripts
Denon
👀 Monitor any changes in your Deno application and automatically restart.
Stars: ✭ 725 (-15.3%)
Mutual labels:  scripts
Windows 10 Sophia Script
⚡ The most powerful PowerShell module on GitHub for Windows 10 & Windows 11 fine-tuning and tweaking
Stars: ✭ 4,133 (+382.83%)
Mutual labels:  scripts
Smarthome
@skalavala 👍 Nothing But Smarthome Stuff! - By Mahasri Kalavala
Stars: ✭ 437 (-48.95%)
Mutual labels:  scripts
Distro.tools
Mirror
Stars: ✭ 25 (-97.08%)
Mutual labels:  scripts
Dotfiles
My dotfiles.
Stars: ✭ 5 (-99.42%)
Mutual labels:  scripts
Penetration Testing Tools
A collection of more than 140+ tools, scripts, cheatsheets and other loots that I have developed over years for Red Teaming/Pentesting/IT Security audits purposes. Most of them came handy on at least one of my real-world engagements.
Stars: ✭ 614 (-28.27%)
Mutual labels:  scripts
        (                                                                        
       )\ )          )    )               (                       (             
      (()/( (     ( /( ( /(               )\   (       )  (  (    )\   (   (    
      /(_)))\ )  )\()))\())  (    (    (((_)  )(   ( /(  )\))(  ((_) ))\  )(   
      (_)) (()/( (_))/((_)\   )\   )\ ) )\___ (()\  )(_))((_)()\  _  /((_)(()\  
      | _ \ )(_))| |_ | |(_) ((_) _(_/(((/ __| ((_)((_)_ _(()((_)| |(_))   ((_)
      |  _/| || ||  _|| ' \ / _ \| ' \))| (__ | '_|/ _` |\ V  V /| |/ -_) | '_|
      |_|   \_, | \__||_||_|\___/|_||_|  \___||_|  \__,_| \_/\_/ |_|\___| |_|   
      |__/  
                                                  —————— by yanghangfeng

PythonCrawler: 用 python编写的爬虫项目集合🐛

spiderFile模块简介

  1. baidu_sy_img.py: 抓取百度的高清摄影图片。
  2. baidu_wm_img.py: 抓取百度图片唯美意境模块。
  3. get_photos.py: 抓取百度贴吧某话题下的所有图片。
  4. get_web_all_img.py: 抓取整个网站的图片。
  5. lagou_position_spider.py: 任意输入关键字,一键抓取与关键字相关的职位招聘信息,并保存到本地文件。
  6. student_img.py: 基于本学校官网的url漏洞,获取所有注册学生学籍证件照。
  7. JD_spider.py: 大批量抓取京东商品id和标签。
  8. ECUT_pos_html.py: 抓取学校官网所有校园招聘信息,并保存为html格式,图片也会镶嵌在html中。
  9. ECUT_get_grade.py: 模拟登陆学校官网,抓取成绩并计算平均学分绩。
  10. github_hot.py: 抓取github上面热门语言所对应的项目,并把项目简介和项目主页地址保存到本地文件。
  11. xz_picture_spider.py: 应一位知友的请求,抓取某网站上面所有的写真图片。
  12. one_img.py: 抓取one文艺网站的图片。
  13. get_baike.py: 任意输入一个关键词抓取百度百科的介绍。
  14. kantuSpider.py: 抓取看图网站上的所有图片。
  15. fuckCTF.py: 通过selenium模拟登入合天网站,自动修改原始密码。
  16. one_update.py: 更新抓取one文艺网站的代码,添加一句箴言的抓取。
  17. get_history_weather.py: 抓取广州市2019年第一季度的天气数据。
  18. search_useful_camera_ip_address.py: 模拟登入某扫描网站获取潜在的摄像头IP地址,然后使用弱密码验证筛选出可登录的摄像头IP地址。
  19. get_top_sec_com.py: 获取A股市场网络安全版块公司市值排名情况,并以图片格式保存下来。

spiderAPI模块简介

本模块提供一些网站的API爬虫接口,功能可能不是很全因此可塑性很大智慧的你如果有兴趣可以继续改进。

1.大众点评
from spiderAPI.dianping import *

'''
citys = {
    '北京': '2', '上海': '1', '广州': '4', '深圳': '7', '成都': '8', '重庆': '9', '杭州': '3', '南京': '5', '沈阳': '18', '苏州': '6', '天津': '10','武汉': '16', '西安': '17', '长沙': '344', '大连': '19', '济南': '22', '宁波': '11', '青岛': '21', '无锡': '13', '厦门': '15', '郑州': '160'
}

ranktype = {
    '最佳餐厅': 'score', '人气餐厅': 'popscore', '口味最佳': 'score1', '环境最佳': 'score2', '服务最佳': 'score3'
}
'''

result=bestRestaurant(cityId=1, rankType='popscore')#获取人气餐厅

shoplist=dpindex(cityId=1, page=1)#商户风云榜

restaurantlist=restaurantList('http://www.dianping.com/search/category/2/10/p2')#获取餐厅

2.获取代理IP

爬取代理IP

from spiderAPI.proxyip import get_enableips

enableips=get_enableips()

3.百度地图

百度地图提供的API,对查询有一些限制,这里找出了web上查询的接口。

from spiderAPI.baidumap import *

citys=citys()#获取城市列表
result=search(keyword="美食", citycode="257", page=1)#获取搜索结果

4.模拟登录github
from spiderAPI.github import GitHub

github = GitHub()
github.login() # 这一步会提示你输入用户名和密码
github.show_timeline() # 获取github主页时间线
# 更多的功能有待你们自己去发掘
5.拉勾网
from spiderAPI.lagou import *

lagou_spider(key='数据挖掘', page=1) # 获取关键字为数据挖掘的招聘信息
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].