All Projects → 1414044032 → Sina_Spider

1414044032 / Sina_Spider

Licence: other
新浪爬虫,基于Python+Selenium。模拟登陆后保存cookie,实现登录状态的保存。可以通过输入关键词来爬取到关键词相关的热门微博。

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Sina Spider

Capturer
capture pictures from website like sina, lofter, huaban and so on
Stars: ✭ 76 (+204%)
Mutual labels:  spider, sina
Shadow
计算机基础知识,数据结构,设计模式,Tomcat中间件的实现
Stars: ✭ 19 (-24%)
Mutual labels:  spider
fetchurls
A bash script to spider a site, follow links, and fetch urls (with built-in filtering) into a generated text file.
Stars: ✭ 97 (+288%)
Mutual labels:  spider
yutto
🧊 一个可爱且任性的 B 站视频下载器(bilili V2)
Stars: ✭ 383 (+1432%)
Mutual labels:  spider
ComicSpider
动漫之家漫画站电脑版原图爬虫
Stars: ✭ 67 (+168%)
Mutual labels:  spider
js block
研究学习各种拦截:反爬虫、拦截ad、防广告注入、斗黄牛等
Stars: ✭ 59 (+136%)
Mutual labels:  spider
bangumi yearly report
No description or website provided.
Stars: ✭ 24 (-4%)
Mutual labels:  spider
weibo topic
微博话题关键词,个人微博采集, 微博博文一键删除 selenium获取cookie,requests处理
Stars: ✭ 28 (+12%)
Mutual labels:  spider
NScrapy
NScrapy is a .net core corss platform Distributed Spider Framework which provide an easy way to write your own Spider
Stars: ✭ 88 (+252%)
Mutual labels:  spider
job-spider
多线程爬取互联网行业常用招聘网站
Stars: ✭ 28 (+12%)
Mutual labels:  spider
scripter
一些脚本和工具
Stars: ✭ 20 (-20%)
Mutual labels:  spider
Bilibili manga download
带图形界面的哔哩哔哩漫画下载工具
Stars: ✭ 52 (+108%)
Mutual labels:  spider
robotstxt
robots.txt file parsing and checking for R
Stars: ✭ 65 (+160%)
Mutual labels:  spider
Tieba-Birthday-Spider
百度贴吧生日爬虫,可抓取贴吧内吧友生日,并且在对应日期自动发送祝福
Stars: ✭ 28 (+12%)
Mutual labels:  spider
devsearch
A web search engine built with Python which uses TF-IDF and PageRank to sort search results.
Stars: ✭ 52 (+108%)
Mutual labels:  spider
PTT Beauty Spider
PTT 表特版爬蟲圖片下載器
Stars: ✭ 47 (+88%)
Mutual labels:  spider
get LibSeat
利昂图书馆预约系统自动预约&签到程序。支持包括中国人民大学、北京师范大学、济南大学、哈尔滨工业大学等在内的38所高校的图书馆系统
Stars: ✭ 39 (+56%)
Mutual labels:  spider
zhihu-crawler
徒手实现定时爬取知乎,从中发掘有价值的信息,并可视化爬取的数据作网页展示。
Stars: ✭ 56 (+124%)
Mutual labels:  spider
Spider
Spider项目将会不断更新本人学习使用过的爬虫方法!!!
Stars: ✭ 16 (-36%)
Mutual labels:  spider
crawler
一个php爬虫
Stars: ✭ 13 (-48%)
Mutual labels:  spider

Sina_Spider

新浪爬虫,基于Python+Selenium。模拟登陆后保存cookie,实现登录状态的保存。可以通过输入关键词来爬取到关键词相关的热门微博。

环境与工具:

Python:3.6 + selenium + firefox_Driver firfox_Driver 驱动下载地址: https://pan.baidu.com/s/1WGo7kVGsfRlE2XFvQRPHJA https://github.com/mozilla/geckodriver/releases 注意驱动与浏览器版本对应 下载驱动后。可以放在 C:\Python36\Scripts 目录下面。不然需要配置环境变量,把驱动目录添加进Path。 需要安装火狐浏览器:官网下载。

main 中修改为自己的账户密码即可。注意看浏览器打开的窗口登录时,是否有验证码。经过测试,邮箱登录一般不会弹出验证码。手机号码会弹出。异地登录会弹出。 出现验证码,可以在 driver.find_element_by_css_selector("div.info_list:nth-child(6) > a:nth-child(1)").click() 之前time.sleep(20) 让驱动暂时暂停,手动输入验证码(20秒内)。之后就可以正常获取到cookie。获取的cookie 保存为txt文件,放在同一级目录中,再次登录就不需要模拟登陆了。

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].