All Projects → zkqiang → job-spider

zkqiang / job-spider

Licence: Apache-2.0 license
多线程爬取互联网行业常用招聘网站

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to job-spider

seenreq
Generate an object for testing if a request is sent, request is Mikeal's request.
Stars: ✭ 42 (+50%)
Mutual labels:  spider
MoMo
利用墨墨背单词的分享功能拿每日20个的单词上限奖励(多线程
Stars: ✭ 45 (+60.71%)
Mutual labels:  spider
ComicSpider
动漫之家漫画站电脑版原图爬虫
Stars: ✭ 67 (+139.29%)
Mutual labels:  spider
crawlerdetect
Golang module to detect bots and crawlers via the user agent
Stars: ✭ 22 (-21.43%)
Mutual labels:  spider
goSpider
some small project and some articles
Stars: ✭ 56 (+100%)
Mutual labels:  spider
PTT Beauty Spider
PTT 表特版爬蟲圖片下載器
Stars: ✭ 47 (+67.86%)
Mutual labels:  spider
blinkist-m4a-downloader
Grabs all of the audio files from all of the Blinkist books
Stars: ✭ 100 (+257.14%)
Mutual labels:  spider
get LibSeat
利昂图书馆预约系统自动预约&签到程序。支持包括中国人民大学、北京师范大学、济南大学、哈尔滨工业大学等在内的38所高校的图书馆系统
Stars: ✭ 39 (+39.29%)
Mutual labels:  spider
DSpiderDemo-Android
客户端爬虫安卓端demo
Stars: ✭ 43 (+53.57%)
Mutual labels:  spider
Tieba-Birthday-Spider
百度贴吧生日爬虫,可抓取贴吧内吧友生日,并且在对应日期自动发送祝福
Stars: ✭ 28 (+0%)
Mutual labels:  spider
small-spider-project
日常爬虫
Stars: ✭ 14 (-50%)
Mutual labels:  spider
feaplat
爬虫管理系统,支持集群,弹性伸缩。支持运行feapder、scrapy、selenium、playwright等各种框架及脚本
Stars: ✭ 42 (+50%)
Mutual labels:  spider
www job com
爬取拉勾、BOSS直聘、智联招聘、51job、赶集招聘、58招聘等职位信息
Stars: ✭ 47 (+67.86%)
Mutual labels:  lagou
scraper
图片爬取下载工具,极速爬取下载 站酷https://www.zcool.com.cn/, CNU 视觉 http://www.cnu.cc/ 设计师/用户 上传的 图片/照片/插画。
Stars: ✭ 64 (+128.57%)
Mutual labels:  spider
Bilibili manga download
带图形界面的哔哩哔哩漫画下载工具
Stars: ✭ 52 (+85.71%)
Mutual labels:  spider
spider
python 爬虫(amazon, confluence ...)
Stars: ✭ 21 (-25%)
Mutual labels:  spider
bangumi yearly report
No description or website provided.
Stars: ✭ 24 (-14.29%)
Mutual labels:  spider
scripter
一些脚本和工具
Stars: ✭ 20 (-28.57%)
Mutual labels:  spider
ZUCC ZhenFangHelper
正方教务管理系统学生版的自动登录、选课、信息获取
Stars: ✭ 36 (+28.57%)
Mutual labels:  spider
fetchurls
A bash script to spider a site, follow links, and fetch urls (with built-in filtering) into a generated text file.
Stars: ✭ 97 (+246.43%)
Mutual labels:  spider

招聘网站爬虫

可爬取各大互联网行业常用招聘网站,采集职位主要信息输出到 csv 文件;
爬虫和文件写入独立两个进程,进程A对每个网站的爬虫启动多线程,每个爬虫以生成器方式迭代返回数据,通过队列传输给进程B进行写入;
注意:本爬虫仅供学习交流,请勿将爬取数据进行非法使用。

运行环境

  • Python 3
  • requests
  • lxml

运行方式

方法一:使用命令行参数
$ python3 run.py -j 后端 -c 北京

方法二:直接运行,根据提示输入参数
$ python3 run.py
请输入职业:后端
请输入城市:北京

配置

如果想自定义爬虫,可添加在spider.py末尾定义爬虫类,需要继承BaseSpider基类和关联SpiderMeta元类, 并且需要实现crawl方法迭代返回爬取数据,数据内容请参照已有爬虫类。

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].