All Projects → shuizhubocai → crawler

shuizhubocai / crawler

Licence: MIT license
requests+lxml爬虫,简单爬虫架构

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to crawler

iHealth crawler
iHealth 项目的内容爬虫(一个基于 python 和 MongoDB 的医疗咨询爬虫)
Stars: ✭ 24 (-66.67%)
Mutual labels:  requests, lxml
Requests Html
Pythonic HTML Parsing for Humans™
Stars: ✭ 12,268 (+16938.89%)
Mutual labels:  requests, lxml
dnevnikru
dnevnik.ru parser
Stars: ✭ 20 (-72.22%)
Mutual labels:  requests, lxml
Instagram Stalker Scraper
(UNMAINTAINED) Fetch data of any public Instagram profile, without using api
Stars: ✭ 39 (-45.83%)
Mutual labels:  requests, lxml
resto
🔗 a CLI app can send pretty HTTP & API requests with TUI
Stars: ✭ 113 (+56.94%)
Mutual labels:  requests
TSdownloader
Template for downloading segmented video (.m3u8/.ts) from streaming websites
Stars: ✭ 17 (-76.39%)
Mutual labels:  requests
WorkAggregation
招聘岗位信息聚合系统,拥有爬虫爬取、数据分析、可视化、互动等功能
Stars: ✭ 258 (+258.33%)
Mutual labels:  lxml
SD-streams
Anime streaming without ads using Beautifulsoup and requests Python
Stars: ✭ 18 (-75%)
Mutual labels:  requests
python3-concurrency
Python3爬虫系列的理论验证,首先研究I/O模型,分别用Python实现了blocking I/O、nonblocking I/O、I/O multiplexing各模型下的TCP服务端和客户端。然后,研究同步I/O操作(依序下载、多进程并发、多线程并发)和异步I/O(asyncio)之间的效率差别
Stars: ✭ 49 (-31.94%)
Mutual labels:  requests
covid-19
Data ETL & Analysis on the global and Mexican datasets of the COVID-19 pandemic.
Stars: ✭ 14 (-80.56%)
Mutual labels:  requests
get LibSeat
利昂图书馆预约系统自动预约&签到程序。支持包括中国人民大学、北京师范大学、济南大学、哈尔滨工业大学等在内的38所高校的图书馆系统
Stars: ✭ 39 (-45.83%)
Mutual labels:  requests
feupy
The sigarra scraping library no one asked for
Stars: ✭ 13 (-81.94%)
Mutual labels:  requests
option chain analysis
NSE Nifty Option chain analysis on the web page.
Stars: ✭ 63 (-12.5%)
Mutual labels:  requests
web full stack application
show full stack technology applications : Scrapy + webservice[restful] + websocket + VueJS + MongoDB
Stars: ✭ 16 (-77.78%)
Mutual labels:  requests
gists
Methods for working with the GitHub Gist API. Node.js/JavaScript
Stars: ✭ 96 (+33.33%)
Mutual labels:  requests
pyitau
Unofficial client to access your Itaú bank data
Stars: ✭ 28 (-61.11%)
Mutual labels:  requests
cpr
C++ Requests: Curl for People, a spiritual port of Python Requests.
Stars: ✭ 5,005 (+6851.39%)
Mutual labels:  requests
Tieba-Birthday-Spider
百度贴吧生日爬虫,可抓取贴吧内吧友生日,并且在对应日期自动发送祝福
Stars: ✭ 28 (-61.11%)
Mutual labels:  requests
PT-Tracking
Aplicação para registo e acompanhamento de encomendas da CTT Expresso, automatiza a consulta online do estado de tracking para várias remessas e mantém um registo dos pagamentos referentes aos envios à cobrança. As remessas que requerem atenção, devido a atrasos na entrega ou na receção do pagamento correspondente, bem como os cheques cuja data …
Stars: ✭ 18 (-75%)
Mutual labels:  requests
usim800
usim800 is a Python driver module for SIM800 GSM/GPRS .
Stars: ✭ 36 (-50%)
Mutual labels:  requests

使用requests+lxml爬取网站

crawler

爬取的网站

  • 爬取的是董伟明博客标题

爬虫包含6个模块

  • url管理器
  • download下载器
  • parser解析器
  • output导出数据
  • crawler爬虫调度器
  • useragent代理池

使用项目

  • 建议使用virtualenv在独立的环境中运行项目
  • pip3 install -r requirements.txt
  • python crawler.py

注意事项

  • lsxm版本使用3.5.0。目前高于3.5.0会不兼容
  • python版本使用3.6.0
  • pip3版本使用10.0.1
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].