All Projects → Albert-W → Python_crawler

Albert-W / Python_crawler

It's designed to be a simple, tiny, pratical python crawler using json and sqlite instead of mysql or mongdb. The destination website is Zhihu.com.

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Python crawler

Flask Restplus Boilerplate
A boilerplate for flask restful web service
Stars: ✭ 466 (+935.56%)
Mutual labels:  sqlalchemy, flask
Flask Marshmallow
Flask + marshmallow for beautiful APIs
Stars: ✭ 666 (+1380%)
Mutual labels:  sqlalchemy, flask
Potion
Flask-Potion is a RESTful API framework for Flask and SQLAlchemy, Peewee or MongoEngine
Stars: ✭ 484 (+975.56%)
Mutual labels:  sqlalchemy, flask
Data Driven Web Apps With Flask
Course demo code and other hand-out materials for our data-driven web apps in Flask course
Stars: ✭ 388 (+762.22%)
Mutual labels:  sqlalchemy, flask
Flask Jwt Router
Flask JWT Router is a Python library that adds authorised routes to a Flask app.
Stars: ✭ 43 (-4.44%)
Mutual labels:  sqlalchemy, flask
Mini Shop Server
基于 Flask 框架开发的微信小程序后端项目,用于构建小程序商城后台 (电商相关;rbac权限管理;附带自动生成Swagger 风格的API 文档;可作「Python 项目毕设」;慕课网系列)---- 相关博客链接:🌟
Stars: ✭ 446 (+891.11%)
Mutual labels:  sqlalchemy, flask
Flask Rest Jsonapi
Flask extension to build REST APIs around JSONAPI 1.0 specification.
Stars: ✭ 566 (+1157.78%)
Mutual labels:  sqlalchemy, flask
Safrs
SqlAlchemy Flask-Restful Swagger Json:API OpenAPI
Stars: ✭ 255 (+466.67%)
Mutual labels:  sqlalchemy, flask
Databook
A facebook for data
Stars: ✭ 26 (-42.22%)
Mutual labels:  sqlalchemy, flask
Flask Sqlalchemy Booster
Collection of utilities and decorators which add extensive querying and serializing capabilities to Flask SQLalchemy models
Stars: ✭ 5 (-88.89%)
Mutual labels:  sqlalchemy, flask
Enferno
A Python framework based on Flask microframework, with batteries included, and best practices in mind.
Stars: ✭ 385 (+755.56%)
Mutual labels:  sqlalchemy, flask
Ecache
👏👏 Integrate cache(redis) [flask etc.] with SQLAlchemy.
Stars: ✭ 28 (-37.78%)
Mutual labels:  sqlalchemy, flask
Flask Sqlalchemy
Adds SQLAlchemy support to Flask
Stars: ✭ 3,658 (+8028.89%)
Mutual labels:  sqlalchemy, flask
Full Stack
Full stack, modern web application generator. Using Flask, PostgreSQL DB, Docker, Swagger, automatic HTTPS and more.
Stars: ✭ 451 (+902.22%)
Mutual labels:  sqlalchemy, flask
Flask Sqlacodegen
🍶 Automatic model code generator for SQLAlchemy with Flask support
Stars: ✭ 283 (+528.89%)
Mutual labels:  sqlalchemy, flask
Qb
The database toolkit for go
Stars: ✭ 524 (+1064.44%)
Mutual labels:  sqlalchemy, sqlite3
nim-gatabase
Connection-Pooling Compile-Time ORM for Nim
Stars: ✭ 103 (+128.89%)
Mutual labels:  sqlalchemy, sqlite3
flaskbooks
A very light social network & RESTful API for sharing books using flask!
Stars: ✭ 19 (-57.78%)
Mutual labels:  sqlalchemy, sqlite3
Mixer
Mixer -- Is a fixtures replacement. Supported Django, Flask, SqlAlchemy and custom python objects.
Stars: ✭ 743 (+1551.11%)
Mutual labels:  sqlalchemy, flask
Flask Bones
An example of a large scale Flask application using blueprints and extensions.
Stars: ✭ 849 (+1786.67%)
Mutual labels:  sqlalchemy, flask

python_crawler

本项目旨要做一个轻量,易读,方便拓展的 知乎爬虫。

设计之初就尽量避免引入额外的框架和数据库引擎,因此它是一个python原生爬虫,数据库采用的是最轻便的sqlLite。 所有的定制信息都从config文件导入, 修改它可以实现定制功能。

效果展示

前端展示

image

数据库展示

image

前置条件

为方便数据库与对象的映射, 引入了sqlalchemy; 为了提供网页服务器,采用了flask, 此外没有其他包了。

pip install sqlalchemy
pip install flask

文件介绍

根目录

  1. zhihu.db 保存爬虫信息的 sqlite数据库文件

  2. temp.json 保存不需要存入数据库的临时信息

backend

主要负责爬虫与持久化功能

  1. config.py 所有的配置信息,都通过config.py 统一管理。 修改config.py 可以拓展程序的功能。

  2. create_table.py 设计表结构,并通过ORM 在数据库中创建表。

  3. dbTool.py 对数据库的操作,包装成python 函数。

  4. zhihu.py 全部爬虫功能实现

frontend

可视化展示的文件夹

  1. templates/ 提供了模版html, 是前端展示的入口

  2. static/ 包括图片,css, js 等资源文件

  3. run.py Flask 路由的实现,包括两个功能:

    1 向前端传递json 数据

    2 向前端传递展示页面。

数据库设计

需要保存的字段

  • id = Column(Integer, primary_key = True, autoincrement = True)
  • articleId = Column(Integer)
  • authorName = Column(String(length = 32))
  • authorId = Column(Integer)
  • followers = Column(Integer)
  • createTime = Column(String)
  • createDate = Column(Text)
  • vote = Column(Integer)
  • content = Column(Text)

使用方法

  1. 修改config.py , 输入想爬的网页,对应的正则表达式。
  2. 执行create_table.py, 会生成数据库与表单。
  3. 执行zhihu.py, 会爬取对应网页,并输入到数据库。默认:zhihu.db
  4. 执行run.py, 启动网页服务器,通过浏览器访问。默认: http://127.0.0.1:5000/zhihu
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].