Albert-W / Python_crawler
It's designed to be a simple, tiny, pratical python crawler using json and sqlite instead of mysql or mongdb. The destination website is Zhihu.com.
Stars: ✭ 45
Programming Languages
javascript
184084 projects - #8 most used programming language
Projects that are alternatives of or similar to Python crawler
Flask Restplus Boilerplate
A boilerplate for flask restful web service
Stars: ✭ 466 (+935.56%)
Mutual labels: sqlalchemy, flask
Flask Marshmallow
Flask + marshmallow for beautiful APIs
Stars: ✭ 666 (+1380%)
Mutual labels: sqlalchemy, flask
Potion
Flask-Potion is a RESTful API framework for Flask and SQLAlchemy, Peewee or MongoEngine
Stars: ✭ 484 (+975.56%)
Mutual labels: sqlalchemy, flask
Data Driven Web Apps With Flask
Course demo code and other hand-out materials for our data-driven web apps in Flask course
Stars: ✭ 388 (+762.22%)
Mutual labels: sqlalchemy, flask
Flask Jwt Router
Flask JWT Router is a Python library that adds authorised routes to a Flask app.
Stars: ✭ 43 (-4.44%)
Mutual labels: sqlalchemy, flask
Mini Shop Server
基于 Flask 框架开发的微信小程序后端项目,用于构建小程序商城后台 (电商相关;rbac权限管理;附带自动生成Swagger 风格的API 文档;可作「Python 项目毕设」;慕课网系列)---- 相关博客链接:🌟
Stars: ✭ 446 (+891.11%)
Mutual labels: sqlalchemy, flask
Flask Rest Jsonapi
Flask extension to build REST APIs around JSONAPI 1.0 specification.
Stars: ✭ 566 (+1157.78%)
Mutual labels: sqlalchemy, flask
Safrs
SqlAlchemy Flask-Restful Swagger Json:API OpenAPI
Stars: ✭ 255 (+466.67%)
Mutual labels: sqlalchemy, flask
Flask Sqlalchemy Booster
Collection of utilities and decorators which add extensive querying and serializing capabilities to Flask SQLalchemy models
Stars: ✭ 5 (-88.89%)
Mutual labels: sqlalchemy, flask
Enferno
A Python framework based on Flask microframework, with batteries included, and best practices in mind.
Stars: ✭ 385 (+755.56%)
Mutual labels: sqlalchemy, flask
Ecache
👏👏 Integrate cache(redis) [flask etc.] with SQLAlchemy.
Stars: ✭ 28 (-37.78%)
Mutual labels: sqlalchemy, flask
Flask Sqlalchemy
Adds SQLAlchemy support to Flask
Stars: ✭ 3,658 (+8028.89%)
Mutual labels: sqlalchemy, flask
Full Stack
Full stack, modern web application generator. Using Flask, PostgreSQL DB, Docker, Swagger, automatic HTTPS and more.
Stars: ✭ 451 (+902.22%)
Mutual labels: sqlalchemy, flask
Flask Sqlacodegen
🍶 Automatic model code generator for SQLAlchemy with Flask support
Stars: ✭ 283 (+528.89%)
Mutual labels: sqlalchemy, flask
nim-gatabase
Connection-Pooling Compile-Time ORM for Nim
Stars: ✭ 103 (+128.89%)
Mutual labels: sqlalchemy, sqlite3
flaskbooks
A very light social network & RESTful API for sharing books using flask!
Stars: ✭ 19 (-57.78%)
Mutual labels: sqlalchemy, sqlite3
Mixer
Mixer -- Is a fixtures replacement. Supported Django, Flask, SqlAlchemy and custom python objects.
Stars: ✭ 743 (+1551.11%)
Mutual labels: sqlalchemy, flask
Flask Bones
An example of a large scale Flask application using blueprints and extensions.
Stars: ✭ 849 (+1786.67%)
Mutual labels: sqlalchemy, flask
python_crawler
本项目旨要做一个轻量,易读,方便拓展的 知乎爬虫。
设计之初就尽量避免引入额外的框架和数据库引擎,因此它是一个python原生爬虫,数据库采用的是最轻便的sqlLite。 所有的定制信息都从config文件导入, 修改它可以实现定制功能。
效果展示
前端展示
数据库展示
前置条件
为方便数据库与对象的映射, 引入了sqlalchemy; 为了提供网页服务器,采用了flask, 此外没有其他包了。
pip install sqlalchemy
pip install flask
文件介绍
根目录
-
zhihu.db 保存爬虫信息的 sqlite数据库文件
-
temp.json 保存不需要存入数据库的临时信息
backend
主要负责爬虫与持久化功能
-
config.py 所有的配置信息,都通过config.py 统一管理。 修改config.py 可以拓展程序的功能。
-
create_table.py 设计表结构,并通过ORM 在数据库中创建表。
-
dbTool.py 对数据库的操作,包装成python 函数。
-
zhihu.py 全部爬虫功能实现
frontend
可视化展示的文件夹
-
templates/ 提供了模版html, 是前端展示的入口
-
static/ 包括图片,css, js 等资源文件
-
run.py Flask 路由的实现,包括两个功能:
1 向前端传递json 数据
2 向前端传递展示页面。
数据库设计
需要保存的字段
- id = Column(Integer, primary_key = True, autoincrement = True)
- articleId = Column(Integer)
- authorName = Column(String(length = 32))
- authorId = Column(Integer)
- followers = Column(Integer)
- createTime = Column(String)
- createDate = Column(Text)
- vote = Column(Integer)
- content = Column(Text)
使用方法
- 修改config.py , 输入想爬的网页,对应的正则表达式。
- 执行create_table.py, 会生成数据库与表单。
- 执行zhihu.py, 会爬取对应网页,并输入到数据库。默认:zhihu.db
- 执行run.py, 启动网页服务器,通过浏览器访问。默认: http://127.0.0.1:5000/zhihu
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].