Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to weibo spider

weibo crawler

收集新浪微博数据

Stars: ✭ 82 (+241.67%)

Mutual labels: weibospider

Weibospider

This is a sina weibo spider built by scrapy [微博爬虫/持续维护]

Stars: ✭ 2,408 (+9933.33%)

Mutual labels: weibospider

Weibospider

⚡ A distributed crawler for weibo, building with celery and requests.

Stars: ✭ 4,670 (+19358.33%)

Mutual labels: weibospider

Weibospider

新浪微博爬虫，用python爬取新浪微博数据

Stars: ✭ 4,861 (+20154.17%)

Mutual labels: weibospider

weibo_spider

Description

Reading userid list(not nickname) from Weibo_user table in MySQL,then crawl these user's Weibo message and save messages to database(MySQL).
main.py: start py.
MysqlUtil.py: connect to MySQL and execute CRUD operations.
WeiboProducer.py: read userid list from MySQL and put userids in to the queue
WeiboConsumer.py: read userid from the queue and crawl Weibo message.
weibo_rss.sql: database sql,include table structure.

Environment

Python: 2.7.*
System: Ubuntu
MySQL: 5.5

Usage

To run main.py normaly, you need do these:

you need to login weibo.cn(Mobile page) to get login cookie.
copy cookies, set to variable: cookie in WeiboConsumer.py line 25.
install MySQL and create database,tables.
set start parameters: -t (Weiboconsumer thread numbers)(Optional)

Example:

python main.py -t 3

How to get cookie:

open weibo.cn in Firefox or Chrome.
open developer tools -> NetWork， find weibo.cn login request header.
copy cookie in request header to program.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Kevinsss / weibo_spider

Programming Languages

Labels

Projects that are alternatives of or similar to weibo spider

weibo_spider

Description

Environment

Usage

How to get cookie: