All Projects → Xarrow → weibo-scraper

Xarrow / weibo-scraper

Licence: MIT License
Simple Weibo Scraper

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to weibo-scraper

WeiboCrawler
无cookie版微博爬虫,可以连续爬取一个或多个新浪微博用户信息、用户微博及其微博评论转发。
Stars: ✭ 45 (-10%)
Mutual labels:  crawler, weibo, weibo-spider
Goose Parser
Universal scrapping tool, which allows you to extract data using multiple environments
Stars: ✭ 211 (+322%)
Mutual labels:  crawler, scraper
Media Scraper
Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok
Stars: ✭ 206 (+312%)
Mutual labels:  crawler, scraper
Annie
👾 Fast and simple video download library and CLI tool written in Go
Stars: ✭ 16,369 (+32638%)
Mutual labels:  crawler, scraper
Colly
Elegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+30970%)
Mutual labels:  crawler, scraper
Tianyancha
pip安装的天眼查爬虫API,指定的单个/多个企业工商信息一键保存为Excel/JSON格式。A Battery-included Scraper API of Tianyancha, the best Chinese business data and investigation platform.
Stars: ✭ 206 (+312%)
Mutual labels:  crawler, scraper
dijnet-bot
Az összes számlád még egy helyen :)
Stars: ✭ 17 (-66%)
Mutual labels:  crawler, scraper
Instagram Crawler
Crawl instagram photos, posts and videos for download.
Stars: ✭ 178 (+256%)
Mutual labels:  crawler, scraper
Weibopicdownloader
免登录下载微博图片 爬虫 Download Weibo Images without Logging-in
Stars: ✭ 247 (+394%)
Mutual labels:  crawler, weibo
Polite
Be nice on the web
Stars: ✭ 253 (+406%)
Mutual labels:  crawler, scraper
zeekEye
A Fast and Powerful Scraping and Web Crawling Framework.
Stars: ✭ 36 (-28%)
Mutual labels:  weibo, weibo-spider
Querylist
🕷️ The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
Stars: ✭ 2,392 (+4684%)
Mutual labels:  crawler, scraper
Jvppeteer
Headless Chrome For Java (Java 爬虫)
Stars: ✭ 193 (+286%)
Mutual labels:  crawler, scraper
bots-zoo
No description or website provided.
Stars: ✭ 59 (+18%)
Mutual labels:  crawler, scraper
Goribot
[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。
Stars: ✭ 190 (+280%)
Mutual labels:  crawler, scraper
Ruiji.net
crawler framework, distributed crawler extractor
Stars: ✭ 220 (+340%)
Mutual labels:  crawler, scraper
Datmusic Api
Alternative for VK Audio API
Stars: ✭ 160 (+220%)
Mutual labels:  crawler, scraper
Linkedin Profile Scraper
🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (+242%)
Mutual labels:  crawler, scraper
Skrape.it
A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.
Stars: ✭ 231 (+362%)
Mutual labels:  crawler, scraper
papercut
Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-70%)
Mutual labels:  crawler, scraper

Weibo Scraper

PyPI PyPI - Python Version Build Status codecov


Simple weibo tweet scraper . Crawl weibo tweets without authorization. There are many limitations in official API . In general , we can inspect mobile site which has it's own API by Chrome.


Why

  1. Crawl weibo data in order to research big data .

  2. Back up data for weibo's shameful blockade .


Installation

pip

$ pip install weibo-scraper

Or Upgrade it.

$ pip install --upgrade weibo-scraper

pipenv

$ pipenv install weibo-scraper

Or Upgrade it.

$ pipenv update --outdated # show packages which are outdated

$ pipenv update weibo-scraper # just update weibo-scraper

Only Python 3.6+ is supported


Usage

CLI

$ weibo-scraper -h

usage: weibo-scraper [-h] [-u U] [-p P] [-o O] [-f FORMAT]
                     [-efn EXPORTED_FILE_NAME] [-s] [-d] [--more] [-v]

weibo-scraper-1.0.7-beta 🚀

optional arguments:
  -h, --help            show this help message and exit
  -u U                  username [nickname] which want to exported
  -p P                  pages which exported [ default 1 page ]
  -o O                  output file path which expected [ default 'current
                        dir' ]
  -f FORMAT, --format FORMAT
                        format which expected [ default 'txt' ]
  -efn EXPORTED_FILE_NAME, --exported_file_name EXPORTED_FILE_NAME
                        file name which expected
  -s, --simplify        simplify available info
  -d, --debug           open debug mode
  --more                more
  -v, --version         weibo scraper version

API

  1. Firstly , you can get weibo profile by name or uid .
>>> from weibo_scraper import get_weibo_profile
>>> weibo_profile = get_weibo_profile(name='来去之间',)
>>> ....

You will get weibo profile response which is type of weibo_base.UserMeta, and this response include fields as below

field chinese type sample ext
id 用户id str
screen_name 微博昵称 Option[str]
avatar_hd 高清头像 Option[str] 'https://ww2.sinaimg.cn/orj480/4242e8adjw8elz58g3kyvj20c80c8myg.jpg'
cover_image_phone 手机版封面 Option[str] 'https://tva1.sinaimg.cn/crop.0.0.640.640.640/549d0121tw1egm1kjly3jj20hs0hsq4f.jpg'
description 描述 Option[str]
follow_count 关注数 Option[int] 3568
follower_count 被关注数 Option[int] 794803
gender 性别 Option[str] 'm'/'f'
raw_user_response 原始返回 Option[dict]
  1. Secondly , via tweet_container_id to get weibo tweets is a rare way to use but it also works well .
>>> from weibo_scraper import  get_weibo_tweets
>>> for tweet in get_weibo_tweets(tweet_container_id='1076033637346297',pages=1):
>>>     print(tweet)
>>> ....
  1. Of Course , you can also get raw weibo tweets by nick name which is exist . And the param of pages is optional .
>>> from weibo_scraper import  get_weibo_tweets_by_name
>>> for tweet in get_weibo_tweets_by_name(name='嘻红豆', pages=1):
>>>     print(tweet)
>>> ....
  1. If you want to get all tweets , you can set the param of pages as None
>>> from weibo_scraper import  get_weibo_tweets_by_name
>>> for tweet in get_weibo_tweets_by_name(name='嘻红豆', pages=None):
>>>     print(tweet)
>>> ....
  1. You can also get formatted tweets via api of weibo_scrapy.get_formatted_weibo_tweets_by_name,
>>> from weibo_scraper import  get_formatted_weibo_tweets_by_name
>>> result_iterator = get_formatted_weibo_tweets_by_name(name='嘻红豆', pages=None)
>>> for user_meta in result_iterator:
>>>     if user_meta is not None:
>>>         for tweetMeta in user_meta.cards_node:
>>>             print(tweetMeta.mblog.text)
>>> ....

img

  1. Get realtime hot words
hotwords = weibo_scraper.get_realtime_hotwords()
    for hw in hotwords:
        print(str(hw))
  1. Get realtime hot words in every interval
wt = Timer(name="realtime_hotword_timer", fn=weibo_scraper.get_realtime_hotwords, interval=1)
wt.set_ignore_ex(True)
wt.scheduler()

LICENSE

MIT

This Project Powered By Jetbrains OpenSource License

img

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].