All Projects → gaussic → Weibo_wordcloud

gaussic / Weibo_wordcloud

Licence: mit
根据关键词抓取微博数据,再生成词云

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Weibo wordcloud

Weibo Analyst
Social media (Weibo) comments analyzing toolbox in Chinese 微博评论分析工具, 实现功能: 1.微博评论数据爬取; 2.分词与关键词提取; 3.词云与词频统计; 4.情感分析; 5.主题聚类
Stars: ✭ 430 (+179.22%)
Mutual labels:  crawler, weibo
Weibo Crawler
新浪微博爬虫,用python爬取新浪微博数据,并下载微博图片和微博视频
Stars: ✭ 1,019 (+561.69%)
Mutual labels:  crawler, weibo
Fess
Fess is very powerful and easily deployable Enterprise Search Server.
Stars: ✭ 561 (+264.29%)
Mutual labels:  search, crawler
Crawlertutorial
爬蟲極簡教學(fetch, parse, search, multiprocessing, API)- PTT 為例
Stars: ✭ 282 (+83.12%)
Mutual labels:  search, crawler
Decryptlogin
APIs for loginning some websites by using requests.
Stars: ✭ 1,861 (+1108.44%)
Mutual labels:  crawler, weibo
Jivesearch
A search engine that doesn't track you.
Stars: ✭ 364 (+136.36%)
Mutual labels:  search, crawler
Scrapy Azuresearch Crawler Samples
Scrapy as a Web Crawler for Azure Search Samples
Stars: ✭ 20 (-87.01%)
Mutual labels:  search, crawler
Algoliasearch Netlify
Official Algolia Plugin for Netlify. Index your website to Algolia when deploying your project to Netlify with the Algolia Crawler
Stars: ✭ 208 (+35.06%)
Mutual labels:  search, crawler
Baiduspider
BaiduSpider,一个爬取百度搜索结果的爬虫,目前支持百度网页搜索,百度图片搜索,百度知道搜索,百度视频搜索,百度资讯搜索,百度文库搜索,百度经验搜索和百度百科搜索。
Stars: ✭ 105 (-31.82%)
Mutual labels:  search, crawler
Weibo Album Crawler
新浪微博相册大图多线程爬虫。
Stars: ✭ 83 (-46.1%)
Mutual labels:  crawler, weibo
weibo-scraper
Simple Weibo Scraper
Stars: ✭ 50 (-67.53%)
Mutual labels:  crawler, weibo
Weibo Topic Spider
微博超级话题爬虫,微博词频统计+情感分析+简单分类,新增肺炎超话爬取数据
Stars: ✭ 128 (-16.88%)
Mutual labels:  crawler, weibo
indieweb-search
Source code for the IndieWeb search engine.
Stars: ✭ 16 (-89.61%)
Mutual labels:  search, crawler
Opensearchserver
Open-source Enterprise Grade Search Engine Software
Stars: ✭ 408 (+164.94%)
Mutual labels:  search, crawler
WeiboCrawler
无cookie版微博爬虫,可以连续爬取一个或多个新浪微博用户信息、用户微博及其微博评论转发。
Stars: ✭ 45 (-70.78%)
Mutual labels:  crawler, weibo
Filemasta
A search application to explore, discover and share online files
Stars: ✭ 571 (+270.78%)
Mutual labels:  search, crawler
Weibopicdownloader
免登录下载微博图片 爬虫 Download Weibo Images without Logging-in
Stars: ✭ 247 (+60.39%)
Mutual labels:  crawler, weibo
Lxspider
爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书》
Stars: ✭ 60 (-61.04%)
Mutual labels:  crawler, weibo
Sina Weibo Album Downloader
Multithreading download all HD photos / pictures from someone's Sina Weibo album.
Stars: ✭ 125 (-18.83%)
Mutual labels:  crawler, weibo
Search
An Open Source Search Engine
Stars: ✭ 139 (-9.74%)
Mutual labels:  search, crawler

微博爬虫与词云展示

环境

  • Python 3
  • requests
  • jieba
  • matplotlib
  • wordcloud
  • scipy

爬虫

由于移动端网页版并未对爬虫做太大的限制,因此可以直接爬取微博搜索部分数据。搜索 API 如下:

https://m.weibo.cn/api/container/getIndex?type=wb&queryVal={}&containerid=100103type=2%26q%3D{}&page={}

基于这个 API 可以获取到一定量的 JSON 数据 (原始数据见 sample.json),经过处理后,格式如下:

{
    "mid": "4199434918992223",
    "text": "【深度学习的终极形态】近期,院友袁进辉博士回到微软亚洲研究院做了题为《打造最强深度学习引擎》的报告,分享了深度学习框架方面的技术进展。他在报告中启发大家思考如何才能“鱼和熊掌兼得”,让软件发挥灵活性,硬件发挥高效率。我们整理了本次报告的重点,希望能对大家有所帮助!  ​...全文",
    "userid": "1286528122",
    "username": "微软亚洲研究院",
    "reposts_count": 21,
    "comments_count": 1,
    "attitudes_count": 9
}

详细的爬虫见 weibo_search.py。

词云

词云的实现可以使用 wordcloud,基本的步骤是:

  1. 分词与关键词提取:中文的文本需要分词和去除大量的停用词,例如(你,我,他,这是), 才能使得生成的词云图更加具有意义。这一步,使用 jieba 分词器的 TF-IDF 关键词提取,就可以直接完成。

  2. 传入 wordcloud 的是一个字符串以及一幅底层图像,将第一步得到的关键词用空格串联起来, 对于底层图像的选取,尽量选择白底无背景图像,这样生成的图像就会更加接近原图。

代码详见 weibo_cloud.py。

样例

关键词:iPhone

apple

关键词:微软

microsoft

关键词:谷歌

google

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].