All Projects → xjtushilei → ChineseStarsRelationship

xjtushilei / ChineseStarsRelationship

Licence: Apache-2.0 license
中国明星数据爬取。你甚至可以拿到互联网上所有的人之间的关系,接下来你可以自己发挥!基于这些数据,你可以完成更多有趣的事情。比如说社交网络分析,关系网络可视化,算法研究,和其他有意思的事情。Chinese star data crawling. You can even get all the people on the internet! Based on these data, you can do more interesting things. For example, social network analysis, relational network visualization, algorithm research, and other interesting things.

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to ChineseStarsRelationship

Economic audit knowledge graph
经济责任审计知识图谱:网络爬虫、关系抽取、领域词汇判定
Stars: ✭ 98 (+276.92%)
Mutual labels:  spider, knowledge-graph
Web kg
爬取百度百科中文页面,抽取三元组信息,构建中文知识图谱
Stars: ✭ 549 (+2011.54%)
Mutual labels:  spider, knowledge-graph
go-movies
golang spider Crawler 爬虫 电影
Stars: ✭ 168 (+546.15%)
Mutual labels:  spider
spider
🌟 powered by python3( simple learning of spider) 百度文库;网易云歌曲; 豆瓣电影; GitHub; 京东; QQ空间; 天气; vip解析助手; TED文本内容; wifi破解脚本; 必应图片设置为桌面等爬取
Stars: ✭ 124 (+376.92%)
Mutual labels:  spider
KG4Rec
Knowledge-aware recommendation papers.
Stars: ✭ 76 (+192.31%)
Mutual labels:  knowledge-graph
obo-relations
RO is an ontology of relations for use with biological ontologies
Stars: ✭ 63 (+142.31%)
Mutual labels:  knowledge-graph
WSDM2021 NSM
Improving Multi-hop Knowledge Base Question Answering by Learning Intermediate Supervision Signals. WSDM 2021.
Stars: ✭ 84 (+223.08%)
Mutual labels:  knowledge-graph
Social-Knowledge-Graph-Papers
A paper list of research about social knowledge graph
Stars: ✭ 27 (+3.85%)
Mutual labels:  knowledge-graph
node-html-crawler
Simple for use node html crawler (spider) of site web pages
Stars: ✭ 30 (+15.38%)
Mutual labels:  spider
qa
😚 Q & A website based on Spring Boot.
Stars: ✭ 46 (+76.92%)
Mutual labels:  spider
PaperMachete
A project that uses Binary Ninja and GRAKN.AI to perform static analysis on binary files with the goal of identifying bugs in software.
Stars: ✭ 49 (+88.46%)
Mutual labels:  knowledge-graph
nodejs-meizitu
妹子图全站采集10G套图资源
Stars: ✭ 80 (+207.69%)
Mutual labels:  spider
Subbranch-China
银行、支行名称。中国各地区各银行支行名称数据爬虫,数据来源微信商户平台,已经整理可直接导入的sql文件
Stars: ✭ 31 (+19.23%)
Mutual labels:  spider
wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (+100%)
Mutual labels:  spider
cognipy
In-memory Graph Database and Knowledge Graph with Natural Language Interface, compatible with Pandas
Stars: ✭ 31 (+19.23%)
Mutual labels:  knowledge-graph
elves
🎊 Design and implement of lightweight crawler framework.
Stars: ✭ 322 (+1138.46%)
Mutual labels:  spider
NBFNet
Official implementation of Neural Bellman-Ford Networks (NeurIPS 2021)
Stars: ✭ 106 (+307.69%)
Mutual labels:  knowledge-graph
typedb
TypeDB: a strongly-typed database
Stars: ✭ 3,152 (+12023.08%)
Mutual labels:  knowledge-graph
163Music
163music spider by scrapy.
Stars: ✭ 60 (+130.77%)
Mutual labels:  spider
spider-school
自动答题程序🎉
Stars: ✭ 37 (+42.31%)
Mutual labels:  spider

中国明星数据爬取

目标

代码没有技术含量,仅仅告诉大家一个好的数据源!

爬取网络上的数据,建立一个完整的人物关系网。这里是爬取数据的部分,使用了jsoup就可以了,主要还是网站比较好。

时效性

2017年还有效。由于互动百科网站的页面结构改变,可能会影响爬虫的正常工作,后续不在维护其可用性,想下载数据的直接在release中进行下载。

方法

深度优先爬取,直到队列没有种子。暂时没有使用多线程。

举例

http://www.baike.com/wiki/%E5%91%A8%E6%9D%B0%E4%BC%A6里有完整的关系网络信息,简单解析一下就好啦。

结果展示

爬取得过程(log4j的日志)

图片展示

结果(尚未爬取结束)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].