LiuTianyong / Ncov2019_data_crawler
疫情数据爬虫,2019新型冠状病毒数据仓库,轨迹数据,同乘数据,报道
Stars: ✭ 175
Programming Languages
python
139335 projects - #7 most used programming language
Projects that are alternatives of or similar to Ncov2019 data crawler
Weixin Spider
微信公众号爬虫,公众号历史文章,文章评论,文章阅读及在看数据,可视化web页面,可部署于Windows服务器。基于Python3之flask/mysql/redis/mitmproxy/pywin32等实现,高效微信爬虫,微信公众号爬虫,历史文章,文章评论,数据更新。
Stars: ✭ 287 (+64%)
Mutual labels: api, crawler, spider
Crawlertutorial
爬蟲極簡教學(fetch, parse, search, multiprocessing, API)- PTT 為例
Stars: ✭ 282 (+61.14%)
Mutual labels: api, crawler, spider
Crawler China Mainland Universities
中国大陆大学列表爬虫
Stars: ✭ 143 (-18.29%)
Mutual labels: crawler, spider, data
Go spider
[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.
Stars: ✭ 1,745 (+897.14%)
Mutual labels: crawler, spider
Amazonbigspider
😱Full Automatic Amazon Distributed Spider | 亚马逊分布式四国际站采集选款产品|账号admin,密码adminadmin
Stars: ✭ 140 (-20%)
Mutual labels: crawler, spider
Python3 Spider
Python爬虫实战 - 模拟登陆各大网站 包含但不限于:滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝,如果喜欢请start ❤️
Stars: ✭ 2,129 (+1116.57%)
Mutual labels: crawler, spider
Weibo Topic Spider
微博超级话题爬虫,微博词频统计+情感分析+简单分类,新增肺炎超话爬取数据
Stars: ✭ 128 (-26.86%)
Mutual labels: crawler, spider
Apis
Making data readily available to anyone interested
Stars: ✭ 143 (-18.29%)
Mutual labels: api, data
Abot
Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
Stars: ✭ 1,961 (+1020.57%)
Mutual labels: crawler, spider
Linkedin Profile Scraper
🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (-2.29%)
Mutual labels: crawler, spider
Digger
Digger is a powerful and flexible web crawler implemented by pure golang
Stars: ✭ 130 (-25.71%)
Mutual labels: crawler, spider
Reddit Detective
Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more
Stars: ✭ 129 (-26.29%)
Mutual labels: api, data
Spoon
🥄 A package for building specific Proxy Pool for different Sites.
Stars: ✭ 173 (-1.14%)
Mutual labels: crawler, spider
Jlitespider
A lite distributed Java spider framework :-)
Stars: ✭ 151 (-13.71%)
Mutual labels: crawler, spider
Scrapingoutsourcing
ScrapingOutsourcing专注分享爬虫代码 尽量每周更新一个
Stars: ✭ 164 (-6.29%)
Mutual labels: crawler, spider
2019-nCov-data
简体中文 | English
本项目为2019新型冠状病毒(COVID-19/2019-nCoV)疫情状况的时间序列数据仓库,数据来源为丁香园、南都传媒和腾讯新闻。
本项目数据包括:轨迹数据,同乘数据,新闻数据,谣言数据(后续会更新其他方面,尽量保持数据仓库完整)
希望用这些数据做科研之用,因此做了这个数据仓库,直接推送大部分统计软件可以直接打开的csv文件,希望能够减轻各位的负担。 后期会部署服务器并提供API的使用和JSON数据接口,如有需要可以关注,后期我会进行数据清洗以后进行封装调用接口。
CSV文件列表
-
新闻数据 covid_news.csv 轨迹数据 covid_patient_track.csv 谣言数据 rumor.csv 同乘数据 covid_virus_trip.csv 腾讯新闻轨迹数据 covid_txnew_track.csv
项目介绍
本项目每一小时钟访问并爬取一次数据(实际程序可以调控爬取时间,但为了减轻目标服务器负载建议10 - 60分钟一次),储存在MySQL中,并且保存所有历史数据的更新,希望能够在未来回溯病情时能有所帮助。
数据表
CREATE TABLE `covid_news` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`adoptType` int(255) DEFAULT NULL,
`dataInfoOperator` varchar(255) DEFAULT NULL,
`dataInfoState` int(255) DEFAULT NULL,
`createTime` bigint(20) DEFAULT NULL,
`dataInfoTime` bigint(20) DEFAULT NULL,
`entryWay` int(255) DEFAULT NULL,
`infoSource` varchar(255) DEFAULT NULL,
`infoType` int(11) DEFAULT NULL,
`modifyTime` bigint(20) DEFAULT NULL,
`provinceId` int(11) DEFAULT NULL,
`provinceName` varchar(255) DEFAULT NULL,
`pubDate` bigint(20) DEFAULT NULL,
`pubDateStr` text,
`sourceUrl` text,
`summary` text,
`title` text,
`new_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1401 DEFAULT CHARSET=utf8 COMMENT=']'
CREATE TABLE `covid_patient_track` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`city` varchar(255) DEFAULT NULL,
`district` varchar(255) DEFAULT NULL,
`street` varchar(255) DEFAULT NULL,
`place` varchar(255) DEFAULT NULL,
`location` varchar(255) DEFAULT NULL,
`remark` varchar(255) DEFAULT NULL,
`source` varchar(255) DEFAULT NULL,
`link` varchar(255) DEFAULT NULL,
`is_today` varchar(255) DEFAULT NULL,
`province` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=10960 DEFAULT CHARSET=utf8
CREATE TABLE `covid_rumor` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`body` text,
`mainSummary` varchar(255) DEFAULT NULL,
`rumorType` int(255) DEFAULT NULL,
`score` int(255) DEFAULT NULL,
`sourceUrl` varchar(255) DEFAULT NULL,
`summary` varchar(255) DEFAULT NULL,
`title` varchar(255) DEFAULT NULL,
`rumor_id` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=105 DEFAULT CHARSET=utf8
CREATE TABLE `covid_virus_trip` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`tripType` varchar(255) DEFAULT NULL,
`tripDate` varchar(255) DEFAULT NULL,
`tripNo` varchar(255) DEFAULT NULL,
`tripDepname` varchar(255) DEFAULT NULL,
`tripArrname` varchar(255) DEFAULT NULL,
`tripDepcode` varchar(255) DEFAULT NULL,
`tripArrcode` varchar(255) DEFAULT NULL,
`tripDeptime` varchar(255) DEFAULT NULL,
`tripArrtime` varchar(255) DEFAULT NULL,
`carriage` varchar(255) DEFAULT NULL,
`seatNo` varchar(255) DEFAULT NULL,
`tripMemo` text,
`link` text,
`publisher` varchar(255) DEFAULT NULL,
`publishtime` varchar(255) DEFAULT NULL,
`verified` varchar(255) DEFAULT NULL,
`codeList` varchar(255) DEFAULT NULL,
`nameIndex` varchar(255) DEFAULT NULL,
`createtime` varchar(255) DEFAULT NULL,
`updatetime` varchar(255) DEFAULT NULL,
`virus_trip_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2451 DEFAULT CHARSET=utf8
CREATE TABLE `covid_txnew_track` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`confid` varchar(255) DEFAULT NULL,
`province` varchar(255) DEFAULT NULL,
`city` varchar(255) DEFAULT NULL,
`county` varchar(255) DEFAULT NULL,
`location` varchar(255) DEFAULT NULL,
`user_num` varchar(255) DEFAULT NULL,
`user_name` varchar(255) DEFAULT NULL,
`other_info` varchar(255) DEFAULT NULL,
`track` varchar(255) DEFAULT NULL,
`target` varchar(255) DEFAULT NULL,
`pub_time` varchar(255) DEFAULT NULL,
`source` varchar(255) DEFAULT NULL,
`source_url` varchar(255) DEFAULT NULL,
`contact` varchar(255) DEFAULT NULL,
`contact_detail` varchar(255) DEFAULT NULL,
`hashtag` varchar(255) DEFAULT NULL,
`lasttime` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=21 DEFAULT CHARSET=utf8
捐赠
本项目不需要任何捐赠。 全国各地的医疗资源都处于短缺的状态。如果希望捐赠的人,请移步各个红十字会或者官方认可的捐赠平台,他们能够更加妥善地运用这笔资金,帮助更有需要的人。 祝大家一切都好。
最后声明
- 本项目完全出于公益目的,如果未来用作商业目的或产生任何不必要的版权纠纷,本项目不负任何责任;
- 本项目仅获取丁香园和南都传媒的疫情数据并将其储存,数据所有权为丁香园和南都传媒,本人无法授权任何个人或团体在科研或商业项目中使用本数据,如有需要,希望您能够联系丁香园和南都传媒并取得许可;
- 如有其它问题可留言
- 感谢我的小伙伴玉容同学帮我收集全国卫健委资料(该部分还没开始做,会尽快开始)
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].