All Projects → datasets → Covid 19

datasets / Covid 19

Novel Coronavirus 2019 time series data on cases

Programming Languages

python
139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to Covid 19

Wikisql
A large annotated semantic parsing corpus for developing natural language interfaces.
Stars: ✭ 965 (-8.96%)
Mutual labels:  dataset
Qri
you're invited to a data party!
Stars: ✭ 1,003 (-5.38%)
Mutual labels:  dataset
Chinesetrafficpolicepose
Detects Chinese traffic police commanding poses 检测中国交警指挥手势
Stars: ✭ 49 (-95.38%)
Mutual labels:  dataset
French Sentiment Analysis Dataset
A collection of over 1.5 Million tweets data translated to French, with their sentiment.
Stars: ✭ 35 (-96.7%)
Mutual labels:  dataset
Human3.6m downloader
Human3.6M downloader by Python
Stars: ✭ 37 (-96.51%)
Mutual labels:  dataset
Letsgodataset
This repository makes the integral Let's Go dataset publicly available.
Stars: ✭ 41 (-96.13%)
Mutual labels:  dataset
Elastic data
Elasticsearch datasets ready for bulk loading
Stars: ✭ 30 (-97.17%)
Mutual labels:  dataset
Courseraforums
Anonymized versions of the discussion threads from the forums of 60 Coursera MOOCs
Stars: ✭ 50 (-95.28%)
Mutual labels:  dataset
People Counting Dataset
the large-scale data set for people counting (LOI counting)
Stars: ✭ 37 (-96.51%)
Mutual labels:  dataset
Mtnt
Code for the collection and analysis of the MTNT dataset
Stars: ✭ 48 (-95.47%)
Mutual labels:  dataset
Dataconfs
A list of conferences connected with data worldwide.
Stars: ✭ 36 (-96.6%)
Mutual labels:  dataset
Pts
Quantized Mesh Terrain Data Generator and Server for CesiumJS Library
Stars: ✭ 36 (-96.6%)
Mutual labels:  dataset
Watermarkreco
Pytorch implementation of the paper "Large-Scale Historical Watermark Recognition: dataset and a new consistency-based approach"
Stars: ✭ 45 (-95.75%)
Mutual labels:  dataset
Multi Plier
An unsupervised transfer learning approach for rare disease transcriptomics
Stars: ✭ 33 (-96.89%)
Mutual labels:  dataset
Distil
💧 In memory dataset filtering, inspired by snikch/aggro
Stars: ✭ 49 (-95.38%)
Mutual labels:  dataset
Rstudioconf tweets
🖥 A repository for tracking tweets about rstudio::conf
Stars: ✭ 32 (-96.98%)
Mutual labels:  dataset
Covid Ctset
Large Covid-19 CT scans dataset from paper: https://doi.org/10.1101/2020.06.08.20121541
Stars: ✭ 40 (-96.23%)
Mutual labels:  dataset
Images Web Crawler
This package is a complete tool for creating a large dataset of images (specially designed -but not only- for machine learning enthusiasts). It can crawl the web, download images, rename / resize / covert the images and merge folders..
Stars: ✭ 51 (-95.19%)
Mutual labels:  dataset
Php Ml
PHP-ML - Machine Learning library for PHP
Stars: ✭ 7,900 (+645.28%)
Mutual labels:  dataset
Multidigitmnist
Combine multiple MNIST digits to create datasets with 100/1000 classes for few-shot learning/meta-learning
Stars: ✭ 48 (-95.47%)
Mutual labels:  dataset

COVID-19 dataset

Coronavirus disease 2019 (COVID-19) time series listing confirmed cases, reported deaths and reported recoveries. Data is disaggregated by country (and sometimes subregion). Coronavirus disease (COVID-19) is caused by the Severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) and has had a worldwide effect. On March 11 2020, the World Health Organization (WHO) declared it a pandemic, pointing to the over 118,000 cases of the Coronavirus illness in over 110 countries and territories around the world at the time.

This dataset includes time series data tracking the number of people affected by COVID-19 worldwide, including:

  • confirmed tested cases of Coronavirus infection
  • the number of people who have reportedly died while sick with Coronavirus
  • the number of people who have reportedly recovered from it

Data

Data is in CSV format and updated daily. It is sourced from this upstream repository maintained by the amazing team at Johns Hopkins University Center for Systems Science and Engineering (CSSE) who have been doing a great public service from an early point by collating data from around the world.

We have cleaned and normalized that data, for example tidying dates and consolidating several files into normalized time series. We have also added some metadata such as column descriptions and data packaged it.

You can view the data, its structure as well as download it in alternative formats (e.g. JSON) from the DataHub:

https://datahub.io/core/covid-19

Sources

The upstream dataset currently lists the following upstream data sources:

We will endeavour to provide more detail on how regularly and by which technical means the data is updated. Additional background is available in the CSSE blog, and in the Lancet paper (DOI), which includes this figure:

countries timeline

Preparation

This repository uses Pandas to process and normalize the data.

You first need to install the dependencies:

pip install -r scripts/requirements.txt

Then run the following scripts:

python scripts/process_worldwide.py
python scripts/process_us.py

Python 3.8 .github/workflows/actions.yml

License

This dataset is licensed under the Open Data Commons Public Domain and Dedication License.

The data comes from a variety public sources and was collated in the first instance via Johns Hopkins University on GitHub. We have used that data and processed it further. Given the public sources and factual nature we believe that there the data is public domain and are therefore releasing the results under the Public Domain Dedication and License. We are also, of course, explicitly licensing any contribution of ours under that license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].