All Projects → mongodb-developer → open-data-covid-19

mongodb-developer / open-data-covid-19

Licence: Apache-2.0 license
Open Data Repository for the Covid-19 dataset.

Programming Languages

python
139335 projects - #7 most used programming language
java
68154 projects - #9 most used programming language
go
31211 projects - #10 most used programming language
shell
77523 projects
javascript
184084 projects - #8 most used programming language
Dockerfile
14818 projects
Makefile
30231 projects

Projects that are alternatives of or similar to open-data-covid-19

COVID-19-DETECTION
Detect Covid-19 with Chest X-Ray Data
Stars: ✭ 43 (+126.32%)
Mutual labels:  covid-19, covid-virus, covid, covid19, covid-data, covid19-data
us-covid19
Data repository of State's Health Department stats for COVID19 in the United States
Stars: ✭ 37 (+94.74%)
Mutual labels:  covid-19, covid-virus, covid, covid19, covid-data, covid19-data
coviddata
Daily COVID-19 statistics by country, region, and city
Stars: ✭ 49 (+157.89%)
Mutual labels:  covid-19, covid, covid19, covid-data, covid19-data
covid19-api
Covid19 Data API (JSON) - LIVE
Stars: ✭ 20 (+5.26%)
Mutual labels:  open-data, covid-19, covid, covid-data, covid19-data
covid-dashboard
Help welcomed if you have expertise in public health web technology, data modeling and munging, or visualization.
Stars: ✭ 106 (+457.89%)
Mutual labels:  covid-19, covid, covid19, covid-data, covid19-data
PhoNER COVID19
COVID-19 Named Entity Recognition for Vietnamese (NAACL 2021)
Stars: ✭ 55 (+189.47%)
Mutual labels:  covid-19, covid, covid19, covid19-data
COVID19
A web app to display the live graphical state-wise reported corona cases in India so far. It also shows the latest news for COVID-19. Stay Home, Stay Safe!
Stars: ✭ 122 (+542.11%)
Mutual labels:  covid-19, covid19, covid-data, covid19-data
CoronaVirusDatabase
A repository for analyzing references and database of "gisanddata.maps.arcgis.com" website for Corona Virus.
Stars: ✭ 38 (+100%)
Mutual labels:  covid-19, covid-virus, covid19, covid-data
covid-19
An app made with Flutter to track COVID-19 case counts.
Stars: ✭ 47 (+147.37%)
Mutual labels:  covid-19, covid-virus, covid19, covid19-data
covid19-timeseries
Covid19 timeseries data store
Stars: ✭ 38 (+100%)
Mutual labels:  covid-19, covid, covid19, covid19-data
covid19-mx-time-series
Time series data of the COVID-19 epidemic in Mexico
Stars: ✭ 36 (+89.47%)
Mutual labels:  covid-19, covid19, covid-data, covid19-data
Covid19arData
Data COVID-19 Argentina actualizada y en formatos abiertos.
Stars: ✭ 51 (+168.42%)
Mutual labels:  open-data, covid-19, covid, covid-data
covid-19-image-repository
Anonymized dataset of COVID-19 cases with a focus on radiological imaging. This includes images (x-ray / ct) with extensive metadata, such as admission-, ICU-, laboratory-, and patient master-data.
Stars: ✭ 42 (+121.05%)
Mutual labels:  covid-19, covid19, covid-data, covid19-data
CoronaVirusOutbreakAPI
A tiny and small program to crawler and analyze outbreak of COVID-19 in world and every country using PHP.
Stars: ✭ 20 (+5.26%)
Mutual labels:  covid-19, covid-virus, covid, covid19
covid19-visualized
COVID-19 World update with data Visualization (Include Indonesia cases)
Stars: ✭ 23 (+21.05%)
Mutual labels:  covid-19, covid-virus, covid19, covid19-data
COVID19Py
A tiny Python package for easy access to up-to-date Coronavirus (COVID-19, SARS-CoV-2) cases data.
Stars: ✭ 86 (+352.63%)
Mutual labels:  covid-19, covid19, covid19-data
covid19gr
Open Data Aggregation & Knowledge Base Repository for the evolution of the SARS-COV-2 pandemic in Greece.
Stars: ✭ 21 (+10.53%)
Mutual labels:  covid-19, covid19, covid19-data
data2019nCoV
COVID-19 Pandemic Data R Package
Stars: ✭ 40 (+110.53%)
Mutual labels:  covid-19, covid, covid-data
covid-19-vis
This repository contains data visualizations based on RKI and DIVI using kepler.gl
Stars: ✭ 25 (+31.58%)
Mutual labels:  covid-19, covid19, covid-data
COVID-19-AI
Collection of AI resources to fight against Coronavirus (COVID-19)
Stars: ✭ 25 (+31.58%)
Mutual labels:  covid-19, covid19, covid19-data

MongoDB Open Data COVID-19

This project retrieves and inserts into MongoDB the Johns Hopkins University COVID-19 dataset provided on Github.

Please read the blog post associated to this repository.

Databases and Collections

Database covid19

This database contains a few collections that have been carefully engineered to be as useful and as convenient as possible to work with.

In each of these collections:

  • the keys have been renamed correctly to provide more consistency across the different collections,
  • the fields have been casted into their correct types,
  • the loc fields contains GeoJSON points,
  • the date are ISODates,
  • etc.

5 collections are available:

  • metadata
  • global (the data from the time series global files)
  • us_only (the data from the time series US files)
  • global_and_us (the most complete one)
  • countries_summary (same as global but countries are grouped in a single doc for each date)

Collection metadata

This collection contains only one single document. It contains the list of all the values (obtained with mongodb distinct function) for the major fields along with the first and last dates.

{
  _id : "metadata",
  countries : [ "Afghanistan", "Albania", "Algeria", "..." ],
  states : [ "Alabama", "Alaska", "Alberta", "..." ],
  states_us : [ "Alabama", "Alaska", "American Samoa", "..." ],
  counties : [ "Abbeville", "Acadia", "Accomack", "..." ],
  iso3s : [ "ABW", "AFG", "AGO", "..." ],
  uids : [ 4, 8, 12, ... ],
  first_date : 2020-01-22T00:00:00.000+00:00,
  last_date : 2020-04-24T00:00:00.000+00:00
}

Collection global

This collection contains the equivalent of what is in the 3 documents:

  • time_series_covid19_confirmed_global.csv
  • time_series_covid19_deaths_global.csv
  • time_series_covid19_recovered_global.csv

Each document is also joined with its associated line from the UID_ISO_FIPS_LookUp_Table.csv file.

In this collection we have nb_entries(time_series_covid19_confirmed_global.csv) * number_days documents.

Here is an example document:

{
	"_id" : ObjectId("5ea45f2a8049cddb8cfa3822"),
	"uid" : 4,
	"country_iso2" : "AF",
	"country_iso3" : "AFG",
	"country_code" : 4,
	"country" : "Afghanistan",
	"combined_name" : "Afghanistan",
	"population" : 38928341,
	"loc" : {
		"type" : "Point",
		"coordinates" : [
			67.71,
			33.9391
		]
	},
	"date" : ISODate("2020-01-22T00:00:00Z"),
	"confirmed" : 0,
	"deaths" : 0,
	"recovered" : 0
}

Collection us_only

This collection contains the equivalent of what is in the 2 documents:

  • time_series_covid19_confirmed_US.csv
  • time_series_covid19_deaths_US.csv

Each document is also joined with its associated line from the UID_ISO_FIPS_LookUp_Table.csv file.

In this collection we have nb_entries(time_series_covid19_confirmed_US.csv) * number_days documents.

Here is an example document:

{
	"_id" : ObjectId("5ea45f2b8049cddb8cfa9912"),
	"uid" : 16,
	"country_iso2" : "AS",
	"country_iso3" : "ASM",
	"country_code" : 16,
	"fips" : 60,
	"state" : "American Samoa",
	"country" : "US",
	"combined_name" : "American Samoa, US",
	"population" : 55641,
	"loc" : {
		"type" : "Point",
		"coordinates" : [
			-170.132,
			-14.271
		]
	},
	"date" : ISODate("2020-01-22T00:00:00Z"),
	"confirmed" : 0,
	"deaths" : 0
}

Note: JHU does not provide recovered data for the US files. It's currently only available in the global files.

Collection countries_summary

This collection is calculated using the data from the global collection. It's the same collection but the countries are grouped together into a single document for each date.

So in this collection, you will find nb_countries * nb_days documents.

Also, because all the states are grouped into a single doc for each countries, some fields are arrays in this collection.

You can see the detailed aggregation pipeline in the 2-smart-insert.py file in the data-import folder.

Here is an example for France:

{
	"_id" : ObjectId("5eb1e2bdeb5fc5a3a38a33f8"),
	"uids" : [ 312, 175, 638, 666, 258, 250, 663, 254, 652, 540, 474 ],
	"confirmed" : 169583,
	"deaths" : 25204,
	"country" : "France",
	"date" : ISODate("2020-05-04T00:00:00Z"),
	"country_iso2s" : [ "GF", "MF", "PF", "YT", "GP", "RE", "PM", "MQ", "FR", "BL", "NC" ],
	"country_iso3s" : [ "SPM", "NCL", "REU", "BLM", "MAF", "MYT", "MTQ", "PYF", "GLP", "FRA", "GUF" ],
	"country_codes" : [ 652, 474, 666, 175, 250, 258, 254, 663, 312, 638, 540 ],
	"combined_names" : [
		"Saint Pierre and Miquelon, France",
		"Guadeloupe, France",
		"Reunion, France",
		"New Caledonia, France",
		"Saint Barthelemy, France",
		"France",
		"Martinique, France",
		"Mayotte, France",
		"French Polynesia, France",
		"French Guiana, France",
		"St Martin, France"
	],
	"population" : 298682,
	"recovered" : 51476,
	"states" : [
		"French Guiana",
		"French Polynesia",
		"Guadeloupe",
		"Mayotte",
		"New Caledonia",
		"Reunion",
		"Saint Barthelemy",
		"St Martin",
		"Martinique",
		"Saint Pierre and Miquelon"
	]
}

Collection global_and_us

This collection is the most complete collection in this database. This collection basically contains all the documents from the collections:

  • global
  • us_only

But with a little trick on top: the US cases are counted in both collections respectively:

  • at a country level in the first one,
  • and in a more detailed level (county and state) in the second one.

So to take advantages of both collections, I just removed the confirmed and deaths counts from the US documents which comes from the global collection.

This allow me to keep track of the recovered cases in the US while also keeping track of the confirmed and deaths cases at a more detailed level. This is really the best we can do here because JHU don't reported recovered cases at a detailed level for the US.

With this trick, the count for confirmed, deaths and recovered persons for a given date is correct.

This is the collection I'm using to build my charts in my charts blog posts:

The documents in this collection are exactly the same than in the collections mentioned above.

  • For a document that comes from the global collection:
{
	"_id" : ObjectId("5ea49768865a48ecca6d5ccb"),
	"uid" : 250,
	"country_iso2" : "FR",
	"country_iso3" : "FRA",
	"country_code" : 250,
	"country" : "France",
	"combined_name" : "France",
	"population" : 65273512,
	"loc" : {
		"type" : "Point",
		"coordinates" : [
			2.2137,
			46.2276
		]
	},
	"date" : ISODate("2020-04-24T00:00:00Z"),
	"confirmed" : 158636,
	"deaths" : 22245,
	"recovered" : 43493
}
  • For a document that comes from the us_only collection:
{
	"_id" : ObjectId("5ea4976b865a48ecca70df4d"),
	"uid" : 84042101,
	"country_iso2" : "US",
	"country_iso3" : "USA",
	"country_code" : 840,
	"fips" : 42101,
	"county" : "Philadelphia",
	"state" : "Pennsylvania",
	"country" : "US",
	"combined_name" : "Philadelphia, Pennsylvania, US",
	"population" : 1584064,
	"loc" : {
		"type" : "Point",
		"coordinates" : [
			-75.1379,
			40.0034
		]
	},
	"date" : ISODate("2020-04-24T00:00:00Z"),
	"confirmed" : 11877,
	"deaths" : 449
}
  • For the special document that comes from the global collection and represents the US entire country (the one with the trick):
{
	"_id" : ObjectId("5ea49768865a48ecca6d84d1"),
	"uid" : 840,
	"country_iso2" : "US",
	"country_iso3" : "USA",
	"country_code" : 840,
	"country" : "US",
	"combined_name" : "US",
	"population" : 329466283,
	"loc" : {
		"type" : "Point",
		"coordinates" : [
			-100,
			40
		]
	},
	"date" : ISODate("2020-04-24T00:00:00Z"),
	"recovered" : 99079
}

Just pay attention: the documents which come from the US detailed source don't have a recovered field because JHU doesn't provide this data. Only the documents (one for each date) that represents the US at a country level contain the recovered field.

Also JHU doesn't provide the recovered count for all the countries and states in the global files.

In this collection, we have (nb_entries(time_series_covid19_confirmed_global.csv) + nb_entries(time_series_covid19_confirmed_global.csv)) * number_days.

Dabatase covid19jhu

This database contains the raw CSV files imported with the mongoimport tool.

This database is updated by the 1-mongoimport-everything.sh in the data-import folder.

This script imports all the files matching these names with wildcards:

  • jhu/csse_covid_19_data/csse_covid_19_daily_reports/*.csv
  • jhu/csse_covid_19_data/csse_covid_19_daily_reports_us/*.csv
  • jhu/csse_covid_19_data/csse_covid_19_time_series/*.csv
  • jhu/csse_covid_19_data/UID_ISO_FIPS_LookUp_Table.csv

And it creates these collections:

  • Errata
  • UID_ISO_FIPS_LookUp_Table
  • daily
  • daily_us
  • time_series_covid19_confirmed_US
  • time_series_covid19_confirmed_global
  • time_series_covid19_deaths_US
  • time_series_covid19_deaths_global
  • time_series_covid19_recovered_global

Note: These collections are not clean and the schema designs are not great to work with because they come from raw and flat CSV files but at least they contain exactly what JHU is delivering.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].