All Projects → jgehrcke → Covid 19 Germany Gae

jgehrcke / Covid 19 Germany Gae

Licence: mit
COVID-19 statistics for Germany. For states and counties. With time series data. Daily updates. Official RKI numbers.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Covid 19 Germany Gae

WikiChron
Data visualization tool for wikis evolution
Stars: ✭ 19 (-83.33%)
Mutual labels:  time-series, history
laravel-quasar
⏰📊✨Laravel Time Series - Provides an API to create and maintain data projections (statistics, aggregates, etc.) from your Eloquent models, and convert them to time series.
Stars: ✭ 78 (-31.58%)
Mutual labels:  time-series, timeline
Timeline
直观地显示各个历史时间段及历史地图。Visually display various historical time periods and historical maps.
Stars: ✭ 127 (+11.4%)
Mutual labels:  timeline, history
Symbolic Execution
History of symbolic execution (as well as SAT/SMT solving, fuzzing, and taint data tracking)
Stars: ✭ 395 (+246.49%)
Mutual labels:  timeline, history
Mdline
Markdown timeline format and toolkit.
Stars: ✭ 111 (-2.63%)
Mutual labels:  timeline, history
TTTTRPG
Timeline Tree of Tabletop Role-Playing Games, celebrating more than 40 years game design innovations
Stars: ✭ 34 (-70.18%)
Mutual labels:  timeline, history
Covid 19 Timeline
以 社会学年鉴模式体例规范地统编自2019年末起新冠肺炎疫情进展的时间线。
Stars: ✭ 1,887 (+1555.26%)
Mutual labels:  timeline, history
timeline-component-lwc
This component enables timeline view for Salesforce Record history.
Stars: ✭ 18 (-84.21%)
Mutual labels:  timeline, history
bitcoin-development-history
Data and a example for a open source timeline of the history of Bitcoin development
Stars: ✭ 27 (-76.32%)
Mutual labels:  timeline, history
Grand Timeline
Interactive grand unified timeline of 30,800 ancient Chinese people / 古人全表
Stars: ✭ 83 (-27.19%)
Mutual labels:  timeline, history
Griddb
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.
Stars: ✭ 1,587 (+1292.11%)
Mutual labels:  time-series
Carbon
Carbon is one of the components of Graphite, and is responsible for receiving metrics over the network and writing them down to disk using a storage backend.
Stars: ✭ 1,435 (+1158.77%)
Mutual labels:  time-series
Tsmoothie
A python library for time-series smoothing and outlier detection in a vectorized way.
Stars: ✭ 109 (-4.39%)
Mutual labels:  time-series
Cutbox
CutBox makes your macOS pasteboard awesome.
Stars: ✭ 112 (-1.75%)
Mutual labels:  history
Time Series Forecasting With Python
A use-case focused tutorial for time series forecasting with python
Stars: ✭ 105 (-7.89%)
Mutual labels:  time-series
Pyrate
A Python tool for estimating velocity and time-series from Interferometric Synthetic Aperture Radar (InSAR) data.
Stars: ✭ 110 (-3.51%)
Mutual labels:  time-series
Hyperapp Fx
Effects for use with Hyperapp
Stars: ✭ 105 (-7.89%)
Mutual labels:  history
Dmm
Deep Markov Models
Stars: ✭ 103 (-9.65%)
Mutual labels:  time-series
Nnet Ts
Neural network architecture for time series forecasting.
Stars: ✭ 103 (-9.65%)
Mutual labels:  time-series
Fast
End-to-end earthquake detection pipeline via efficient time series similarity search
Stars: ✭ 114 (+0%)
Mutual labels:  time-series

COVID-19 case numbers for Germany 😷

🇩🇪 Übersicht

(see below for an English version)

  • COVID-19 Fallzahlen für Bundesländer und Landkreise.
  • Mehrfach täglich automatisiert aktualisiert.
  • Mit Zeitreihen (inkl. 7-Tage-Inzidenz-Zeitreihen).
  • Aktuelle Einwohnerzahlen und GeoJSON-Daten, mit transparenten Quellen.
  • Präzise maschinenlesbare CSV-Dateien. Zeitstempel in ISO 8601-Notation, Spaltennamen nutzen u.a. ISO 3166 country codes.
  • Zwei verschiedene Perspektiven:

🇺🇸 Overview

  • Historical (time series) data for individual Bundesländer and Landkreise (states and counties).
  • Automatic updates, multiple times per day.
  • 7-day incidence time series (so that you don't need to compute those).
  • Population data and GeoJSON data, with transparent references and code for reproduction.
  • Provided through machine-readable (CSV) files: timestamps are encoded using ISO 8601 time string notation. Column names use the ISO 3166 notation for individual states.
  • Two perspectives on the historical evolution:
    • Official RKI time series data, based on an ArcGIS HTTP API (docs) provided by the Esri COVID-19 GeoHub Deutschland. These time series are being re-written as data gets better over time (accounting for delay in reporting etc), and provide a credible, curated view into the past weeks and months.
    • Time series data provided by the Risklayer GmbH-coordinated crowdsourcing effort (the foundation for what various German newspapers and TV channels show on a daily basis, such as the ZDF but also the foundation for what the JHU) publishes about Germany.

Contact, questions, contributions

You probably have a number of questions. Just as I had (and still have). Your feedback, your contributions, and your questions are highly appreciated! Please use the GitHub issue tracker (preferred) or contact me via mail. For updates, you can also follow me on Twitter: @gehrcke.

Plots

Note that these plots are updated multiple times per day. Feel free to hotlink them.

Note: there is a systematic difference between the RKI data-based death rate curve and the Risklayer-based death rate curve. Both curves are wrong, and yet both curves are legit. The incidents of death that we learn about today may have happened days or weeks in the past. Neither curve attempts to show the exact time of death (sadly! :-)) The RKI curve, in fact, is based on the point in time when each corresponding COVID-19 case that led to death was registered in the first place ("Meldedatum" of the corresponding case). The Risklayer data set to my knowledge pretends as if the incidents of death we learn about today happened yesterday. While this is not true, the resulting curve is a little more intuitive. Despite its limitations, the Risklayer data set is the best view on the "current" evolution of deaths that we have.

The individual data files

  • RKI data (most credible view into the past): time series data provided by the Robert Koch-Institut (updated daily):
    • cases-rki-by-ags.csv and deaths-rki-by-ags.csv: per-Landkreis time series
    • cases-rki-by-state.csv and deaths-rki-by-state.csv: per-Bundesland time series
    • 7-day incidence time series resolved by county based on RKI data can be found in more-data/.
    • This is the only data source that rigorously accounts for Meldeverzug (reporting delay). The historical evolution of data points in these files is updated daily based on a (less accessible) RKI ArcGIS system. These time series see amendments weeks and months into the past as data gets better over time. This data source has its strength in the past, but it often does not yet reflect the latest from today and yesterday.
  • Crowdsourcing data (fresh view into the last 1-2 days): Risklayer GmbH crowdsource effort (see "Attribution" below):
  • ags.json:
    • for translating "amtlicher Gemeindeschlüssel" (AGS) to Landreis/Bundesland details, including latitude and longitude.
    • containing per-county population data (see pull/383 for details).
  • JSON endpoint /now: Germany's total case count (updated in real time, always fresh, for the sensationalists) -- Update Feb 2021: the HTTP API was disabled.
  • data.csv: history, mixed data source based on RKI/ZEIT ONLINE. This did power the per-Bundesland time series exposed by the HTTP JSON API up until Jan 2021.

How is this data set different from others?

  • It includes historical data for individual Bundesländer and Landkreise (states and counties).
  • Its time series data is being re-written as data gets better over time. This is based on official RKI-provided time series data which receives daily updates even for days weeks in the past (accounting for delay in reporting).

CSV file details

Focus: predictable/robust machine readability. Backwards-compatibility (columns get added; but have never been removed so far).

  • The column names use the ISO 3166 code for individual states.
  • The points in time are encoded using localized ISO 8601 time string notation.

Note that the numbers for "today" as presented in media often actually refer to the last known state of data on the evening before. To address this ambiguity, the sample timestamps in the CSV files presented in this repository contain the time of the day (and not just the day). With that, consumers can have a vague impression about whether the sample represents the state in the morning or evening -- a common confusion / ambiguity with other data sets.

The recovered metric is not presented because it is rather blurry. Feel free to consume it from other sources!

Quality data sources published by Bundesländer

I tried to discover these step-by-step, they are possibly underrated (April 2020, minor updates towards the end of 2020):

Further resources

HTTP API details

Update Feb 2021: I disabled the HTTP API. It's best to directly use the data files from this respository.

What you should know before reading these numbers

Please question the conclusiveness of these numbers. Some directions along which you may want to think:

  • Germany seems to perform a large number of tests. But think about how much insight you actually have into how the testing rate (and its spatial distribution) evolves over time. In my opinion, one absolutely should know a whole lot about the testing effort itself before drawing conclusions from the time evolution of case count numbers.
  • Each confirmed case is implicitly associated with a reporting date. We do not know for sure how that reporting date relates to the date of taking the sample.
  • We believe that each "confirmed case" actually corresponds to a polymerase chain reaction (PCR) test for the SARS-CoV2 virus with a positive outcome. Well, I think that's true, we can have that much trust into the system.
  • We seem to believe that the change of the number of confirmed COVID-19 cases over time is somewhat expressive: but what does it shed light on, exactly? The amount of testing performed, and its spatial coverage? The efficiency with which the virus spreads through the population ("basic reproduction number")? The actual, absolute number of people infected? The virus' potential to exhibit COVID-19 in an infected human body?

If you keep these (and more) ambiguities and questions in mind then I think you are ready to look at these numbers and their time evolution :-) 😷.

Thoughts about reporting delays

In Germany, every step along the chain of reporting (Meldekette) introduces a noticeable delay. This is not necessary, but sadly the current state of affairs. The Robert Koch-Institut (RKI) seems to be working on a more modern reporting system that might mitigate some of these delays along the Meldekette in the future. Until then, it is fair to assume that case numbers published by RKI have 1-2 days delay over the case numbers published by Landkreise, which themselves have an unknown lag relative to the physical tests. In some cases, the Meldekette might even be entirely disrupted, as discussed in this SPIEGEL article (German). Also see this discussion.

Wishlist: every case should be tracked with its own time line, and transparently change state over time. The individual cases (and their time lines) should be aggregated on a country-wide level, anonymously, and get published in almost real time, through an official, structured data source, free to consume for everyone.

Attributions

Beginning of March 2020: shout-out to ZEIT ONLINE for continuously collecting and publishing the state-level data with little delay.

Edit March 21, 2020: Notably, by now the Berliner Morgenpost seems to do an equally well job of quickly aggregating the state-level data. We are using that in here, too. Thanks!

Edit March 26, 2020: Risklayer is coordinating a crowd-sourcing effort to process verified Landkreis data as quickly as possible. Tagesspiegel is verifying this effort and using it in their overview page. As far as I can tell this is so far the most transparent data flow, and also the fastest, getting us the freshest case count numbers. Great work!

Edit December 13, 2020: for the *-rl-crowdsource*.csv files proper legal attribution goes to

Risklayer GmbH (www.risklayer.com) and Center for Disaster Management and Risk Reduction Technology (CEDIM) at Karlsruhe Institute of Technology (KIT) and the Risklayer-CEDIM-Tagesspiegel SARS-CoV-2 Crowdsourcing Contributors

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].