All Projects → dathere → covid19-time-series-utilities

dathere / covid19-time-series-utilities

Licence: CC-BY-SA-4.0 license
several utilities to help wrangle COVID-19 data into a time-series format

Programming Languages

shell
77523 projects
PLpgSQL
1095 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to covid19-time-series-utilities

luftdatenpumpe
Process live and historical data from luftdaten.info, IRCELINE and OpenAQ. Filter by station-id, sensor-id and sensor-type, apply reverse geocoding, store into timeseries and RDBMS databases, publish to MQTT, output as JSON or visualize in Grafana.
Stars: ✭ 22 (-35.29%)
Mutual labels:  time-series, opendata
Covid19arData
Data COVID-19 Argentina actualizada y en formatos abiertos.
Stars: ✭ 51 (+50%)
Mutual labels:  opendata, covid-19
coronavirus-dresden
Collects official SARS-CoV-2 infection statistics published by the city of Dresden.
Stars: ✭ 19 (-44.12%)
Mutual labels:  opendata, covid-19
covid19gr
Open Data Aggregation & Knowledge Base Repository for the evolution of the SARS-COV-2 pandemic in Greece.
Stars: ✭ 21 (-38.24%)
Mutual labels:  opendata, covid-19
CoronaDash
COVID-19 spread shiny dashboard with a forecasting model, countries' trajectories graphs, and cluster analysis tools
Stars: ✭ 20 (-41.18%)
Mutual labels:  time-series, covid-19
COVID-19-AI
Collection of AI resources to fight against Coronavirus (COVID-19)
Stars: ✭ 25 (-26.47%)
Mutual labels:  opendata, covid-19
covid19-datasets
A list of high quality open datasets for COVID-19 data analysis
Stars: ✭ 56 (+64.71%)
Mutual labels:  opendata, covid-19
COVID19
Using Kalman Filter to Predict Corona Virus Spread
Stars: ✭ 78 (+129.41%)
Mutual labels:  time-series, covid-19
covid-france
Visualize evolution of the number of people hospitalized in French departments due to COVID-19 infection
Stars: ✭ 23 (-32.35%)
Mutual labels:  opendata, covid-19
pyRiemann
Python machine learning package based on sklearn API for multivariate data processing and statistical analysis of symmetric positive definite matrices via Riemannian geometry
Stars: ✭ 470 (+1282.35%)
Mutual labels:  time-series
covid-19
A web application to display Coronavirus Diseases (COVID19) statistics from different countries.
Stars: ✭ 28 (-17.65%)
Mutual labels:  covid-19
websegura
Analizamos y mostramos seguridad HTTPS de sitios web públicos, como medida para visualizar aquellos que pueden suponer un riesgo para sus usuarios.
Stars: ✭ 27 (-20.59%)
Mutual labels:  opendata
Covidview
A complete COVID-19 tracker cum dashboard website made by me.
Stars: ✭ 24 (-29.41%)
Mutual labels:  covid-19
rcvr-app
recover provides localities a privacy-compliant, safe, and easy way for their guests to check in. See https://www.recoverapp.de/ for more details.
Stars: ✭ 43 (+26.47%)
Mutual labels:  covid-19
COVID19
A web app to display the live graphical state-wise reported corona cases in India so far. It also shows the latest news for COVID-19. Stay Home, Stay Safe!
Stars: ✭ 122 (+258.82%)
Mutual labels:  covid-19
NYState-COVID-19-Tracker
COVID-19 positive cases tracker for New York State.
Stars: ✭ 14 (-58.82%)
Mutual labels:  covid-19
wv
⏰ This R package provides the tools to perform standard and robust wavelet variance analysis for time series (signal processing). Among others, aside from computing the wavelet variance and cross-covariance (classic and robust), the package provides inference tools (e.g. confidence intervals) and plotting tools allowing to perform some visual an…
Stars: ✭ 14 (-58.82%)
Mutual labels:  time-series
corona cases
🦠 Coronavirus Information on Telegram Chatbot
Stars: ✭ 19 (-44.12%)
Mutual labels:  covid-19
coronainfobd
Real-time corona-virus tracker of Bangladesh 🇧🇩 which includes latest updates, data visualization, public awareness from WHO and some advice to aware people. 🥰❤
Stars: ✭ 46 (+35.29%)
Mutual labels:  covid-19
PhoNER COVID19
COVID-19 Named Entity Recognition for Vietnamese (NAACL 2021)
Stars: ✭ 55 (+61.76%)
Mutual labels:  covid-19

COVID-19 - time-series utilities

This repo contains several utilities for wrangling COVID-19 data from the John Hopkins University COVID-19 repository.

NOTE: The utilities currently do not work because of the new file formats. They will be updated shortly to work with the revised formats.

Requirements

Cloning

A note on cloning this repo, since the COVID19 directory is a git submodule:

  • after cloning, you must initiate the submodule. In the top level directory for the project, run git submodule init and git submodule update to clone the JHU Repo as a submodule

Content

The files in this directory and how they're used:

  • covid-19_ingest.sh: script that converts the JHU COVID-19 daily-report data to a time-series database using TimescaleDB.
  • covid-refine: OpenRefine automation script that converts JHU COVID-19 time-series data into a normalized, enriched format and uploads it to TimescaleDB. (RECOMMENDED)
  • schema.sql: Data definition (DDL) to create the necessary tables & hypertables.
  • environment: Default environment values used in Docker containers.

Using the Timescale covid19-ingest script

  1. Create a TimescaleDB instance - download or signup
  2. Create a database named covid_19, and an application user covid19_user
  psql
  create database covid_19;
  create user covid19_user WITH PASSWORD 'your-password-here';
  alter database covid_19 OWNER TO covid19_user;
  \quit
  1. Run schema.sql as the covid19_user. VACUUM/ANALYZE require owner privs

    psql -U covid19_user -h <the.server.hostname> -f schema.sql covid_19

  2. Install csvkit

    • Ubuntu: sudo apt-get install csvkit
    • MacOS: Using homebrew run brew install csvkit
  3. Using a text editor, replace the environment variables for PGHOST, PGUSER and PGPASSWORD in covid-19_ingest.sh

  4. Run the script

    bash covid-19_ingest.sh

  5. (OPTIONAL) add shell script to crontab to run daily

  6. Be able to slice-and-dice the data using the full power of PostgreSQL along with Timescale's time-series capabilities!

Using COVIDrefine

NOTE: Due to the changing file format of JHU's daily report data, covid-refine is recommended over covid-19_ingest.sh. COVIDrefine has the added benefit of producing fully normalized, non-sparse, geo-enriched data.

See the detailed README.

If you just want to download the COVIDrefine data, the latest version can be found here.

Using docker-compose

  1. Remember initiate the submodule, run git submodule init
  2. Run docker-compose build
  3. Run docker-compose up
  4. That's all. You can go to Swagger or PostgREST

NOTES

  • the JHU COVID-19 repository is a git submodule. This was done to automate getting the latest data from their repo.
  • the script will only work in *nix environment (Linux, Unix, MacOS)
  • both scripts maintain a hidden directory called ~/.covid-19 in your home directory. -covid-19_ingest.sh checkslastcsvprocessed. Delete that file to process all daily-report files from the beginning, or change the date in the file to start processing files AFTER the entered date.

TODO

  • use postgREST to add a REST API in front of TimescaleDB database
  • create a Grafana dashboard
  • create a Carto visualization
  • create a Superset visualization

ACKNOWLEDGEMENTS

  • thanks to Avtar Sewrathan (@avthars), Prashant Sridharan (@CoolAssPuppy) and Mike Freedman (@michaelfreedman) at Timescale for their help & support to implement this project from idea to implementation in 5 days!
  • thanks to Julian Simioni (@orangejulius) at Geocode.earth for allowing us to use the Geocode.earth API!

Shield: CC BY-SA 4.0

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

CC BY-SA 4.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].