All Projects β†’ corona-zahlen-landkreis β†’ corona_landkreis_fallzahlen_scraping

corona-zahlen-landkreis / corona_landkreis_fallzahlen_scraping

Licence: GPL-3.0 license
Scraping Germany's local districts websites for newer corona-case-numbers!

Programming Languages

python
139335 projects - #7 most used programming language
javascript
184084 projects - #8 most used programming language
PHP
23972 projects - #3 most used programming language
Vue
7211 projects
shell
77523 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to corona landkreis fallzahlen scraping

coronainfobd
Real-time corona-virus tracker of Bangladesh πŸ‡§πŸ‡© which includes latest updates, data visualization, public awareness from WHO and some advice to aware people. πŸ₯°β€
Stars: ✭ 46 (+170.59%)
Mutual labels:  covid-19, covid19, covid19-data
print4health
print4health.org
Stars: ✭ 11 (-35.29%)
Mutual labels:  wirvsvirus, wirvsvirus2020, wirvsvirushack
covid-19
An app made with Flutter to track COVID-19 case counts.
Stars: ✭ 47 (+176.47%)
Mutual labels:  covid-19, covid19, covid19-data
COVID19
A web app to display the live graphical state-wise reported corona cases in India so far. It also shows the latest news for COVID-19. Stay Home, Stay Safe!
Stars: ✭ 122 (+617.65%)
Mutual labels:  covid-19, covid19, covid19-data
coviddata
Daily COVID-19 statistics by country, region, and city
Stars: ✭ 49 (+188.24%)
Mutual labels:  covid-19, covid19, covid19-data
covid19-timeseries
Covid19 timeseries data store
Stars: ✭ 38 (+123.53%)
Mutual labels:  covid-19, covid19, covid19-data
covid19-datasets
A list of high quality open datasets for COVID-19 data analysis
Stars: ✭ 56 (+229.41%)
Mutual labels:  covid-19, covid19, covid19-data
cl-covid19
Explore COVID-19 data with Common Lisp, gnuplot, SQL and Grafana
Stars: ✭ 51 (+200%)
Mutual labels:  covid-19, covid19, covid19-data
covid19-pr-api
COVID-19 Open API for Datasets in Puerto Rico
Stars: ✭ 21 (+23.53%)
Mutual labels:  covid-19, covid19, covid19-data
coronavirus-data
This repository contains data on Coronavirus Disease 2019 (COVID-19) in New York City (NYC), from the NYC Department of Health and Mental Hygiene.
Stars: ✭ 926 (+5347.06%)
Mutual labels:  covid-19, covid19, covid19-data
open-data-covid-19
Open Data Repository for the Covid-19 dataset.
Stars: ✭ 19 (+11.76%)
Mutual labels:  covid-19, covid19, covid19-data
covid19-stream-processors
Stream Information & Example Applications for Processing JHU and CovidTracking.com COVID-19 data available as streams over Solace
Stars: ✭ 35 (+105.88%)
Mutual labels:  covid-19, covid19, covid19-data
COVID-19-DETECTION
Detect Covid-19 with Chest X-Ray Data
Stars: ✭ 43 (+152.94%)
Mutual labels:  covid-19, covid19, covid19-data
PhoNER COVID19
COVID-19 Named Entity Recognition for Vietnamese (NAACL 2021)
Stars: ✭ 55 (+223.53%)
Mutual labels:  covid-19, covid19, covid19-data
covid-dashboard
Help welcomed if you have expertise in public health web technology, data modeling and munging, or visualization.
Stars: ✭ 106 (+523.53%)
Mutual labels:  covid-19, covid19, covid19-data
covid-19-image-repository
Anonymized dataset of COVID-19 cases with a focus on radiological imaging. This includes images (x-ray / ct) with extensive metadata, such as admission-, ICU-, laboratory-, and patient master-data.
Stars: ✭ 42 (+147.06%)
Mutual labels:  covid-19, covid19, covid19-data
COVID-19-Datasets
Novel Coronavirus (COVID-19) Cases for India, provided by University of Kalyani.
Stars: ✭ 19 (+11.76%)
Mutual labels:  covid-19, covid19, covid19-data
covid19-florida
Florida COVID19 Data parsed from Florida DOH Dashboard and PDF reports
Stars: ✭ 32 (+88.24%)
Mutual labels:  covid-19, covid19, covid19-data
data
Collecting and organising COVID-19 data for Slovenia as they come in from various sources
Stars: ✭ 20 (+17.65%)
Mutual labels:  covid-19, covid19, covid19-data
covid19africa
Africa open COVID-19 data working group
Stars: ✭ 47 (+176.47%)
Mutual labels:  covid-19, covid19, covid19-data

Web-crawler for Corona case numbers in germany (county or sub-county level)

69 Landkreise/Stadtkreis/Kreisfreihe StΓ€dte are currently supported.

6 of those also have sub-county (Stadt/Gemeinde) level support.

WE NEED YOUR HELP πŸ™Œ

Inspiration

RKI's and state's data is most of the times multiple days old The official data at the RKI and the federal states are sometimes several days old. What could be more obvious than to retrieve this data directly from the websites of the administrative districts (county)? Then they are "directly at the source" and most up-to-date.

In addition, there is a crowdsource website on which current case numbers per county (including source) can be given. (currently in the works)

Want to participate? No problem!

Search the website with press releases from your district (or neighboring districts) and try to find the following URLs:

  • Check, if we already collect the data and there is a district in landkreise/get-.py!
  • If not: Where are the newest corona-case numbers for your district?
    • For which words do we have to search? (we also take RegEx :-) )
  • Is there a list (RSS-Feed?) of press releases including corona messages? (in case the URL is changing all the time?)
  • Optional: With which URL/search words do we get the case numbers at sub-county level? (might be in the press releases of the district)

How to start the included API service

See API README Website where people can choose a district, enter the current number of cases (including status date) and source. (and a backend which forwards these requests with the possibility to access these data and include it in the data set.)

How to use the web-crawler - scraper

Scraper that also generates community level output:

./run.sh
# or to update a single district / community
# run any of the get-<location>.py files
python3 landkreise/get-soest.py

Newer abstracted scraper (depend on scrape.py):

# SCRAPER debug mode/logging can be enabled as follows
SCRAPER_DEBUG=yes python3 landkreise/get-aachen.py

The result data is saved in landkreise/data/ as CSV files. Each CSV file is named by it's official Id. The districtId and communitId Ids can be looked up in sources.csv OR in JSON location.json (Replacement for sources.csv in master). The district/community unique Ids are the official identifier from https://de.wikipedia.org/wiki/Amtlicher_Gemeindeschl%C3%BCssel.

districts data:

landkreise/data/0<districtId>.csv

community data (aggregates to distrcit - hopefully): landkreise/data/0<districtId>/<communityId>.csv

Requirements:

  • Python3
    • requests (often included)
    • CacheControl[filecache] (persistent cache does not work yet)
      • lockfile
    • beautiful soup 4
    • running all scrapers tqdm

pip3 line:

pip3 install requests bs4 cachecontrol[filecache] lockfile tqdm

Debian/Ubuntu packages:

  sudo apt-get install python3 python3-bs4 python3-cachecontrol python3-lockfile python3-tqdm
  • Makefile for Docker container also exists

Writing / Mainting a scraper

Newer abstracted scraper:

# For all scrapers that are migrted to use scraper.py:
#   request will be cached automatically
#   user-agents will rotate
#   debug mode
#
# SCRAPER debug mode/logging can be enabled as follows
SCRAPER_DEBUG=yes python3 landkreise/get-fulda.py

New scrapers should use scraper.py and use request_url to load URLs. This should cache the website responses and reduce data-transfers. Also the user-agent should rotate at least for every scraper start.

unparsable Landkreise

It would be nice, if you would check these for new data and open a PR!

name website id
Kreis Vorpommern-Greifswald https://www.kreis-vg.de/'/index.php?object=tx%7C3079.14723.1%27 13075

For more, see the project tab!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].