All Projects → github → Covid 19 Repo Data

github / Covid 19 Repo Data

Licence: cc0-1.0
Data archive of identifiable COVID-19 related public projects on GitHub

Projects that are alternatives of or similar to Covid 19 Repo Data

Openml R
R package to interface with OpenML
Stars: ✭ 81 (-65.68%)
Mutual labels:  open-data, dataset
Atsd Use Cases
Axibase Time Series Database: Usage Examples and Research Articles
Stars: ✭ 335 (+41.95%)
Mutual labels:  open-data, dataset
Adresse.data.gouv.fr
Le site officiel de l'Adresse
Stars: ✭ 117 (-50.42%)
Mutual labels:  open-data, dataset
Geodata Br
Free open public domain geographic data of Brazil available in multiple languages and formats.
Stars: ✭ 57 (-75.85%)
Mutual labels:  open-data, dataset
Fma
FMA: A Dataset For Music Analysis
Stars: ✭ 1,391 (+489.41%)
Mutual labels:  open-data, dataset
Crypto
Cryptocurrency Historical Market Data R Package
Stars: ✭ 112 (-52.54%)
Mutual labels:  open-data, dataset
Awesome Italian Public Datasets
A selection of interesting Open dataset from the Italian Public Administration and Civic Data use cases
Stars: ✭ 132 (-44.07%)
Mutual labels:  open-data, dataset
Datatable
A go in-memory table
Stars: ✭ 215 (-8.9%)
Mutual labels:  dataset
Vehicle reid Collection
🚗 the collection of vehicle re-ID papers, datasets. 🚗
Stars: ✭ 225 (-4.66%)
Mutual labels:  dataset
Dialogrpt
EMNLP 2020: "Dialogue Response Ranking Training with Large-Scale Human Feedback Data"
Stars: ✭ 216 (-8.47%)
Mutual labels:  dataset
Ava downloader
⏬ Download AVA dataset (A Large-Scale Database for Aesthetic Visual Analysis)
Stars: ✭ 214 (-9.32%)
Mutual labels:  dataset
Bccd dataset
BCCD (Blood Cell Count and Detection) Dataset is a small-scale dataset for blood cells detection.
Stars: ✭ 216 (-8.47%)
Mutual labels:  dataset
Stocknet Dataset
A comprehensive dataset for stock movement prediction from tweets and historical stock prices.
Stars: ✭ 228 (-3.39%)
Mutual labels:  dataset
Dataset Serialize
JSON to DataSet and DataSet to JSON converter for Delphi and Lazarus (FPC)
Stars: ✭ 213 (-9.75%)
Mutual labels:  dataset
Datalad
Keep code, data, containers under control with git and git-annex
Stars: ✭ 234 (-0.85%)
Mutual labels:  dataset
Short Jokes Dataset
Python scripts for building 'Short Jokes' dataset, featured on Kaggle
Stars: ✭ 215 (-8.9%)
Mutual labels:  dataset
Img2poem
Stars: ✭ 238 (+0.85%)
Mutual labels:  dataset
Datasets
source{d} datasets ("big code") for source code analysis and machine learning on source code
Stars: ✭ 231 (-2.12%)
Mutual labels:  dataset
Stationary
Get hourly meteorological data from one of thousands of global stations
Stars: ✭ 225 (-4.66%)
Mutual labels:  dataset
Automated Resume Screening System
Automated Resume Screening System using Machine Learning (With Dataset)
Stars: ✭ 224 (-5.08%)
Mutual labels:  dataset

Actions Status

COVID-19 Public Repository Data

A comprehensive versioned dataset of the repositories and relevant related metadata about public projects hosted on GitHub related to the 2019 Novel Coronavirus and associated COVID-19 disease.

For a view of the latest projects, see the covid-19 topic on GitHub. To preview and interact with the data provided, see the subsection below.

Why is GitHub doing this?

We have received a number of enquiries from researchers and the community surrounding open collaboration on projects on the platform related to the disease COVID-19 caused by the SARS-CoV-2 virus. Many projects, ordered by star count, can be found using the covid-19 topic on GitHub, however, discovery of other important projects is difficult due to differences in the way users self identify their work. There are some great awesomelists such as https://github.com/soroushchehresa/awesome-coronavirus documenting useful projects but they are not time versioned.

As this is such an important topic to many people at this time, we've decided to do regular, versioned, extracts of data from our systems and make them available to researchers under an open license to allow for deeper analysis of these public projects from teams outside of GitHub.

If you have created any interesting research based on this data we would love to hear about it so that we can help ensure it becomes more prominently featured. Please open a PR against the file USER_SUBMISSIONS.md with a link to your research. We are especially interested in highlighting the most promising and impactful projects in need of community help and support.

Open data

Open source is bigger than any company or community. The dataset is released under CC0-1.0 for anyone to use and learn from.

There are two main sets of files, released via TSV and json formats for public consumption in the directory data/. A comprehensive data dictionary that explains the contents of these files is here. The files are sorted in descending order by the count of distinct contributors at the time of extract.

The files have been versioned based on a weekly snapshot of identified repositories from the week of 2020-01-20 onward.

We will update this repository with new data files on a monthly basis, generally on the first Tuesday of a month. We will revisit this each month and provide an update on continuing this commitment.

Identification methodology

Rather than relying on any one GitHub topic to identify potential COVID-19 related projects, the data set is produced using a more comprehensive set of search criteria to identify projects likely to be COVID-19 related.

Note: This has the potential to include a small number of false positives however we figured we were better to cast a wide net and allow consumers of the data to perform additional cleaning if they desire.

Furthermore, since this data is versioned based on the week the repo was initially created, there may exist data that are included for repos that were originally public that have been made private and are currently inaccessible.

The following parts of public metadata are currently being used to identify public projects (those licensed and not) as COVID-19 related:

  • The repo's description
  • The name of the repo
  • The topics associated with the repo
  • The organization bio description where that exists

Search terms against these metadata include variations of: covid, coronavirus, ncov and sars-cov-2

License

The data and associated documentation in this repo are open data released under the very permissive CC0-1.0 public domain dedication. However, please understand:

  • Third party rights:
    • Each project referenced is licensed under their own terms (see the license_name field in the extract, and visit individual project repositories for details).
    • Users or others may have rights to user-provided data such as repository, organization, and user names and descriptions.
    • If you're unsure about your right to use any user-provided data or material from referenced projects, it's up to you to verify your rights.
  • Open data norms:
    • If you use this dataset in a publication, a link to or citation of this repository would be appreciated.
    • If you extend this dataset, sharing your additions as open data would also be appreciated.
  • If you use this dataset as a starting point for further research which involves accessing and using additional GitHub data, you will need to abide by our privacy statement and related terms.
  • CC0-1.0 does not grant any trademark permissions. GitHub® and its stylized versions and the Invertocat mark are GitHub's Trademarks or registered Trademarks. When using GitHub's logos, be sure to follow the GitHub logo guidelines.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].