All Projects → pha4ge → SARS-CoV-2-Contextual-Data-Specification

pha4ge / SARS-CoV-2-Contextual-Data-Specification

Licence: other
Collection template and associated materials for SARS-CoV-2 metadata

Projects that are alternatives of or similar to SARS-CoV-2-Contextual-Data-Specification

coronavirus
covid-19 data in J
Stars: ✭ 15 (-42.31%)
Mutual labels:  sars-cov-2
publiccode.yml
A metadata standard for public software
Stars: ✭ 97 (+273.08%)
Mutual labels:  metadata-standard
covid19 scenarios data
Data preprocessing scripts and preprocessed data storage for COVID-19 Scenarios project
Stars: ✭ 43 (+65.38%)
Mutual labels:  sars-cov-2
go-xmp
A native Go SDK for the Extensible Metadata Platform (XMP)
Stars: ✭ 36 (+38.46%)
Mutual labels:  metadata-standard
social-distancing
A simple, decentralized. privacy-reserving contact tracing system
Stars: ✭ 87 (+234.62%)
Mutual labels:  sars-cov-2
Covid-19 Ampel
Covid-19_Ampel | Zeigt neben der 7-Tage-Inzidenz weitere Infos zu einer Region.
Stars: ✭ 33 (+26.92%)
Mutual labels:  sars-cov-2
align covid
Coronavirus time series aligned by number of cases, not date.
Stars: ✭ 22 (-15.38%)
Mutual labels:  sars-cov-2
poreCov
SARS-CoV-2 workflow for nanopore sequence data
Stars: ✭ 34 (+30.77%)
Mutual labels:  sars-cov-2
covid19.analytics
R package to obtain and analyze live data from the nCOVID19 coronavirus
Stars: ✭ 34 (+30.77%)
Mutual labels:  sars-cov-2
usher
Ultrafast Sample Placement on Existing Trees
Stars: ✭ 89 (+242.31%)
Mutual labels:  sars-cov-2
Singan
Official pytorch implementation of the paper: "SinGAN: Learning a Generative Model from a Single Natural Image"
Stars: ✭ 2,983 (+11373.08%)
Mutual labels:  harmonization
SARS-CoV-2-Nowcasting und -R-Schaetzung
Das Nowcasting erstellt eine Schätzung des Verlaufs der Anzahl von bereits erfolgten SARS-CoV-2-Erkrankungsfällen in Deutschland unter Berücksichtigung des Diagnose-, Melde- und Übermittlungsverzugs.
Stars: ✭ 80 (+207.69%)
Mutual labels:  sars-cov-2
COVID19-taiwan
Release COVID-19 (SARS-CoV-2) FDA / NHI drugs screening results.
Stars: ✭ 24 (-7.69%)
Mutual labels:  sars-cov-2
Covid 19 Data
Data on COVID-19 (coronavirus) cases, deaths, hospitalizations, tests • All countries • Updated daily by Our World in Data
Stars: ✭ 4,702 (+17984.62%)
Mutual labels:  sars-cov-2
coronavirus-dresden
Collects official SARS-CoV-2 infection statistics published by the city of Dresden.
Stars: ✭ 19 (-26.92%)
Mutual labels:  sars-cov-2
COVID-19-AI
Collection of AI resources to fight against Coronavirus (COVID-19)
Stars: ✭ 25 (-3.85%)
Mutual labels:  sars-cov-2
ModelCataloguePlugin
MetadataExchange Community Edition
Stars: ✭ 16 (-38.46%)
Mutual labels:  metadata-standard
epidemic-simulator
A HTML/JavaScript simulator for an epidemc on a population
Stars: ✭ 23 (-11.54%)
Mutual labels:  sars-cov-2
RainNet
[CVPR 2021] Region-aware Adaptive Instance Normalization for Image Harmonization
Stars: ✭ 125 (+380.77%)
Mutual labels:  harmonization
corona tracker
COVID-19 tracking app - submission for https://wirvsvirushackathon.org/
Stars: ✭ 13 (-50%)
Mutual labels:  sars-cov-2

SARS-CoV-2 Contextual Data Specification - Collection template and associated materials for SARS-CoV-2 metadata

Cite Us DOI License GitHub release (latest by date) Maintenance

PHA4GE overview

The Public Health Alliance for Genomic Epidemiology (PHA4GE) is a global coalition that is actively working to establish consensus standards, document and share best practices, improve the availability of critical bioinformatic tools and resources, and advocate for greater openness, interoperability, accessibility and reproducibility in public health microbial bioinformatics.

In the face of the current SARS-CoV-2 pandemic, PHA4GE has identified a clear and present need for a fit-for-purpose, open source SARS-CoV-2 contextual data standard. As such, we have developed a SARS-CoV-2 contextual data specification based on harmonizable, publicly available, community standards.

SARS-CoV-2 contextual data specification overview

Public health genomics contextual data includes sample metadata, lab/clinical/epidemiological data, and analysis methods information. Contextual data enables the interpretation of the sequence data, informs decision making for public health responses, and facilitates scientific understanding of infectious disease. Structured and consistent contextual data can also be more easily processed, aggregated, and reused by both humans and computers for different types of analyses.

The SARS-CoV-2 contextual data specification includes a metadata collection template, reference guides, controlled vocabulary, and mapping to existing standards. We also provide a suite of protocols for submitting sequence data and contextual data to public repositories, enabling global interoperability of the data. The specification and all of the supporting materials are freely available and detailed below.

Content description

SARS-CoV-2 contextual data specification package

Spreadsheet-based (.xlsx) collection template

It contains the following items (tabs in the spreadsheet):

  1. a template for populating the complete set of contextual data;

The collection template contains "required" (colour-coded yellow), "strongly recommended" (colour-coded purple) and "optional" (colour-coded white) fields.

  1. guidance for populating the template;

The reference guide aims to facilitate the use of the collection template. It contains field definitions, further guidance/instructions, and examples of structured data.

  1. ontology-mapped controlled vocabulary for the picklists.

Lists of controlled vocabulary, agreed upon by PHA4GE, are provided here for populating the template.

The specification contextual data collection template in machine-amenable JSON format

Due to JSON format limitation, it deviates slightly from the collection template where the "required" (colour-coded yellow) fields are set as required and both the "strongly recommended" (colour-coded purple) and "optional" (colour-coded white) fields are both set as optional.

Standard operating procedure

A Standard Operating Procedure (SOP) containing instructions for using the collection template. This SOP provides users with step-by-step instructions for populating the template, looking up standardized terms, and how best to structure sample descriptions. Also included are a number of ethical, practical, and privacy considerations for data sharing. The SOP is available in protocols.io under the DOI dx.doi.org/10.17504/protocols.io.btpznmp6.

Supporting materials

PHA4GE to WHO and sequence repository field mappings

WHO recommended contextual data mapping to PHA4GE fields

PHA4GE fields are mapped to corresponding contextual data elements recommended by the World Health Organization (World Health Organization. Guidance for surveillance of SARS-CoV-2 variants: interim guidance. WHO/2019-nCoV/surveillance/variants2021.1).

PHA4GE to sequence repository field mappings

A mapping file indicating which PHA4GE fields correspond to which fields within the different repository submission forms is provided to facilitate data transformations for submissions. Field mappings to the following repositories are available in the document:

  • GISAID
  • ENA
  • NCBI

Submission protocols

Submission protocols are available for the following data repositories:

NCBI

The entire collection of submission protocols is available as a workflow at protocols.io under the DOI dx.doi.org/10.17504/protocols.io.bsypnfvn. This workflow includes the following individual protocols:

  1. PHA4GE contextual metadata SOP This SOP provides users with step-by-step instructions for populating the collection template. The SOP is available in protocols.io under the DOI dx.doi.org/10.17504/protocols.io.btpznmp6

  2. Overview of NCBI's submission process and the metadata required Provides an overview of the submission process and includes a brief training video. The protocol is available at protocols.io under the DOI dx.doi.org/10.17504/protocols.io.bsbpnamn.

  3. SARS-CoV-2 NCBI submission protocol: SRA, BioSample, and BioProject Step-by-step instructions for establishing a new NCBI laboratory submission account, creating and linking a new BioProject to an existing umbrella effort, and submitting raw sequence data with assocated metadata to SRA and BioSample. The protocol is available at protocols.io under the DOI dx.doi.org/10.17504/protocols.io.bsypnfvn.

  4. SARS-CoV-2 NCBI consensus submission protocol: GenBank Step-by-step instructions for submitting SARS-CoV-2 consensus sequencing to NCBI GenBank and linking to existing BioProject, BioSamples, and raw data. The protocol is available at protocols.io under the DOI dx.doi.org/10.17504/protocols.io.bid7ka9n.

ENA

The entire collection os submission protocols is available as a workflow at protocols.io under the DOI dx.doi.org/10.17504/protocols.io.buqnnvve.

Three separate protocols are included:

  1. SOP for populating EBI submission templates The protocol is available at protocols.io under the DOI dx.doi.org/10.17504/protocols.io.bh5dj826.

  2. SARS-CoV-2 EBI submission protocol: ENA, BioSample, and BioProject The protocol is available at protocols.io under the DOI dx.doi.org/10.17504/protocols.io.bhwdj7a6.

  3. SARS-CoV2 EBI assembly submission protocol The protocol is available at protocols.io under the DOI dx.doi.org/10.17504/protocols.io.bhwqj7dw.

GISAID

This protocol provides the steps needed to establish a new GISAID submission environment for your laboratory. Once established, this protocol covers genome submission sample metadata to GISAID. The protocol is available in protocols.io under the DOI dx.doi.org/10.17504/protocols.io.bumknu4w.

JSON Specification Generation

The JSON is produced automatically from the csv version of the template using the the script available from SARS-CoV-2-Data-Spec-JSON repository.

Table 1 Terms for SARS-CoV-2 submission template according to the PHA4GE contextual data collection specification in PHA4GE SARS-CoV-2 Standardised Terms

Column Description
Interface Label Column headers in the submission template
Required/Optional Type of requirement according to PHA4GE's template specification. Limited to the values "Optional", "Recommended" and "Required".
Definition Short description for the expected interface label value.
Ontology Ontology ID for the label
Value Type Expected interface label's value type. Expected values: "String", "Int", "Float", "Bioproject_ID", "Biosample_ID", "SRA_ID", "Genbank_ID", "GISAID_ID", "Email", "Date" and "Integer_or_Range".
Example Example for the expected interface label value.
Guidance Detailed description for the expected interface label value.

Contacts

For more information and/or assistance, contact [email protected] or the issue page of this repository.

Citation

Emma J Griffiths, Ruth E Timme, Catarina Inês Mendes, Andrew J Page, Nabil-Fareed Alikhan, Dan Fornika, Finlay Maguire, Josefina Campos, Daniel Park, Idowu B Olawoye, Paul E Oluniyi, Dominique Anderson, Alan Christoffels, Anders Gonçalves da Silva, Rhiannon Cameron, Damion Dooley, Lee S Katz, Allison Black, Ilene Karsch-Mizrachi, Tanya Barrett, Anjanette Johnston, Thomas R Connor, Samuel M Nicholls, Adam A Witney, Gregory H Tyson, Simon H Tausch, Amogelang R Raphenya, Brian Alcock, David M Aanensen, Emma Hodcroft, William W L Hsiao, Ana Tereza R Vasconcelos, Duncan R MacCannell, on behalf of the Public Health Alliance for Genomic Epidemiology (PHA4GE) consortium, Future-proofing and maximizing the utility of metadata: The PHA4GE SARS-CoV-2 contextual data specification package, GigaScience, Volume 11, 2022, giac003, https://doi.org/10.1093/gigascience/giac003

License

CC-BY 4.0 International

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].