All Projects → greenelab → Hgsc_subtypes

greenelab / Hgsc_subtypes

Licence: bsd-3-clause
Two or three subtypes of high grade serous ovarian cancer subtypes fit data from different populations better than four

Programming Languages

r
7636 projects

High-Grade Serous Ovarian Cancer Subtypes - Why has the field settled on four?

DOI

Summary

In this repository, we compare high-grade serous ovarian cancer (HGSC) subtypes across Australian, American, and Japanese populations. We determine that two or three subtypes are most consistent across different datasets. A full report of this analysis is published in G3: Genes, Genomes, Genetics (Way et al. 2016). Instructions are provided in release version 1.3 to reproduce the analysis.

We leverage data extracted from the bioconductor package curatedOvarianData (Ganzfried et al. 2013) as well as a dataset we uploaded to GEO (GSE74357). We apply a unified, unsupervised bioinformatics pipeline to compare subtypes across these populations and determine that specific subtypes are reliably identified. The most replicable subtypes are mesenchymal-like and proliferative-like and their sample representation was highly concordant with other independent clustering studies performed on single populations.

We are currently working on adding African American HGSC samples to this pipeline to determine the representation of HGSC subtypes in an additional population. This project is in development and will be associated with a future release.

Contact

For all analysis or coding related questions please file a GitHub issue

Environment

To ensure analysis reproducibility, most packages are versioned using conda. The only exceptions are MCPcounter and ESTIMATE, which are downloaded by running install_custom.R.

To create a complete instance of this environment run the following:

conda env create --force --file environment.yml
source activate hgsc_subtypes

R --no-save < install_custom.R

Analyses

There are currently two pipelines in place to analyze hgsc subtypes. To reproduce the results of either pipeline, activate the hgsc_subtypes environment and run:

# Cross-population HGSC subtypes analysis 
bash hgsc_subtypes_pipeline.sh

# African American HGSC subtypes analysis
bash aaces_subtypes_pipeline.sh

Data

All data was retrieved from curatedOvarianData except for the Mayo data and AACES data.

Acknowledgements

This work was supported by the Institute for Quantitative Biomedical Sciences (Dartmouth); The graduate program in Genomics and Computational Biology (Penn); The Norris Cotton Cancer Center Developmental Funds; the National Cancer Institute at the National Institutes of Health (R01 CA168758 to J.A.D., F31 CA186625 to J.R., R01 CA122443 to E.L.G.); The Mayo Clinic Ovarian Cancer SPORE (P50 CA136393 to E.L.G.); The Mayo Clinic Comprehensive Cancer Center-Gene Analysis Shared Resource (P30 CA15083); The Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative (grant number GBMF 4552 to C.S.G.); and The American Cancer Society (grant number IRG 8200327 to C.S.G.).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].